# SAMSum Heterogeneous 18-Model Swarm (Ollama) — Colab Notebook

This notebook runs a heterogeneous swarm of **18 Ollama models** on the **SAMSum** dataset and saves per-sample outputs to Google Drive as JSONL.

**Consensus method:** SentenceTransformer embeddings + cosine similarity (select the most "central" summary).

**Notes**
- Concurrency is capped to keep Colab stable.
- Put `samsum.jsonl` in `/content/` (Colab file area).


In [None]:
!pip -q install rouge-score bert-score sentence-transformers scikit-learn spacy pandas tqdm httpx fastapi uvicorn
!python -m spacy download en_core_web_sm -q

# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh
!ollama --version


  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m34.4 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Cr

In [None]:
from google.colab import drive
import os, time, json, asyncio, subprocess
from pathlib import Path

BASE_DIR = "/content/slm_evaluation"
RESULTS_DIR = f"{BASE_DIR}/results"
os.makedirs(RESULTS_DIR, exist_ok=True)

DATASET_PATH = "/content/samsum.jsonl"  # upload file to /content/
print("RESULTS_DIR:", RESULTS_DIR)
print("DATASET_PATH:", DATASET_PATH)


Mounted at /content/drive
RESULTS_DIR: /content/drive/MyDrive/slm_evaluation/results
DATASET_PATH: /content/samsum.jsonl


In [61]:
# Persist models locally (ephemeral, as requested)
import os, subprocess, time

# Model storage (your choice, unchanged)
os.environ["OLLAMA_MODELS"] = "/content/"
os.makedirs(os.environ["OLLAMA_MODELS"], exist_ok=True)

os.environ["OLLAMA_NUM_PARALLEL"] = "8"          # increase internal parallelism
os.environ["OLLAMA_MAX_LOADED_MODELS"] = "18"    # keep all models resident
os.environ["OLLAMA_KEEP_ALIVE"] = "24h"          # prevent model eviction

# Stop stale server if any
!pkill -f "ollama serve" || true

# Start and log
!nohup ollama serve > /tmp/ollama.log 2>&1 &

# Wait for readiness
import requests
for i in range(60):
    try:
        r = requests.get("http://127.0.0.1:11434/api/tags", timeout=2)
        if r.status_code == 200:
            print("OK: Ollama is up")
            break
    except Exception:
        time.sleep(1)
else:
    print("ERROR: Ollama did not start. Showing last log lines:")
    !tail -n 120 /tmp/ollama.log

# Show current tags
!curl -s http://127.0.0.1:11434/api/tags | head -c 2000


^C
OK: Ollama is up
{"models":[{"name":"stablelm2:1.6b","model":"stablelm2:1.6b","modified_at":"2026-01-03T00:39:19.388393534Z","size":982790462,"digest":"714a6116cffa8b415b52c62a7a2d09ba6227ed733baa0025c937a36aee5504f3","details":{"parent_model":"","format":"gguf","family":"stablelm","families":["stablelm"],"parameter_size":"2B","quantization_level":"Q4_0"}},{"name":"tinyllama:1.1b","model":"tinyllama:1.1b","modified_at":"2026-01-03T00:39:07.826291773Z","size":637700138,"digest":"2644915ede352ea7bdfaff0bfac0be74c719d5d5202acb63a6fb095b52f394a4","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"1B","quantization_level":"Q4_0"}},{"name":"orca-mini:7b","model":"orca-mini:7b","modified_at":"2026-01-03T00:38:57.525310185Z","size":3825807733,"digest":"9c9618e2e8952fa2c2fac4986cc128923bd36f239639d7b31ad2692f26de6dab","details":{"parent_model":"","format":"gguf","family":"llama","families":null,"parameter_size":"7B","quantization_level":"Q4_0

In [25]:
!ollama list

NAME    ID    SIZE    MODIFIED 


In [26]:
MODELS = [
  "qwen2.5:7b-instruct",
  "llama3.1:8b",
  "mistral:7b-instruct",
  "phi3:mini",
  "phi3:medium",
  "gemma:7b-instruct",
  "gemma:2b",
  "qwen2.5:3b-instruct",
  "qwen2.5:1.5b-instruct",
  "llama3.2:3b",
  "llama3.2:1b",
  "openchat:7b",
  "neural-chat:7b",
  "dolphin-mistral:7b",
  "zephyr:7b",
  "orca-mini:7b",
  "tinyllama:1.1b",
  "stablelm2:1.6b"
]
print('Model count:', len(MODELS))

Model count: 18


In [27]:
# Pull all models (this can take a while; reruns will be fast if cached in Drive)
assert len(MODELS) == 18, f"Expected 18 models, got {len(MODELS)}"

for m in MODELS:
    print("Pulling:", m)
    !ollama pull $m

# Verify
!curl -s http://127.0.0.1:11434/api/tags | head -c 4000


Pulling: qwen2.5:7b-instruct
[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25

In [66]:
import os, time, json, asyncio
from pathlib import Path

# pick options (define if not already defined)
try:
    RUN_OPTIONS
except NameError:
    RUN_OPTIONS = {"temperature": 0.2, "num_predict": 256, "num_ctx": 4096}

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"MODEL {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)  # <-- FIX: pass options
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_hetero18(
    max_samples=None,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,   # <-- FIX: provide options here
):
    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_hetero18_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Models: {len(MODELS)} | concurrency={concurrency} | samples={len(dataset)}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(m):
                async with sem:
                    txt, lat, attempts = await call_with_retry(m, prompt, client, options, retries=3)
                    return m, txt, lat, attempts

            results = await asyncio.gather(*[_one(m) for m in MODELS])

            candidates = []
            for (m, txt, lat, attempts) in results:
                candidates.append({
                    "model": m,
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All models failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run
await run_eval_hetero18(
    max_samples=None,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


Starting Evaluation: colab_hetero18_1767404215
Results saved to: /content/slm_evaluation/results/colab_hetero18_1767404215/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_hetero18_1767404215/run_log.txt
Models: 18 | concurrency=6 | samples=819
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 0/819 [00:00<?, ?it/s]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: qwen2.5:7b-instruct
  status: OK | latency: 9.62s | attempts: 1
  summary: Hannah reluctantly agrees to ask Larry for Betty's number after Amanda suggests it.

MODEL 02: llama3.1:8b
  status: OK | latency: 25.96s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 6.69s | attempts: 1
  summary: Hannah asks Am

CancelledError: 

In [65]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &


^C


In [68]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &



import os, time, json, asyncio
from pathlib import Path

# Assumes these already exist in your notebook:
# - DATASET_PATH
# - RESULTS_DIR
# - RUN_OPTIONS
# - call_ollama(model, prompt, client, options) -> (txt, latency)
# - ollama_is_up() -> bool
# - get_consensus(list_of_strings) -> (best_idx, meta)
# - check_factuality(message, summary) -> any
# Also requires: httpx, tqdm

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"AGENT {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_homo6(
    model="llama3.1:8b",
    max_samples=None,
    agents=6,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_homo6_{model.replace(':','_').replace('/','_')}_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Homogeneous Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Model: {model} | agents={agents} | concurrency={concurrency} | samples={len(dataset)}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(agent_idx: int):
                async with sem:
                    txt, lat, attempts = await call_with_retry(model, prompt, client, options, retries=3)
                    return agent_idx, txt, lat, attempts

            results = await asyncio.gather(*[_one(i) for i in range(agents)])

            candidates = []
            for (agent_idx, txt, lat, attempts) in results:
                candidates.append({
                    "model": model,
                    "agent_idx": int(agent_idx),
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All agents failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "model": model,
                "agents": int(agents),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run (default: llama3.1:8b, 6 agents)
await run_eval_homo6(
    model="llama3.1:8b",
    max_samples=None,
    agents=6,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


^C
Starting Homogeneous Evaluation: colab_homo6_llama3.1_8b_1767405547
Results saved to: /content/slm_evaluation/results/colab_homo6_llama3.1_8b_1767405547/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_homo6_llama3.1_8b_1767405547/run_log.txt
Model: llama3.1:8b | agents=6 | concurrency=6 | samples=819
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:03<52:25,  3.85s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 3.75s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 02: llama3.1:8b
  status: OK | latency: 3.69s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 03: llama3.1:8b
  status: OK | latency: 3.74s | attempts: 1
  summary: Hannah asked Amanda for Be

Processing Samples:   1%|▏         | 11/819 [00:17<17:33,  1.30s/it]


SAMPLE 11/819 | ID: samsum_test_0010
MESSAGE (preview): Wanda: Let's make a party! Gina: Why? Wanda: beacuse. I want some fun! Gina: ok, what do u need? Wanda: 1st I need too make a list Gina: noted and then? Wanda: well, could u take yours father car and go do groceries with me? Gina: don't know if he'll agree Wanda: I know, but u can ask :) Gina: I'll try but theres no promisess Wanda: I know, u r the best! Gina: When u wanna go Wanda: Friday? Gina:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Wanda wants to have a party and asks Gina for help with errands so they can plan it together.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Wanda wants to have a party and needs Gina's help with errands before it can happen.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Wanda wants to have a pa

Processing Samples:   3%|▎         | 21/819 [00:31<21:25,  1.61s/it]


SAMPLE 21/819 | ID: samsum_test_0020
MESSAGE (preview): Deirdre: Hi Beth, how are you love? Beth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask. Deirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special! Beth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course. Deirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Be ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 2.00s | attempts: 1
  summary: Beth is planning a surprise 40th birthday party for her mom and also wants to try working at Deirdre's salon as part of her work experience.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.88s | attempts: 1
  summary: Beth is planning a 40th birthday celebration for her mother and also wants to try working at Deirdre's salon as part of her work experience.

Processing Samples:   4%|▍         | 31/819 [00:47<19:53,  1.51s/it]


SAMPLE 31/819 | ID: samsum_test_0030
MESSAGE (preview): Mary: hey, im kinda broke, lend me a few box Carter: okay, give me an hour, im at the train station Mary: cool, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Mary asked Carter to lend her some money and he agreed to meet up with her.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Mary asked Carter for a loan of a few dollars and he agreed to meet her at the train station.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her after an hour.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Mary asked Carter to lend her some money and he agreed to meet up with her after an hour.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary:

Processing Samples:   5%|▌         | 41/819 [00:59<17:20,  1.34s/it]


SAMPLE 41/819 | ID: samsum_test_0040
MESSAGE (preview): Sebastian: It's been already a year since we moved here. Sebastian: This is totally the best time of my life. Kevin: Really?  Sebastian: Yeah! Totally maaan. Sebastian: During this 1 year I learned more than ever.  Sebastian: I learned how to be resourceful, I'm learning responsibility, and I literally have the power to make my dreams come true. Kevin: It's great to hear that. Kevin: It's great th ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.54s | attempts: 1
  summary: Sebastian reflects on the past year, expressing his satisfaction with his life and feeling grateful to have someone he loves by his side.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Sebastian is reflecting on the past year of his life, expressing gratitude and satisfaction with his decisions and accomplishments.

AGENT 03: 

Processing Samples:   6%|▌         | 51/819 [01:12<16:19,  1.28s/it]


SAMPLE 51/819 | ID: samsum_test_0050
MESSAGE (preview): Nick: You look absolutely gorgeous and have a lovely smile.  Nick: Would love to get to know you a bit more. How about we meet up for a drink sometime? Jane: Hmmm... You're shooting a bit above your range aren't you? Nick: Why would you think that hon? Jane: Because I'm not that desperate. Nick: That was a bit below the belt. Nick: You're nice but you're not THAT hot. Jane: Oh is your poor little  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: A woman, Jane, responds negatively and insultingly to a man's friendly online approach, leading him to become angry and withdraw his invitation.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.31s | attempts: 1
  summary: The conversation between Nick and Jane quickly turns from a friendly exchange to a heated argument, with both parties exchanging insults

Processing Samples:   7%|▋         | 61/819 [01:22<12:52,  1.02s/it]


SAMPLE 61/819 | ID: samsum_test_0060
MESSAGE (preview): Salma: <file_video> Salma: the latest cat meme Hugh: oh sweet, I can never get enough of those lol Hugh: <file_photo> Salma: hahaha same
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.67s | attempts: 1
  summary: Salma and Hugh are sharing a funny cat video and photo.

AGENT 06: llama3.1:8b
  status: O

Processing Samples:   9%|▊         | 71/819 [01:34<15:09,  1.22s/it]


SAMPLE 71/819 | ID: samsum_test_0070
MESSAGE (preview): Ali: I think I left my wallet at your place yesterday. Could you check?  Mohammad: Give me a sec, I'll have a look around my room. Ali: OK. Mohammad: Found it! Ali: Phew, I don't know what I'd do if it wasn't there. Can you bring it to uni tomorrow? Mohammad: Sure thing.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it and agreed to bring it to university the next day.

AGENT 04: llama3.1:8b
  statu

Processing Samples:  10%|▉         | 81/819 [01:47<15:33,  1.26s/it]


SAMPLE 81/819 | ID: samsum_test_0080
MESSAGE (preview): Ryan: I have a bad feeling about this Ryan: <file_other> Sebastian: Ukraine... Sebastian: This russian circus will never end... Ryan: I hope the leaders of of nations will react somehow to this shit. Sebastian: I hope so too :(
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: The conversation is about concerns over Russia's actions in Ukraine and the fear that nothing will be done to stop them.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: The conversation is about concerns and fears regarding Russia's actions in Ukraine.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: The conversation appears to be a discussion between Ryan and Sebastian about their concern over Russia's actions in Ukraine.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.84s | attemp

Processing Samples:  11%|█         | 91/819 [02:00<14:34,  1.20s/it]


SAMPLE 91/819 | ID: samsum_test_0090
MESSAGE (preview): Ashleigh: Looks like we're going to the cinema!!  Ashleigh: <file_gif> Peter: You got the job?? Ashleigh: I got hte job! :D Peter: <file_gif> Ashleigh: <file_gif>
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Ashleigh just received a job offer and is excited to celebrate with Peter.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Ashleigh got a new job and they're going to celebrate by going to the cinema.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Ashleigh just received a job offer and is excited to celebrate with Peter at the cinema.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 05: llama3.1:8b
  status: OK | late

Processing Samples:  12%|█▏        | 101/819 [02:12<15:01,  1.26s/it]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.33s | attempts: 1
  summary: Chris and Rick discuss scheduling a Skype call for the next day at 3 pm, when they are in Cuba.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.42s | attempts: 1
  summary: Chris and Rick discuss and arrange a Skype call for the next day at 3pm their time, when they are in Cuba.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.46s | attempts: 1
  summary:

Processing Samples:  14%|█▎        | 111/819 [02:24<14:14,  1.21s/it]


SAMPLE 111/819 | ID: samsum_test_0110
MESSAGE (preview): Teacher: Rashi, why are you so low?  Rashi: Ma’am I’m a bit confused about my career.  Teacher: What is your confusion? Rashi: I was discussing with my friends about the career options.  Teacher: Hmm. Rashi: There are too many to choose from. Teacher: Choose a career based on what truly interests you.  Rashi: I have many that interests me. How does it determine the career? Teacher: The passion you ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.38s | attempts: 1
  summary: A teacher helps a student named Rashi understand that choosing a career based on personal interest and passion is key to success.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: A teacher helps a student named Rashi understand how to choose a career that aligns with her interests and passions.

AGENT 03: llama3.1:8b
  status: 

Processing Samples:  15%|█▍        | 121/819 [02:37<16:02,  1.38s/it]


SAMPLE 121/819 | ID: samsum_test_0120
MESSAGE (preview): John: wanna go see "A Star is Born" on Wed? Joan: sorry can't Joan: super busy  Joan: don't have time for anything :(  John: that's a shame Joan: I'm free on Thursday John: I could do Thursday Joan: ok! so around 8pm? John: sure sounds great John: I'll see where it's palying and send you the details Joan: ok great!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 04:

Processing Samples:  16%|█▌        | 131/819 [02:48<11:33,  1.01s/it]


SAMPLE 131/819 | ID: samsum_test_0130
MESSAGE (preview): Ryan: You're going to the casting?  Ryan: So you think you can dance 🤩 Jack: I am!  Jack: this time im going Ryan: U should go really Jack: I know, wanna come with me?  Ryan: I thought about it!  Jack: Nice well! I will meet you there! 😝😝😝
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 05: llama3.1:8b
  status: OK | latency: 0

Processing Samples:  17%|█▋        | 141/819 [03:01<13:15,  1.17s/it]


SAMPLE 141/819 | ID: samsum_test_0140
MESSAGE (preview): Sean: I overslept :/ Sam: Again?? Sean: I know.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.66s | attempts: 1
  summary: Sean overslept and Sam is surprised by it.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Sean has overslept, and Sam is surprised that it's happened again.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by his lack of punctuality.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: Sean overslept, and Sam is surprised it's not the first time.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Sean has overslept and is late, which is not the first time it's happened.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Sean has overslept, 

Processing Samples:  18%|█▊        | 151/819 [03:12<12:09,  1.09s/it]


SAMPLE 151/819 | ID: samsum_test_0150
MESSAGE (preview): Mike: will call u back in 2 hrs, ok?  Mike: can't talk right now, sry Mike: *sorry Dale: cool, no problem Dale: until then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.68s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: Mike is unable to talk at the moment but plans to call Dale back within two hours.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Mike is unable to talk and will call Dale back in two hours.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Mike is busy and will call Dale back within 2 hours.

AGENT 06: llama3.1:8b
  status:

Processing Samples:  20%|█▉        | 161/819 [03:24<11:50,  1.08s/it]


SAMPLE 161/819 | ID: samsum_test_0160
MESSAGE (preview): Alexander: Personal request to send me message when you will be in taxi Alexander: If any problem, call me Tom: ;) Tom: Thank You, I appreciate it Alexander: Taxi confirmation below Alexander: <file_photo> Tom: Thank you for the transport, we arrived safely, although without luggages :/ Alexander: Good but bad Tom: Yeeees
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Alexander arranged a taxi for Tom and received confirmation that they arrived safely, albeit without their luggage.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Alexander arranged a taxi for Tom and confirmed its arrival.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Alexander arranged a taxi for Tom and confirmed its arrival.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.90s | a

Processing Samples:  21%|██        | 171/819 [03:38<15:20,  1.42s/it]


SAMPLE 171/819 | ID: samsum_test_0170
MESSAGE (preview): Keith: Hi there kiddo, when are you planning to visit you old parents? :) Laura: Hey Dad, I'm not sure yet. I've been pretty busy recenlty.There is this big project coming… Keith: Oh, I understand, all work and no play…XD Laura: Daad! Don't be mean! You know I treat studying seriously! Keith: I know, you take after your mum :) By the way I think she bought some b-day gift for you… Laura: Next Satu ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Laura and her father Keith discuss her upcoming visit to see them and make plans for her to come next Saturday.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Laura and her father, Keith, discuss her upcoming visit to see them and make plans for the weekend.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.24s | attempts: 1
 

Processing Samples:  22%|██▏       | 181/819 [03:49<10:53,  1.02s/it]


SAMPLE 181/819 | ID: samsum_test_0180
MESSAGE (preview): Rob: Are we meeting up 2morrow? Eve: How about Sunday Eve: stores are open so we can go then Rob: Ok I just don't know what time they close Eve: we have ot go in the moring  Eve: I have some stuff to do around 3 Rob: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.66s | attempts: 1
  summary: They agreed to meet up on Sunday morning.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: They rescheduled their meeting from tomorrow to Sunday morning.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.63s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: They are planning to meet up on Sunday m

Processing Samples:  23%|██▎       | 191/819 [04:03<14:51,  1.42s/it]


SAMPLE 191/819 | ID: samsum_test_0190
MESSAGE (preview): Emely: Hey. Could u help me with one task? I have to fill in the gaps and I find it a little bit strange task Titus: How is that strange ? Emely: I have problems to so it haha. Can I send a photo of this task and then I will wrote u my answers? There are 10 sentences Titus: Ok I'm doing something now but I can look at it quickly if it won't take up a lot of my time Emely: Ok Emely: <file_photo> Ti ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.53s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues viewing a photo sent by Emely.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.57s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues with sending and vi

Processing Samples:  25%|██▍       | 201/819 [04:16<12:08,  1.18s/it]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Wednesday at around 9am.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Saturday morning around 9am.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place around 9am the next day.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Sam is finishing work at 5 and plans to bring someone over aroun

Processing Samples:  26%|██▌       | 211/819 [04:32<13:40,  1.35s/it]


SAMPLE 211/819 | ID: samsum_test_0210
MESSAGE (preview): Yaz: Going to slimming club tonight? Mary: Well, I don't want to, but I really should. I've been so bad, though! Yaz: Me too, choccies, wine, cake, you name it! Mary: Well, we should bite the bullet. She'll tell us off, I expect. Feels like being back at school! Yaz: Well, she IS our old cookery teacher! See you at 6ish, pick you up!? Mary: Yep! See you then!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: The two friends are planning to attend a slimming club together tonight, despite not wanting to.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Two friends, Yaz and Mary, are planning to attend a slimming club together 

Processing Samples:  27%|██▋       | 221/819 [04:45<13:27,  1.35s/it]


SAMPLE 221/819 | ID: samsum_test_0220
MESSAGE (preview): Lisa: Hello Peter. What have you been doing off late? Peter: Bit busy with work. Lisa: Too much of work, huh? Peter: Not really, I’ve been working out Lisa:  Ohh, trying to get in shape? Peter: Nah, I don’t have much of weight to lose, just want to improve my health. Lisa:  That’s a good thought. I was also thinking of working out a bit. What do you do? Do you go to a Gym? Peter: Yes, I hold a mem ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, proper eating, and adequate sleep.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, diet, and sleep habits.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.17s | at

Processing Samples:  28%|██▊       | 231/819 [04:57<11:29,  1.17s/it]


SAMPLE 231/819 | ID: samsum_test_0230
MESSAGE (preview): Tiffany: buy me a burger on your way home Railey: ok Tiffany: thx, sis :)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.67s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey

Processing Samples:  29%|██▉       | 241/819 [05:10<13:12,  1.37s/it]


SAMPLE 241/819 | ID: samsum_test_0240
MESSAGE (preview): Aggie: When do you start work? Peter: at 8  Peter: why? Aggie: no when do you start the new job? Peter: oh on the 6th Aggie: Ok I'll get a babysitter then Peter: lol wanted a free babysitter Aggie: yeah sorry :P
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Aggie was planning to hire a babysitter for Peter's first day of work, but he had actually already started working.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Aggie was planning to use Peter's work schedule as an excuse for needing a babysitter, but he had already started working.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Aggie and Peter discuss his upcoming work schedule, with Aggie initially planning to hire a babysitter for him.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.35s | 

Processing Samples:  31%|███       | 251/819 [05:22<11:19,  1.20s/it]


SAMPLE 251/819 | ID: samsum_test_0250
MESSAGE (preview): Shaldona: WE ARE GONNA GET MARRIED ❤️❤️ Shaldona: <file_others> Shaldona: This is our mobile inviation for our wedding. Shaldona: Invitation* Piper: Hey. You haven’t sent me any messages for a few years. Piper: And now you are sending me your wedding invitation  Piper: THROUGH MESSENGER? Shaldona: ..... Shaldona: Well.. Shaldona: I had no enough time to meet everybody and give this in person. Shal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Shaldona sent a wedding invitation via messenger to someone she hadn't spoken to in years.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Shaldona sent a wedding invitation via messenger to someone she hadn't contacted in years.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Shaldona sent a wedd

Processing Samples:  32%|███▏      | 261/819 [05:35<12:22,  1.33s/it]


SAMPLE 261/819 | ID: samsum_test_0260
MESSAGE (preview): Debbie: Help, I don't know which dress to buy! <file_photo> or <file_photo>? Kelly: The red one! It's beautiful. Denise: It is, but the green one will suit you better. Kelly: Why? Debbie looks good in red. Denise: She does, but in my opinion that dress would look better on someone taller. Deb needs a shorter one. Kelly: Right, I haven't thought about it. Debbie: So the green one? Denise: Definitel ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

AGENT 03: llama3.1:8b
  status: OK | la

Processing Samples:  33%|███▎      | 271/819 [05:45<09:44,  1.07s/it]


SAMPLE 271/819 | ID: samsum_test_0270
MESSAGE (preview): Conrad: I'm outside the house Conrad: I forgot my keys... Rebecca: 💩 Tiffany: I'll be home at 10-11 Rebecca: I'm coming back even later Conrad: Oh no... Conrad: I'll wait in the coffee shop
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Conrad is locked out of the house and will wait for someone to let him in.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Conrad is locked out of the house and will wait at a nearby coffee shop.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Conrad is locked out of the house and will wait at a nearby coffee shop.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Conrad is locked out of the house and will wait in a nearby coffee shop.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.94s | a

Processing Samples:  34%|███▍      | 281/819 [05:57<10:13,  1.14s/it]


SAMPLE 281/819 | ID: samsum_test_0280
MESSAGE (preview): Stan: She replied :-) Dave: She did? Stan: <file_photo> Dave: Lucky you! Stan: I can't believe it! She's my dream come true! Dave: Good luck today! Where are you going to take her? Stan: Pat&Gill's Dave: Good choice. Let me know how it was :-) Stan: I will. Dave: In minute detail :-) Stan: Forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true".

AGENT 02: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true".

AGENT 03: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true".

AGENT 04: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1


Processing Samples:  36%|███▌      | 291/819 [06:11<11:32,  1.31s/it]


SAMPLE 291/819 | ID: samsum_test_0290
MESSAGE (preview): Vincent: <file_photo> Damian: What happened to your lamp? Vincent: I broke it xD  Vincent: With my bare hand Damian: You didn't do this just to show off did you? Vincent: Hahaha. xD No. Vincent: I was playing with my cat with a ribbon Vincent: And while raising my hand I just hit the lamp and the glass cover broke Damian: Shit happens. You ordered new one yet? Vincent: Yeah. Should be ready to pic ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat, but has already ordered a replacement.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summar

Processing Samples:  37%|███▋      | 301/819 [06:24<10:41,  1.24s/it]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Jerry and Janet are exchanging sweet goodbyes and catching up about their day before Jerry's expected return home.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages while Jerry is on his way home from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages as Jerry h

Processing Samples:  38%|███▊      | 311/819 [06:38<12:29,  1.48s/it]


SAMPLE 311/819 | ID: samsum_test_0310
MESSAGE (preview): Sharon: My mum knows Ahmed: Aaaand? Sharon: She’s angry, I’m grounded Ahmed: For fuck’s sake you’re 19 Sharon: But I still live with my parents, let me remind you -_- Ahmed: So move out Sharon: It’s not that easy, hello Ahmed: Move in with me, we would be together all the time :* Sharon: I’d have to go to work, I’m still a student Ahmed: Don’t worry, I’ll take care of you Sharon: My parents will s ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.67s | attempts: 1
  summary: Sharon's 19-year-old daughter is grounded by her parents and Ahmed suggests she move in with him, but Sharon is hesitant due to her responsibilities as a student.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.68s | attempts: 1
  summary: Sharon's 19-year-old daughter is grounded by her parents and Ahmed suggests she move in with him, but Sharon is hesit

Processing Samples:  39%|███▉      | 321/819 [06:52<10:39,  1.28s/it]


SAMPLE 321/819 | ID: samsum_test_0320
MESSAGE (preview): Andre: i just read the news about the bear attack on the zoo :-( Andre: yikes - i would've never had imagine something like that could happen Megan: that's why I always say you can't keep animals y cages!!! Megan: it's cruel and it's wrong!!! Megan: i'm not surprised the bear reacted that way
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Andre is upset about a recent bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Andre is upset about a bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Andre is upset about a recent bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel

Processing Samples:  40%|████      | 331/819 [07:03<10:35,  1.30s/it]


SAMPLE 331/819 | ID: samsum_test_0330
MESSAGE (preview): Bella: It's valentine's day!😁😁😁 Aria: For somebody without bf today is kinda miserable day.....😢😢 Bella: There are a lot of vendors selling roses on the street here. Bella: <file_photo> Aria: <file_gif>  Bella: hahahahahahaha!! That looks SO SAD! :'‑(:'‑(:'‑( Aria: I feel like the weather is colder than it really is. How's the weather there? Bella: Here? it's 3 degree today. Wow! Even coffee shops ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.52s | attempts: 1
  summary: Two friends, Bella and Aria, exchange Valentine's Day messages, with Aria expressing sadness about being single on the holiday.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.70s | attempts: 1
  summary: The conversation is between two friends, Bella and Aria, on Valentine's Day, discussing their respective plans for the day and looking forward to a futu

Processing Samples:  42%|████▏     | 341/819 [07:16<10:38,  1.34s/it]


SAMPLE 341/819 | ID: samsum_test_0340
MESSAGE (preview): Joel: Hey, how are you guys doing? How's Coco? Becky: She's actually sitting on my lap purring right now. Trent: confirmed Joel: no way :D pics or didn't happen! Becky: <file_photo> Joel: haha! It took you guys 2 days to win her over. She's usually not that easy. Trent: maybe she remembers us visiting you? Joel: Maybe... oh, man, I'm jealous ;) Is she behaving ok? Becky: She's a total gem. Absolut ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Joel is checking in on his friends' cat, Coco, and getting updates on how she's adjusting to their care.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.43s | attempts: 1
  summary: Becky and Trent are taking care of Joel's cat Coco, who is adjusting well to their home despite some minor misbehavior.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.48s | 

Processing Samples:  43%|████▎     | 351/819 [07:28<11:06,  1.42s/it]


SAMPLE 351/819 | ID: samsum_test_0350
MESSAGE (preview): Noah: Hi there! The quince we talked about the other day... Are you still interested? Sophie: Hello Noah, but of course I am.Thank you. Noah: Actually William went to collect the rest of them for you immediately. They're in a basket on our terrace, so you call collect them any time. Sophie: That's very very kind of him! He's really a darling. Sophie: We won't be going your direction any time soon  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 2.34s | attempts: 1
  summary: Noah and Sophie discuss the quince fruit they had previously talked about, and make arrangements for its collection by Sophie's twins, who will be passing by Noah's location on their way home from college.

AGENT 02: llama3.1:8b
  status: OK | latency: 2.34s | attempts: 1
  summary: Noah and Sophie discuss collecting quinces that William picked up from the

Processing Samples:  44%|████▍     | 361/819 [07:41<09:10,  1.20s/it]


SAMPLE 361/819 | ID: samsum_test_0360
MESSAGE (preview): Chandler: Phoebe!! Do you have money??  Phoebe: Yes I have .. But why do you need it.. Chandler: Open your door.. And pay the delivery guy standing outside my door.. Phoebe: Oh chandler !! you idiot... On my way..
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is at his door.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is at his door.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is at his door.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Chandler needs Phoebe to pay for a delivery that was mistakenly sent to his apartment.

AGENT 05: llama3.1:8b
  status: OK | 

Processing Samples:  45%|████▌     | 371/819 [07:53<08:41,  1.16s/it]


SAMPLE 371/819 | ID: samsum_test_0370
MESSAGE (preview): Henry: Do you know what happened to cafe Kohaku? Julia: The one near Covent Garden? Henry: Yeah. I wanted to reserve a table, but I can't find them anywhere. Their Facebook page disappeared. Julia: I'm afraid they're closed :( Henry: For good?! Julia: Afraid so. I've been there last week and wanted to take my cousin there, but the place's empty Julia: Probably the rent was too high. Henry: Oh no,  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed down permanently due to high rent costs.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed permanently due to high rent costs.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed

Processing Samples:  47%|████▋     | 381/819 [08:07<09:51,  1.35s/it]


SAMPLE 381/819 | ID: samsum_test_0380
MESSAGE (preview): Don: Hi Cindy. Have you made all arrangements? Cindy: It's about today's meeting or your trip next week? Don: Both, I suppose:) Cindy: You have meeting with management board today at 2 pm. Don: Where did you set it up? Cindy: In our conference room. Cindy: Catering will bring some tea, coffee and snacks. Don: That's good. Don: Did everybody got the agenda? Cindy: Yep. Don: How did Andy react when  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.48s | attempts: 1
  summary: Don and Cindy discuss arrangements for his meeting with management today and his trip next week, confirming details such as the conference room location, catering, and travel plans.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.23s | attempts: 1
  summary: Don and Cindy discuss arrangements for both Don's meeting with management today and his upcoming t

Processing Samples:  48%|████▊     | 391/819 [08:20<10:23,  1.46s/it]


SAMPLE 391/819 | ID: samsum_test_0390
MESSAGE (preview): Ost: I bought a garage place in the garage. Vesna: Cool. Ost: I got it cheap. It's my intention to rent it out and do the restitution of my money. Vesna: Well doneeeee!. Ost: There is a big crowd in parking, so the prices for finding garage places are favorable, in this area. Vesna: Really!. Ost: Return on investment is much higher than if I give money under term savings in the Bank. Vesna: You ma ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.47s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

AGENT 03: llama3.1:8b
  status: OK | late

Processing Samples:  49%|████▉     | 401/819 [08:34<10:14,  1.47s/it]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Susan had a fun date with a man who took her rollerskating at a disco.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Susan had a fun date with a man who took her rollerskating at a disco.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Susan had a fun date with a man who took her rollerskating a

Processing Samples:  50%|█████     | 411/819 [08:46<07:15,  1.07s/it]


SAMPLE 411/819 | ID: samsum_test_0410
MESSAGE (preview): Ryan: Merry Christmas everybody! Nick: Merry Xmas! Chris: Merry Xmas to you! Chris: Where are you btw? Ryan: visiting family in Manchester  Ryan: white a disaster hahhaha Chris: We stayed home with Ann, first time on our own! Chris: and I've liked it so far very much
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: The group is exchanging holiday greetings and catching up about their Christmas plans.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: The group is exchanging Christm

Processing Samples:  51%|█████▏    | 421/819 [08:59<09:18,  1.40s/it]


SAMPLE 421/819 | ID: samsum_test_0420
MESSAGE (preview): Zack: Dude have you seen the new Jersey Shore season? Dwayne: Didn't even knew a new one came out xD Zack: Ahahah no surprise there. But yeah I watched a couple of episodes and it was hilarious. Dwayne: It's the original cast? Zack: Yep. Snooki, J-Wow, Pauly D, Ron, The Situation, Deena and Vinny. Dwayne: I herad The Situation was having some bad legal problems.. Zack: Yeah and he still is. Dwayne ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.54s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his own enjoyment of the show.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.55s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his own enjoyment of 

Processing Samples:  53%|█████▎    | 431/819 [09:13<10:00,  1.55s/it]


SAMPLE 431/819 | ID: samsum_test_0430
MESSAGE (preview): Nathan: Hey Honey, what are our plans for tomorrow then? Deborah: So I've been thinking that we should pack everything today, we won't have time tomorrow Nathan: Totally true, I started packing my stuff in the bedroom, I also told the girls to start packing too Deborah: If Sofie wants to take the unicorn we won't have much space in the car XD Nathan: No, I explained to her that he'll be fine witho ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.42s | attempts: 1
  summary: The couple is making plans to pack up and leave for a trip with their family the next day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.71s | attempts: 1
  summary: The couple, Nathan and Deborah, are making plans to pack up their belongings and head out on a trip with their family, including their daughters Sofie and Lucy.

AGENT 03: llama3.1:8b
  sta

Processing Samples:  54%|█████▍    | 441/819 [09:27<07:39,  1.22s/it]


SAMPLE 441/819 | ID: samsum_test_0440
MESSAGE (preview): Andrew: Have you got the paper from the office? Paula: Not yet, I'm going there tomorrow. Andrew: You'd better... The deadline for the application is Friday at noon. Paula: Ok, sure! I promise I won't forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Andrew reminds Paula that she needs to pick up a paper from the office by Friday at noon.

AGENT 05: llama3.1:8b
  status: OK | la

Processing Samples:  55%|█████▌    | 451/819 [09:38<07:26,  1.21s/it]


SAMPLE 451/819 | ID: samsum_test_0450
MESSAGE (preview): Grace: i didn't get to talk to you a lot at the party last night Audrey: i know, i really wanted to catch up with you! :-( Audrey: what's going on in your life? Audrey: anything new? Grace: not really, i'm excited about graduating soon Grace: i think i'll apply for grad school but i'm still not sure Audrey: you always have liked school lol Grace: i've also thought of taking a year off Audrey: that ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Grace and Audrey caught up briefly at a party, discussing Grace's plans after graduation.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Grace and Audrey catch up on each other's lives, discussing Grace's upcoming graduation and potential plans for graduate school or taking a year off.

AGENT 03: llama3.1:8b
  status: OK | la

Processing Samples:  56%|█████▋    | 461/819 [09:52<08:13,  1.38s/it]


SAMPLE 461/819 | ID: samsum_test_0460
MESSAGE (preview): Trinny: What time do we meet? Susannah: 5, 6? Linda: six, please? I need time to get ready after work;) Trinny: 6 is good! Susannah: Ok Susannah: at the main entrance? Trinny: Ok Linda: see you Ladies! Trinny: :*
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: The three friends have agreed to meet at 6pm at the main entrance.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: The grou

Processing Samples:  58%|█████▊    | 471/819 [10:04<07:32,  1.30s/it]


SAMPLE 471/819 | ID: samsum_test_0470
MESSAGE (preview): Rob: I wanna start Get off the couch challenge! Who's with me? I'm gonna start with 30 min of activity every day. Greg: way ahead of you mate! at least 60 min a day! Rob: good for you! 30 min a day is gonna be challenging for me! Greg: good luck then! Will: don't be too hard on yourself. better to be realistic. fingers crossed! Anna: i'm in! 30 min sounds reasonable!  Greg: look for variety of exe ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Rob is starting a "Get off the couch challenge" where he will do at least 30 minutes of activity every day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.38s | attempts: 1
  summary: Rob is starting a "Get off the couch" challenge where he will aim to do at least 30 minutes of physical activity every day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1

Processing Samples:  59%|█████▊    | 481/819 [10:17<07:30,  1.33s/it]


SAMPLE 481/819 | ID: samsum_test_0480
MESSAGE (preview): Drade: Mum he got out of the group chatting room Marenda: I am sure he is angry! Marenda: Why did you act that aggressively? Marenda: You know your brother's temper Drade: I just said what he had said was wrong! Marenda: Apologize and send the invitation to him Drade: Why should I? Drade: I didn't say anything wrong!!!!! Drade: He failed in his business, didn't he? Marenda: But did you have to tal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: A brother and sister are arguing over Drade's harsh comments about their brother's failed business venture.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: A brother and sister are arguing over Drade's criticism of their brother's failed business venture.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  sum

Processing Samples:  60%|█████▉    | 491/819 [10:27<05:19,  1.03it/s]


SAMPLE 491/819 | ID: samsum_test_0490
MESSAGE (preview): Matt: Do you have Patrick's phone number? He's not picking up my calls... Matt: Is he mad at me? Mary: Sure, here you go: +44 7700 900077 Susan: He changed his number :) Matt: phew, thanks!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Matt asked for Patrick's phone number because his previous calls weren't being answered.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Matt was unable to reach Patrick on the phone and Mary provided him with Patrick's new phone number.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Patrick had changed his phone number and Matt was unable to reach him.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Matt was unable to reach Patrick and Mary provided him with Patrick's new phone number.



Processing Samples:  61%|██████    | 501/819 [10:41<06:41,  1.26s/it]


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper and tissues.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Simon agrees to buy Helen more tissue after she runs out of toilet paper.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: Helen asked Simon to buy more toilet paper because they were out.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: Helen asked Simon to buy more toilet paper a

Processing Samples:  62%|██████▏   | 511/819 [10:54<05:46,  1.13s/it]


SAMPLE 511/819 | ID: samsum_test_0510
MESSAGE (preview): Mum: hungry? Steve: starving... Mum: I'll wrap something up for you. Steve: You're the best, thank you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Steve is very hungry and his mom offers to pack him a snack.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Steve is very hungry and his mom offers to pack him a snack.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Steve is hungry and his mum offers to pack him a snack.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Steve is extremely hungry and his mom offers to pack him a snack.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.74s | attempts: 1
  summary: Steve is hungry and his mum offers to prepare food for him.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.68s |

Processing Samples:  64%|██████▎   | 521/819 [11:05<05:32,  1.12s/it]


SAMPLE 521/819 | ID: samsum_test_0520
MESSAGE (preview): Jair: Still busy? Callum: Yes a little sorry Jair: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.65s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they were both busy.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Jair and Callum are both busy, but Jair is understanding.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they are both busy.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they are both busy.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating that they are both busy.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.76s | a

Processing Samples:  65%|██████▍   | 531/819 [11:16<05:39,  1.18s/it]


SAMPLE 531/819 | ID: samsum_test_0530
MESSAGE (preview): Adam: I have a juicy gossip for you guys ;) Borys: About whom? Adam: So apparently, Iga planned a weekend getaway with her boyfriend, but they had to cancel because he couldn't talk his group into changing the date of the presentation :D Katia: And she is pissed? Adam: What do you think? That is why they aren't sitting together since Monday ;) Katia: Who would have expected such a gossip girl from ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work commitment, causing tension between them.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.14s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work obligations, causing tension between them.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.07

Processing Samples:  66%|██████▌   | 541/819 [11:29<06:06,  1.32s/it]


SAMPLE 541/819 | ID: samsum_test_0540
MESSAGE (preview): Chris: I tried jogging today! Matt: and the key word is "tried"? Matt: not to mention that you have the weirdest habit to stat the conversation 5 minutes before my work ends :P Matt: good (or bad) thing is that I need to stay a bit longer today ;) Chris: Sorry :P Chris: I have no other friends ;) Matt: it's not a problem for me, you're the one getting a short conversation ;) Chris: Anyway, I was c ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.69s | attempts: 1
  summary: Chris and Matt have a conversation about Chris's attempt at jogging, with Matt offering words of encouragement but also a dose of skepticism that Chris will be able to sustain his performance.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.60s | attempts: 1
  summary: Chris and Matt had a conversation about Chris trying to jog again after a break, with M

Processing Samples:  67%|██████▋   | 551/819 [11:39<04:40,  1.05s/it]


SAMPLE 551/819 | ID: samsum_test_0550
MESSAGE (preview): Jeremih: hey, tell your sis to text back Hansel: haha, thats your issues bro, dont drag me into it Jeremih: she's mad at me Hansel: for what Jeremih: i dont even know😔 Hansel:😢😂 Jeremih: youre laughing Hansel: haha, ill tell her but next time i wont interfere Jeremih: Okay bro, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Jeremih is having a disagreement with his sister and asks Hansel to help mediate.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Jeremih is having issues with his sister and asks Hansel to help mediate the situation.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Jeremih is trying to resolve a conflict with his sister and asks Hansel for help.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Je

Processing Samples:  68%|██████▊   | 561/819 [11:52<05:08,  1.20s/it]


SAMPLE 561/819 | ID: samsum_test_0560
MESSAGE (preview): Oscar: A coffee at Tristano's? Payne: Why not. in 15 mins? Oscar: let's make it half an hour ok? Payne: great, i'll be there Oscar: see you there
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 05: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Oscar and Payne agreed t

Processing Samples:  70%|██████▉   | 571/819 [12:04<04:31,  1.09s/it]


SAMPLE 571/819 | ID: samsum_test_0570
MESSAGE (preview): Emily: shit, I'm waiting to see my doctor and you won't believe who's waiting with me Emily: <file_gif> Adam: who then? Emily: a nun and a blind crazy man Emily: it feels spooky Adam: haha XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.23s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual pair of companions, including a nun and a blind man.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind man.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind man.

AGENT 04: llama3.1:8b
  status: OK | latenc

Processing Samples:  71%|███████   | 581/819 [12:17<05:05,  1.28s/it]


SAMPLE 581/819 | ID: samsum_test_0580
MESSAGE (preview): Ella: did you send me mms?  i can't open it for some reason :/ Caleb: yeah, I sent you a pic of Maya Caleb: she's playing a mouse in a school play and I wanted to show you her costume Caleb: she looks sooo adorable in it! :-) Ella: aww...:) can you send me this photo via email? Caleb: done :-)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and they discussed how to share the photo with Ella.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: Caleb sent Ella a picture of their daughter Maya's school play costume, and they discussed how to share the photo with Ella.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and t

Processing Samples:  72%|███████▏  | 591/819 [12:29<04:35,  1.21s/it]


SAMPLE 591/819 | ID: samsum_test_0590
MESSAGE (preview): Ethan: who's going to see solstafir? Noah: solstafir! when? Noah: i had no idea they were playing Archie: I'm not :D. Leo: i'm going ofc! Ethan: 21.11 Noah: that's a wednesday... ehh... Noah: i'll have to see, might have to stay late at work... Leo: nooo you have to come Ethan: yeah you have to! Archie: There are better concerts :P. Noah: i know, i would really like to see them, but yeah Leo: arch ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Friends are discussing attending a Solstafir concert on November 21.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: A group of friends are discussing attending a Solstafir concert on November 21.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: The group is discussing attending a Solstafir concert

Processing Samples:  73%|███████▎  | 601/819 [12:41<04:09,  1.14s/it]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: The group is preparing for a trip to Bosnia and Herzegovina (not just Croatia) and Michael reminds them that they will need passports due to border control.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: The group discusses and confirms that a valid passport is required for their upcoming trip to Bosnia and Herzegovina.

AGEN

Processing Samples:  75%|███████▍  | 611/819 [12:53<03:58,  1.15s/it]


SAMPLE 611/819 | ID: samsum_test_0610
MESSAGE (preview): James: im out Tony: ??? James: from the team, this old prick pissed me off Tony: listen, I know the coach is pretty annoying at times (all the time lol) but he really is good James: i dont give a fuck, too much for me, already texted him Tony: shiiiit man
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 05: llama3.1:8b
  status: OK | la

Processing Samples:  76%|███████▌  | 621/819 [13:05<04:19,  1.31s/it]


SAMPLE 621/819 | ID: samsum_test_0620
MESSAGE (preview): Ken: Hey, how are you doing today? Ellen: Fine. How are you? Ken: Not doing well at all today. Really depressed. Frustrated. Stressed out. Ellen: Oh no! Ken: Really bad day yesterday carrying over to today. Ellen: Can I do anything? Ken: I'll be fine. Just overwhelmed and burned out. Ellen: Sorry! Ken: Sorry, don't mean to bring you down. Ellen: You didn't, just worried about you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Ken is having a bad day due to depression, frustration, and stress from the previous day's events.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Ken is having a bad day due to being depressed, frustrated, stressed, and burned out from the previous day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Ken is having a bad day due

Processing Samples:  77%|███████▋  | 631/819 [13:19<03:56,  1.26s/it]


SAMPLE 631/819 | ID: samsum_test_0630
MESSAGE (preview): JP: What's new? Samara: Not much. JP: Oh. Samara: Want to play? JP: Nah. Gotta go to work. Samara: K, see ya!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: JP and Samara briefly chat about playing together, but JP has to go to work instead.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: JP and Samara briefly chat about playing together, but JP has to leave for work.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: JP and Samara briefly chat about their day, with JP declining an invitation to play from Samara due to having to go to work.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: JP and Samara briefly chat about their day, with JP declining an invitation to play from Samara because he has to go to work.

AGENT 05: ll

Processing Samples:  78%|███████▊  | 641/819 [13:34<04:01,  1.36s/it]


SAMPLE 641/819 | ID: samsum_test_0640
MESSAGE (preview): Lorena: Hi, can you help me with something? Martin: Well, I can try. Martin: Depends what it is? Lorena: I got a new desk and it comes with assembly instructions but I give up, I just can't do it. Lorena: I'm illiterate when it comes to instructions, haha. Lorena: So I could really use some help... Martin: Hmm, I can't today, but how about tomorrow? Shouldn't take long anyway. Lorena: Yeah, it's f ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Lorena asked Martin for he

Processing Samples:  79%|███████▉  | 651/819 [13:47<03:32,  1.26s/it]


SAMPLE 651/819 | ID: samsum_test_0650
MESSAGE (preview): Jake: Maaaan! you in london?! Mike: <file_photo> Mike: yeah mate! dreams coming true! Jake: cool! congrats!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's dream of visiting London becoming a reality.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's arrival in London.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's trip to London.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Jake and Mike are congratulating each other on Mike's trip to London.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Jake and Mike are exchanging cong

Processing Samples:  81%|████████  | 661/819 [13:59<02:49,  1.07s/it]


SAMPLE 661/819 | ID: samsum_test_0660
MESSAGE (preview): Miley: don't want to go to work tomorrow! Miley: wanna sleep!!! Miley: need some rest :( Aaron: know what u feelin Aaron: really want some holiday already Miley: :( Aaron: fighting girl!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summ

Processing Samples:  82%|████████▏ | 671/819 [14:11<03:06,  1.26s/it]


SAMPLE 671/819 | ID: samsum_test_0670
MESSAGE (preview): Autumn: u sewn anything? Autumn: show me show me!! :D Autumn: <file_gif> Israel: no :( Israel: haven't had the time yet  ;( Autumn: ooh :(  Autumn: u need to have pririorities in life! Autumn: like me - i haven't checked the tests for a month because the dress wouldn't sew itself :D Israel: you've sewn a dres?? Israel: you didn't show it to me Autumn: oh didn't i? Autumn: there will be some photos ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Autumn and Israel are discussing Autumn's sewing projects, with Autumn teasingly showing off her work.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Autumn and Israel are chatting about Autumn's sewing projects, with Autumn teasingly showing off her work.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.28s | attempts: 1
  s

Processing Samples:  83%|████████▎ | 681/819 [14:25<03:20,  1.45s/it]


SAMPLE 681/819 | ID: samsum_test_0680
MESSAGE (preview): Lesley: Can you please go on a walk with the dog when you get home? Chloe: I don't know when I'm going to be home though. Lesley: What do you mean? Doesn't school finish up at 3PM? Chloe: Yes, but I was going to go to Megan's house tonight. Lesley: Chloe, you didn't ask me if you could go. You cannot just assume that you are going.  Chloe: Mom, seriously. I already told her I was going to come ove ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.45s | attempts: 1
  summary: Lesley asks Chloe to come home from school early to take care of the dog, as her father will be working late and no one else will be home.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.39s | attempts: 1
  summary: Lesley tells Chloe she must come home after school to take care of the dog, rather than going straight to a friend's house.

AGENT 03: llama3

Processing Samples:  84%|████████▍ | 691/819 [14:38<02:50,  1.33s/it]


SAMPLE 691/819 | ID: samsum_test_0690
MESSAGE (preview): Mia: God, he keeps talking about the stupid wall Jennifer: I think he's just senile Karine: unfortunately it's much more Karine: this populist strategy to scare people with a minority Karine: and then present himself as the only savior Peter: Karine is right, I think he's not as senile and stupid as we would like to believe Mia: so it's even worse - he's pure evil Peter: possibly yes Karine: I rea ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.56s | attempts: 1
  summary: The group is discussing Donald Trump, with some members labeling him as "pure evil" and others describing him as a populist leader who is also stupid and ignorant.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: The group is discussing how they believe Donald Trump's actions are motivated by evil intentions rather than senilit

Processing Samples:  86%|████████▌ | 701/819 [14:52<02:50,  1.45s/it]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.53s | attempts: 1
  summary: Charlie and Daisy, who had a romantic evening together in Sicily, plan to meet again in Florence, Italy on September 1-5.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.44s | attempts: 1
  summary: Charlie and Daisy, who had previously spent an evening together in Sicily, are making plans to meet again in Florence, Italy.

AGENT 03: llama3.1:8b
  status:

Processing Samples:  87%|████████▋ | 711/819 [15:04<01:47,  1.00it/s]


SAMPLE 711/819 | ID: samsum_test_0710
MESSAGE (preview): Shanon: Yeah. So what bout him? Victor: He's no longer our teacher :) Oliver: I'm so happy :) Sid: Me too! Shanon: Bt y? Sid: He got fired 2day and thrown out of school? Oliver: Literally thrown out! Shanon: What?! Victor: I suppose he's had it and lost it. Oliver: Or maybe had one too many ;) Sid: Not. Our. Teacher. Anymore!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Their teacher has been fired and literally thrown out of the school.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Their

Processing Samples:  88%|████████▊ | 721/819 [15:17<01:54,  1.17s/it]


SAMPLE 721/819 | ID: samsum_test_0720
MESSAGE (preview): Greg: Hi Mum, how's the cat doing? Patti: I just rang the vets, she's fine! Greg: Thank God, been worrying about her all day! Patti: They said I can pick her up later. I'll pop home and fetch the cage after work. Should be there at 5ish. Greg: Good, see you at home, bye!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office by Patti later that day.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: The cat is fine and will be picked up from the vet later that day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: The cat is fine and will be picked up fr

Processing Samples:  89%|████████▉ | 731/819 [15:29<02:00,  1.36s/it]


SAMPLE 731/819 | ID: samsum_test_0730
MESSAGE (preview): Molly: D‑': Luca: What??? Molly: I can't think anymore today! Luca: LOL! Molly: Seriously! Luca: Deep breath... Molly: It's the silly season, isn't it? Luca: Yep. Just hang in there. Molly: Trying... Luca: Try harder! LOL! Molly: Driving me nuts! Luca: I know, but you can't let it get you down! Molly: Yeah, you're right. Luca: Of course I am! LOL! Molly: Modest, too.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated during the holiday season, but Luca tries to cheer her up with jokes and encouragement.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated during the holiday season.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated d

Processing Samples:  90%|█████████ | 741/819 [15:42<01:41,  1.30s/it]


SAMPLE 741/819 | ID: samsum_test_0740
MESSAGE (preview): Joyce: Honey, can I take the car? Kyle: No, I'm sorry. Joyce: Why not? How am I supposed to go get Harry? Kyle: I've got to go to a mechanic. The car is making a weird noise. Joyce: Then can you get Harry on your way? Kyle: I don't know, I'm already running late. Joyce: Am I supposed to go get him by bus? That's ridiculous. Kyle: Fine, I'll get him. Joyce: Thank you.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Kyle is unable to let Joyce take the car because he needs it for a mechanic appointment.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he needs to take the car to a mechanic.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he needs to

Processing Samples:  92%|█████████▏| 751/819 [15:56<01:45,  1.55s/it]


SAMPLE 751/819 | ID: samsum_test_0750
MESSAGE (preview): Bobby: U know what annoys me the most? Mickey: Not being able to watch TV or listen to music without pesky interruptions? Bobby: That too, but the thing that annoys me the most is pharmaceuticals, medicine and drugs. Mickey: Y? Aren't they good 4 ur health? Bobby: They may well be, but not for my wallet! Mickey: What do u mean? Bobby: I was watching a film last night and suddenly there's a bunch o ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.88s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell existing products as medical treatments to make money, citing an example of a non-existent illness being promoted as a legitimate condition.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.81s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses to sel

Processing Samples:  93%|█████████▎| 761/819 [16:08<01:12,  1.25s/it]


SAMPLE 761/819 | ID: samsum_test_0760
MESSAGE (preview): Pamela: Have you tried applying for the RGS grant? Zoe: I did last year. Zoe: Wasn't lucky... Xiara: Me neither Pamela: Do you know if I have to be a member? Zoe: It depends on the award Pamela: Ok I'll have a look at their website 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Pamela asked about applying for the RGS grant and received advice from Zoe that it may require membership, but Pamela will check the website.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Pamela is considering applying for an RGS grant and asks Zoe about the eligibility requirements.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Pamela is inquiring about eligibility for the RGS grant and whether membership is required.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.16s |

Processing Samples:  94%|█████████▍| 771/819 [16:22<01:04,  1.34s/it]


SAMPLE 771/819 | ID: samsum_test_0770
MESSAGE (preview): Matthew: so is there any after party planed after the wedding? Kristina: after the wedding? I think it will last till the morning... Matthew: haha, i mean on the next day :D Bart: i think Mark said there is something planned Bart: at their house probably Matthew: oh thats good Matthew: better atmosphere and everything, more chill Kristina: that's great, i 've been to an after party like this one K ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: The group is discussing plans for an after-party to take place at the groom's house the day after the wedding.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: There is an after-party planned for the day after the wedding, likely at Mark's house.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.30s | attempts: 1
  summary: Ther

Processing Samples:  95%|█████████▌| 781/819 [16:33<00:44,  1.16s/it]


SAMPLE 781/819 | ID: samsum_test_0780
MESSAGE (preview): Karine: <file_video> Jessie: That was us at kindergarten! Karine: yes!! 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: The two friends, Karine and Jessie, recognized themselves from a video of their time in kindergarten.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from thei

Processing Samples:  97%|█████████▋| 791/819 [16:46<00:35,  1.28s/it]


SAMPLE 791/819 | ID: samsum_test_0790
MESSAGE (preview): Louis: <file_gif> Archie: Wtf? Lol! Louis: <file_gif> Archie: That one looks like Melanie Louis: True! Hahahahah Archie: Just like in this pic Archie: <file_photo> Louis: Lol! It's her! Archie: By the way, she was way hotter when she had cury hair Louis: I disagree, sir! She looks absolutely gorgeous with straight hair Archie: Oh well, you have the right to an opinion even if it's wrong XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Louis and Archie are joking about a photo of Melanie, comparing her with and without curly hair.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Louis and Archie are joking about a photo of Melanie, discussing her appearance in different hairstyles.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Louis and Archie are jo

Processing Samples:  98%|█████████▊| 801/819 [16:59<00:23,  1.29s/it]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.26s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mom and he guides her through the selection process via photos.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.26s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the selection process via photos.

AGENT 03: llama3.1:8b
  

Processing Samples:  99%|█████████▉| 811/819 [17:10<00:09,  1.13s/it]


SAMPLE 811/819 | ID: samsum_test_0810
MESSAGE (preview): Luke: Ryanair to cancel flights affecting 400,000 more passengers. So unlucky Erin! Erin: Yeeeep...it touched me as well finally 😩😩😩😩 Now they screw all my plans to go to Egypt... Luke: You’re not the only one Erin: Good to know that my feelings are shared with other 399,999 people 😩 Nathan: Relevant news story for everyone to consider: http://waterfordwhispersnews.com/2017/09/18/ryanair-cancel-ef ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.14s | attempts: 1
  summary: Ryanair has announced it will cancel more flights, affecting over 400,000 passengers.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Ryanair has cancelled flights affecting over 400,000 passengers.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.14s | attempts: 1
  summary: Ryanair has announced it will cancel more flights, 

Processing Samples: 100%|██████████| 819/819 [17:21<00:00,  1.27s/it]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_homo6_llama3.1_8b_1767405547/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_homo6_llama3.1_8b_1767405547/run_log.txt





In [69]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &

import os, time, json, asyncio
from pathlib import Path

HETRO6_MODELS = [
    "llama3.1:8b",
    "qwen2.5:7b-instruct",
    "mistral:7b-instruct",
    "gemma:7b-instruct",
    "openchat:7b",
    "neural-chat:7b",
]

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"MODEL {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_hetero6(
    max_samples=None,
    models=HETRO6_MODELS,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_hetero6_highparams_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Heterogeneous-6 Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Models: {len(models)} | concurrency={concurrency} | samples={len(dataset)}")
    print("Models:", models)
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(m):
                async with sem:
                    txt, lat, attempts = await call_with_retry(m, prompt, client, options, retries=3)
                    return m, txt, lat, attempts

            results = await asyncio.gather(*[_one(m) for m in models])

            candidates = []
            for (m, txt, lat, attempts) in results:
                candidates.append({
                    "model": m,
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All models failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
                "models": models,
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run
await run_eval_hetero6(
    max_samples=None,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


^C
Starting Heterogeneous-6 Evaluation: colab_hetero6_highparams_1767406589
Results saved to: /content/slm_evaluation/results/colab_hetero6_highparams_1767406589/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_hetero6_highparams_1767406589/run_log.txt
Models: 6 | concurrency=6 | samples=819
Models: ['llama3.1:8b', 'qwen2.5:7b-instruct', 'mistral:7b-instruct', 'gemma:7b-instruct', 'openchat:7b', 'neural-chat:7b']
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:20<4:41:47, 20.67s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 20.63s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.35s | attempts: 1
  summary: Hannah reluctantly agrees to ask Larry for Betty's number after Amanda suggests it.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 2.68s | attempts: 1
  summary: Hannah asks Am

Processing Samples:   1%|▏         | 11/819 [02:02<2:17:33, 10.22s/it]


SAMPLE 11/819 | ID: samsum_test_0010
MESSAGE (preview): Wanda: Let's make a party! Gina: Why? Wanda: beacuse. I want some fun! Gina: ok, what do u need? Wanda: 1st I need too make a list Gina: noted and then? Wanda: well, could u take yours father car and go do groceries with me? Gina: don't know if he'll agree Wanda: I know, but u can ask :) Gina: I'll try but theres no promisess Wanda: I know, u r the best! Gina: When u wanna go Wanda: Friday? Gina:  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Wanda wants to have a party and needs Gina's help with errands before it can happen.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.91s | attempts: 1
  summary: Wanda is planning a party and asks Gina to help by going grocery shopping and inviting their father's car.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.80s | attempts: 1
  summ

Processing Samples:   3%|▎         | 21/819 [03:44<2:07:55,  9.62s/it]


SAMPLE 21/819 | ID: samsum_test_0020
MESSAGE (preview): Deirdre: Hi Beth, how are you love? Beth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask. Deirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special! Beth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course. Deirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Be ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Beth asks her aunt Deirdre for work experience in a salon and Deirdre agrees to let her try out the beauty therapy side.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.46s | attempts: 1
  summary: Deirdre and Beth discuss plans for their mother's 40th birthday and Beth's interest in trying work experience at Deirdre's salon.

MODEL 03: mistral:7b-

Processing Samples:   4%|▍         | 31/819 [05:33<2:34:55, 11.80s/it]


SAMPLE 31/819 | ID: samsum_test_0030
MESSAGE (preview): Mary: hey, im kinda broke, lend me a few box Carter: okay, give me an hour, im at the train station Mary: cool, thanks
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 15.06s | attempts: 1
  summary: Mary asked Carter to lend her some money and he agreed to meet up with her after an hour.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 2.68s | attempts: 1
  summary: Mary is borrowing money from Carter temporarily.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.42s | attempts: 1
  summary: Carter agrees to lend Mary some boxes within an hour; Mary expresses gratitude.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 11.72s | attempts: 1
  summary: Mary requests financial assistance from Carter, who agrees to lend her supplies after an hour.

MODEL 05: openchat:7b
  status: OK | latency: 0.47s | attempts: 1
  summary: Mary i

Processing Samples:   5%|▌         | 41/819 [07:20<2:30:31, 11.61s/it]


SAMPLE 41/819 | ID: samsum_test_0040
MESSAGE (preview): Sebastian: It's been already a year since we moved here. Sebastian: This is totally the best time of my life. Kevin: Really?  Sebastian: Yeah! Totally maaan. Sebastian: During this 1 year I learned more than ever.  Sebastian: I learned how to be resourceful, I'm learning responsibility, and I literally have the power to make my dreams come true. Kevin: It's great to hear that. Kevin: It's great th ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 14.24s | attempts: 1
  summary: Sebastian is reflecting on the past year of his life, expressing gratitude and happiness with his decisions and relationships.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.68s | attempts: 1
  summary: Sebastian reflects on a year of personal growth and happiness, finding support in his friendship with Kevin, who wishes he had similar experience

Processing Samples:   6%|▌         | 51/819 [08:59<2:06:30,  9.88s/it]


SAMPLE 51/819 | ID: samsum_test_0050
MESSAGE (preview): Nick: You look absolutely gorgeous and have a lovely smile.  Nick: Would love to get to know you a bit more. How about we meet up for a drink sometime? Jane: Hmmm... You're shooting a bit above your range aren't you? Nick: Why would you think that hon? Jane: Because I'm not that desperate. Nick: That was a bit below the belt. Nick: You're nice but you're not THAT hot. Jane: Oh is your poor little  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.69s | attempts: 1
  summary: The conversation between Nick and Jane quickly turns from a friendly exchange to a heated argument, with both parties exchanging insults before Nick ends the interaction.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.45s | attempts: 1
  summary: The summary of the exchange is a bitter breakup or rejection, with Nick initiating a date that was rej

Processing Samples:   7%|▋         | 61/819 [10:42<2:10:25, 10.32s/it]


SAMPLE 61/819 | ID: samsum_test_0060
MESSAGE (preview): Salma: <file_video> Salma: the latest cat meme Hugh: oh sweet, I can never get enough of those lol Hugh: <file_photo> Salma: hahaha same
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.26s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.53s | attempts: 1
  summary: Salma and Hugh enjoyed sharing and laughing over a cat meme.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.40s | attempts: 1
  summary: Hugh shares a new cat meme with Salma, who finds it amusing.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.50s | attempts: 1
  summary: Salma and Hugh share cat memes and photos.

MODEL 05: openchat:7b
  status: OK | latency: 0.36s | attempts: 1
  summary: Salma and Hugh share and enjoy cat memes and photos.

MODEL 06: neural-chat:7b


Processing Samples:   9%|▊         | 71/819 [12:22<2:05:15, 10.05s/it]


SAMPLE 71/819 | ID: samsum_test_0070
MESSAGE (preview): Ali: I think I left my wallet at your place yesterday. Could you check?  Mohammad: Give me a sec, I'll have a look around my room. Ali: OK. Mohammad: Found it! Ali: Phew, I don't know what I'd do if it wasn't there. Can you bring it to uni tomorrow? Mohammad: Sure thing.
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.64s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.69s | attempts: 1
  summary: Ali's wallet was found by Mohammad, and he agreed to bring it to Ali at university the next day.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.24s | attempts: 1
  summary: Mohammad found Ali's wallet at his place and agreed to bring it to university the next day.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 

Processing Samples:  10%|▉         | 81/819 [14:03<2:00:49,  9.82s/it]


SAMPLE 81/819 | ID: samsum_test_0080
MESSAGE (preview): Ryan: I have a bad feeling about this Ryan: <file_other> Sebastian: Ukraine... Sebastian: This russian circus will never end... Ryan: I hope the leaders of of nations will react somehow to this shit. Sebastian: I hope so too :(
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.68s | attempts: 1
  summary: The conversation appears to be a discussion between Ryan and Sebastian about their concerns regarding Russia's actions in Ukraine.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.00s | attempts: 1
  summary: Ryan and Sebastian express concern about a situation involving Russia and Ukraine, hoping that global leaders will respond.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.54s | attempts: 1
  summary: Ryan expresses concern about a situation, possibly referring to geopolitical events in Ukraine, and hopes for a response

Processing Samples:  11%|█         | 91/819 [15:49<2:04:47, 10.28s/it]


SAMPLE 91/819 | ID: samsum_test_0090
MESSAGE (preview): Ashleigh: Looks like we're going to the cinema!!  Ashleigh: <file_gif> Peter: You got the job?? Ashleigh: I got hte job! :D Peter: <file_gif> Ashleigh: <file_gif>
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.52s | attempts: 1
  summary: Ashleigh got a new job and they're going to celebrate by going to the cinema.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.62s | attempts: 1
  summary: Ashleigh got a new job and is going to the cinema with Peter.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 2.93s | attempts: 1
  summary: Ashleigh has been offered a new job and is excited about going to the cinema with someone.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.67s | attempts: 1
  summary: Ashleigh celebrates getting a new job and plans to go to the cinema with Peter.

MODEL 05: openchat:7b
  status: OK | latenc

Processing Samples:  12%|█▏        | 101/819 [17:33<2:16:51, 11.44s/it]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 14.07s | attempts: 1
  summary: Chris and Rick plan to Skype each other the next day at 3pm, when Rick is in Havana.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.51s | attempts: 1
  summary: Chris and Rick安排在第二天下午3点进行Skype通话，并请Rick向Helen问好。

MODEL 03: mistral:7b-instruct
  status: OK | latency: 1.05s | attempts: 1
  summary: Chris and Rick discuss the possibility of skyping,

Processing Samples:  14%|█▎        | 111/819 [19:09<2:05:44, 10.66s/it]


SAMPLE 111/819 | ID: samsum_test_0110
MESSAGE (preview): Teacher: Rashi, why are you so low?  Rashi: Ma’am I’m a bit confused about my career.  Teacher: What is your confusion? Rashi: I was discussing with my friends about the career options.  Teacher: Hmm. Rashi: There are too many to choose from. Teacher: Choose a career based on what truly interests you.  Rashi: I have many that interests me. How does it determine the career? Teacher: The passion you ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 12.95s | attempts: 1
  summary: A teacher helps a student named Rashi understand that choosing a career based on personal interest and passion is key to success.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.14s | attempts: 1
  summary: The teacher advises Rashi to choose a career based on her interests and passion, emphasizing the importance of balancing duties with long-ter

Processing Samples:  15%|█▍        | 121/819 [20:52<2:05:16, 10.77s/it]


SAMPLE 121/819 | ID: samsum_test_0120
MESSAGE (preview): John: wanna go see "A Star is Born" on Wed? Joan: sorry can't Joan: super busy  Joan: don't have time for anything :(  John: that's a shame Joan: I'm free on Thursday John: I could do Thursday Joan: ok! so around 8pm? John: sure sounds great John: I'll see where it's palying and send you the details Joan: ok great!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.28s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.75s | attempts: 1
  summary: John and Joan will meet to watch "A Star is Born" on Thursday at 8 pm.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.65s | attempts: 1
  summary: John and Joan discuss plans to watch "A Star is Born" on Thursday at 8 pm.

MODEL 04: gemma:7b-instruct
  status: OK 

Processing Samples:  16%|█▌        | 131/819 [22:29<1:50:52,  9.67s/it]


SAMPLE 131/819 | ID: samsum_test_0130
MESSAGE (preview): Ryan: You're going to the casting?  Ryan: So you think you can dance 🤩 Jack: I am!  Jack: this time im going Ryan: U should go really Jack: I know, wanna come with me?  Ryan: I thought about it!  Jack: Nice well! I will meet you there! 😝😝😝
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.60s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.51s | attempts: 1
  summary: Jack is going to the casting and invites Ryan to come with him.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 2.83s | attempts: 1
  summary: Jack is planning to attend a dance casting and invites Ryan to join him.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.58s | attempts: 1
  summary: Ryan and Jack are planning to attend a casting together.

MODEL 05: openchat:7

Processing Samples:  17%|█▋        | 141/819 [24:11<1:49:05,  9.65s/it]


SAMPLE 141/819 | ID: samsum_test_0140
MESSAGE (preview): Sean: I overslept :/ Sam: Again?? Sean: I know.
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.50s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by this repeated occurrence.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.20s | attempts: 1
  summary: Sean overslept again and feels guilty about it.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.31s | attempts: 1
  summary: Sean has overslept again, acknowledged by Sam with a question and Sean's response confirming the occurrence.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.41s | attempts: 1
  summary: Sean admits to oversleeping for the second time.

MODEL 05: openchat:7b
  status: OK | latency: 2.55s | attempts: 1
  summary: Sean overslept again, and Sam is concerned.

MODEL 06: neural-chat:7b
  status: OK | latency: 0.22s | attempts: 1
  

Processing Samples:  18%|█▊        | 151/819 [26:01<1:54:02, 10.24s/it]


SAMPLE 151/819 | ID: samsum_test_0150
MESSAGE (preview): Mike: will call u back in 2 hrs, ok?  Mike: can't talk right now, sry Mike: *sorry Dale: cool, no problem Dale: until then
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.48s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.61s | attempts: 1
  summary: Mike will call Dale back in two hours and is currently unavailable to talk.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.31s | attempts: 1
  summary: Mike informs Dale that he cannot talk currently and will get back within two hours, to which Dale responds understandingly and agrees to wait.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.51s | attempts: 1
  summary: Mike is unavailable for conversation but will call back later.

MODEL 05: openchat:7b
  status: OK | latency: 5.00s | att

Processing Samples:  20%|█▉        | 161/819 [27:52<2:06:04, 11.50s/it]


SAMPLE 161/819 | ID: samsum_test_0160
MESSAGE (preview): Alexander: Personal request to send me message when you will be in taxi Alexander: If any problem, call me Tom: ;) Tom: Thank You, I appreciate it Alexander: Taxi confirmation below Alexander: <file_photo> Tom: Thank you for the transport, we arrived safely, although without luggages :/ Alexander: Good but bad Tom: Yeeees
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 14.03s | attempts: 1
  summary: Alexander arranged a taxi for Tom and received confirmation that they arrived safely, albeit without their luggage.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.50s | attempts: 1
  summary: Alexander arranged a taxi service and informed Tom with a photo, though Tom's luggage was left behind.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.43s | attempts: 1
  summary: Tom has been informed about the taxi confirmation and appr

Processing Samples:  21%|██        | 171/819 [29:31<1:50:48, 10.26s/it]


SAMPLE 171/819 | ID: samsum_test_0170
MESSAGE (preview): Keith: Hi there kiddo, when are you planning to visit you old parents? :) Laura: Hey Dad, I'm not sure yet. I've been pretty busy recenlty.There is this big project coming… Keith: Oh, I understand, all work and no play…XD Laura: Daad! Don't be mean! You know I treat studying seriously! Keith: I know, you take after your mum :) By the way I think she bought some b-day gift for you… Laura: Next Satu ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Laura and her father Keith discuss her upcoming visit to see him and her mother.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.87s | attempts: 1
  summary: Laura will visit her parents next Saturday and they plan to have a lasagna dinner.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.95s | attempts: 1
  summary: Laura plans to visit h

Processing Samples:  22%|██▏       | 181/819 [31:14<1:55:22, 10.85s/it]


SAMPLE 181/819 | ID: samsum_test_0180
MESSAGE (preview): Rob: Are we meeting up 2morrow? Eve: How about Sunday Eve: stores are open so we can go then Rob: Ok I just don't know what time they close Eve: we have ot go in the moring  Eve: I have some stuff to do around 3 Rob: ok
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 13.61s | attempts: 1
  summary: They agreed to meet on Sunday morning.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.13s | attempts: 1
  summary: Rob and Eve will meet on Sunday morning, with Eve finishing around 3 PM.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.60s | attempts: 1
  summary: Eve suggests meeting up on Sunday morning, but Rob is unsure about the closing time of the stores, and Eve mentions she has something to do around 3 PM.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.58s | attempts: 1
  summary: Rob and Eve plan to meet up on S

Processing Samples:  23%|██▎       | 191/819 [32:58<1:48:45, 10.39s/it]


SAMPLE 191/819 | ID: samsum_test_0190
MESSAGE (preview): Emely: Hey. Could u help me with one task? I have to fill in the gaps and I find it a little bit strange task Titus: How is that strange ? Emely: I have problems to so it haha. Can I send a photo of this task and then I will wrote u my answers? There are 10 sentences Titus: Ok I'm doing something now but I can look at it quickly if it won't take up a lot of my time Emely: Ok Emely: <file_photo> Ti ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Emely asked for help with a task involving filling in gaps, but had trouble sending a photo of the task to Titus.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.29s | attempts: 1
  summary: Emely is asking Titus to help with a task of filling in gaps, but is having trouble seeing the photo of the task clearly.

MODEL 03: mistral:7b-instruct
  sta

Processing Samples:  25%|██▍       | 201/819 [34:47<1:55:42, 11.23s/it]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 13.69s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place around 9am the next morning.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.03s | attempts: 1
  summary: Sam will bring someone over around 9 AM tomorrow.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.35s | attempts: 1
  summary: Sam will arrive around 9 AM the next day and Abdellilah is okay with it.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.54s | attempts: 1
  summary: Abdellilah and Sam discuss plans for the evening, including the arrival of

Processing Samples:  26%|██▌       | 211/819 [36:33<1:52:19, 11.08s/it]


SAMPLE 211/819 | ID: samsum_test_0210
MESSAGE (preview): Yaz: Going to slimming club tonight? Mary: Well, I don't want to, but I really should. I've been so bad, though! Yaz: Me too, choccies, wine, cake, you name it! Mary: Well, we should bite the bullet. She'll tell us off, I expect. Feels like being back at school! Yaz: Well, she IS our old cookery teacher! See you at 6ish, pick you up!? Mary: Yep! See you then!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 12.86s | attempts: 1
  summary: The two friends are planning to attend a slimming club together tonight, despite feeling reluctant.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.97s | attempts: 1
  summary: Both Yaz and Mary are going to the slimming club despite having been less than disciplined recently, and they will meet up at 6.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.82s | attempts: 1
  summary: Both Mary 

Processing Samples:  27%|██▋       | 221/819 [38:17<1:50:23, 11.08s/it]


SAMPLE 221/819 | ID: samsum_test_0220
MESSAGE (preview): Lisa: Hello Peter. What have you been doing off late? Peter: Bit busy with work. Lisa: Too much of work, huh? Peter: Not really, I’ve been working out Lisa:  Ohh, trying to get in shape? Peter: Nah, I don’t have much of weight to lose, just want to improve my health. Lisa:  That’s a good thought. I was also thinking of working out a bit. What do you do? Do you go to a Gym? Peter: Yes, I hold a mem ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.29s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, diet, and sleep habits.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.03s | attempts: 1
  summary: Peter and Lisa discuss their health routines, emphasizing exercise, diet, and sleep as important factors for maintaining good health.

MODEL 03: mistral:7b-instruct
 

Processing Samples:  28%|██▊       | 231/819 [40:03<1:53:31, 11.58s/it]


SAMPLE 231/819 | ID: samsum_test_0230
MESSAGE (preview): Tiffany: buy me a burger on your way home Railey: ok Tiffany: thx, sis :)
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 15.01s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 2.80s | attempts: 1
  summary: Railey agreed to buy Tiffany a burger on his way home.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.39s | attempts: 1
  summary: Railey agreed to buy Tiffany a burger on his way home.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 11.70s | attempts: 1
  summary: Tiffany requested a burger from Railey.

MODEL 05: openchat:7b
  status: OK | latency: 0.47s | attempts: 1
  summary: Tiffany asks Railey to buy her a burger, and Railey agrees.

MODEL 06: neural-chat:7b
  status: OK | latency: 0.49s | attempts: 1
  summary: Tiffany request

Processing Samples:  29%|██▉       | 241/819 [41:45<1:39:24, 10.32s/it]


SAMPLE 241/819 | ID: samsum_test_0240
MESSAGE (preview): Aggie: When do you start work? Peter: at 8  Peter: why? Aggie: no when do you start the new job? Peter: oh on the 6th Aggie: Ok I'll get a babysitter then Peter: lol wanted a free babysitter Aggie: yeah sorry :P
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Aggie and Peter discuss his upcoming work schedule, with Aggie initially planning to hire a babysitter for their child on the day he starts work.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.63s | attempts: 1
  summary: Peter starts his new job on the 6th, which will allow Aggie to arrange babysitting for the kids.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.28s | attempts: 1
  summary: Aggie learns that Peter starts his new job on the 6th, and she will arrange for a babysitter accordingly.

MODEL 04: gemma:7b-instruct
  status: OK

Processing Samples:  31%|███       | 251/819 [43:29<1:45:52, 11.18s/it]


SAMPLE 251/819 | ID: samsum_test_0250
MESSAGE (preview): Shaldona: WE ARE GONNA GET MARRIED ❤️❤️ Shaldona: <file_others> Shaldona: This is our mobile inviation for our wedding. Shaldona: Invitation* Piper: Hey. You haven’t sent me any messages for a few years. Piper: And now you are sending me your wedding invitation  Piper: THROUGH MESSENGER? Shaldona: ..... Shaldona: Well.. Shaldona: I had no enough time to meet everybody and give this in person. Shal ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 14.05s | attempts: 1
  summary: Shaldona sent a wedding invitation via Messenger to an acquaintance who hadn't heard from her in years.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.48s | attempts: 1
  summary: Shaldona is sending Piper a digital wedding invitation after years of silence, which Piper finds hypocritical.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0

Processing Samples:  32%|███▏      | 261/819 [45:11<1:31:33,  9.85s/it]


SAMPLE 261/819 | ID: samsum_test_0260
MESSAGE (preview): Debbie: Help, I don't know which dress to buy! <file_photo> or <file_photo>? Kelly: The red one! It's beautiful. Denise: It is, but the green one will suit you better. Kelly: Why? Debbie looks good in red. Denise: She does, but in my opinion that dress would look better on someone taller. Deb needs a shorter one. Kelly: Right, I haven't thought about it. Debbie: So the green one? Denise: Definitel ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.69s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.57s | attempts: 1
  summary: Debbie is deciding between a red and green dress, with friends recommending the green one as it suits her more despite looking good on red too, while Ke

Processing Samples:  33%|███▎      | 271/819 [46:50<1:28:53,  9.73s/it]


SAMPLE 271/819 | ID: samsum_test_0270
MESSAGE (preview): Conrad: I'm outside the house Conrad: I forgot my keys... Rebecca: 💩 Tiffany: I'll be home at 10-11 Rebecca: I'm coming back even later Conrad: Oh no... Conrad: I'll wait in the coffee shop
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.86s | attempts: 1
  summary: Conrad is locked out of the house and will wait for someone to let him in.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.68s | attempts: 1
  summary: Conrad is waiting at a coffee shop because he forgot his keys and won't be let in by Rebecca or Tiffany.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.73s | attempts: 1
  summary: Conrad is outside the house without his keys, Tiffany will be home between 10 and 11, but Rebecca will return later, causing Conrad to wait at a coffee shop.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.63s | attempts: 1
 

Processing Samples:  34%|███▍      | 281/819 [48:42<1:42:30, 11.43s/it]


SAMPLE 281/819 | ID: samsum_test_0280
MESSAGE (preview): Stan: She replied :-) Dave: She did? Stan: <file_photo> Dave: Lucky you! Stan: I can't believe it! She's my dream come true! Dave: Good luck today! Where are you going to take her? Stan: Pat&Gill's Dave: Good choice. Let me know how it was :-) Stan: I will. Dave: In minute detail :-) Stan: Forget it!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 13.96s | attempts: 1
  summary: Stan has a date with his dream girl and is excitedly sharing the details with Dave.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.46s | attempts: 1
  summary: Stan is excited to take his crush to Pat&Gill's and can't believe she agreed to go out with him.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.82s | attempts: 1
  summary: Stan received a positive response from someone, who he considers his dream come true, and they are planning to go to P

Processing Samples:  36%|███▌      | 291/819 [50:26<1:30:47, 10.32s/it]


SAMPLE 291/819 | ID: samsum_test_0290
MESSAGE (preview): Vincent: <file_photo> Damian: What happened to your lamp? Vincent: I broke it xD  Vincent: With my bare hand Damian: You didn't do this just to show off did you? Vincent: Hahaha. xD No. Vincent: I was playing with my cat with a ribbon Vincent: And while raising my hand I just hit the lamp and the glass cover broke Damian: Shit happens. You ordered new one yet? Vincent: Yeah. Should be ready to pic ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.61s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.46s | attempts: 1
  summary: Vincent accidentally broke his lamp with his hand while playing with his cat, but he has already ordered a new one to be picked up on Tuesday.

MODEL 03: mistral:7b-instruct
  status: OK | 

Processing Samples:  37%|███▋      | 301/819 [52:15<1:32:00, 10.66s/it]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages as Jerry heads home from work.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.13s | attempts: 1
  summary: Jerry and Janet are looking forward to seeing each other after Jerry gets home from 40 minutes away, while discussing Janet's frustrating day at work.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.60s | attempts: 1
  summary: Jerry and Janet are lo

Processing Samples:  38%|███▊      | 311/819 [53:59<1:24:59, 10.04s/it]


SAMPLE 311/819 | ID: samsum_test_0310
MESSAGE (preview): Sharon: My mum knows Ahmed: Aaaand? Sharon: She’s angry, I’m grounded Ahmed: For fuck’s sake you’re 19 Sharon: But I still live with my parents, let me remind you -_- Ahmed: So move out Sharon: It’s not that easy, hello Ahmed: Move in with me, we would be together all the time :* Sharon: I’d have to go to work, I’m still a student Ahmed: Don’t worry, I’ll take care of you Sharon: My parents will s ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.42s | attempts: 1
  summary: Sharon's 19-year-old daughter is grounded by her parents and Sharon's boyfriend Ahmed suggests she move in with him, but she refuses due to her responsibilities as a student.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.34s | attempts: 1
  summary: Sharon and Ahmed discuss Sharon's difficult situation with her parents and struggle over whether 

Processing Samples:  39%|███▉      | 321/819 [55:46<1:23:50, 10.10s/it]


SAMPLE 321/819 | ID: samsum_test_0320
MESSAGE (preview): Andre: i just read the news about the bear attack on the zoo :-( Andre: yikes - i would've never had imagine something like that could happen Megan: that's why I always say you can't keep animals y cages!!! Megan: it's cruel and it's wrong!!! Megan: i'm not surprised the bear reacted that way
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Andre is expressing sadness about a bear attack at the zoo, while Megan believes keeping animals in cages is cruel and contributes to such incidents.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.03s | attempts: 1
  summary: Andre expresses shock over a bear attack at the zoo, while Megan criticizes keeping animals in captivity and speculates about the bear's reaction.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 3.12s | attempts: 1
  summary: Megan expre

Processing Samples:  40%|████      | 331/819 [57:33<1:22:57, 10.20s/it]


SAMPLE 331/819 | ID: samsum_test_0330
MESSAGE (preview): Bella: It's valentine's day!😁😁😁 Aria: For somebody without bf today is kinda miserable day.....😢😢 Bella: There are a lot of vendors selling roses on the street here. Bella: <file_photo> Aria: <file_gif>  Bella: hahahahahahaha!! That looks SO SAD! :'‑(:'‑(:'‑( Aria: I feel like the weather is colder than it really is. How's the weather there? Bella: Here? it's 3 degree today. Wow! Even coffee shops ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: The conversation is between two friends, Bella and Aria, on Valentine's Day, discussing their respective plans for the day and looking forward to a future reunion in Warsaw.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.99s | attempts: 1
  summary: Aria feels sad on Valentine's Day without a boyfriend, while Bella plans to order pizza and go for

Processing Samples:  42%|████▏     | 341/819 [59:21<1:28:03, 11.05s/it]


SAMPLE 341/819 | ID: samsum_test_0340
MESSAGE (preview): Joel: Hey, how are you guys doing? How's Coco? Becky: She's actually sitting on my lap purring right now. Trent: confirmed Joel: no way :D pics or didn't happen! Becky: <file_photo> Joel: haha! It took you guys 2 days to win her over. She's usually not that easy. Trent: maybe she remembers us visiting you? Joel: Maybe... oh, man, I'm jealous ;) Is she behaving ok? Becky: She's a total gem. Absolut ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 13.07s | attempts: 1
  summary: Becky and Trent are caring for Joel's cat, Coco, while he is away and everything seems to be going well.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.22s | attempts: 1
  summary: Becky's cats, Trent and Coco, have finally won Coco over after two days, though she still occasionally tries to scratch the couch.

MODEL 03: mistral:7b-instruct
  st

Processing Samples:  43%|████▎     | 351/819 [1:00:58<1:23:21, 10.69s/it]


SAMPLE 351/819 | ID: samsum_test_0350
MESSAGE (preview): Noah: Hi there! The quince we talked about the other day... Are you still interested? Sophie: Hello Noah, but of course I am.Thank you. Noah: Actually William went to collect the rest of them for you immediately. They're in a basket on our terrace, so you call collect them any time. Sophie: That's very very kind of him! He's really a darling. Sophie: We won't be going your direction any time soon  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 14.62s | attempts: 1
  summary: Noah and Sophie discuss collecting some quinces that William collected from their terrace, and also touch on William's career-focused personality and potential future plans.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 6.03s | attempts: 1
  summary: Sophie and Noah discuss collecting quinces, catch up on each other's families, and briefly touch 

Processing Samples:  44%|████▍     | 361/819 [1:02:41<1:14:43,  9.79s/it]


SAMPLE 361/819 | ID: samsum_test_0360
MESSAGE (preview): Chandler: Phoebe!! Do you have money??  Phoebe: Yes I have .. But why do you need it.. Chandler: Open your door.. And pay the delivery guy standing outside my door.. Phoebe: Oh chandler !! you idiot... On my way..
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.57s | attempts: 1
  summary: Chandler needs Phoebe to pay for a delivery that was mistakenly left at his door.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.38s | attempts: 1
  summary: Chandler asks Phoebe to let the delivery person outside his door collect her payment.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.23s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person at his door, as he is out of money.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.54s | attempts: 1
  summary: Chandler instructs Phoebe to pay a delivery guy outsi

Processing Samples:  45%|████▌     | 371/819 [1:04:28<1:16:53, 10.30s/it]


SAMPLE 371/819 | ID: samsum_test_0370
MESSAGE (preview): Henry: Do you know what happened to cafe Kohaku? Julia: The one near Covent Garden? Henry: Yeah. I wanted to reserve a table, but I can't find them anywhere. Their Facebook page disappeared. Julia: I'm afraid they're closed :( Henry: For good?! Julia: Afraid so. I've been there last week and wanted to take my cousin there, but the place's empty Julia: Probably the rent was too high. Henry: Oh no,  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed permanently due to high rent costs.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.02s | attempts: 1
  summary: Cafe Kohaku has closed down, leaving both Henry and Julia disappointed, as they had enjoyed it in that location and were looking forward to trying their brownie.

MODEL 03: mistral:7b-instruct
 

Processing Samples:  47%|████▋     | 381/819 [1:06:16<1:15:12, 10.30s/it]


SAMPLE 381/819 | ID: samsum_test_0380
MESSAGE (preview): Don: Hi Cindy. Have you made all arrangements? Cindy: It's about today's meeting or your trip next week? Don: Both, I suppose:) Cindy: You have meeting with management board today at 2 pm. Don: Where did you set it up? Cindy: In our conference room. Cindy: Catering will bring some tea, coffee and snacks. Don: That's good. Don: Did everybody got the agenda? Cindy: Yep. Don: How did Andy react when  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Don and Cindy discuss arrangements for both Don's meeting with management today and his upcoming trip next week.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.31s | attempts: 1
  summary: Don and Cindy are discussing and confirming arrangements for both today's meeting with the management board and Don's upcoming trip, with all details mostly ta

Processing Samples:  48%|████▊     | 391/819 [1:08:04<1:18:06, 10.95s/it]


SAMPLE 391/819 | ID: samsum_test_0390
MESSAGE (preview): Ost: I bought a garage place in the garage. Vesna: Cool. Ost: I got it cheap. It's my intention to rent it out and do the restitution of my money. Vesna: Well doneeeee!. Ost: There is a big crowd in parking, so the prices for finding garage places are favorable, in this area. Vesna: Really!. Ost: Return on investment is much higher than if I give money under term savings in the Bank. Vesna: You ma ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.55s | attempts: 1
  summary: Ost has bought a garage place at a cheap price in Belgrade, planning to rent it out for better returns than bank savings, due to growing vehicle ownershi

Processing Samples:  49%|████▉     | 401/819 [1:09:47<1:10:47, 10.16s/it]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Susan had a fun date with a man who took her rollerskating at a disco.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.90s | attempts: 1
  summary: Susan had a great date with an elegantly dressed man who took her to a rollerskating disco, which she enjoyed as a change from typical restaurant dates.

MODEL 03: mistral:7b-instruct
  status: OK | la

Processing Samples:  50%|█████     | 411/819 [1:11:27<1:06:11,  9.73s/it]


SAMPLE 411/819 | ID: samsum_test_0410
MESSAGE (preview): Ryan: Merry Christmas everybody! Nick: Merry Xmas! Chris: Merry Xmas to you! Chris: Where are you btw? Ryan: visiting family in Manchester  Ryan: white a disaster hahhaha Chris: We stayed home with Ann, first time on our own! Chris: and I've liked it so far very much
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.87s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.82s | attempts: 1
  summary: Ryan is visiting his family in Manchester during the holidays, which he considers a disaster, while Chris and Nick spent Christmas at home with Ann for the first time and enjoyed it.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.72s | attempts: 1
  summary: Everyone is wishing each other a Merry Christmas. Ryan is visiting family 

Processing Samples:  51%|█████▏    | 421/819 [1:13:17<1:10:24, 10.62s/it]


SAMPLE 421/819 | ID: samsum_test_0420
MESSAGE (preview): Zack: Dude have you seen the new Jersey Shore season? Dwayne: Didn't even knew a new one came out xD Zack: Ahahah no surprise there. But yeah I watched a couple of episodes and it was hilarious. Dwayne: It's the original cast? Zack: Yep. Snooki, J-Wow, Pauly D, Ron, The Situation, Deena and Vinny. Dwayne: I herad The Situation was having some bad legal problems.. Zack: Yeah and he still is. Dwayne ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing surprise at its release.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.49s | attempts: 1
  summary: Zack recommends Dwayne watch the new season of Jersey Shore, especially since it features the original cast including The Situation, 

Processing Samples:  53%|█████▎    | 431/819 [1:15:06<1:14:46, 11.56s/it]


SAMPLE 431/819 | ID: samsum_test_0430
MESSAGE (preview): Nathan: Hey Honey, what are our plans for tomorrow then? Deborah: So I've been thinking that we should pack everything today, we won't have time tomorrow Nathan: Totally true, I started packing my stuff in the bedroom, I also told the girls to start packing too Deborah: If Sofie wants to take the unicorn we won't have much space in the car XD Nathan: No, I explained to her that he'll be fine witho ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.68s | attempts: 1
  summary: The couple is making plans to pack and prepare for a family trip to the forest, where they will hike with their parents while Nathan and Deborah have some alone time.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.59s | attempts: 1
  summary: Nathan and Deborah are making plans to go on a family trip with some alone time, while also taking care o

Processing Samples:  54%|█████▍    | 441/819 [1:16:47<1:04:38, 10.26s/it]


SAMPLE 441/819 | ID: samsum_test_0440
MESSAGE (preview): Andrew: Have you got the paper from the office? Paula: Not yet, I'm going there tomorrow. Andrew: You'd better... The deadline for the application is Friday at noon. Paula: Ok, sure! I promise I won't forget it!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.55s | attempts: 1
  summary: Andrew reminds Paula to pick up a paper from the office by Friday at noon.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.70s | attempts: 1
  summary: Paula will pick up the paper tomorrow before the Friday noon deadline.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.67s | attempts: 1
  summary: Paula has not obtained the paper from the office yet and plans to get it tomorrow, but she needs to remember that the application deadline is on Friday at noon.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.73s | attempts: 1
  summary

Processing Samples:  55%|█████▌    | 451/819 [1:18:31<1:01:46, 10.07s/it]


SAMPLE 451/819 | ID: samsum_test_0450
MESSAGE (preview): Grace: i didn't get to talk to you a lot at the party last night Audrey: i know, i really wanted to catch up with you! :-( Audrey: what's going on in your life? Audrey: anything new? Grace: not really, i'm excited about graduating soon Grace: i think i'll apply for grad school but i'm still not sure Audrey: you always have liked school lol Grace: i've also thought of taking a year off Audrey: that ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Grace and Audrey caught up briefly at a party, discussing Grace's plans after graduation, including possibly attending grad school or taking a year off.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.21s | attempts: 1
  summary: Grace and Audrey catch up about Grace's plans to graduate and possibly take a year off or apply for grad school.

MODEL

Processing Samples:  56%|█████▋    | 461/819 [1:20:11<58:21,  9.78s/it]  


SAMPLE 461/819 | ID: samsum_test_0460
MESSAGE (preview): Trinny: What time do we meet? Susannah: 5, 6? Linda: six, please? I need time to get ready after work;) Trinny: 6 is good! Susannah: Ok Susannah: at the main entrance? Trinny: Ok Linda: see you Ladies! Trinny: :*
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.61s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.32s | attempts: 1
  summary: The group has arranged to meet at 6 PM at the main entrance.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.27s | attempts: 1
  summary: The group has agreed to meet at the main entrance at 6 PM, and Linda needs extra time to get ready after work.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.62s | attempts: 1
  summary: The group has confirmed their meeting time and location at 6 PM at the main

Processing Samples:  58%|█████▊    | 471/819 [1:21:57<56:46,  9.79s/it]


SAMPLE 471/819 | ID: samsum_test_0470
MESSAGE (preview): Rob: I wanna start Get off the couch challenge! Who's with me? I'm gonna start with 30 min of activity every day. Greg: way ahead of you mate! at least 60 min a day! Rob: good for you! 30 min a day is gonna be challenging for me! Greg: good luck then! Will: don't be too hard on yourself. better to be realistic. fingers crossed! Anna: i'm in! 30 min sounds reasonable!  Greg: look for variety of exe ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.77s | attempts: 1
  summary: Rob has started a "Get off the couch challenge" and invited his friends to join him in doing at least 30 minutes of physical activity every day.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.43s | attempts: 1
  summary: Rob is starting a 30-minute daily activity challenge, with friends offering varying levels of support and encouragement.

MODEL

Processing Samples:  59%|█████▊    | 481/819 [1:23:38<57:12, 10.15s/it]  


SAMPLE 481/819 | ID: samsum_test_0480
MESSAGE (preview): Drade: Mum he got out of the group chatting room Marenda: I am sure he is angry! Marenda: Why did you act that aggressively? Marenda: You know your brother's temper Drade: I just said what he had said was wrong! Marenda: Apologize and send the invitation to him Drade: Why should I? Drade: I didn't say anything wrong!!!!! Drade: He failed in his business, didn't he? Marenda: But did you have to tal ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: A brother and sister are arguing over Drade's harsh comments about their brother's failed business venture.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.26s | attempts: 1
  summary: Drade and Marenda are discussing a conflict with their brother over a group chat message, with Drade refusing to apologize.

MODEL 03: mistral:7b-instruct
  status:

Processing Samples:  60%|█████▉    | 491/819 [1:25:22<53:46,  9.84s/it]


SAMPLE 491/819 | ID: samsum_test_0490
MESSAGE (preview): Matt: Do you have Patrick's phone number? He's not picking up my calls... Matt: Is he mad at me? Mary: Sure, here you go: +44 7700 900077 Susan: He changed his number :) Matt: phew, thanks!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.77s | attempts: 1
  summary: Matt asked for Patrick's phone number because he wasn't answering his calls.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.54s | attempts: 1
  summary: Matt gets Patrick's new phone number from Mary and Susan after worrying about a potential conflict.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.69s | attempts: 1
  summary: Matt is asking if Patrick might be upset with him because he's not answering calls; Mary provides Patrick's new phone number and Susan informs that it's a new number.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.52s | attempt

Processing Samples:  61%|██████    | 501/819 [1:27:12<53:45, 10.14s/it]  


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.57s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper and tissues.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.35s | attempts: 1
  summary: Helen asks Simon to get her some tissues because they are out of toilet paper.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.19s | attempts: 1
  summary: Helen needs more toilet paper and asks Simon to bring her tissues instead.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.58s | attempts: 1
  summary: Helen re

Processing Samples:  62%|██████▏   | 511/819 [1:28:52<50:02,  9.75s/it]


SAMPLE 511/819 | ID: samsum_test_0510
MESSAGE (preview): Mum: hungry? Steve: starving... Mum: I'll wrap something up for you. Steve: You're the best, thank you!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.57s | attempts: 1
  summary: Steve is very hungry and his mother offers to prepare food for him.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.36s | attempts: 1
  summary: Steve is very hungry and grateful that his mum is making him a meal.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.16s | attempts: 1
  summary: Steve is hungry and appreciates his mother preparing food for him.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.57s | attempts: 1
  summary: Steve expresses extreme hunger and his mother offers to prepare food for him.

MODEL 05: openchat:7b
  status: OK | latency: 4.93s | attempts: 1
  summary: Mum offers to make food for Steve who is very hungry.


Processing Samples:  64%|██████▎   | 521/819 [1:30:36<54:04, 10.89s/it]


SAMPLE 521/819 | ID: samsum_test_0520
MESSAGE (preview): Jair: Still busy? Callum: Yes a little sorry Jair: ok
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 12.73s | attempts: 1
  summary: Jair asked if Callum was still busy and received a brief response indicating he was.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.55s | attempts: 1
  summary: Callum is still a bit busy and Jair acknowledges it.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.73s | attempts: 1
  summary: Jair asks if Callum is still busy, and Callum responds affirmatively, expressing regret. Jair acknowledges this. (Shortened response: "Callum confirms he's a bit busy and apologizes.")

MODEL 04: gemma:7b-instruct
  status: OK | latency: 9.38s | attempts: 1
  summary: Callum is still working but feels slightly sorry.

MODEL 05: openchat:7b
  status: OK | latency: 0.45s | attempts: 1
  summary: Jair and Cal

Processing Samples:  65%|██████▍   | 531/819 [1:32:19<48:19, 10.07s/it]


SAMPLE 531/819 | ID: samsum_test_0530
MESSAGE (preview): Adam: I have a juicy gossip for you guys ;) Borys: About whom? Adam: So apparently, Iga planned a weekend getaway with her boyfriend, but they had to cancel because he couldn't talk his group into changing the date of the presentation :D Katia: And she is pissed? Adam: What do you think? That is why they aren't sitting together since Monday ;) Katia: Who would have expected such a gossip girl from ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work obligations, causing tension between them.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.90s | attempts: 1
  summary: Adam shared that Iga had to cancel her weekend getaway because her boyfriend couldn't change his group's presentation date, leading to tension between them.


Processing Samples:  66%|██████▌   | 541/819 [1:34:07<51:53, 11.20s/it]


SAMPLE 541/819 | ID: samsum_test_0540
MESSAGE (preview): Chris: I tried jogging today! Matt: and the key word is "tried"? Matt: not to mention that you have the weirdest habit to stat the conversation 5 minutes before my work ends :P Matt: good (or bad) thing is that I need to stay a bit longer today ;) Chris: Sorry :P Chris: I have no other friends ;) Matt: it's not a problem for me, you're the one getting a short conversation ;) Chris: Anyway, I was c ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.57s | attempts: 1
  summary: Chris and Matt have a conversation about Chris trying to jog again after a break, with Matt providing some humorous skepticism and encouragement.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.29s | attempts: 1
  summary: Chris shared his experience jogging for the first time in a while, while Matt provided some humorous and relatable commentary 

Processing Samples:  67%|██████▋   | 551/819 [1:35:48<44:48, 10.03s/it]


SAMPLE 551/819 | ID: samsum_test_0550
MESSAGE (preview): Jeremih: hey, tell your sis to text back Hansel: haha, thats your issues bro, dont drag me into it Jeremih: she's mad at me Hansel: for what Jeremih: i dont even know😔 Hansel:😢😂 Jeremih: youre laughing Hansel: haha, ill tell her but next time i wont interfere Jeremih: Okay bro, thanks
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.66s | attempts: 1
  summary: Jeremih is having a dispute with his sister and asks Hansel to help mediate.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.70s | attempts: 1
  summary: Jeremih's sister is mad at him, and Hansel will pass on the message but won't get involved.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.31s | attempts: 1
  summary: Jeremih is having a communication issue with his sister and Hansel offers to help, but later states he won't get involved in the future.

MODEL 04:

Processing Samples:  68%|██████▊   | 561/819 [1:37:34<42:18,  9.84s/it]


SAMPLE 561/819 | ID: samsum_test_0560
MESSAGE (preview): Oscar: A coffee at Tristano's? Payne: Why not. in 15 mins? Oscar: let's make it half an hour ok? Payne: great, i'll be there Oscar: see you there
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.58s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.42s | attempts: 1
  summary: Oscar and Payne have rescheduled their coffee meeting to half an hour and will meet at Tristano's.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.24s | attempts: 1
  summary: Oscar and Payne plan to meet at Tristano's for a coffee in 30 minutes.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.59s | attempts: 1
  summary: Oscar and Payne are planning to meet for coffee at Tristano's in 30 minutes.

MODEL 05: openchat:7b
  status: OK | latency: 2.63s

Processing Samples:  70%|██████▉   | 571/819 [1:39:19<42:04, 10.18s/it]


SAMPLE 571/819 | ID: samsum_test_0570
MESSAGE (preview): Emily: shit, I'm waiting to see my doctor and you won't believe who's waiting with me Emily: <file_gif> Adam: who then? Emily: a nun and a blind crazy man Emily: it feels spooky Adam: haha XD
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised by the unusual people around her, including a nun and a blind man.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.74s | attempts: 1
  summary: Emily is waiting to see the doctor with a nun and a blind man, which she finds spooky.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 8.66s | attempts: 1
  summary: Emily is unexpectedly waiting with an unusual pair, a nun and a blind man, at the doctor's office, finding the situation somewhat eerie.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.70s | attemp

Processing Samples:  71%|███████   | 581/819 [1:40:57<38:00,  9.58s/it]


SAMPLE 581/819 | ID: samsum_test_0580
MESSAGE (preview): Ella: did you send me mms?  i can't open it for some reason :/ Caleb: yeah, I sent you a pic of Maya Caleb: she's playing a mouse in a school play and I wanted to show you her costume Caleb: she looks sooo adorable in it! :-) Ella: aww...:) can you send me this photo via email? Caleb: done :-)
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and they discussed how to share the photo with Ella.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.93s | attempts: 1
  summary: Caleb sent Ella a picture of Maya's adorable mouse costume from the school play, which Ella requested to open via email.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.91s | attempts: 1
  summary: Caleb sent Ella a picture of Maya in her mouse costume for the 

Processing Samples:  72%|███████▏  | 591/819 [1:42:44<41:10, 10.84s/it]


SAMPLE 591/819 | ID: samsum_test_0590
MESSAGE (preview): Ethan: who's going to see solstafir? Noah: solstafir! when? Noah: i had no idea they were playing Archie: I'm not :D. Leo: i'm going ofc! Ethan: 21.11 Noah: that's a wednesday... ehh... Noah: i'll have to see, might have to stay late at work... Leo: nooo you have to come Ethan: yeah you have to! Archie: There are better concerts :P. Noah: i know, i would really like to see them, but yeah Leo: arch ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.24s | attempts: 1
  summary: The group is discussing attending a Solstafir concert on November 21.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.96s | attempts: 1
  summary: Noah is considering going to see Solstafir on Wednesday despite potentially having to stay late at work.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.98s | attempts: 1
  summary: Noah is uns

Processing Samples:  73%|███████▎  | 601/819 [1:44:27<36:40, 10.09s/it]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: The group is discussing travel requirements for a trip to Bosnia and Herzegovina, where a passport will be necessary due to border control.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.59s | attempts: 1
  summary: The group is traveling to Bosnia and Herzegovina and emphasizes the importance of bringing passports for proper entry.

MODEL 03: mi

Processing Samples:  75%|███████▍  | 611/819 [1:46:11<34:32,  9.96s/it]


SAMPLE 611/819 | ID: samsum_test_0610
MESSAGE (preview): James: im out Tony: ??? James: from the team, this old prick pissed me off Tony: listen, I know the coach is pretty annoying at times (all the time lol) but he really is good James: i dont give a fuck, too much for me, already texted him Tony: shiiiit man
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 5.88s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 9.06s | attempts: 1
  summary: James has quit the team due to disagreements with the coach.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.69s | attempts: 1
  summary: James has decided to leave the team due to frustration with a team member, specifically the coach. Tony acknowledges the coach can be annoying but also praises his abilities, while expressing concern upon hearing James' deci

Processing Samples:  76%|███████▌  | 621/819 [1:47:57<35:00, 10.61s/it]


SAMPLE 621/819 | ID: samsum_test_0620
MESSAGE (preview): Ken: Hey, how are you doing today? Ellen: Fine. How are you? Ken: Not doing well at all today. Really depressed. Frustrated. Stressed out. Ellen: Oh no! Ken: Really bad day yesterday carrying over to today. Ellen: Can I do anything? Ken: I'll be fine. Just overwhelmed and burned out. Ellen: Sorry! Ken: Sorry, don't mean to bring you down. Ellen: You didn't, just worried about you!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.73s | attempts: 1
  summary: Ken is having a bad day due to feeling depressed, frustrated, stressed, and burned out.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.85s | attempts: 1
  summary: Ken is feeling depressed, frustrated, and stressed due to an overwhelming and burning-out situation from a bad day that carried over into today.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.69s | attempt

Processing Samples:  77%|███████▋  | 631/819 [1:49:43<32:54, 10.50s/it]


SAMPLE 631/819 | ID: samsum_test_0630
MESSAGE (preview): JP: What's new? Samara: Not much. JP: Oh. Samara: Want to play? JP: Nah. Gotta go to work. Samara: K, see ya!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.50s | attempts: 1
  summary: JP and Samara briefly chat about playing together but JP has to go to work instead.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.64s | attempts: 1
  summary: JP and Samara briefly consider playing but JP has to go to work instead.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.37s | attempts: 1
  summary: Samara suggests playing but JP declines as they have to go to work.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.66s | attempts: 1
  summary: Samara and JP briefly discuss their plans, with JP having to leave for work.

MODEL 05: openchat:7b
  status: OK | latency: 8.17s | attempts: 1
  summary: Samara and JP briefly discu

Processing Samples:  78%|███████▊  | 641/819 [1:51:32<33:19, 11.23s/it]


SAMPLE 641/819 | ID: samsum_test_0640
MESSAGE (preview): Lorena: Hi, can you help me with something? Martin: Well, I can try. Martin: Depends what it is? Lorena: I got a new desk and it comes with assembly instructions but I give up, I just can't do it. Lorena: I'm illiterate when it comes to instructions, haha. Lorena: So I could really use some help... Martin: Hmm, I can't today, but how about tomorrow? Shouldn't take long anyway. Lorena: Yeah, it's f ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 3.23s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.65s | attempts: 1
  summary: Martin will help Lorena assemble her new desk tomorrow.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.86s | attempts: 1
  summary: Lorena asks Martin for help assembling a n

Processing Samples:  79%|███████▉  | 651/819 [1:53:15<30:20, 10.83s/it]


SAMPLE 651/819 | ID: samsum_test_0650
MESSAGE (preview): Jake: Maaaan! you in london?! Mike: <file_photo> Mike: yeah mate! dreams coming true! Jake: cool! congrats!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 12.71s | attempts: 1
  summary: Jake and Mike are congratulating each other on Mike's trip to London.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.66s | attempts: 1
  summary: Mike is in London, realizing a dream, and Jake congratulates him.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.61s | attempts: 1
  summary: Mike confirms he is in London, implying his dream of visiting the city has come true.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 9.40s | attempts: 1
  summary: Jake and Mike are excited to be in London together.

MODEL 05: openchat:7b
  status: OK | latency: 0.66s | attempts: 1
  summary: Jake is excited to learn that Mike is in London, and they

Processing Samples:  81%|████████  | 661/819 [1:54:54<25:08,  9.55s/it]


SAMPLE 661/819 | ID: samsum_test_0660
MESSAGE (preview): Miley: don't want to go to work tomorrow! Miley: wanna sleep!!! Miley: need some rest :( Aaron: know what u feelin Aaron: really want some holiday already Miley: :( Aaron: fighting girl!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.20s | attempts: 1
  summary: Miley and Aaron are both feeling tired and want a break from work.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 4.97s | attempts: 1
  summary: Miley is tired and wants a break, with Aaron showing understanding and support.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.30s | attempts: 1
  summary: Miley and Aaron both express a desire for rest and a break from work.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.47s | attempts: 1
  summary: Miley is feeling exhausted and longing for a vacation.

MODEL 05: openchat:7b
  status: OK | latency: 2.56s | atte

Processing Samples:  82%|████████▏ | 671/819 [1:56:42<25:40, 10.41s/it]


SAMPLE 671/819 | ID: samsum_test_0670
MESSAGE (preview): Autumn: u sewn anything? Autumn: show me show me!! :D Autumn: <file_gif> Israel: no :( Israel: haven't had the time yet  ;( Autumn: ooh :(  Autumn: u need to have pririorities in life! Autumn: like me - i haven't checked the tests for a month because the dress wouldn't sew itself :D Israel: you've sewn a dres?? Israel: you didn't show it to me Autumn: oh didn't i? Autumn: there will be some photos ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Autumn and Israel are discussing Autumn's sewing projects, with Autumn teasingly showing off her work.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.04s | attempts: 1
  summary: Autumn has sewn a dress and is sharing her sewing progress with Israel, who expresses excitement to see more.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 9.04

Processing Samples:  83%|████████▎ | 681/819 [1:58:28<23:26, 10.19s/it]


SAMPLE 681/819 | ID: samsum_test_0680
MESSAGE (preview): Lesley: Can you please go on a walk with the dog when you get home? Chloe: I don't know when I'm going to be home though. Lesley: What do you mean? Doesn't school finish up at 3PM? Chloe: Yes, but I was going to go to Megan's house tonight. Lesley: Chloe, you didn't ask me if you could go. You cannot just assume that you are going.  Chloe: Mom, seriously. I already told her I was going to come ove ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: Lesley tells Chloe she must come home after school to let the dog out, but can then go visit Megan afterwards.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.50s | attempts: 1
  summary: Chloe agrees to walk the dog after visiting Megan, despite initial resistance, under her mother Lesley's conditions.

MODEL 03: mistral:7b-instruct
  status: OK 

Processing Samples:  84%|████████▍ | 691/819 [2:00:14<22:12, 10.41s/it]


SAMPLE 691/819 | ID: samsum_test_0690
MESSAGE (preview): Mia: God, he keeps talking about the stupid wall Jennifer: I think he's just senile Karine: unfortunately it's much more Karine: this populist strategy to scare people with a minority Karine: and then present himself as the only savior Peter: Karine is right, I think he's not as senile and stupid as we would like to believe Mia: so it's even worse - he's pure evil Peter: possibly yes Karine: I rea ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 1.23s | attempts: 1
  summary: The group is discussing Donald Trump, with some members calling him "pure evil" and others labeling him as "stupid", but all agreeing that he poses a significant threat to the world.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.20s | attempts: 1
  summary: The group discusses their concerns about a populist leader, agreeing he is both dangerous

Processing Samples:  86%|████████▌ | 701/819 [2:01:58<19:32,  9.94s/it]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 9.24s | attempts: 1
  summary: Charlie and Daisy, who had a romantic evening together in Sicily, plan to meet again in Florence on September 1-5.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.92s | attempts: 1
  summary: Charlie and Daisy plan to meet in Florence from September 1-5 to explore the city together.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 1.19s | at

Processing Samples:  87%|████████▋ | 711/819 [2:03:48<18:53, 10.49s/it]


SAMPLE 711/819 | ID: samsum_test_0710
MESSAGE (preview): Shanon: Yeah. So what bout him? Victor: He's no longer our teacher :) Oliver: I'm so happy :) Sid: Me too! Shanon: Bt y? Sid: He got fired 2day and thrown out of school? Oliver: Literally thrown out! Shanon: What?! Victor: I suppose he's had it and lost it. Oliver: Or maybe had one too many ;) Sid: Not. Our. Teacher. Anymore!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.51s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.67s | attempts: 1
  summary: Their teacher was fired and thrown out of the school, and the students are celebrating.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.37s | attempts: 1
  summary: The group is happy because their teacher has been fired from the school.

MODEL 04: gemma:7b-instruct
  status: OK |

Processing Samples:  88%|████████▊ | 721/819 [2:05:29<17:04, 10.45s/it]


SAMPLE 721/819 | ID: samsum_test_0720
MESSAGE (preview): Greg: Hi Mum, how's the cat doing? Patti: I just rang the vets, she's fine! Greg: Thank God, been worrying about her all day! Patti: They said I can pick her up later. I'll pop home and fetch the cage after work. Should be there at 5ish. Greg: Good, see you at home, bye!
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.61s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.92s | attempts: 1
  summary: Patti will pick up their cat from the vet's office around 5 PM after work.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.36s | attempts: 1
  summary: Greg is relieved to hear that his cat is fine after Patti visited the vet and will pick her up later.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.57s | attempts: 1
  su

Processing Samples:  89%|████████▉ | 731/819 [2:07:11<14:25,  9.83s/it]


SAMPLE 731/819 | ID: samsum_test_0730
MESSAGE (preview): Molly: D‑': Luca: What??? Molly: I can't think anymore today! Luca: LOL! Molly: Seriously! Luca: Deep breath... Molly: It's the silly season, isn't it? Luca: Yep. Just hang in there. Molly: Trying... Luca: Try harder! LOL! Molly: Driving me nuts! Luca: I know, but you can't let it get you down! Molly: Yeah, you're right. Luca: Of course I am! LOL! Molly: Modest, too.
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated with the holiday season, but Luca tries to cheer her up.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.42s | attempts: 1
  summary: Molly is struggling with stress during the busy season, and Luca is trying to encourage her to stay positive.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.26s | attempts: 1
  summary: Molly is feeling over

Processing Samples:  90%|█████████ | 741/819 [2:09:00<13:58, 10.75s/it]


SAMPLE 741/819 | ID: samsum_test_0740
MESSAGE (preview): Joyce: Honey, can I take the car? Kyle: No, I'm sorry. Joyce: Why not? How am I supposed to go get Harry? Kyle: I've got to go to a mechanic. The car is making a weird noise. Joyce: Then can you get Harry on your way? Kyle: I don't know, I'm already running late. Joyce: Am I supposed to go get him by bus? That's ridiculous. Kyle: Fine, I'll get him. Joyce: Thank you.
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 11.66s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he has a car problem and is already running late.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.74s | attempts: 1
  summary: Kyle agrees to pick up Harry after deciding Joyce cannot use the car.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 1.03s | attempts: 1
  summary: Joyce needs the car to pick up Harry, but Kyle can't let 

Processing Samples:  92%|█████████▏| 751/819 [2:10:45<12:24, 10.94s/it]


SAMPLE 751/819 | ID: samsum_test_0750
MESSAGE (preview): Bobby: U know what annoys me the most? Mickey: Not being able to watch TV or listen to music without pesky interruptions? Bobby: That too, but the thing that annoys me the most is pharmaceuticals, medicine and drugs. Mickey: Y? Aren't they good 4 ur health? Bobby: They may well be, but not for my wallet! Mickey: What do u mean? Bobby: I was watching a film last night and suddenly there's a bunch o ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 12.53s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell them as medical products to make money, citing an example of a fictional illness being marketed as a real one in a commercial.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 1.25s | attempts: 1
  summary: Bobby suspects that pharmaceutical companies are creating non-existent il

Processing Samples:  93%|█████████▎| 761/819 [2:12:26<09:52, 10.22s/it]


SAMPLE 761/819 | ID: samsum_test_0760
MESSAGE (preview): Pamela: Have you tried applying for the RGS grant? Zoe: I did last year. Zoe: Wasn't lucky... Xiara: Me neither Pamela: Do you know if I have to be a member? Zoe: It depends on the award Pamela: Ok I'll have a look at their website 
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: Pamela is considering applying for the RGS grant and asks Zoe about the eligibility requirements.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 7.90s | attempts: 1
  summary: Zoe and Xiara both applied for the RGS grant last year without success, and Pamela plans to check the website to see if she needs to be a member.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.53s | attempts: 1
  summary: The group discussed applying for the RGS grant, with Pamela and two others mentioning past unsuccessful attempts. Pamela plan

Processing Samples:  94%|█████████▍| 771/819 [2:14:09<08:04, 10.09s/it]


SAMPLE 771/819 | ID: samsum_test_0770
MESSAGE (preview): Matthew: so is there any after party planed after the wedding? Kristina: after the wedding? I think it will last till the morning... Matthew: haha, i mean on the next day :D Bart: i think Mark said there is something planned Bart: at their house probably Matthew: oh thats good Matthew: better atmosphere and everything, more chill Kristina: that's great, i 've been to an after party like this one K ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.88s | attempts: 1
  summary: There is an after-party planned for the day after the wedding at Mark's house, with leftover food and liquor.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.75s | attempts: 1
  summary: Matthew, Kristina, and Bart are looking forward to the after-party at Mark's house, which is expected to involve leftover food from the wedding, plenty of drinks,

Processing Samples:  95%|█████████▌| 781/819 [2:15:49<05:33,  8.76s/it]


SAMPLE 781/819 | ID: samsum_test_0780
MESSAGE (preview): Karine: <file_video> Jessie: That was us at kindergarten! Karine: yes!! 
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 5.94s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from kindergarten.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 0.45s | attempts: 1
  summary: Karine and Jessie recognize a video from their kindergarten days.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.23s | attempts: 1
  summary: They recognized a video as being from their kindergarten days.

MODEL 04: gemma:7b-instruct
  status: OK | latency: 0.50s | attempts: 1
  summary: Karine and Jessie reminisce about their childhood together.

MODEL 05: openchat:7b
  status: OK | latency: 0.33s | attempts: 1
  summary: Jessie and Karine reminisce about their kindergarten days.

MODEL 06: neural-chat:7b
  status: OK | latency: 2.6

Processing Samples:  97%|█████████▋| 791/819 [2:17:35<04:37,  9.93s/it]


SAMPLE 791/819 | ID: samsum_test_0790
MESSAGE (preview): Louis: <file_gif> Archie: Wtf? Lol! Louis: <file_gif> Archie: That one looks like Melanie Louis: True! Hahahahah Archie: Just like in this pic Archie: <file_photo> Louis: Lol! It's her! Archie: By the way, she was way hotter when she had cury hair Louis: I disagree, sir! She looks absolutely gorgeous with straight hair Archie: Oh well, you have the right to an opinion even if it's wrong XD
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 8.94s | attempts: 1
  summary: The conversation is about a shared photo of Melanie, where Archie and Louis are jokingly debating whether she looks better with curly or straight hair.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 5.59s | attempts: 1
  summary: Louis and Archie are joking about someone who resembles Melanie in a photo, debating her appearance with curly versus straight hair.

MODEL 03:

Processing Samples:  98%|█████████▊| 801/819 [2:19:16<02:57,  9.86s/it]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the options.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.16s | attempts: 1
  summary: Linda is getting flowers for her mom and gets help from her dad to choose the right type.

MODEL 03: mistral:7b-instruct
  status: OK | latency: 0.86s | attempts: 1

Processing Samples:  99%|█████████▉| 811/819 [2:20:55<01:18,  9.78s/it]


SAMPLE 811/819 | ID: samsum_test_0810
MESSAGE (preview): Luke: Ryanair to cancel flights affecting 400,000 more passengers. So unlucky Erin! Erin: Yeeeep...it touched me as well finally 😩😩😩😩 Now they screw all my plans to go to Egypt... Luke: You’re not the only one Erin: Good to know that my feelings are shared with other 399,999 people 😩 Nathan: Relevant news story for everyone to consider: http://waterfordwhispersnews.com/2017/09/18/ryanair-cancel-ef ...[truncated]
------------------------------------------------------------------------------------------
MODEL 01: llama3.1:8b
  status: OK | latency: 0.67s | attempts: 1
  summary: Ryanair has cancelled flights affecting over 400,000 passengers.

MODEL 02: qwen2.5:7b-instruct
  status: OK | latency: 8.38s | attempts: 1
  summary: Many passengers, including Erin and Ava, are facing flight cancellations by Ryanair, leading to disrupted travel plans and potential compensation hopes.

MODEL 03: mistral:7b-instruct
  status: OK | latency:

Processing Samples: 100%|██████████| 819/819 [2:22:19<00:00, 10.43s/it]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_hetero6_highparams_1767406589/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_hetero6_highparams_1767406589/run_log.txt





Below for 12 homo

In [71]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &

import os, time, json, asyncio
from pathlib import Path

# Assumes these already exist in your notebook:
# - DATASET_PATH
# - RESULTS_DIR
# - RUN_OPTIONS
# - call_ollama(model, prompt, client, options) -> (txt, latency)
# - ollama_is_up() -> bool
# - get_consensus(list_of_strings) -> (best_idx, meta)
# - check_factuality(message, summary) -> any
# Also requires: httpx, tqdm

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"AGENT {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_homo12(
    model="llama3.1:8b",
    max_samples=None,
    agents=12,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_homo12_{model.replace(':','_').replace('/','_')}_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Homogeneous-12 Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Model: {model} | agents={agents} | concurrency={concurrency} | samples={len(dataset)}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(agent_idx: int):
                async with sem:
                    txt, lat, attempts = await call_with_retry(model, prompt, client, options, retries=3)
                    return agent_idx, txt, lat, attempts

            results = await asyncio.gather(*[_one(i) for i in range(agents)])

            candidates = []
            for (agent_idx, txt, lat, attempts) in results:
                candidates.append({
                    "model": model,
                    "agent_idx": int(agent_idx),
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All agents failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "model": model,
                "agents": int(agents),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run (default: llama3.1:8b, 12 agents)
await run_eval_homo12(
    model="llama3.1:8b",
    max_samples=None,
    agents=12,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


^C
Starting Homogeneous-12 Evaluation: colab_homo12_llama3.1_8b_1767415147
Results saved to: /content/slm_evaluation/results/colab_homo12_llama3.1_8b_1767415147/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_homo12_llama3.1_8b_1767415147/run_log.txt
Model: llama3.1:8b | agents=12 | concurrency=6 | samples=819
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:04<1:02:52,  4.61s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 3.78s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 02: llama3.1:8b
  status: OK | latency: 3.72s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 03: llama3.1:8b
  status: OK | latency: 3.78s | attempts: 1
  summary: Hannah asked Amanda for Be

Processing Samples:   1%|▏         | 11/819 [00:27<29:01,  2.16s/it]


SAMPLE 11/819 | ID: samsum_test_0010
MESSAGE (preview): Wanda: Let's make a party! Gina: Why? Wanda: beacuse. I want some fun! Gina: ok, what do u need? Wanda: 1st I need too make a list Gina: noted and then? Wanda: well, could u take yours father car and go do groceries with me? Gina: don't know if he'll agree Wanda: I know, but u can ask :) Gina: I'll try but theres no promisess Wanda: I know, u r the best! Gina: When u wanna go Wanda: Friday? Gina:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Wanda wants to have a party and needs Gina's help with errands before it can happen.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Wanda wants to have a party and needs Gina's help with errands before it can happen.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Wanda wants to have a party and n

Processing Samples:   3%|▎         | 21/819 [00:51<34:48,  2.62s/it]


SAMPLE 21/819 | ID: samsum_test_0020
MESSAGE (preview): Deirdre: Hi Beth, how are you love? Beth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask. Deirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special! Beth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course. Deirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Be ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.84s | attempts: 1
  summary: Beth asks her aunt Deirdre for work experience at a salon, and Deirdre agrees to let her try out beauty therapy tasks.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.97s | attempts: 1
  summary: Beth is planning a surprise 40th birthday party for her mom and also wants to try working at Deirdre's salon as part of her work experience.

AGENT 03: llama3.1:8

Processing Samples:   4%|▍         | 31/819 [01:16<33:00,  2.51s/it]


SAMPLE 31/819 | ID: samsum_test_0030
MESSAGE (preview): Mary: hey, im kinda broke, lend me a few box Carter: okay, give me an hour, im at the train station Mary: cool, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her after an hour.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet her at the train station in an hour.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Mary asked Carter to lend her some money and he agreed to meet up with her after an hour.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Mar

Processing Samples:   5%|▌         | 41/819 [01:37<28:41,  2.21s/it]


SAMPLE 41/819 | ID: samsum_test_0040
MESSAGE (preview): Sebastian: It's been already a year since we moved here. Sebastian: This is totally the best time of my life. Kevin: Really?  Sebastian: Yeah! Totally maaan. Sebastian: During this 1 year I learned more than ever.  Sebastian: I learned how to be resourceful, I'm learning responsibility, and I literally have the power to make my dreams come true. Kevin: It's great to hear that. Kevin: It's great th ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.36s | attempts: 1
  summary: Sebastian is reflecting on the past year of his life, expressing gratitude and happiness with his decisions and relationships.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.51s | attempts: 1
  summary: Sebastian reflects on the past year, expressing his happiness and fulfillment with his life, while Kevin expresses admiration and envy of Sebastian's succ

Processing Samples:   6%|▌         | 51/819 [01:59<28:48,  2.25s/it]


SAMPLE 51/819 | ID: samsum_test_0050
MESSAGE (preview): Nick: You look absolutely gorgeous and have a lovely smile.  Nick: Would love to get to know you a bit more. How about we meet up for a drink sometime? Jane: Hmmm... You're shooting a bit above your range aren't you? Nick: Why would you think that hon? Jane: Because I'm not that desperate. Nick: That was a bit below the belt. Nick: You're nice but you're not THAT hot. Jane: Oh is your poor little  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.27s | attempts: 1
  summary: A man and a woman exchange flirtatious messages, but she rejects him and he becomes angry and insults her.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: The conversation between Nick and Jane quickly turns from a friendly exchange to a heated argument, with both parties exchanging insults before Nick ends the interaction.

AG

Processing Samples:   7%|▋         | 61/819 [02:18<22:59,  1.82s/it]


SAMPLE 61/819 | ID: samsum_test_0060
MESSAGE (preview): Salma: <file_video> Salma: the latest cat meme Hugh: oh sweet, I can never get enough of those lol Hugh: <file_photo> Salma: hahaha same
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 06: llama3.1:8b
  status

Processing Samples:   9%|▊         | 71/819 [02:38<25:29,  2.05s/it]


SAMPLE 71/819 | ID: samsum_test_0070
MESSAGE (preview): Ali: I think I left my wallet at your place yesterday. Could you check?  Mohammad: Give me a sec, I'll have a look around my room. Ali: OK. Mohammad: Found it! Ali: Phew, I don't know what I'd do if it wasn't there. Can you bring it to uni tomorrow? Mohammad: Sure thing.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Al

Processing Samples:  10%|▉         | 81/819 [03:00<25:26,  2.07s/it]


SAMPLE 81/819 | ID: samsum_test_0080
MESSAGE (preview): Ryan: I have a bad feeling about this Ryan: <file_other> Sebastian: Ukraine... Sebastian: This russian circus will never end... Ryan: I hope the leaders of of nations will react somehow to this shit. Sebastian: I hope so too :(
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: The conversation between Ryan and Sebastian expresses concern and frustration about the ongoing conflict in Ukraine.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: The conversation is about concerns and disappointment regarding Russia's actions in Ukraine.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: The conversation is about concerns and fears regarding Russia's actions in Ukraine.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: The conversation is

Processing Samples:  11%|█         | 91/819 [03:21<24:38,  2.03s/it]


SAMPLE 91/819 | ID: samsum_test_0090
MESSAGE (preview): Ashleigh: Looks like we're going to the cinema!!  Ashleigh: <file_gif> Peter: You got the job?? Ashleigh: I got hte job! :D Peter: <file_gif> Ashleigh: <file_gif>
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Ashleigh has just been offered a new job and is excited to celebrate with a trip to the cinema.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.


Processing Samples:  12%|█▏        | 101/819 [03:43<25:41,  2.15s/it]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.45s | attempts: 1
  summary: Chris and Rick arrange to Skype each other the next day at 3pm, when they are both in a location with better internet connection.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.38s | attempts: 1
  summary: Chris and Rick plan to Skype each other the next day at 3pm, when they are both likely to have a better internet connection.

AGENT 03: llama3.1:8b
  

Processing Samples:  14%|█▎        | 111/819 [04:02<23:58,  2.03s/it]


SAMPLE 111/819 | ID: samsum_test_0110
MESSAGE (preview): Teacher: Rashi, why are you so low?  Rashi: Ma’am I’m a bit confused about my career.  Teacher: What is your confusion? Rashi: I was discussing with my friends about the career options.  Teacher: Hmm. Rashi: There are too many to choose from. Teacher: Choose a career based on what truly interests you.  Rashi: I have many that interests me. How does it determine the career? Teacher: The passion you ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: A teacher helps a student named Rashi understand how to choose a career by following her passions and balancing duties with success.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: A teacher helps a student named Rashi understand how to choose a career that aligns with her interests and passions.

AGENT 03: llama3.1:8b
  statu

Processing Samples:  15%|█▍        | 121/819 [04:26<27:36,  2.37s/it]


SAMPLE 121/819 | ID: samsum_test_0120
MESSAGE (preview): John: wanna go see "A Star is Born" on Wed? Joan: sorry can't Joan: super busy  Joan: don't have time for anything :(  John: that's a shame Joan: I'm free on Thursday John: I could do Thursday Joan: ok! so around 8pm? John: sure sounds great John: I'll see where it's palying and send you the details Joan: ok great!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 04:

Processing Samples:  16%|█▌        | 131/819 [04:45<20:21,  1.77s/it]


SAMPLE 131/819 | ID: samsum_test_0130
MESSAGE (preview): Ryan: You're going to the casting?  Ryan: So you think you can dance 🤩 Jack: I am!  Jack: this time im going Ryan: U should go really Jack: I know, wanna come with me?  Ryan: I thought about it!  Jack: Nice well! I will meet you there! 😝😝😝
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 05: llama3.1:8b
  status: OK | latency: 0

Processing Samples:  17%|█▋        | 141/819 [05:07<22:30,  1.99s/it]


SAMPLE 141/819 | ID: samsum_test_0140
MESSAGE (preview): Sean: I overslept :/ Sam: Again?? Sean: I know.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.72s | attempts: 1
  summary: Sean overslept and Sam is surprised by it.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Sean has overslept and is aware that it's not the first time.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Sean has overslept again, and Sam is surprised by the repetition.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by his lack of punctuality.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by this not being the first time it's happened.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Sean ov

Processing Samples:  18%|█▊        | 151/819 [05:27<21:30,  1.93s/it]


SAMPLE 151/819 | ID: samsum_test_0150
MESSAGE (preview): Mike: will call u back in 2 hrs, ok?  Mike: can't talk right now, sry Mike: *sorry Dale: cool, no problem Dale: until then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.72s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Mike is unable to talk at the moment but promises to call Dale back within 2 hours.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Mike is unable to talk and will call Dale back within two hours.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Mike is unable to talk at the moment but will call Dale back within two hours.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.66s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.



Processing Samples:  20%|█▉        | 161/819 [05:46<19:29,  1.78s/it]


SAMPLE 161/819 | ID: samsum_test_0160
MESSAGE (preview): Alexander: Personal request to send me message when you will be in taxi Alexander: If any problem, call me Tom: ;) Tom: Thank You, I appreciate it Alexander: Taxi confirmation below Alexander: <file_photo> Tom: Thank you for the transport, we arrived safely, although without luggages :/ Alexander: Good but bad Tom: Yeeees
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Alexander arranged a taxi for Tom and provided a photo of the confirmation.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Alexander arranged a taxi ride for Tom and confirmed its details.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Alexander arranged a taxi pickup for Tom and confirmed its arrival.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.88s | attempts: 1
  summary: Alexan

Processing Samples:  21%|██        | 171/819 [06:09<25:46,  2.39s/it]


SAMPLE 171/819 | ID: samsum_test_0170
MESSAGE (preview): Keith: Hi there kiddo, when are you planning to visit you old parents? :) Laura: Hey Dad, I'm not sure yet. I've been pretty busy recenlty.There is this big project coming… Keith: Oh, I understand, all work and no play…XD Laura: Daad! Don't be mean! You know I treat studying seriously! Keith: I know, you take after your mum :) By the way I think she bought some b-day gift for you… Laura: Next Satu ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Laura and her father Keith discuss her upcoming visit to see them and make plans for her to come next Saturday.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.24s | attempts: 1
  summary: Laura and her father Keith discuss her upcoming visit to see them, with Laura agreeing on a date of next Saturday.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.15s 

Processing Samples:  22%|██▏       | 181/819 [06:29<19:12,  1.81s/it]


SAMPLE 181/819 | ID: samsum_test_0180
MESSAGE (preview): Rob: Are we meeting up 2morrow? Eve: How about Sunday Eve: stores are open so we can go then Rob: Ok I just don't know what time they close Eve: we have ot go in the moring  Eve: I have some stuff to do around 3 Rob: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.68s | attempts: 1
  summary: They plan to meet up on Sunday morning.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: They are planning to meet up on Sunday, likely in the morning.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.59s | attempts: 1
  summary: They agreed to meet on Sunday morning.

AGE

Processing Samples:  23%|██▎       | 191/819 [06:54<26:31,  2.53s/it]


SAMPLE 191/819 | ID: samsum_test_0190
MESSAGE (preview): Emely: Hey. Could u help me with one task? I have to fill in the gaps and I find it a little bit strange task Titus: How is that strange ? Emely: I have problems to so it haha. Can I send a photo of this task and then I will wrote u my answers? There are 10 sentences Titus: Ok I'm doing something now but I can look at it quickly if it won't take up a lot of my time Emely: Ok Emely: <file_photo> Ti ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues with sending and viewing a photo of the task.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues with

Processing Samples:  25%|██▍       | 201/819 [07:17<21:11,  2.06s/it]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Sam is finishing work at 5 and will bring someone over around 9 the next morning.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Wednesday at around 9am.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place around 9am the next morning.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Wednesd

Processing Samples:  26%|██▌       | 211/819 [07:43<23:05,  2.28s/it]


SAMPLE 211/819 | ID: samsum_test_0210
MESSAGE (preview): Yaz: Going to slimming club tonight? Mary: Well, I don't want to, but I really should. I've been so bad, though! Yaz: Me too, choccies, wine, cake, you name it! Mary: Well, we should bite the bullet. She'll tell us off, I expect. Feels like being back at school! Yaz: Well, she IS our old cookery teacher! See you at 6ish, pick you up!? Mary: Yep! See you then!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGE

Processing Samples:  27%|██▋       | 221/819 [08:05<23:44,  2.38s/it]


SAMPLE 221/819 | ID: samsum_test_0220
MESSAGE (preview): Lisa: Hello Peter. What have you been doing off late? Peter: Bit busy with work. Lisa: Too much of work, huh? Peter: Not really, I’ve been working out Lisa:  Ohh, trying to get in shape? Peter: Nah, I don’t have much of weight to lose, just want to improve my health. Lisa:  That’s a good thought. I was also thinking of working out a bit. What do you do? Do you go to a Gym? Peter: Yes, I hold a mem ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, diet, and sleep.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.23s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, proper eating, and sufficient sleep.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.21s | attempt

Processing Samples:  28%|██▊       | 231/819 [08:27<20:05,  2.05s/it]


SAMPLE 231/819 | ID: samsum_test_0230
MESSAGE (preview): Tiffany: buy me a burger on your way home Railey: ok Tiffany: thx, sis :)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.74s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey

Processing Samples:  29%|██▉       | 241/819 [08:49<23:32,  2.44s/it]


SAMPLE 241/819 | ID: samsum_test_0240
MESSAGE (preview): Aggie: When do you start work? Peter: at 8  Peter: why? Aggie: no when do you start the new job? Peter: oh on the 6th Aggie: Ok I'll get a babysitter then Peter: lol wanted a free babysitter Aggie: yeah sorry :P
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.47s | attempts: 1
  summary: Aggie was planning to use Peter's work schedule as an excuse to get a free babysitter, but he revealed that his new job doesn't start until the 6th.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Aggie was planning to use Peter's work schedule as an excuse for getting a babysitter, but he had already started working.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Aggie was planning to use Peter's work schedule as an excuse to get a babysitter, but he started his new job on the 6th.

AGENT 04: ll

Processing Samples:  31%|███       | 251/819 [09:11<20:29,  2.16s/it]


SAMPLE 251/819 | ID: samsum_test_0250
MESSAGE (preview): Shaldona: WE ARE GONNA GET MARRIED ❤️❤️ Shaldona: <file_others> Shaldona: This is our mobile inviation for our wedding. Shaldona: Invitation* Piper: Hey. You haven’t sent me any messages for a few years. Piper: And now you are sending me your wedding invitation  Piper: THROUGH MESSENGER? Shaldona: ..... Shaldona: Well.. Shaldona: I had no enough time to meet everybody and give this in person. Shal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Shaldona sent a mobile wedding invitation via messenger to Piper after years of no contact.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Shaldona sent a mobile wedding invitation via Messenger to her acquaintance Piper after years of no contact.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: S

Processing Samples:  32%|███▏      | 261/819 [09:32<21:15,  2.29s/it]


SAMPLE 261/819 | ID: samsum_test_0260
MESSAGE (preview): Debbie: Help, I don't know which dress to buy! <file_photo> or <file_photo>? Kelly: The red one! It's beautiful. Denise: It is, but the green one will suit you better. Kelly: Why? Debbie looks good in red. Denise: She does, but in my opinion that dress would look better on someone taller. Deb needs a shorter one. Kelly: Right, I haven't thought about it. Debbie: So the green one? Denise: Definitel ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

AGENT 03: llama3.1:8b
  status: OK | la

Processing Samples:  33%|███▎      | 271/819 [09:50<17:01,  1.86s/it]


SAMPLE 271/819 | ID: samsum_test_0270
MESSAGE (preview): Conrad: I'm outside the house Conrad: I forgot my keys... Rebecca: 💩 Tiffany: I'll be home at 10-11 Rebecca: I'm coming back even later Conrad: Oh no... Conrad: I'll wait in the coffee shop
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Conrad is locked out of the house and will wait in a nearby coffee shop.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Conrad is locked out of the house and will wait at a nearby coffee shop.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Conrad is locked out of his house and will wait at a nearby coffee shop.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Conrad is locked out of his house and will wait at a nearby coffee shop.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.77s | att

Processing Samples:  34%|███▍      | 281/819 [10:11<18:59,  2.12s/it]


SAMPLE 281/819 | ID: samsum_test_0280
MESSAGE (preview): Stan: She replied :-) Dave: She did? Stan: <file_photo> Dave: Lucky you! Stan: I can't believe it! She's my dream come true! Dave: Good luck today! Where are you going to take her? Stan: Pat&Gill's Dave: Good choice. Let me know how it was :-) Stan: I will. Dave: In minute detail :-) Stan: Forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Stan has a date with a woman he considers his "dream come true" and is going to take her to Pat&Gill's.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true".

AGENT 03: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true".

AGENT 04: llama3.1:8b
  status: OK | latency:

Processing Samples:  36%|███▌      | 291/819 [10:36<20:26,  2.32s/it]


SAMPLE 291/819 | ID: samsum_test_0290
MESSAGE (preview): Vincent: <file_photo> Damian: What happened to your lamp? Vincent: I broke it xD  Vincent: With my bare hand Damian: You didn't do this just to show off did you? Vincent: Hahaha. xD No. Vincent: I was playing with my cat with a ribbon Vincent: And while raising my hand I just hit the lamp and the glass cover broke Damian: Shit happens. You ordered new one yet? Vincent: Yeah. Should be ready to pic ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat, but has already ordered a replacement.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat, but has already ordered a replacement.

AGENT 03: llama3.1:8b
  status: OK 

Processing Samples:  37%|███▋      | 301/819 [10:58<18:13,  2.11s/it]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Jerry and Janet are exchanging sweet and playful messages as Jerry heads home to Janet after work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Jerry and Janet are exchanging sweet messages as Jerry heads home from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages as Jerry heads home from work.

AGENT 04: ll

Processing Samples:  38%|███▊      | 311/819 [11:22<21:18,  2.52s/it]


SAMPLE 311/819 | ID: samsum_test_0310
MESSAGE (preview): Sharon: My mum knows Ahmed: Aaaand? Sharon: She’s angry, I’m grounded Ahmed: For fuck’s sake you’re 19 Sharon: But I still live with my parents, let me remind you -_- Ahmed: So move out Sharon: It’s not that easy, hello Ahmed: Move in with me, we would be together all the time :* Sharon: I’d have to go to work, I’m still a student Ahmed: Don’t worry, I’ll take care of you Sharon: My parents will s ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.55s | attempts: 1
  summary: Sharon's parents are angry with her and she's grounded, leading to a disagreement between Sharon and Ahmed over whether she should move out of the house.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.66s | attempts: 1
  summary: Sharon's 19-year-old daughter is grounded by her parents and Ahmed suggests she move in with him, but Sharon refuses due to he

Processing Samples:  39%|███▉      | 321/819 [11:45<18:28,  2.23s/it]


SAMPLE 321/819 | ID: samsum_test_0320
MESSAGE (preview): Andre: i just read the news about the bear attack on the zoo :-( Andre: yikes - i would've never had imagine something like that could happen Megan: that's why I always say you can't keep animals y cages!!! Megan: it's cruel and it's wrong!!! Megan: i'm not surprised the bear reacted that way
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Andre is upset about a recent bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.24s | attempts: 1
  summary: Andre is expressing sadness about a bear attack at the zoo, while Megan believes keeping animals in cages is cruel and contributed to the incident.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Andre expressed shock over a bear attack at the zoo, while Megan 

Processing Samples:  40%|████      | 331/819 [12:05<18:08,  2.23s/it]


SAMPLE 331/819 | ID: samsum_test_0330
MESSAGE (preview): Bella: It's valentine's day!😁😁😁 Aria: For somebody without bf today is kinda miserable day.....😢😢 Bella: There are a lot of vendors selling roses on the street here. Bella: <file_photo> Aria: <file_gif>  Bella: hahahahahahaha!! That looks SO SAD! :'‑(:'‑(:'‑( Aria: I feel like the weather is colder than it really is. How's the weather there? Bella: Here? it's 3 degree today. Wow! Even coffee shops ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.60s | attempts: 1
  summary: The conversation is between two friends, Bella and Aria, on Valentine's Day, discussing their plans and feelings about the holiday while apart from each other.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.54s | attempts: 1
  summary: The conversation is between two friends, Bella and Aria, who are discussing Valentine's Day and their respective plans f

Processing Samples:  42%|████▏     | 341/819 [12:26<17:53,  2.25s/it]


SAMPLE 341/819 | ID: samsum_test_0340
MESSAGE (preview): Joel: Hey, how are you guys doing? How's Coco? Becky: She's actually sitting on my lap purring right now. Trent: confirmed Joel: no way :D pics or didn't happen! Becky: <file_photo> Joel: haha! It took you guys 2 days to win her over. She's usually not that easy. Trent: maybe she remembers us visiting you? Joel: Maybe... oh, man, I'm jealous ;) Is she behaving ok? Becky: She's a total gem. Absolut ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.42s | attempts: 1
  summary: Joel is catching up with Becky and Trent about their cat Coco, who has apparently warmed up to them after a 2-day visit.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.39s | attempts: 1
  summary: Becky and Trent are caring for Joel's cat, Coco, while he is away and everything seems to be going well.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.35s |

Processing Samples:  43%|████▎     | 351/819 [12:47<18:25,  2.36s/it]


SAMPLE 351/819 | ID: samsum_test_0350
MESSAGE (preview): Noah: Hi there! The quince we talked about the other day... Are you still interested? Sophie: Hello Noah, but of course I am.Thank you. Noah: Actually William went to collect the rest of them for you immediately. They're in a basket on our terrace, so you call collect them any time. Sophie: That's very very kind of him! He's really a darling. Sophie: We won't be going your direction any time soon  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 2.11s | attempts: 1
  summary: Noah and Sophie discuss collecting some quinces that William collected from their terrace, and also touch on William's career-focused lifestyle and potential future plans, including marriage and children.

AGENT 02: llama3.1:8b
  status: OK | latency: 2.38s | attempts: 1
  summary: Noah and Sophie discuss the quince fruit that William collected for her, an

Processing Samples:  44%|████▍     | 361/819 [13:09<15:50,  2.07s/it]


SAMPLE 361/819 | ID: samsum_test_0360
MESSAGE (preview): Chandler: Phoebe!! Do you have money??  Phoebe: Yes I have .. But why do you need it.. Chandler: Open your door.. And pay the delivery guy standing outside my door.. Phoebe: Oh chandler !! you idiot... On my way..
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Chandler needs Phoebe to pay for a delivery that was mistakenly left at his doorstep.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is waiting at his door.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Chandler needs Phoebe to pay for a delivery that was mistakenly sent to his apartment instead of hers.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is at his d

Processing Samples:  45%|████▌     | 371/819 [13:29<15:08,  2.03s/it]


SAMPLE 371/819 | ID: samsum_test_0370
MESSAGE (preview): Henry: Do you know what happened to cafe Kohaku? Julia: The one near Covent Garden? Henry: Yeah. I wanted to reserve a table, but I can't find them anywhere. Their Facebook page disappeared. Julia: I'm afraid they're closed :( Henry: For good?! Julia: Afraid so. I've been there last week and wanted to take my cousin there, but the place's empty Julia: Probably the rent was too high. Henry: Oh no,  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: The cafe Kohaku near Covent Garden has closed permanently due to high rent costs.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed permanently due to high rent costs.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed 

Processing Samples:  47%|████▋     | 381/819 [13:53<16:06,  2.21s/it]


SAMPLE 381/819 | ID: samsum_test_0380
MESSAGE (preview): Don: Hi Cindy. Have you made all arrangements? Cindy: It's about today's meeting or your trip next week? Don: Both, I suppose:) Cindy: You have meeting with management board today at 2 pm. Don: Where did you set it up? Cindy: In our conference room. Cindy: Catering will bring some tea, coffee and snacks. Don: That's good. Don: Did everybody got the agenda? Cindy: Yep. Don: How did Andy react when  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Don and Cindy discuss arrangements for Don's meeting with management today and his upcoming trip next week.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Don and Cindy discuss arrangements for Don's meeting with management today and his upcoming trip next week.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.23s | attempts:

Processing Samples:  48%|████▊     | 391/819 [14:15<17:45,  2.49s/it]


SAMPLE 391/819 | ID: samsum_test_0390
MESSAGE (preview): Ost: I bought a garage place in the garage. Vesna: Cool. Ost: I got it cheap. It's my intention to rent it out and do the restitution of my money. Vesna: Well doneeeee!. Ost: There is a big crowd in parking, so the prices for finding garage places are favorable, in this area. Vesna: Really!. Ost: Return on investment is much higher than if I give money under term savings in the Bank. Vesna: You ma ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.42s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.39s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

AGENT 03: llama3.1:8b
  status: OK | late

Processing Samples:  49%|████▉     | 401/819 [14:39<17:35,  2.52s/it]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskating disco.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskating disco and she's excited about it.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.27s | attempts: 1
  summary: Susan had a fun date with a man who t

Processing Samples:  50%|█████     | 411/819 [14:59<12:19,  1.81s/it]


SAMPLE 411/819 | ID: samsum_test_0410
MESSAGE (preview): Ryan: Merry Christmas everybody! Nick: Merry Xmas! Chris: Merry Xmas to you! Chris: Where are you btw? Ryan: visiting family in Manchester  Ryan: white a disaster hahhaha Chris: We stayed home with Ann, first time on our own! Chris: and I've liked it so far very much
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: The group is exchanging holiday greetings

Processing Samples:  51%|█████▏    | 421/819 [15:23<16:25,  2.48s/it]


SAMPLE 421/819 | ID: samsum_test_0420
MESSAGE (preview): Zack: Dude have you seen the new Jersey Shore season? Dwayne: Didn't even knew a new one came out xD Zack: Ahahah no surprise there. But yeah I watched a couple of episodes and it was hilarious. Dwayne: It's the original cast? Zack: Yep. Snooki, J-Wow, Pauly D, Ron, The Situation, Deena and Vinny. Dwayne: I herad The Situation was having some bad legal problems.. Zack: Yeah and he still is. Dwayne ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.64s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his surprise at how The Situation has changed.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.64s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his s

Processing Samples:  53%|█████▎    | 431/819 [15:47<16:57,  2.62s/it]


SAMPLE 431/819 | ID: samsum_test_0430
MESSAGE (preview): Nathan: Hey Honey, what are our plans for tomorrow then? Deborah: So I've been thinking that we should pack everything today, we won't have time tomorrow Nathan: Totally true, I started packing my stuff in the bedroom, I also told the girls to start packing too Deborah: If Sofie wants to take the unicorn we won't have much space in the car XD Nathan: No, I explained to her that he'll be fine witho ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.51s | attempts: 1
  summary: The couple is making plans to pack up and leave for a trip with their family, including a hike to the forest.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.45s | attempts: 1
  summary: The couple is making plans to pack and prepare for a family trip to the forest with their children.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.45s | attempts: 1
  s

Processing Samples:  54%|█████▍    | 441/819 [16:10<12:59,  2.06s/it]


SAMPLE 441/819 | ID: samsum_test_0440
MESSAGE (preview): Andrew: Have you got the paper from the office? Paula: Not yet, I'm going there tomorrow. Andrew: You'd better... The deadline for the application is Friday at noon. Paula: Ok, sure! I promise I won't forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Andrew reminds Paula to pick up a paper from the office by Friday at noon.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Paula needs to retrieve a paper from the office by Friday at noon.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.89s |

Processing Samples:  55%|█████▌    | 451/819 [16:31<13:18,  2.17s/it]


SAMPLE 451/819 | ID: samsum_test_0450
MESSAGE (preview): Grace: i didn't get to talk to you a lot at the party last night Audrey: i know, i really wanted to catch up with you! :-( Audrey: what's going on in your life? Audrey: anything new? Grace: not really, i'm excited about graduating soon Grace: i think i'll apply for grad school but i'm still not sure Audrey: you always have liked school lol Grace: i've also thought of taking a year off Audrey: that ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Grace and Audrey caught up briefly at a party, discussing Grace's plans after graduating from college.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.24s | attempts: 1
  summary: Grace and Audrey caught up over text after missing each other at a party, discussing Grace's plans for graduation and potential future endeavors.

AGENT 03: llama3.1:8b
  status

Processing Samples:  56%|█████▋    | 461/819 [16:54<14:15,  2.39s/it]


SAMPLE 461/819 | ID: samsum_test_0460
MESSAGE (preview): Trinny: What time do we meet? Susannah: 5, 6? Linda: six, please? I need time to get ready after work;) Trinny: 6 is good! Susannah: Ok Susannah: at the main entrance? Trinny: Ok Linda: see you Ladies! Trinny: :*
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.88s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: The three women have agreed to meet at 6pm at the main entrance.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: The three women have agreed to meet at 6pm at the main entrance.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: The

Processing Samples:  58%|█████▊    | 471/819 [17:16<13:03,  2.25s/it]


SAMPLE 471/819 | ID: samsum_test_0470
MESSAGE (preview): Rob: I wanna start Get off the couch challenge! Who's with me? I'm gonna start with 30 min of activity every day. Greg: way ahead of you mate! at least 60 min a day! Rob: good for you! 30 min a day is gonna be challenging for me! Greg: good luck then! Will: don't be too hard on yourself. better to be realistic. fingers crossed! Anna: i'm in! 30 min sounds reasonable!  Greg: look for variety of exe ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Rob has started a "Get off the couch challenge" to do at least 30 minutes of activity every day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.31s | attempts: 1
  summary: Rob is starting a "Get off the couch challenge" where he will aim to do at least 30 minutes of physical activity every day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.31s | atte

Processing Samples:  59%|█████▊    | 481/819 [17:39<13:50,  2.46s/it]


SAMPLE 481/819 | ID: samsum_test_0480
MESSAGE (preview): Drade: Mum he got out of the group chatting room Marenda: I am sure he is angry! Marenda: Why did you act that aggressively? Marenda: You know your brother's temper Drade: I just said what he had said was wrong! Marenda: Apologize and send the invitation to him Drade: Why should I? Drade: I didn't say anything wrong!!!!! Drade: He failed in his business, didn't he? Marenda: But did you have to tal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: A brother and sister are arguing over Drade's comments about their brother's failed business, which Drade claims were justified.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: A brother and sister are arguing over Drade's harsh comments about their brother's failed business venture.

AGENT 03: llama3.1:8b
  status: OK | laten

Processing Samples:  60%|█████▉    | 491/819 [17:58<09:42,  1.78s/it]


SAMPLE 491/819 | ID: samsum_test_0490
MESSAGE (preview): Matt: Do you have Patrick's phone number? He's not picking up my calls... Matt: Is he mad at me? Mary: Sure, here you go: +44 7700 900077 Susan: He changed his number :) Matt: phew, thanks!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Matt was unable to reach Patrick on the phone and Mary provided him with Patrick's new phone number.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Patrick had changed his phone number and Matt was unable to reach him.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Patrick had changed his phone number and Matt was unable to reach him.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Matt was unable to reach Patrick and was worried that he might be angry with him.

AGENT 05: llama3.1:8b
  

Processing Samples:  61%|██████    | 501/819 [18:21<11:24,  2.15s/it]


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper because they were out.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper and tissues.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Helen asked Simon to buy her some tissue because they were out of toilet paper.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Helen asked Simon to buy her more 

Processing Samples:  62%|██████▏   | 511/819 [18:43<10:01,  1.95s/it]


SAMPLE 511/819 | ID: samsum_test_0510
MESSAGE (preview): Mum: hungry? Steve: starving... Mum: I'll wrap something up for you. Steve: You're the best, thank you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Steve is very hungry and his mother offers to prepare food for him.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Steve is hungry and his mum offers to prepare food for him.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to prepare food for him.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to prepare food for him.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Steve is hungry and his mother offers to prepare food for him.

AGENT 06: llama3.1:8

Processing Samples:  64%|██████▎   | 521/819 [19:03<10:07,  2.04s/it]


SAMPLE 521/819 | ID: samsum_test_0520
MESSAGE (preview): Jair: Still busy? Callum: Yes a little sorry Jair: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Jair asked if Callum was still busy and received a brief response indicating that he was.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they were both busy.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Jair and Callum are both busy, but Jair is okay with that.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they are both busy.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.88s | attempts: 1
  summary: Jair asked if Callum was still busy and received a brief response indicating that he was.

AGENT 06: llama3.1:8b
 

Processing Samples:  65%|██████▍   | 531/819 [19:24<10:05,  2.10s/it]


SAMPLE 531/819 | ID: samsum_test_0530
MESSAGE (preview): Adam: I have a juicy gossip for you guys ;) Borys: About whom? Adam: So apparently, Iga planned a weekend getaway with her boyfriend, but they had to cancel because he couldn't talk his group into changing the date of the presentation :D Katia: And she is pissed? Adam: What do you think? That is why they aren't sitting together since Monday ;) Katia: Who would have expected such a gossip girl from ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work obligations, causing tension between them.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to work obligations, causing tension between them.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.07s |

Processing Samples:  66%|██████▌   | 541/819 [19:45<09:57,  2.15s/it]


SAMPLE 541/819 | ID: samsum_test_0540
MESSAGE (preview): Chris: I tried jogging today! Matt: and the key word is "tried"? Matt: not to mention that you have the weirdest habit to stat the conversation 5 minutes before my work ends :P Matt: good (or bad) thing is that I need to stay a bit longer today ;) Chris: Sorry :P Chris: I have no other friends ;) Matt: it's not a problem for me, you're the one getting a short conversation ;) Chris: Anyway, I was c ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Chris and Matt exchange jokes and teasing about Chris' attempt at jogging after a long break.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.47s | attempts: 1
  summary: Chris and Matt have a lighthearted conversation about Chris trying to jog again after a long break.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.55s | attempts: 1
  summary: Chris an

Processing Samples:  67%|██████▋   | 551/819 [20:04<08:27,  1.89s/it]


SAMPLE 551/819 | ID: samsum_test_0550
MESSAGE (preview): Jeremih: hey, tell your sis to text back Hansel: haha, thats your issues bro, dont drag me into it Jeremih: she's mad at me Hansel: for what Jeremih: i dont even know😔 Hansel:😢😂 Jeremih: youre laughing Hansel: haha, ill tell her but next time i wont interfere Jeremih: Okay bro, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Jeremih is having a disagreement with his sister and asks Hansel to help mediate.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Jeremih is having a disagreement with his sister and asks Hansel to help mediate the situation.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Jeremih is having a disagreement with his sister and asks Hansel to help mediate the situation.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.11s | att

Processing Samples:  68%|██████▊   | 561/819 [20:26<08:56,  2.08s/it]


SAMPLE 561/819 | ID: samsum_test_0560
MESSAGE (preview): Oscar: A coffee at Tristano's? Payne: Why not. in 15 mins? Oscar: let's make it half an hour ok? Payne: great, i'll be there Oscar: see you there
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Oscar and Payne agreed t

Processing Samples:  70%|██████▉   | 571/819 [20:46<08:17,  2.01s/it]


SAMPLE 571/819 | ID: samsum_test_0570
MESSAGE (preview): Emily: shit, I'm waiting to see my doctor and you won't believe who's waiting with me Emily: <file_gif> Adam: who then? Emily: a nun and a blind crazy man Emily: it feels spooky Adam: haha XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind man.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind man.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised by the unusual people she has encountered in the waiting room.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 

Processing Samples:  71%|███████   | 581/819 [21:09<09:04,  2.29s/it]


SAMPLE 581/819 | ID: samsum_test_0580
MESSAGE (preview): Ella: did you send me mms?  i can't open it for some reason :/ Caleb: yeah, I sent you a pic of Maya Caleb: she's playing a mouse in a school play and I wanted to show you her costume Caleb: she looks sooo adorable in it! :-) Ella: aww...:) can you send me this photo via email? Caleb: done :-)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, which Ella couldn't open as an MMS.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.26s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and they discussed how to share the photo with Ella.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, which Ella couldn't open

Processing Samples:  72%|███████▏  | 591/819 [21:30<07:52,  2.07s/it]


SAMPLE 591/819 | ID: samsum_test_0590
MESSAGE (preview): Ethan: who's going to see solstafir? Noah: solstafir! when? Noah: i had no idea they were playing Archie: I'm not :D. Leo: i'm going ofc! Ethan: 21.11 Noah: that's a wednesday... ehh... Noah: i'll have to see, might have to stay late at work... Leo: nooo you have to come Ethan: yeah you have to! Archie: There are better concerts :P. Noah: i know, i would really like to see them, but yeah Leo: arch ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: The group is discussing attending a Solstafir concert on November 21.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: The group is discussing attending a Solstafir concert on November 21.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Friends are discussing attending a Solstafir concert on Novemb

Processing Samples:  73%|███████▎  | 601/819 [21:51<07:19,  2.02s/it]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: The group discusses and confirms that a passport is required for travel to Bosnia and Herzegovina.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: The group's travel plans involve a trip to Bosnia and Herzegovina, which requires a passport for entry.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary:

Processing Samples:  75%|███████▍  | 611/819 [22:12<07:02,  2.03s/it]


SAMPLE 611/819 | ID: samsum_test_0610
MESSAGE (preview): James: im out Tony: ??? James: from the team, this old prick pissed me off Tony: listen, I know the coach is pretty annoying at times (all the time lol) but he really is good James: i dont give a fuck, too much for me, already texted him Tony: shiiiit man
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.76s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 05: llama3.1:8b
  status: OK | la

Processing Samples:  76%|███████▌  | 621/819 [22:34<07:33,  2.29s/it]


SAMPLE 621/819 | ID: samsum_test_0620
MESSAGE (preview): Ken: Hey, how are you doing today? Ellen: Fine. How are you? Ken: Not doing well at all today. Really depressed. Frustrated. Stressed out. Ellen: Oh no! Ken: Really bad day yesterday carrying over to today. Ellen: Can I do anything? Ken: I'll be fine. Just overwhelmed and burned out. Ellen: Sorry! Ken: Sorry, don't mean to bring you down. Ellen: You didn't, just worried about you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Ken is having a bad day due to feeling depressed, frustrated, stressed, and burned out from the previous day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Ken is having a tough day due to being overwhelmed and burnt out from the previous day's stress.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: Ken is having a bad day due

Processing Samples:  77%|███████▋  | 631/819 [22:58<06:50,  2.18s/it]


SAMPLE 631/819 | ID: samsum_test_0630
MESSAGE (preview): JP: What's new? Samara: Not much. JP: Oh. Samara: Want to play? JP: Nah. Gotta go to work. Samara: K, see ya!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: JP and Samara briefly chat about their day, with JP declining an invitation to play from Samara because he has to go to work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: JP and Samara briefly chat about playing together, but JP has to leave for work instead.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: JP and Samara exchanged brief messages about doing nothing exciting happening and then parted ways.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: JP and Samara briefly chat about their day, but JP has to leave for work.

AGENT 05: llama3.1:8b
  status: OK | lat

Processing Samples:  78%|███████▊  | 641/819 [23:22<06:55,  2.34s/it]


SAMPLE 641/819 | ID: samsum_test_0640
MESSAGE (preview): Lorena: Hi, can you help me with something? Martin: Well, I can try. Martin: Depends what it is? Lorena: I got a new desk and it comes with assembly instructions but I give up, I just can't do it. Lorena: I'm illiterate when it comes to instructions, haha. Lorena: So I could really use some help... Martin: Hmm, I can't today, but how about tomorrow? Shouldn't take long anyway. Lorena: Yeah, it's f ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Lorena asked Martin for he

Processing Samples:  79%|███████▉  | 651/819 [23:45<06:26,  2.30s/it]


SAMPLE 651/819 | ID: samsum_test_0650
MESSAGE (preview): Jake: Maaaan! you in london?! Mike: <file_photo> Mike: yeah mate! dreams coming true! Jake: cool! congrats!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's dream of visiting London becoming a reality.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's trip to London.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Jake and Mike are exchanging congratulations on Mike's trip to London.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's dream of visiting London becoming a reality.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Jake a

Processing Samples:  81%|████████  | 661/819 [24:06<05:04,  1.93s/it]


SAMPLE 661/819 | ID: samsum_test_0660
MESSAGE (preview): Miley: don't want to go to work tomorrow! Miley: wanna sleep!!! Miley: need some rest :( Aaron: know what u feelin Aaron: really want some holiday already Miley: :( Aaron: fighting girl!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break, but trying to stay motivated.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.86s 

Processing Samples:  82%|████████▏ | 671/819 [24:28<05:11,  2.10s/it]


SAMPLE 671/819 | ID: samsum_test_0670
MESSAGE (preview): Autumn: u sewn anything? Autumn: show me show me!! :D Autumn: <file_gif> Israel: no :( Israel: haven't had the time yet  ;( Autumn: ooh :(  Autumn: u need to have pririorities in life! Autumn: like me - i haven't checked the tests for a month because the dress wouldn't sew itself :D Israel: you've sewn a dres?? Israel: you didn't show it to me Autumn: oh didn't i? Autumn: there will be some photos ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Autumn and Israel are discussing Autumn's sewing projects, with Autumn teasing Israel about not having time to sew.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.23s | attempts: 1
  summary: Autumn and Israel are chatting about Autumn's sewing projects, with Autumn teasing Israel for not having time to sew yet.

AGENT 03: llama3.1:8b
  status: OK | late

Processing Samples:  83%|████████▎ | 681/819 [24:52<05:48,  2.52s/it]


SAMPLE 681/819 | ID: samsum_test_0680
MESSAGE (preview): Lesley: Can you please go on a walk with the dog when you get home? Chloe: I don't know when I'm going to be home though. Lesley: What do you mean? Doesn't school finish up at 3PM? Chloe: Yes, but I was going to go to Megan's house tonight. Lesley: Chloe, you didn't ask me if you could go. You cannot just assume that you are going.  Chloe: Mom, seriously. I already told her I was going to come ove ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.49s | attempts: 1
  summary: Lesley asks Chloe to come home from school early to take care of the dog, as no one else will be home for a long time.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.37s | attempts: 1
  summary: Lesley tells Chloe she must come home after school to take care of the dog, as no one else will be there.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.50s | 

Processing Samples:  84%|████████▍ | 691/819 [25:13<04:50,  2.27s/it]


SAMPLE 691/819 | ID: samsum_test_0690
MESSAGE (preview): Mia: God, he keeps talking about the stupid wall Jennifer: I think he's just senile Karine: unfortunately it's much more Karine: this populist strategy to scare people with a minority Karine: and then present himself as the only savior Peter: Karine is right, I think he's not as senile and stupid as we would like to believe Mia: so it's even worse - he's pure evil Peter: possibly yes Karine: I rea ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
  summary: The group is discussing Donald Trump, with some members labeling him as "pure evil" and others describing him as a populist leader who is intentionally manipulating people for his own gain.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
  summary: The group is discussing Donald Trump, with some members labeling him as "pure evil" and ot

Processing Samples:  86%|████████▌ | 701/819 [25:37<04:52,  2.48s/it]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Charlie and Daisy plan to meet up again in Florence, Italy on September 1st.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.32s | attempts: 1
  summary: Charlie and Daisy, who met in Sicily, plan to meet again in Florence, Italy on September 1st.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.50s | attempts: 1
  summary: Charlie and Daisy make plans to

Processing Samples:  87%|████████▋ | 711/819 [25:58<03:02,  1.69s/it]


SAMPLE 711/819 | ID: samsum_test_0710
MESSAGE (preview): Shanon: Yeah. So what bout him? Victor: He's no longer our teacher :) Oliver: I'm so happy :) Sid: Me too! Shanon: Bt y? Sid: He got fired 2day and thrown out of school? Oliver: Literally thrown out! Shanon: What?! Victor: I suppose he's had it and lost it. Oliver: Or maybe had one too many ;) Sid: Not. Our. Teacher. Anymore!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Their teacher was fired and physically removed from the school.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Their teac

Processing Samples:  88%|████████▊ | 721/819 [26:19<03:18,  2.02s/it]


SAMPLE 721/819 | ID: samsum_test_0720
MESSAGE (preview): Greg: Hi Mum, how's the cat doing? Patti: I just rang the vets, she's fine! Greg: Thank God, been worrying about her all day! Patti: They said I can pick her up later. I'll pop home and fetch the cage after work. Should be there at 5ish. Greg: Good, see you at home, bye!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: The cat is fine and will be picked up fr

Processing Samples:  89%|████████▉ | 731/819 [26:41<03:25,  2.33s/it]


SAMPLE 731/819 | ID: samsum_test_0730
MESSAGE (preview): Molly: D‑': Luca: What??? Molly: I can't think anymore today! Luca: LOL! Molly: Seriously! Luca: Deep breath... Molly: It's the silly season, isn't it? Luca: Yep. Just hang in there. Molly: Trying... Luca: Try harder! LOL! Molly: Driving me nuts! Luca: I know, but you can't let it get you down! Molly: Yeah, you're right. Luca: Of course I am! LOL! Molly: Modest, too.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Molly is frustrated with the stress of the holiday season and her friend Luca tries to cheer her up.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Molly is frustrated with the stress of the holiday season and her friend Luca tries to cheer her up.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Molly is frustrated with the holiday season an

Processing Samples:  90%|█████████ | 741/819 [27:03<02:55,  2.25s/it]


SAMPLE 741/819 | ID: samsum_test_0740
MESSAGE (preview): Joyce: Honey, can I take the car? Kyle: No, I'm sorry. Joyce: Why not? How am I supposed to go get Harry? Kyle: I've got to go to a mechanic. The car is making a weird noise. Joyce: Then can you get Harry on your way? Kyle: I don't know, I'm already running late. Joyce: Am I supposed to go get him by bus? That's ridiculous. Kyle: Fine, I'll get him. Joyce: Thank you.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.14s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he needs to take the car to a mechanic.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Kyle is unable to let Joyce take the car because he needs it for a mechanic appointment.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he has a ca

Processing Samples:  92%|█████████▏| 751/819 [27:28<03:04,  2.71s/it]


SAMPLE 751/819 | ID: samsum_test_0750
MESSAGE (preview): Bobby: U know what annoys me the most? Mickey: Not being able to watch TV or listen to music without pesky interruptions? Bobby: That too, but the thing that annoys me the most is pharmaceuticals, medicine and drugs. Mickey: Y? Aren't they good 4 ur health? Bobby: They may well be, but not for my wallet! Mickey: What do u mean? Bobby: I was watching a film last night and suddenly there's a bunch o ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.97s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell them as medical products to make money, citing an example of a non-existent illness being promoted in a commercial for mints.

AGENT 02: llama3.1:8b
  status: OK | latency: 2.20s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell them as medi

Processing Samples:  93%|█████████▎| 761/819 [27:50<02:10,  2.26s/it]


SAMPLE 761/819 | ID: samsum_test_0760
MESSAGE (preview): Pamela: Have you tried applying for the RGS grant? Zoe: I did last year. Zoe: Wasn't lucky... Xiara: Me neither Pamela: Do you know if I have to be a member? Zoe: It depends on the award Pamela: Ok I'll have a look at their website 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.14s | attempts: 1
  summary: Pamela and Zoe discuss applying for the RGS grant, but both had unsuccessful attempts last year.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: Pamela asks about eligibility for the RGS grant and is informed that membership may not be required.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Pamela is inquiring about eligibility for the RGS grant and seeking advice from Zoe, who has previously applied.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summa

Processing Samples:  94%|█████████▍| 771/819 [28:13<01:52,  2.34s/it]


SAMPLE 771/819 | ID: samsum_test_0770
MESSAGE (preview): Matthew: so is there any after party planed after the wedding? Kristina: after the wedding? I think it will last till the morning... Matthew: haha, i mean on the next day :D Bart: i think Mark said there is something planned Bart: at their house probably Matthew: oh thats good Matthew: better atmosphere and everything, more chill Kristina: that's great, i 've been to an after party like this one K ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: The group is discussing an after-party planned for the day after the wedding at Mark's house, which they are all excited to attend.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.39s | attempts: 1
  summary: There is a planned after-party at the groom's house on the day after the wedding, and several people are planning to attend.

AGENT 03: llama3.1:8b


Processing Samples:  95%|█████████▌| 781/819 [28:32<01:16,  2.02s/it]


SAMPLE 781/819 | ID: samsum_test_0780
MESSAGE (preview): Karine: <file_video> Jessie: That was us at kindergarten! Karine: yes!! 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.65s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from kindergarten.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 06: llam

Processing Samples:  97%|█████████▋| 791/819 [28:55<01:01,  2.20s/it]


SAMPLE 791/819 | ID: samsum_test_0790
MESSAGE (preview): Louis: <file_gif> Archie: Wtf? Lol! Louis: <file_gif> Archie: That one looks like Melanie Louis: True! Hahahahah Archie: Just like in this pic Archie: <file_photo> Louis: Lol! It's her! Archie: By the way, she was way hotter when she had cury hair Louis: I disagree, sir! She looks absolutely gorgeous with straight hair Archie: Oh well, you have the right to an opinion even if it's wrong XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Louis and Archie are joking about a photo of Melanie, comparing her appearance with different hairstyles.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Louis and Archie are joking about a photo of Melanie, discussing her appearance with and without curly hair.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Louis and 

Processing Samples:  98%|█████████▊| 801/819 [29:17<00:40,  2.25s/it]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the options.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.35s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the process by suggesting specific types of flowers.

AGENT 03: llama3.1:8b
 

Processing Samples:  99%|█████████▉| 811/819 [29:36<00:15,  1.97s/it]


SAMPLE 811/819 | ID: samsum_test_0810
MESSAGE (preview): Luke: Ryanair to cancel flights affecting 400,000 more passengers. So unlucky Erin! Erin: Yeeeep...it touched me as well finally 😩😩😩😩 Now they screw all my plans to go to Egypt... Luke: You’re not the only one Erin: Good to know that my feelings are shared with other 399,999 people 😩 Nathan: Relevant news story for everyone to consider: http://waterfordwhispersnews.com/2017/09/18/ryanair-cancel-ef ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Ryanair has cancelled flights affecting over 400,000 passengers.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Ryanair has cancelled flights affecting over 400,000 passengers.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: Ryanair has announced it will cancel more flights, affecting around 400,

Processing Samples: 100%|██████████| 819/819 [29:55<00:00,  2.19s/it]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_homo12_llama3.1_8b_1767415147/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_homo12_llama3.1_8b_1767415147/run_log.txt





12 hetro

In [70]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &

import os, time, json, asyncio
from pathlib import Path

# Picks the 12 highest-parameter models from YOUR MODELS list (7B/8B only)
HETRO12_MODELS = [
    "llama3.1:8b",
    "qwen2.5:7b-instruct",
    "mistral:7b-instruct",
    "gemma:7b-instruct",
    "openchat:7b",
    "neural-chat:7b",
    "dolphin-mistral:7b",
    "zephyr:7b",
    "orca-mini:7b",
    "phi3:medium",
    "gemma:7b-instruct",   # duplicate protection note below
    "qwen2.5:7b-instruct", # duplicate protection note below
]

# Remove duplicates while preserving order, then enforce exactly 12 if possible
_seen = set()
HETRO12_MODELS = [m for m in HETRO12_MODELS if not (m in _seen or _seen.add(m))]

# If your list had fewer than 12 unique "high-param" models, backfill from your MODELS (largest-first)
if len(HETRO12_MODELS) < 12:
    preferred = [
        "stablelm2:1.6b",
        "tinyllama:1.1b",
        "llama3.2:3b",
        "qwen2.5:3b-instruct",
        "gemma:2b",
        "qwen2.5:1.5b-instruct",
        "llama3.2:1b",
        "phi3:mini",
    ]
    for m in preferred:
        if m not in HETRO12_MODELS:
            HETRO12_MODELS.append(m)
        if len(HETRO12_MODELS) == 12:
            break

assert len(HETRO12_MODELS) == 12, f"Need 12 models, got {len(HETRO12_MODELS)}: {HETRO12_MODELS}"

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"MODEL {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_hetero12(
    max_samples=None,
    models=HETRO12_MODELS,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_hetero12_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Heterogeneous-12 Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Models: {len(models)} | concurrency={concurrency} | samples={len(dataset)}")
    print("Models:", models)
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(m):
                async with sem:
                    txt, lat, attempts = await call_with_retry(m, prompt, client, options, retries=3)
                    return m, txt, lat, attempts

            results = await asyncio.gather(*[_one(m) for m in models])

            candidates = []
            for (m, txt, lat, attempts) in results:
                candidates.append({
                    "model": m,
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All models failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
                "models": models,
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run
await run_eval_hetero12(
    max_samples=None,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


^C
Starting Heterogeneous-12 Evaluation: colab_hetero12_1767415130
Results saved to: /content/slm_evaluation/results/colab_hetero12_1767415130/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_hetero12_1767415130/run_log.txt
Models: 12 | concurrency=6 | samples=819
Models: ['llama3.1:8b', 'qwen2.5:7b-instruct', 'mistral:7b-instruct', 'gemma:7b-instruct', 'openchat:7b', 'neural-chat:7b', 'dolphin-mistral:7b', 'zephyr:7b', 'orca-mini:7b', 'phi3:medium', 'stablelm2:1.6b', 'tinyllama:1.1b']
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 0/819 [00:13<?, ?it/s]


CancelledError: 

18 homo

In [72]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &

import os, time, json, asyncio
from pathlib import Path

# Assumes these already exist in your notebook:
# - DATASET_PATH
# - RESULTS_DIR
# - RUN_OPTIONS
# - call_ollama(model, prompt, client, options) -> (txt, latency)
# - ollama_is_up() -> bool
# - get_consensus(list_of_strings) -> (best_idx, meta)
# - check_factuality(message, summary) -> any
# Also requires: httpx, tqdm

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"AGENT {idx+1:02d}: {c['model']}" if idx is not None else f"MODEL: {c['model']}"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    return (
        f"{header}\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_homo18(
    model="llama3.1:8b",
    max_samples=None,
    agents=18,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"colab_homo18_{model.replace(':','_').replace('/','_')}_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Homogeneous-18 Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Model: {model} | agents={agents} | concurrency={concurrency} | samples={len(dataset)}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            prompt = f"Summarize this message in one short sentence:\n\n{item['message']}\n\nSummary:"

            async def _one(agent_idx: int):
                async with sem:
                    txt, lat, attempts = await call_with_retry(model, prompt, client, options, retries=3)
                    return agent_idx, txt, lat, attempts

            results = await asyncio.gather(*[_one(i) for i in range(agents)])

            candidates = []
            for (agent_idx, txt, lat, attempts) in results:
                candidates.append({
                    "model": model,
                    "agent_idx": int(agent_idx),
                    "summary": txt,
                    "latency": float(lat),
                    "attempts": int(attempts),
                })

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx_valid, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx_valid]
            else:
                meta = {}
                swarm_summary = "ERROR: All agents failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(item["message"].replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(item["message"])
                _log_line("-" * 90)

                for j, c in enumerate(candidates):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                if meta.get("consensus_scores") is not None:
                    print("CONSENSUS scores:", meta.get("consensus_scores"))
                print("SWARM SUMMARY:", swarm_summary)
                print(sep + "\n")

                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"CONSENSUS scores: {meta.get('consensus_scores')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line(sep)

            result = {
                "id": item.get("id"),
                "model": model,
                "agents": int(agents),
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(item["message"], swarm_summary),
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(result) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run (default: llama3.1:8b, 18 agents)
await run_eval_homo18(
    model="llama3.1:8b",
    max_samples=None,
    agents=18,
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    options=RUN_OPTIONS
)


^C
Starting Homogeneous-18 Evaluation: colab_homo18_llama3.1_8b_1767416982
Results saved to: /content/slm_evaluation/results/colab_homo18_llama3.1_8b_1767416982/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_homo18_llama3.1_8b_1767416982/run_log.txt
Model: llama3.1:8b | agents=18 | concurrency=6 | samples=819
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:05<1:13:36,  5.40s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 3.77s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 02: llama3.1:8b
  status: OK | latency: 3.77s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 03: llama3.1:8b
  status: OK | latency: 3.78s | attempts: 1
  summary: Hannah asked Amanda for Be

Processing Samples:   1%|▏         | 11/819 [00:37<41:35,  3.09s/it]


SAMPLE 11/819 | ID: samsum_test_0010
MESSAGE (preview): Wanda: Let's make a party! Gina: Why? Wanda: beacuse. I want some fun! Gina: ok, what do u need? Wanda: 1st I need too make a list Gina: noted and then? Wanda: well, could u take yours father car and go do groceries with me? Gina: don't know if he'll agree Wanda: I know, but u can ask :) Gina: I'll try but theres no promisess Wanda: I know, u r the best! Gina: When u wanna go Wanda: Friday? Gina:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Wanda wants to have a party and asks Gina for help with errands so they can plan together.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Wanda wants to have a party and needs Gina's help with errands before it can happen.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Wanda wants to have a party

Processing Samples:   3%|▎         | 21/819 [01:11<48:33,  3.65s/it]


SAMPLE 21/819 | ID: samsum_test_0020
MESSAGE (preview): Deirdre: Hi Beth, how are you love? Beth: Hi Auntie Deirdre, I'm been meaning to message you, had a favour to ask. Deirdre: Wondered if you had any thought about your Mum's 40th, we've got to do something special! Beth: How about a girls weekend, just mum, me, you and the girls, Kira will have to come back from Uni, of course. Deirdre: Sounds fab! Get your thinking cap on, it's only in 6 weeks! Be ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.82s | attempts: 1
  summary: Beth asks her aunt Deirdre for work experience in a salon and Deirdre agrees to let her try out the beauty therapy side.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.94s | attempts: 1
  summary: Beth wants to try working at her aunt Deirdre's salon as part of her work experience and is interested in pursuing a career in beauty therapy.

AGENT 03: llama3

Processing Samples:   4%|▍         | 31/819 [01:47<46:55,  3.57s/it]


SAMPLE 31/819 | ID: samsum_test_0030
MESSAGE (preview): Mary: hey, im kinda broke, lend me a few box Carter: okay, give me an hour, im at the train station Mary: cool, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Mary asked Carter to lend her some money and he agreed to meet up with her after an hour.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Mary asked Carter to lend her money and he agreed to meet up with her after an hour.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: Mary asked Carter to lend h

Processing Samples:   5%|▌         | 41/819 [02:17<41:25,  3.19s/it]


SAMPLE 41/819 | ID: samsum_test_0040
MESSAGE (preview): Sebastian: It's been already a year since we moved here. Sebastian: This is totally the best time of my life. Kevin: Really?  Sebastian: Yeah! Totally maaan. Sebastian: During this 1 year I learned more than ever.  Sebastian: I learned how to be resourceful, I'm learning responsibility, and I literally have the power to make my dreams come true. Kevin: It's great to hear that. Kevin: It's great th ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.38s | attempts: 1
  summary: Sebastian is reflecting on the past year of his life, expressing gratitude and satisfaction with his decisions and relationships.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.48s | attempts: 1
  summary: Sebastian reflects on the past year, expressing his happiness and fulfillment with his life, while Kevin expresses admiration and envy of Sebastian's s

Processing Samples:   6%|▌         | 51/819 [02:48<40:36,  3.17s/it]


SAMPLE 51/819 | ID: samsum_test_0050
MESSAGE (preview): Nick: You look absolutely gorgeous and have a lovely smile.  Nick: Would love to get to know you a bit more. How about we meet up for a drink sometime? Jane: Hmmm... You're shooting a bit above your range aren't you? Nick: Why would you think that hon? Jane: Because I'm not that desperate. Nick: That was a bit below the belt. Nick: You're nice but you're not THAT hot. Jane: Oh is your poor little  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.30s | attempts: 1
  summary: The conversation between Nick and Jane quickly turns from a friendly exchange to a heated argument, with both parties exchanging insults before ending abruptly.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.36s | attempts: 1
  summary: A man and a woman exchange flirtatious messages, but she responds with insults when he asks her out, leading him to canc

Processing Samples:   7%|▋         | 61/819 [03:15<33:06,  2.62s/it]


SAMPLE 61/819 | ID: samsum_test_0060
MESSAGE (preview): Salma: <file_video> Salma: the latest cat meme Hugh: oh sweet, I can never get enough of those lol Hugh: <file_photo> Salma: hahaha same
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Salma and Hugh are sharing a funny cat video and photo.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Salma and Hugh are sharing and laughing at a new cat meme.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Salma and Hugh are sharing a cat meme with each other.

AGENT 06: llama3.1:8b
  status: OK | 

Processing Samples:   9%|▊         | 71/819 [03:42<36:05,  2.90s/it]


SAMPLE 71/819 | ID: samsum_test_0070
MESSAGE (preview): Ali: I think I left my wallet at your place yesterday. Could you check?  Mohammad: Give me a sec, I'll have a look around my room. Ali: OK. Mohammad: Found it! Ali: Phew, I don't know what I'd do if it wasn't there. Can you bring it to uni tomorrow? Mohammad: Sure thing.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Ali asked Mohammad to check for his lost wallet and was relieved when Mohammad found it.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Al

Processing Samples:  10%|▉         | 81/819 [04:14<36:30,  2.97s/it]


SAMPLE 81/819 | ID: samsum_test_0080
MESSAGE (preview): Ryan: I have a bad feeling about this Ryan: <file_other> Sebastian: Ukraine... Sebastian: This russian circus will never end... Ryan: I hope the leaders of of nations will react somehow to this shit. Sebastian: I hope so too :(
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: The conversation appears to be a discussion between Ryan and Sebastian about their concerns regarding Russia's actions in Ukraine.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: The conversation is about concerns and fears regarding the ongoing conflict in Ukraine.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: The conversation between Ryan and Sebastian expresses concern and frustration about the ongoing conflict in Ukraine.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.75s | 

Processing Samples:  11%|█         | 91/819 [04:45<35:41,  2.94s/it]


SAMPLE 91/819 | ID: samsum_test_0090
MESSAGE (preview): Ashleigh: Looks like we're going to the cinema!!  Ashleigh: <file_gif> Peter: You got the job?? Ashleigh: I got hte job! :D Peter: <file_gif> Ashleigh: <file_gif>
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Ashleigh just found out she got a new job and is excited to celebrate by going to the cinema.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Ashleigh got a new job and they're going to celebrate by going to the cinema.

AGENT 05: llama3.

Processing Samples:  12%|█▏        | 101/819 [05:16<35:31,  2.97s/it]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.41s | attempts: 1
  summary: Chris and Rick discuss scheduling a Skype call for the next day at 3pm, when they are in Cuba and have better internet access.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.45s | attempts: 1
  summary: Chris and Rick discuss and arrange a Skype call for the next day at 3pm their local time, while Chris is in a different time zone.

AGENT 03: llama3.1:8b

Processing Samples:  14%|█▎        | 111/819 [05:44<34:15,  2.90s/it]


SAMPLE 111/819 | ID: samsum_test_0110
MESSAGE (preview): Teacher: Rashi, why are you so low?  Rashi: Ma’am I’m a bit confused about my career.  Teacher: What is your confusion? Rashi: I was discussing with my friends about the career options.  Teacher: Hmm. Rashi: There are too many to choose from. Teacher: Choose a career based on what truly interests you.  Rashi: I have many that interests me. How does it determine the career? Teacher: The passion you ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.36s | attempts: 1
  summary: A teacher helps a student named Rashi understand that choosing a career based on personal interest and passion is key to success.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.50s | attempts: 1
  summary: The teacher helps Rashi understand that choosing a career based on personal interest and passion is key to success, while also learning to balance dut

Processing Samples:  15%|█▍        | 121/819 [06:17<38:47,  3.33s/it]


SAMPLE 121/819 | ID: samsum_test_0120
MESSAGE (preview): John: wanna go see "A Star is Born" on Wed? Joan: sorry can't Joan: super busy  Joan: don't have time for anything :(  John: that's a shame Joan: I'm free on Thursday John: I could do Thursday Joan: ok! so around 8pm? John: sure sounds great John: I'll see where it's palying and send you the details Joan: ok great!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: John and Joan rescheduled their plans to see "A Star is Born" from Wednesday to Thursday at 8pm.

AGENT 04:

Processing Samples:  16%|█▌        | 131/819 [06:44<28:56,  2.52s/it]


SAMPLE 131/819 | ID: samsum_test_0130
MESSAGE (preview): Ryan: You're going to the casting?  Ryan: So you think you can dance 🤩 Jack: I am!  Jack: this time im going Ryan: U should go really Jack: I know, wanna come with me?  Ryan: I thought about it!  Jack: Nice well! I will meet you there! 😝😝😝
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: Ryan and Jack are planning to attend a dance casting together.

AGENT 05: llama3.1:8b
  status: OK | latency: 0

Processing Samples:  17%|█▋        | 141/819 [07:15<32:53,  2.91s/it]


SAMPLE 141/819 | ID: samsum_test_0140
MESSAGE (preview): Sean: I overslept :/ Sam: Again?? Sean: I know.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.65s | attempts: 1
  summary: Sean overslept and Sam is surprised by it.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by his lack of punctuality.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Sean has overslept, and Sam is surprised it's happened again.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Sean has overslept for the second time, according to Sam.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Sean has overslept, and Sam is surprised by the repetition of this occurrence.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Sean has overslept, and S

Processing Samples:  18%|█▊        | 151/819 [07:44<30:29,  2.74s/it]


SAMPLE 151/819 | ID: samsum_test_0150
MESSAGE (preview): Mike: will call u back in 2 hrs, ok?  Mike: can't talk right now, sry Mike: *sorry Dale: cool, no problem Dale: until then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Mike is unable to talk at the moment but will call Dale back in two hours.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Mike and Dale are arranging for Mike to call Dale back later.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.74s | attempts: 1
  summary: Mike is busy and will call Dale back within two hours.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Mike is busy and will call Dale back within 2 hours.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Mike is unable to talk and will call Dale back in two hours.

AGENT 06: llama3.1:8b
  status: 

Processing Samples:  20%|█▉        | 161/819 [08:12<27:55,  2.55s/it]


SAMPLE 161/819 | ID: samsum_test_0160
MESSAGE (preview): Alexander: Personal request to send me message when you will be in taxi Alexander: If any problem, call me Tom: ;) Tom: Thank You, I appreciate it Alexander: Taxi confirmation below Alexander: <file_photo> Tom: Thank you for the transport, we arrived safely, although without luggages :/ Alexander: Good but bad Tom: Yeeees
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Alexander arranged transportation for Tom and received a confirmation that they arrived safely, albeit with some issues.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.72s | attempts: 1
  summary: Alexander arranged a taxi ride for Tom and confirmed its details.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Alexander arranged transportation for Tom and confirmed their arrival.

AGENT 04: llama3.1:8b
  status: OK 

Processing Samples:  21%|██        | 171/819 [08:44<36:10,  3.35s/it]


SAMPLE 171/819 | ID: samsum_test_0170
MESSAGE (preview): Keith: Hi there kiddo, when are you planning to visit you old parents? :) Laura: Hey Dad, I'm not sure yet. I've been pretty busy recenlty.There is this big project coming… Keith: Oh, I understand, all work and no play…XD Laura: Daad! Don't be mean! You know I treat studying seriously! Keith: I know, you take after your mum :) By the way I think she bought some b-day gift for you… Laura: Next Satu ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Laura and her father Keith discuss her upcoming visit to see them and make plans for her to come next Saturday.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Laura and her father, Keith, discuss her upcoming visit to see him and her mother.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Laura a

Processing Samples:  22%|██▏       | 181/819 [09:12<27:53,  2.62s/it]


SAMPLE 181/819 | ID: samsum_test_0180
MESSAGE (preview): Rob: Are we meeting up 2morrow? Eve: How about Sunday Eve: stores are open so we can go then Rob: Ok I just don't know what time they close Eve: we have ot go in the moring  Eve: I have some stuff to do around 3 Rob: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.66s | attempts: 1
  summary: They agreed to meet up on Sunday morning.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.57s | attempts: 1
  summary: They agreed to meet on Sunday morning.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: They are planning to meet up on Sunday morning.

AGENT 06: llama3

Processing Samples:  23%|██▎       | 191/819 [09:48<36:45,  3.51s/it]


SAMPLE 191/819 | ID: samsum_test_0190
MESSAGE (preview): Emely: Hey. Could u help me with one task? I have to fill in the gaps and I find it a little bit strange task Titus: How is that strange ? Emely: I have problems to so it haha. Can I send a photo of this task and then I will wrote u my answers? There are 10 sentences Titus: Ok I'm doing something now but I can look at it quickly if it won't take up a lot of my time Emely: Ok Emely: <file_photo> Ti ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.52s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues with sending and viewing a photo.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.52s | attempts: 1
  summary: Emely asked Titus for help with a task involving filling in gaps, but they had trouble communicating due to issues viewing a photo 

Processing Samples:  25%|██▍       | 201/819 [10:21<29:36,  2.87s/it]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place around 9am the next morning.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Sam will bring someone over around 9am the next morning.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Saturday morning around 9 am.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Sam and Abdellilah are making plans for Sam to bring someone over on Saturday mo

Processing Samples:  26%|██▌       | 211/819 [10:58<32:51,  3.24s/it]


SAMPLE 211/819 | ID: samsum_test_0210
MESSAGE (preview): Yaz: Going to slimming club tonight? Mary: Well, I don't want to, but I really should. I've been so bad, though! Yaz: Me too, choccies, wine, cake, you name it! Mary: Well, we should bite the bullet. She'll tell us off, I expect. Feels like being back at school! Yaz: Well, she IS our old cookery teacher! See you at 6ish, pick you up!? Mary: Yep! See you then!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Two friends, Mary and Yaz, are planning to attend a slimming club together tonight.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Two friends, Yaz and Mary, are planning to attend a slimming club together tonight.

AGE

Processing Samples:  27%|██▋       | 221/819 [11:28<32:01,  3.21s/it]


SAMPLE 221/819 | ID: samsum_test_0220
MESSAGE (preview): Lisa: Hello Peter. What have you been doing off late? Peter: Bit busy with work. Lisa: Too much of work, huh? Peter: Not really, I’ve been working out Lisa:  Ohh, trying to get in shape? Peter: Nah, I don’t have much of weight to lose, just want to improve my health. Lisa:  That’s a good thought. I was also thinking of working out a bit. What do you do? Do you go to a Gym? Peter: Yes, I hold a mem ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, diet, and sleep habits.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.32s | attempts: 1
  summary: Lisa and Peter discuss their interest in improving their health through exercise, with Peter sharing his gym routine and Lisa considering making changes to her diet and sleep

Processing Samples:  28%|██▊       | 231/819 [11:59<28:48,  2.94s/it]


SAMPLE 231/819 | ID: samsum_test_0230
MESSAGE (preview): Tiffany: buy me a burger on your way home Railey: ok Tiffany: thx, sis :)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.75s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.69s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Tiffany asked Railey to buy her a burger on his way home.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Tiffany asked Railey

Processing Samples:  29%|██▉       | 241/819 [12:30<32:31,  3.38s/it]


SAMPLE 241/819 | ID: samsum_test_0240
MESSAGE (preview): Aggie: When do you start work? Peter: at 8  Peter: why? Aggie: no when do you start the new job? Peter: oh on the 6th Aggie: Ok I'll get a babysitter then Peter: lol wanted a free babysitter Aggie: yeah sorry :P
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.44s | attempts: 1
  summary: Aggie was planning to use Peter's work schedule as an excuse for getting a babysitter, but he revealed that his new job doesn't start until the 6th.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.50s | attempts: 1
  summary: Aggie was planning to use Peter's upcoming work schedule as an excuse to get a free babysitter, but he revealed that his new job doesn't start until the 6th.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.97s | attempts: 1
  summary: Aggie was planning to use Peter's upcoming work schedule as an excuse to get a free babysitter.

AG

Processing Samples:  31%|███       | 251/819 [13:00<27:34,  2.91s/it]


SAMPLE 251/819 | ID: samsum_test_0250
MESSAGE (preview): Shaldona: WE ARE GONNA GET MARRIED ❤️❤️ Shaldona: <file_others> Shaldona: This is our mobile inviation for our wedding. Shaldona: Invitation* Piper: Hey. You haven’t sent me any messages for a few years. Piper: And now you are sending me your wedding invitation  Piper: THROUGH MESSENGER? Shaldona: ..... Shaldona: Well.. Shaldona: I had no enough time to meet everybody and give this in person. Shal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.08s | attempts: 1
  summary: Shaldona sent a mobile wedding invitation to Piper through Messenger after years of not contacting her.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Shaldona sent a mobile wedding invitation via Messenger to Piper after years of no contact.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Shaldo

Processing Samples:  32%|███▏      | 261/819 [13:30<30:02,  3.23s/it]


SAMPLE 261/819 | ID: samsum_test_0260
MESSAGE (preview): Debbie: Help, I don't know which dress to buy! <file_photo> or <file_photo>? Kelly: The red one! It's beautiful. Denise: It is, but the green one will suit you better. Kelly: Why? Debbie looks good in red. Denise: She does, but in my opinion that dress would look better on someone taller. Deb needs a shorter one. Kelly: Right, I haven't thought about it. Debbie: So the green one? Denise: Definitel ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.30s | attempts: 1
  summary: Debbie asked for advice on which dress to buy and her friends helped her decide that the green one would be a better fit.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.22s | attempts: 1
  summary: Debbie asked for help choosing between two dresses and her friends Kelly and Denise helped her decide on the green one.

AGENT 03: llama3.1:8b
  status: OK | 

Processing Samples:  33%|███▎      | 271/819 [13:56<24:03,  2.63s/it]


SAMPLE 271/819 | ID: samsum_test_0270
MESSAGE (preview): Conrad: I'm outside the house Conrad: I forgot my keys... Rebecca: 💩 Tiffany: I'll be home at 10-11 Rebecca: I'm coming back even later Conrad: Oh no... Conrad: I'll wait in the coffee shop
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: Conrad is locked out of the house and will wait in a nearby coffee shop.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Conrad is locked out of the house and will wait at a nearby coffee shop.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Conrad is locked out of the house and will wait in a nearby coffee shop.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Conrad is locked out of the house and will wait in a nearby coffee shop.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.90s | att

Processing Samples:  34%|███▍      | 281/819 [14:24<25:40,  2.86s/it]


SAMPLE 281/819 | ID: samsum_test_0280
MESSAGE (preview): Stan: She replied :-) Dave: She did? Stan: <file_photo> Dave: Lucky you! Stan: I can't believe it! She's my dream come true! Dave: Good luck today! Where are you going to take her? Stan: Pat&Gill's Dave: Good choice. Let me know how it was :-) Stan: I will. Dave: In minute detail :-) Stan: Forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Stan is excitedly planning a date with someone he has a crush on, and his friend Dave is offering words of encouragement and advice.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Stan is excitedly planning a date with someone he considers his "dream come true" and has asked Dave for advice on the outing.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Stan is excitedly planning a date with someone he consid

Processing Samples:  36%|███▌      | 291/819 [14:58<28:06,  3.19s/it]


SAMPLE 291/819 | ID: samsum_test_0290
MESSAGE (preview): Vincent: <file_photo> Damian: What happened to your lamp? Vincent: I broke it xD  Vincent: With my bare hand Damian: You didn't do this just to show off did you? Vincent: Hahaha. xD No. Vincent: I was playing with my cat with a ribbon Vincent: And while raising my hand I just hit the lamp and the glass cover broke Damian: Shit happens. You ordered new one yet? Vincent: Yeah. Should be ready to pic ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat, but has already ordered a replacement.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.07s | attempts: 1
  summary: Vincent accidentally broke his lamp with his bare hand while playing with his cat, but has already ordered a replacement.

AGENT 03: llama3.1:8b
  status: OK 

Processing Samples:  37%|███▋      | 301/819 [15:29<26:11,  3.03s/it]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages while Jerry is on his way home from work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages while Jerry is on his way home from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Jerry and Janet are exchanging excited messages as Jerry heads home after work.

AG

Processing Samples:  38%|███▊      | 311/819 [16:05<30:44,  3.63s/it]


SAMPLE 311/819 | ID: samsum_test_0310
MESSAGE (preview): Sharon: My mum knows Ahmed: Aaaand? Sharon: She’s angry, I’m grounded Ahmed: For fuck’s sake you’re 19 Sharon: But I still live with my parents, let me remind you -_- Ahmed: So move out Sharon: It’s not that easy, hello Ahmed: Move in with me, we would be together all the time :* Sharon: I’d have to go to work, I’m still a student Ahmed: Don’t worry, I’ll take care of you Sharon: My parents will s ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.44s | attempts: 1
  summary: Sharon's parents are angry with her and she's grounded, leading to a disagreement between Sharon and Ahmed over her living situation.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
  summary: Sharon's parents are angry with her and she's grounded, leading to a disagreement between Sharon and Ahmed over whether she should move out of her

Processing Samples:  39%|███▉      | 321/819 [16:37<26:20,  3.17s/it]


SAMPLE 321/819 | ID: samsum_test_0320
MESSAGE (preview): Andre: i just read the news about the bear attack on the zoo :-( Andre: yikes - i would've never had imagine something like that could happen Megan: that's why I always say you can't keep animals y cages!!! Megan: it's cruel and it's wrong!!! Megan: i'm not surprised the bear reacted that way
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.11s | attempts: 1
  summary: Andre is upset about a recent bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Andre is upset about a recent bear attack at a zoo, and Megan agrees that keeping animals in cages is cruel.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Andre is upset about a bear attack at the zoo, and Megan agrees that keeping animals in cages is cruel.


Processing Samples:  40%|████      | 331/819 [17:06<24:41,  3.04s/it]


SAMPLE 331/819 | ID: samsum_test_0330
MESSAGE (preview): Bella: It's valentine's day!😁😁😁 Aria: For somebody without bf today is kinda miserable day.....😢😢 Bella: There are a lot of vendors selling roses on the street here. Bella: <file_photo> Aria: <file_gif>  Bella: hahahahahahaha!! That looks SO SAD! :'‑(:'‑(:'‑( Aria: I feel like the weather is colder than it really is. How's the weather there? Bella: Here? it's 3 degree today. Wow! Even coffee shops ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.69s | attempts: 1
  summary: The conversation is between two friends, Bella and Aria, who are discussing Valentine's Day and their plans for the day, with Aria feeling down about being single on this holiday.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.43s | attempts: 1
  summary: Two friends, Bella and Aria, exchange messages on Valentine's Day, discussing their plans and feelin

Processing Samples:  42%|████▏     | 341/819 [17:36<25:43,  3.23s/it]


SAMPLE 341/819 | ID: samsum_test_0340
MESSAGE (preview): Joel: Hey, how are you guys doing? How's Coco? Becky: She's actually sitting on my lap purring right now. Trent: confirmed Joel: no way :D pics or didn't happen! Becky: <file_photo> Joel: haha! It took you guys 2 days to win her over. She's usually not that easy. Trent: maybe she remembers us visiting you? Joel: Maybe... oh, man, I'm jealous ;) Is she behaving ok? Becky: She's a total gem. Absolut ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.48s | attempts: 1
  summary: Becky and Trent are caring for Joel's cat, Coco, while he is away, and everything seems to be going well.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.46s | attempts: 1
  summary: Becky and Trent are caring for Joel's cat, Coco, while he is away and everything seems to be going well.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.43s | attempts: 1
  

Processing Samples:  43%|████▎     | 351/819 [18:05<25:11,  3.23s/it]


SAMPLE 351/819 | ID: samsum_test_0350
MESSAGE (preview): Noah: Hi there! The quince we talked about the other day... Are you still interested? Sophie: Hello Noah, but of course I am.Thank you. Noah: Actually William went to collect the rest of them for you immediately. They're in a basket on our terrace, so you call collect them any time. Sophie: That's very very kind of him! He's really a darling. Sophie: We won't be going your direction any time soon  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 2.35s | attempts: 1
  summary: Noah and Sophie discuss collecting some quinces that William collected for her, and also touch on Noah's concerns about his son William's career-focused life and potential lack of interest in marriage or family.

AGENT 02: llama3.1:8b
  status: OK | latency: 2.75s | attempts: 1
  summary: Noah and Sophie discuss the quinces they had previously talked about

Processing Samples:  44%|████▍     | 361/819 [18:35<22:15,  2.92s/it]


SAMPLE 361/819 | ID: samsum_test_0360
MESSAGE (preview): Chandler: Phoebe!! Do you have money??  Phoebe: Yes I have .. But why do you need it.. Chandler: Open your door.. And pay the delivery guy standing outside my door.. Phoebe: Oh chandler !! you idiot... On my way..
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.83s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is at his door.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.92s | attempts: 1
  summary: Chandler needs Phoebe to pay for a delivery that was mistakenly left at his door.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.84s | attempts: 1
  summary: Chandler asks Phoebe to pay a delivery person at his door.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: Chandler needs Phoebe to pay a delivery person who is waiting at his door.

AGENT 05: llama3.1:8b
  status: OK | laten

Processing Samples:  45%|████▌     | 371/819 [19:03<20:20,  2.72s/it]


SAMPLE 371/819 | ID: samsum_test_0370
MESSAGE (preview): Henry: Do you know what happened to cafe Kohaku? Julia: The one near Covent Garden? Henry: Yeah. I wanted to reserve a table, but I can't find them anywhere. Their Facebook page disappeared. Julia: I'm afraid they're closed :( Henry: For good?! Julia: Afraid so. I've been there last week and wanted to take my cousin there, but the place's empty Julia: Probably the rent was too high. Henry: Oh no,  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.16s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed permanently due to high rent costs.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed down permanently due to high rent costs.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Cafe Kohaku near Covent Garden has closed

Processing Samples:  47%|████▋     | 381/819 [19:37<23:29,  3.22s/it]


SAMPLE 381/819 | ID: samsum_test_0380
MESSAGE (preview): Don: Hi Cindy. Have you made all arrangements? Cindy: It's about today's meeting or your trip next week? Don: Both, I suppose:) Cindy: You have meeting with management board today at 2 pm. Don: Where did you set it up? Cindy: In our conference room. Cindy: Catering will bring some tea, coffee and snacks. Don: That's good. Don: Did everybody got the agenda? Cindy: Yep. Don: How did Andy react when  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.60s | attempts: 1
  summary: Don and Cindy discuss arrangements for Don's meeting with management today and his trip next week, confirming details such as the venue, catering, agenda, travel plans, and accommodations.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.76s | attempts: 1
  summary: Don and Cindy discuss arrangements for Don's meeting with management today and his trip nex

Processing Samples:  48%|████▊     | 391/819 [20:09<24:53,  3.49s/it]


SAMPLE 391/819 | ID: samsum_test_0390
MESSAGE (preview): Ost: I bought a garage place in the garage. Vesna: Cool. Ost: I got it cheap. It's my intention to rent it out and do the restitution of my money. Vesna: Well doneeeee!. Ost: There is a big crowd in parking, so the prices for finding garage places are favorable, in this area. Vesna: Really!. Ost: Return on investment is much higher than if I give money under term savings in the Bank. Vesna: You ma ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: Ost bought a garage in Belgrade at a low price with the intention of renting it out for a higher return on investment.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: Ost bought a garage in Belgrade and plans to rent it out due to high demand and low interest rates.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.40s | attemp

Processing Samples:  49%|████▉     | 401/819 [20:42<24:08,  3.47s/it]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Susan had a fun date with a man who took her rollerskating at a disco.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Susan had a fun date with a man who took her to a rollerskating disco.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.24s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskatin

Processing Samples:  50%|█████     | 411/819 [21:12<18:03,  2.65s/it]


SAMPLE 411/819 | ID: samsum_test_0410
MESSAGE (preview): Ryan: Merry Christmas everybody! Nick: Merry Xmas! Chris: Merry Xmas to you! Chris: Where are you btw? Ryan: visiting family in Manchester  Ryan: white a disaster hahhaha Chris: We stayed home with Ann, first time on our own! Chris: and I've liked it so far very much
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: The group is exchanging holiday greetings and sharing their Christmas plans.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: The group is exchanging holiday greetings and catching up on their Christmas plans.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.71s | attempts: 1
  summary: The group is exchanging holiday gr

Processing Samples:  51%|█████▏    | 421/819 [21:44<22:50,  3.44s/it]


SAMPLE 421/819 | ID: samsum_test_0420
MESSAGE (preview): Zack: Dude have you seen the new Jersey Shore season? Dwayne: Didn't even knew a new one came out xD Zack: Ahahah no surprise there. But yeah I watched a couple of episodes and it was hilarious. Dwayne: It's the original cast? Zack: Yep. Snooki, J-Wow, Pauly D, Ron, The Situation, Deena and Vinny. Dwayne: I herad The Situation was having some bad legal problems.. Zack: Yeah and he still is. Dwayne ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.58s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his own enjoyment of the show.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.57s | attempts: 1
  summary: Zack and Dwayne discuss the new season of Jersey Shore, with Zack encouraging Dwayne to watch it after expressing his own enjoyment of 

Processing Samples:  53%|█████▎    | 431/819 [22:17<23:31,  3.64s/it]


SAMPLE 431/819 | ID: samsum_test_0430
MESSAGE (preview): Nathan: Hey Honey, what are our plans for tomorrow then? Deborah: So I've been thinking that we should pack everything today, we won't have time tomorrow Nathan: Totally true, I started packing my stuff in the bedroom, I also told the girls to start packing too Deborah: If Sofie wants to take the unicorn we won't have much space in the car XD Nathan: No, I explained to her that he'll be fine witho ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.46s | attempts: 1
  summary: The couple is discussing and making plans for a trip with their family to the forest, including packing, transportation, and activities.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.25s | attempts: 1
  summary: The couple is making plans to pack and prepare for a family trip tomorrow.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.65s | attempts: 1
 

Processing Samples:  54%|█████▍    | 441/819 [22:49<18:15,  2.90s/it]


SAMPLE 441/819 | ID: samsum_test_0440
MESSAGE (preview): Andrew: Have you got the paper from the office? Paula: Not yet, I'm going there tomorrow. Andrew: You'd better... The deadline for the application is Friday at noon. Paula: Ok, sure! I promise I won't forget it!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.87s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Andrew reminds Paula that she needs to pick up a paper from the office by Friday at noon.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.81s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.88s | attempts: 1
  summary: Paula needs to pick up a paper from the office by Friday at noon.

AGENT 05: llama3.1:8b
  status: OK | la

Processing Samples:  55%|█████▌    | 451/819 [23:18<18:06,  2.95s/it]


SAMPLE 451/819 | ID: samsum_test_0450
MESSAGE (preview): Grace: i didn't get to talk to you a lot at the party last night Audrey: i know, i really wanted to catch up with you! :-( Audrey: what's going on in your life? Audrey: anything new? Grace: not really, i'm excited about graduating soon Grace: i think i'll apply for grad school but i'm still not sure Audrey: you always have liked school lol Grace: i've also thought of taking a year off Audrey: that ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Grace and Audrey caught up briefly at a party, discussing Grace's plans after graduating from college.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Grace and Audrey caught up briefly at a party, discussing Grace's plans after graduation.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Grace and

Processing Samples:  56%|█████▋    | 461/819 [23:51<19:39,  3.29s/it]


SAMPLE 461/819 | ID: samsum_test_0460
MESSAGE (preview): Trinny: What time do we meet? Susannah: 5, 6? Linda: six, please? I need time to get ready after work;) Trinny: 6 is good! Susannah: Ok Susannah: at the main entrance? Trinny: Ok Linda: see you Ladies! Trinny: :*
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.91s | attempts: 1
  summary: The three women have agreed to meet at 6pm at the main entrance.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: The group has agreed to meet at 6pm at the main entrance.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: The group 

Processing Samples:  58%|█████▊    | 471/819 [24:23<18:44,  3.23s/it]


SAMPLE 471/819 | ID: samsum_test_0470
MESSAGE (preview): Rob: I wanna start Get off the couch challenge! Who's with me? I'm gonna start with 30 min of activity every day. Greg: way ahead of you mate! at least 60 min a day! Rob: good for you! 30 min a day is gonna be challenging for me! Greg: good luck then! Will: don't be too hard on yourself. better to be realistic. fingers crossed! Anna: i'm in! 30 min sounds reasonable!  Greg: look for variety of exe ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.33s | attempts: 1
  summary: Rob is starting a "Get off the couch challenge" where participants aim to do at least 30 minutes of physical activity every day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Rob has started a "Get off the couch" challenge to do at least 30 minutes of activity every day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.38s |

Processing Samples:  59%|█████▊    | 481/819 [24:56<19:29,  3.46s/it]


SAMPLE 481/819 | ID: samsum_test_0480
MESSAGE (preview): Drade: Mum he got out of the group chatting room Marenda: I am sure he is angry! Marenda: Why did you act that aggressively? Marenda: You know your brother's temper Drade: I just said what he had said was wrong! Marenda: Apologize and send the invitation to him Drade: Why should I? Drade: I didn't say anything wrong!!!!! Drade: He failed in his business, didn't he? Marenda: But did you have to tal ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.18s | attempts: 1
  summary: Drade and Marenda are arguing over Drade's harsh comments towards his brother, who had failed in business.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: A brother and sister are arguing over Drade's harsh comments about their brother's failed business venture.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.04s | attempts: 

Processing Samples:  60%|█████▉    | 491/819 [25:21<13:36,  2.49s/it]


SAMPLE 491/819 | ID: samsum_test_0490
MESSAGE (preview): Matt: Do you have Patrick's phone number? He's not picking up my calls... Matt: Is he mad at me? Mary: Sure, here you go: +44 7700 900077 Susan: He changed his number :) Matt: phew, thanks!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Matt was unable to reach Patrick on the phone and Mary provided him with Patrick's new phone number.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Matt asked for Patrick's phone number because he wasn't answering Matt's calls.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Matt was unable to reach Patrick on his old phone number and Mary provided him with the new one.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Matt was unable to reach Patrick on the phone and Mary provided him with

Processing Samples:  61%|██████    | 501/819 [25:54<16:32,  3.12s/it]


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper because they were out.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.86s | attempts: 1
  summary: Helen asked Simon to buy more toilet paper because they were out.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper and tissues.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.89s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper b

Processing Samples:  62%|██████▏   | 511/819 [26:25<14:18,  2.79s/it]


SAMPLE 511/819 | ID: samsum_test_0510
MESSAGE (preview): Mum: hungry? Steve: starving... Mum: I'll wrap something up for you. Steve: You're the best, thank you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to pack him a snack.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to prepare food for him.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to pack him a snack.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.70s | attempts: 1
  summary: Steve is extremely hungry and his mother offers to prepare food for him.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Steve is hungry and his mom offers to pack him a snack.

AGENT 06: llama3.

Processing Samples:  64%|██████▎   | 521/819 [26:54<13:48,  2.78s/it]


SAMPLE 521/819 | ID: samsum_test_0520
MESSAGE (preview): Jair: Still busy? Callum: Yes a little sorry Jair: ok
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating they were both busy.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating that Callum is still busy.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.74s | attempts: 1
  summary: Jair and Callum are briefly discussing their current workload.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Jair and Callum exchanged brief messages indicating that Callum is still busy.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.62s | attempts: 1
  summary: Jair and Callum are briefly discussing their current workload.

AGENT 06: llama3.1:8b
  status: OK | latency: 0.73

Processing Samples:  65%|██████▍   | 531/819 [27:23<14:29,  3.02s/it]


SAMPLE 531/819 | ID: samsum_test_0530
MESSAGE (preview): Adam: I have a juicy gossip for you guys ;) Borys: About whom? Adam: So apparently, Iga planned a weekend getaway with her boyfriend, but they had to cancel because he couldn't talk his group into changing the date of the presentation :D Katia: And she is pissed? Adam: What do you think? That is why they aren't sitting together since Monday ;) Katia: Who would have expected such a gossip girl from ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work obligations, causing tension between them.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Iga and her boyfriend had to cancel their weekend plans due to his work commitment, causing tension in their relationship.

AGENT 03: llama3.1:8b
  status: OK | late

Processing Samples:  66%|██████▌   | 541/819 [27:53<14:04,  3.04s/it]


SAMPLE 541/819 | ID: samsum_test_0540
MESSAGE (preview): Chris: I tried jogging today! Matt: and the key word is "tried"? Matt: not to mention that you have the weirdest habit to stat the conversation 5 minutes before my work ends :P Matt: good (or bad) thing is that I need to stay a bit longer today ;) Chris: Sorry :P Chris: I have no other friends ;) Matt: it's not a problem for me, you're the one getting a short conversation ;) Chris: Anyway, I was c ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.34s | attempts: 1
  summary: Chris and Matt exchange jokes and teasing about Chris's attempt at jogging after a long break.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.61s | attempts: 1
  summary: Chris and Matt have a conversation about Chris's attempt at jogging, with Matt providing a dose of reality to balance out Chris's enthusiasm.

AGENT 03: llama3.1:8b
  status: OK | laten

Processing Samples:  67%|██████▋   | 551/819 [28:20<11:58,  2.68s/it]


SAMPLE 551/819 | ID: samsum_test_0550
MESSAGE (preview): Jeremih: hey, tell your sis to text back Hansel: haha, thats your issues bro, dont drag me into it Jeremih: she's mad at me Hansel: for what Jeremih: i dont even know😔 Hansel:😢😂 Jeremih: youre laughing Hansel: haha, ill tell her but next time i wont interfere Jeremih: Okay bro, thanks
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Jeremih is having a disagreement with his sister and asks Hansel to help mediate the situation.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: Jeremih is trying to resolve a conflict with his sister and asks Hansel for help.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Jeremih is having issues with his sister and asks Hansel to help mediate the situation.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1

Processing Samples:  68%|██████▊   | 561/819 [28:51<12:46,  2.97s/it]


SAMPLE 561/819 | ID: samsum_test_0560
MESSAGE (preview): Oscar: A coffee at Tristano's? Payne: Why not. in 15 mins? Oscar: let's make it half an hour ok? Payne: great, i'll be there Oscar: see you there
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.95s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: Oscar and Payne agreed to meet for a coffee at Tristano's in 30 minutes.

AGENT 05: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Oscar and Payne agreed t

Processing Samples:  70%|██████▉   | 571/819 [29:19<11:48,  2.86s/it]


SAMPLE 571/819 | ID: samsum_test_0570
MESSAGE (preview): Emily: shit, I'm waiting to see my doctor and you won't believe who's waiting with me Emily: <file_gif> Adam: who then? Emily: a nun and a blind crazy man Emily: it feels spooky Adam: haha XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind man.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual group, including a nun and a blind person.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.27s | attempts: 1
  summary: Emily is waiting to see her doctor and is surprised to be surrounded by an unusual pair of fellow patients, including a nun and a blind man.

AGENT 04: llama3.1:8b
  status: OK 

Processing Samples:  71%|███████   | 581/819 [29:52<12:40,  3.20s/it]


SAMPLE 581/819 | ID: samsum_test_0580
MESSAGE (preview): Ella: did you send me mms?  i can't open it for some reason :/ Caleb: yeah, I sent you a pic of Maya Caleb: she's playing a mouse in a school play and I wanted to show you her costume Caleb: she looks sooo adorable in it! :-) Ella: aww...:) can you send me this photo via email? Caleb: done :-)
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and they discussed how to share the photo with Ella.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.15s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, but initially only as an MMS.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.21s | attempts: 1
  summary: Caleb sent Ella a picture of his daughter Maya's school play costume, and they discussed how to shar

Processing Samples:  72%|███████▏  | 591/819 [30:21<10:55,  2.87s/it]


SAMPLE 591/819 | ID: samsum_test_0590
MESSAGE (preview): Ethan: who's going to see solstafir? Noah: solstafir! when? Noah: i had no idea they were playing Archie: I'm not :D. Leo: i'm going ofc! Ethan: 21.11 Noah: that's a wednesday... ehh... Noah: i'll have to see, might have to stay late at work... Leo: nooo you have to come Ethan: yeah you have to! Archie: There are better concerts :P. Noah: i know, i would really like to see them, but yeah Leo: arch ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.98s | attempts: 1
  summary: Friends are discussing attending a Solstafir concert on November 21.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: The group is discussing attending a Solstafir concert on November 21.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: The group is discussing attending a Solstafir concert on Novemb

Processing Samples:  73%|███████▎  | 601/819 [30:50<10:19,  2.84s/it]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.28s | attempts: 1
  summary: The group is discussing travel requirements for a trip to Bosnia and Herzegovina, where a passport is necessary due to border control.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.34s | attempts: 1
  summary: The group is discussing travel requirements for a trip to Bosnia and Herzegovina, where a passport is necessary due to border control.

AGENT 03:

Processing Samples:  75%|███████▍  | 611/819 [31:20<09:48,  2.83s/it]


SAMPLE 611/819 | ID: samsum_test_0610
MESSAGE (preview): James: im out Tony: ??? James: from the team, this old prick pissed me off Tony: listen, I know the coach is pretty annoying at times (all the time lol) but he really is good James: i dont give a fuck, too much for me, already texted him Tony: shiiiit man
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.68s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.74s | attempts: 1
  summary: James has quit the team due to frustration with their coach.

AGENT 05: llama3.1:8b
  status: OK | la

Processing Samples:  76%|███████▌  | 621/819 [31:51<10:45,  3.26s/it]


SAMPLE 621/819 | ID: samsum_test_0620
MESSAGE (preview): Ken: Hey, how are you doing today? Ellen: Fine. How are you? Ken: Not doing well at all today. Really depressed. Frustrated. Stressed out. Ellen: Oh no! Ken: Really bad day yesterday carrying over to today. Ellen: Can I do anything? Ken: I'll be fine. Just overwhelmed and burned out. Ellen: Sorry! Ken: Sorry, don't mean to bring you down. Ellen: You didn't, just worried about you!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.05s | attempts: 1
  summary: Ken is having a bad day due to being depressed, frustrated, and stressed from the previous day's events.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.02s | attempts: 1
  summary: Ken is having a bad day due to feeling depressed, frustrated, stressed, and burned out.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Ken is having a bad day due to feeling de

Processing Samples:  77%|███████▋  | 631/819 [32:25<09:42,  3.10s/it]


SAMPLE 631/819 | ID: samsum_test_0630
MESSAGE (preview): JP: What's new? Samara: Not much. JP: Oh. Samara: Want to play? JP: Nah. Gotta go to work. Samara: K, see ya!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: JP and Samara briefly chat about playing together, but JP has to leave for work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: JP and Samara briefly chat about playing together but JP has to leave for work.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.00s | attempts: 1
  summary: The conversation between JP and Samara is a brief exchange about doing nothing in particular together.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.90s | attempts: 1
  summary: The conversation between JP and Samara is a brief exchange about doing nothing in particular together.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.96s | attempts

Processing Samples:  78%|███████▊  | 641/819 [32:58<09:56,  3.35s/it]


SAMPLE 641/819 | ID: samsum_test_0640
MESSAGE (preview): Lorena: Hi, can you help me with something? Martin: Well, I can try. Martin: Depends what it is? Lorena: I got a new desk and it comes with assembly instructions but I give up, I just can't do it. Lorena: I'm illiterate when it comes to instructions, haha. Lorena: So I could really use some help... Martin: Hmm, I can't today, but how about tomorrow? Shouldn't take long anyway. Lorena: Yeah, it's f ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.13s | attempts: 1
  summary: Lorena needs help assembling her new desk and Martin agrees to assist her the next day.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Lorena needs help assembli

Processing Samples:  79%|███████▉  | 651/819 [33:31<08:53,  3.18s/it]


SAMPLE 651/819 | ID: samsum_test_0650
MESSAGE (preview): Jake: Maaaan! you in london?! Mike: <file_photo> Mike: yeah mate! dreams coming true! Jake: cool! congrats!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Jake and Mike are exchanging congratulations over Mike's arrival in London.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.96s | attempts: 1
  summary: Jake and Mike are exchanging congratulatory messages because Mike is now in London, a dream destination for him.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.73s | attempts: 1
  summary: Jake and Mike are exchanging congratulations on Mike's trip to London.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.93s | attempts: 1
  summary: Jake and Mike are exchanging congratulatory messages because Mike is now in London, a dream destination for him.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.80s | atte

Processing Samples:  81%|████████  | 661/819 [34:01<07:14,  2.75s/it]


SAMPLE 661/819 | ID: samsum_test_0660
MESSAGE (preview): Miley: don't want to go to work tomorrow! Miley: wanna sleep!!! Miley: need some rest :( Aaron: know what u feelin Aaron: really want some holiday already Miley: :( Aaron: fighting girl!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Miley and Aaron are both feeling tired and wanting a break from work.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.85s | attempts: 1
  summ

Processing Samples:  82%|████████▏ | 671/819 [34:31<07:25,  3.01s/it]


SAMPLE 671/819 | ID: samsum_test_0670
MESSAGE (preview): Autumn: u sewn anything? Autumn: show me show me!! :D Autumn: <file_gif> Israel: no :( Israel: haven't had the time yet  ;( Autumn: ooh :(  Autumn: u need to have pririorities in life! Autumn: like me - i haven't checked the tests for a month because the dress wouldn't sew itself :D Israel: you've sewn a dres?? Israel: you didn't show it to me Autumn: oh didn't i? Autumn: there will be some photos ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Autumn and Israel are discussing sewing a dress, with Autumn teasingly showing Israel photos of her progress.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Autumn and Israel are discussing sewing a dress, with Autumn teasing Israel about not having time to sew one yet.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.26s | a

Processing Samples:  83%|████████▎ | 681/819 [35:04<07:57,  3.46s/it]


SAMPLE 681/819 | ID: samsum_test_0680
MESSAGE (preview): Lesley: Can you please go on a walk with the dog when you get home? Chloe: I don't know when I'm going to be home though. Lesley: What do you mean? Doesn't school finish up at 3PM? Chloe: Yes, but I was going to go to Megan's house tonight. Lesley: Chloe, you didn't ask me if you could go. You cannot just assume that you are going.  Chloe: Mom, seriously. I already told her I was going to come ove ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: Lesley tells Chloe she must come home after school to take care of the dog, rather than going straight to her friend's house.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.40s | attempts: 1
  summary: Lesley tells Chloe she must come home after school to take care of the dog, rather than going to a friend's house first.

AGENT 03: llama3.1:8b
  status: 

Processing Samples:  84%|████████▍ | 691/819 [35:35<07:00,  3.29s/it]


SAMPLE 691/819 | ID: samsum_test_0690
MESSAGE (preview): Mia: God, he keeps talking about the stupid wall Jennifer: I think he's just senile Karine: unfortunately it's much more Karine: this populist strategy to scare people with a minority Karine: and then present himself as the only savior Peter: Karine is right, I think he's not as senile and stupid as we would like to believe Mia: so it's even worse - he's pure evil Peter: possibly yes Karine: I rea ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.62s | attempts: 1
  summary: The group is discussing Donald Trump, with some members labeling him as "pure evil" and others describing him as both stupid and dangerous due to his confidence and persistence.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.78s | attempts: 1
  summary: A group of friends discuss their concerns about a politician, specifically Donald Trump, and agree tha

Processing Samples:  86%|████████▌ | 701/819 [36:08<06:39,  3.39s/it]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.33s | attempts: 1
  summary: Charlie and Daisy reconnect after a previous encounter in Sicily, and plan to meet up again in Florence, Italy.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.34s | attempts: 1
  summary: Charlie and Daisy, who met in Sicily, plan to meet again in Florence, Italy on September 1st.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.38s | attempts: 1
  summa

Processing Samples:  87%|████████▋ | 711/819 [36:37<04:17,  2.38s/it]


SAMPLE 711/819 | ID: samsum_test_0710
MESSAGE (preview): Shanon: Yeah. So what bout him? Victor: He's no longer our teacher :) Oliver: I'm so happy :) Sid: Me too! Shanon: Bt y? Sid: He got fired 2day and thrown out of school? Oliver: Literally thrown out! Shanon: What?! Victor: I suppose he's had it and lost it. Oliver: Or maybe had one too many ;) Sid: Not. Our. Teacher. Anymore!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.79s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Their teacher has been fired and literally thrown out of school.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Their teacher has been fired and physically removed from the school.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.82s | attempts: 1
  summary: Their tea

Processing Samples:  88%|████████▊ | 721/819 [37:08<04:42,  2.89s/it]


SAMPLE 721/819 | ID: samsum_test_0720
MESSAGE (preview): Greg: Hi Mum, how's the cat doing? Patti: I just rang the vets, she's fine! Greg: Thank God, been worrying about her all day! Patti: They said I can pick her up later. I'll pop home and fetch the cage after work. Should be there at 5ish. Greg: Good, see you at home, bye!
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: The cat is fine and will be picked up from the vet's office later that day.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: The cat is fine and will be picked up from the vet later that day.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.94s | attempts: 1
  summary: The cat is fine and will be picked up from the ve

Processing Samples:  89%|████████▉ | 731/819 [37:38<04:51,  3.31s/it]


SAMPLE 731/819 | ID: samsum_test_0730
MESSAGE (preview): Molly: D‑': Luca: What??? Molly: I can't think anymore today! Luca: LOL! Molly: Seriously! Luca: Deep breath... Molly: It's the silly season, isn't it? Luca: Yep. Just hang in there. Molly: Trying... Luca: Try harder! LOL! Molly: Driving me nuts! Luca: I know, but you can't let it get you down! Molly: Yeah, you're right. Luca: Of course I am! LOL! Molly: Modest, too.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.09s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated during the holiday season, but Luca tries to cheer her up with humor.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.20s | attempts: 1
  summary: Molly is feeling overwhelmed and frustrated during the holiday season, but Luca tries to cheer her up with jokes and encouragement.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Mo

Processing Samples:  90%|█████████ | 741/819 [38:09<04:04,  3.14s/it]


SAMPLE 741/819 | ID: samsum_test_0740
MESSAGE (preview): Joyce: Honey, can I take the car? Kyle: No, I'm sorry. Joyce: Why not? How am I supposed to go get Harry? Kyle: I've got to go to a mechanic. The car is making a weird noise. Joyce: Then can you get Harry on your way? Kyle: I don't know, I'm already running late. Joyce: Am I supposed to go get him by bus? That's ridiculous. Kyle: Fine, I'll get him. Joyce: Thank you.
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Kyle is unable to take the car because he needs to visit a mechanic and will instead pick up Harry.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.10s | attempts: 1
  summary: Kyle is unable to take Joyce to pick up Harry because he has a car problem and is already running late.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Kyle is unable to let Joyce take the car bec

Processing Samples:  92%|█████████▏| 751/819 [38:44<04:17,  3.78s/it]


SAMPLE 751/819 | ID: samsum_test_0750
MESSAGE (preview): Bobby: U know what annoys me the most? Mickey: Not being able to watch TV or listen to music without pesky interruptions? Bobby: That too, but the thing that annoys me the most is pharmaceuticals, medicine and drugs. Mickey: Y? Aren't they good 4 ur health? Bobby: They may well be, but not for my wallet! Mickey: What do u mean? Bobby: I was watching a film last night and suddenly there's a bunch o ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.90s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell them as medical products to make money, citing an example of a non-existent illness being marketed as a legitimate condition.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.90s | attempts: 1
  summary: Bobby believes that pharmaceutical companies invent illnesses and sell products to 

Processing Samples:  93%|█████████▎| 761/819 [39:14<03:04,  3.18s/it]


SAMPLE 761/819 | ID: samsum_test_0760
MESSAGE (preview): Pamela: Have you tried applying for the RGS grant? Zoe: I did last year. Zoe: Wasn't lucky... Xiara: Me neither Pamela: Do you know if I have to be a member? Zoe: It depends on the award Pamela: Ok I'll have a look at their website 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.12s | attempts: 1
  summary: Pamela and her friends discuss applying for the RGS grant, but none of them were successful last year.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.19s | attempts: 1
  summary: Pamela and her friends discuss applying for the RGS grant, but none of them were successful in getting it last year.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.99s | attempts: 1
  summary: Pamela is inquiring about eligibility for the RGS grant and whether membership is required.

AGENT 04: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summ

Processing Samples:  94%|█████████▍| 771/819 [39:46<02:34,  3.22s/it]


SAMPLE 771/819 | ID: samsum_test_0770
MESSAGE (preview): Matthew: so is there any after party planed after the wedding? Kristina: after the wedding? I think it will last till the morning... Matthew: haha, i mean on the next day :D Bart: i think Mark said there is something planned Bart: at their house probably Matthew: oh thats good Matthew: better atmosphere and everything, more chill Kristina: that's great, i 've been to an after party like this one K ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.34s | attempts: 1
  summary: There is an after-party planned for the day after the wedding at Mark's house, and several people are excited to attend.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.34s | attempts: 1
  summary: There is an after-party planned for the day after the wedding at Mark's house, and several people are planning to attend.

AGENT 03: llama3.1:8b
  status: OK |

Processing Samples:  95%|█████████▌| 781/819 [40:14<01:47,  2.83s/it]


SAMPLE 781/819 | ID: samsum_test_0780
MESSAGE (preview): Karine: <file_video> Jessie: That was us at kindergarten! Karine: yes!! 
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 02: llama3.1:8b
  status: OK | latency: 0.68s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 03: llama3.1:8b
  status: OK | latency: 0.78s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 04: llama3.1:8b
  status: OK | latency: 0.77s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGENT 05: llama3.1:8b
  status: OK | latency: 0.80s | attempts: 1
  summary: Karine and Jessie shared a video of themselves from their kindergarten days.

AGE

Processing Samples:  97%|█████████▋| 791/819 [40:45<01:26,  3.09s/it]


SAMPLE 791/819 | ID: samsum_test_0790
MESSAGE (preview): Louis: <file_gif> Archie: Wtf? Lol! Louis: <file_gif> Archie: That one looks like Melanie Louis: True! Hahahahah Archie: Just like in this pic Archie: <file_photo> Louis: Lol! It's her! Archie: By the way, she was way hotter when she had cury hair Louis: I disagree, sir! She looks absolutely gorgeous with straight hair Archie: Oh well, you have the right to an opinion even if it's wrong XD
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.06s | attempts: 1
  summary: Louis and Archie are joking about a photo of Melanie, comparing her appearance with different hairstyles.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Louis and Archie are jokingly discussing a photo of Melanie, a woman they know.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.03s | attempts: 1
  summary: Louis and Archie are jokingly discussin

Processing Samples:  98%|█████████▊| 801/819 [41:17<00:57,  3.22s/it]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.26s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through photos of different options.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.31s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the process by sending suggestions based on photos.

AGENT 03

Processing Samples:  99%|█████████▉| 811/819 [41:43<00:22,  2.76s/it]


SAMPLE 811/819 | ID: samsum_test_0810
MESSAGE (preview): Luke: Ryanair to cancel flights affecting 400,000 more passengers. So unlucky Erin! Erin: Yeeeep...it touched me as well finally 😩😩😩😩 Now they screw all my plans to go to Egypt... Luke: You’re not the only one Erin: Good to know that my feelings are shared with other 399,999 people 😩 Nathan: Relevant news story for everyone to consider: http://waterfordwhispersnews.com/2017/09/18/ryanair-cancel-ef ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Ryanair has announced it will cancel more flights, affecting over 400,000 passengers.

AGENT 02: llama3.1:8b
  status: OK | latency: 1.17s | attempts: 1
  summary: Ryanair has announced it will cancel more flights, affecting around 400,000 passengers.

AGENT 03: llama3.1:8b
  status: OK | latency: 1.01s | attempts: 1
  summary: Ryanair has cancelled flight

Processing Samples: 100%|██████████| 819/819 [42:10<00:00,  3.09s/it]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_homo18_llama3.1_8b_1767416982/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_homo18_llama3.1_8b_1767416982/run_log.txt





In [73]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &

import os, time, json, asyncio, math
from pathlib import Path

# Assumes these exist already:
# DATASET_PATH, RESULTS_DIR, RUN_OPTIONS
# call_ollama(model, prompt, client, options) -> (txt, latency)
# ollama_is_up() -> bool
# get_consensus(list_of_strings) -> (best_idx, meta)
# check_factuality(message, summary) -> any

def _truncate(s: str, n: int = 400):
    if s is None:
        return ""
    s = str(s)
    return s if len(s) <= n else s[:n] + " ...[truncated]"

def _format_candidate_block(c, idx=None):
    header = f"AGENT {idx+1:02d}" if idx is not None else "AGENT"
    lat = f"{c.get('latency', 0.0):.2f}s"
    att = c.get("attempts", 1)
    status = "OK" if not str(c.get("summary","")).startswith("ERROR") else "ERROR"
    corr = "CORRECT" if c.get("is_correct_prompt") else "INCORRECT"
    return (
        f"{header} [{corr}]\n"
        f"  status: {status} | latency: {lat} | attempts: {att}\n"
        f"  summary: {c.get('summary','')}\n"
    )

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

def _counts_for_k(k: int):
    if k == 3:
        return 2, 1
    c = int(math.ceil(0.6 * k))
    w = k - c
    return c, w

async def run_eval_homo_mixed_3_6_12_18(
    model="llama3.1:8b",
    max_samples=None,
    sizes=(3, 6, 12, 18),
    concurrency=6,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
    seed=1337,
):
    import httpx
    from tqdm import tqdm
    import random

    sizes = tuple(sorted(set(int(x) for x in sizes)))
    assert sizes == (3, 6, 12, 18), f"Expected sizes (3,6,12,18), got {sizes}"

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    # Compute max needed once (for k=18)
    max_k = max(sizes)
    max_correct, max_wrong = _counts_for_k(max_k)
    assert max_correct + max_wrong == max_k

    run_id = f"colab_homo_mixed_3_6_12_18_{model.replace(':','_').replace('/','_')}_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting Mixed-Prompt Homogeneous Evaluation: {run_id}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print(f"Model: {model} | sizes={sizes} | per-sample calls={max_k} | concurrency={concurrency} | samples={len(dataset)}")
    print(f"Max mix for k={max_k}: correct={max_correct}, incorrect={max_wrong}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    rng = random.Random(seed)

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            msg = item["message"]

            # Correct prompt (normal summarization)
            prompt_correct = (
                "Summarize this message in one short sentence.\n\n"
                f"{msg}\n\nSummary:"
            )

            # Incorrect prompt: intentionally wrong summary for robustness testing
            # This is explicit to avoid accidental “creative paraphrase” being treated as incorrect.
            prompt_incorrect = (
                "Summarize this message in one short sentence, but make the summary intentionally incorrect.\n"
                "Change key facts (who did what, what happened, or the outcome) so it contradicts the message.\n\n"
                f"{msg}\n\nIntentionally incorrect summary:"
            )

            # Build a randomized order of correct/incorrect calls for this sample
            # We will still keep them in two buckets (correct list + incorrect list) for easy slicing later.
            flags = [True] * max_correct + [False] * max_wrong
            rng.shuffle(flags)

            async def _one(agent_idx: int, is_correct: bool):
                async with sem:
                    p = prompt_correct if is_correct else prompt_incorrect
                    txt, lat, attempts = await call_with_retry(model, p, client, options, retries=3)
                    return {
                        "model": model,
                        "agent_idx": int(agent_idx),
                        "is_correct_prompt": bool(is_correct),
                        "summary": txt,
                        "latency": float(lat),
                        "attempts": int(attempts),
                    }

            # One batch per sample (18 calls), reused for all sizes
            batch = await asyncio.gather(*[_one(i, flags[i]) for i in range(max_k)])

            correct_candidates = [c for c in batch if c["is_correct_prompt"]]
            wrong_candidates = [c for c in batch if not c["is_correct_prompt"]]

            # Ensure we have enough of each type
            if len(correct_candidates) < max_correct or len(wrong_candidates) < max_wrong:
                # This should not happen, but keep it safe
                correct_candidates = (correct_candidates + wrong_candidates)[:max_correct]
                wrong_candidates = wrong_candidates[:max_wrong]

            results_by_size = {}

            for k in sizes:
                c_need, w_need = _counts_for_k(k)
                chosen = correct_candidates[:c_need] + wrong_candidates[:w_need]

                valid_summaries = [
                    c["summary"] for c in chosen
                    if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
                ]

                if valid_summaries:
                    best_idx_valid, meta = get_consensus(valid_summaries)
                    swarm_summary = valid_summaries[best_idx_valid]
                else:
                    meta = {}
                    swarm_summary = "ERROR: All agents failed"

                results_by_size[str(k)] = {
                    "k": int(k),
                    "correct_needed": int(c_need),
                    "incorrect_needed": int(w_need),
                    "swarm_summary": swarm_summary,
                    "swarm_candidates": chosen,
                    "consensus_metadata": meta,
                    "factuality": check_factuality(msg, swarm_summary),
                }

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                sep = "=" * 90
                header = f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}"
                msg_preview = _truncate(msg.replace("\n", " "), 400)

                print("\n" + sep)
                print(header)
                print("MESSAGE (preview):", msg_preview)
                print("-" * 90)

                _log_line("\n" + sep)
                _log_line(header)
                _log_line("MESSAGE:")
                _log_line(msg)
                _log_line("-" * 90)

                # Print only the k=18 candidate blocks (full set), since smaller ks are subsets
                chosen18 = results_by_size["18"]["swarm_candidates"]
                for j, c in enumerate(chosen18):
                    c_out = dict(c)
                    if truncate_console and isinstance(c_out["summary"], str):
                        c_out["summary"] = _truncate(c_out["summary"], truncate_console)
                    block = _format_candidate_block(c_out, idx=j)
                    print(block)
                    _log_line(_format_candidate_block(c, idx=j))

                print("-" * 90)
                for k in sizes:
                    r = results_by_size[str(k)]
                    print(f"k={k} | correct={r['correct_needed']} incorrect={r['incorrect_needed']} | avg_similarity={r['consensus_metadata'].get('avg_similarity')}")
                    print("SWARM SUMMARY:", r["swarm_summary"])
                    print("-" * 90)

                print(sep + "\n")

            record = {
                "id": item.get("id"),
                "model": model,
                "scenario": "mixed_prompts_incorrect_injection",
                "sizes": list(sizes),
                "max_k": int(max_k),
                "seed": int(seed),
                "results_by_size": results_by_size,
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(record) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# Run
await run_eval_homo_mixed_3_6_12_18(
    model="llama3.1:8b",
    max_samples=None,
    concurrency=6,
    print_each=True,
    print_every=100,
    truncate_console=300,
    options=RUN_OPTIONS,
    seed=1337,
)


^C
Starting Mixed-Prompt Homogeneous Evaluation: colab_homo_mixed_3_6_12_18_llama3.1_8b_1767419513
Results saved to: /content/slm_evaluation/results/colab_homo_mixed_3_6_12_18_llama3.1_8b_1767419513/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_homo_mixed_3_6_12_18_llama3.1_8b_1767419513/run_log.txt
Model: llama3.1:8b | sizes=(3, 6, 12, 18) | per-sample calls=18 | concurrency=6 | samples=819
Max mix for k=18: correct=11, incorrect=7
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:06<1:23:00,  6.09s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 3.74s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 02 [CORRECT]
  status: OK | latency: 3.74s | attempts: 1
  summary: Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Larry instead.

AGENT 03 [CORRECT]
  status: OK | latency: 3.68s | attempts: 1
  summary: Hannah asked Amanda for Betty's pho

Processing Samples:  12%|█▏        | 101/819 [06:50<51:42,  4.32s/it]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.60s | attempts: 1
  summary: Chris and Rick discuss and arrange a Skype call for the next day at 3pm Rick's time, when he is in Cuba.

AGENT 02 [CORRECT]
  status: OK | latency: 1.52s | attempts: 1
  summary: Chris and Rick discuss and agree on a Skype call for the next day at 3pm, when they are in Cuba.

AGENT 03 [CORRECT]
  status: OK | latency: 1.66s | attempts: 1
  summary: Chris and

Processing Samples:  25%|██▍       | 201/819 [13:30<38:55,  3.78s/it]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.22s | attempts: 1
  summary: Sam is finishing work at 5 and will bring someone over around 9 the next morning.

AGENT 02 [CORRECT]
  status: OK | latency: 1.26s | attempts: 1
  summary: Sam will bring someone over to Abdellilah's place on Saturday morning around 9am.

AGENT 03 [CORRECT]
  status: OK | latency: 1.22s | attempts: 1
  summary: Sam is finishing work at 5 and will bring someone over around 9 the next morning.

AGENT 04 [CORRECT]
  status: OK | latency: 0.84s | attempts: 1
  summary: Sam is finishing work at 5 and will bring someone over around 9 t

Processing Samples:  37%|███▋      | 301/819 [20:15<32:46,  3.80s/it]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.02s | attempts: 1
  summary: Jerry and Janet are exchanging sweet and casual messages as Jerry heads home from work.

AGENT 02 [CORRECT]
  status: OK | latency: 0.92s | attempts: 1
  summary: Jerry and Janet are exchanging sweet messages as Jerry heads home to see her.

AGENT 03 [CORRECT]
  status: OK | latency: 1.00s | attempts: 1
  summary: Jerry and Janet are exchanging affectionate messages as Jerry heads home after work.

AGENT 04 [CORRECT]
  status: O

Processing Samples:  49%|████▉     | 401/819 [27:04<33:25,  4.80s/it]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.24s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskating disco.

AGENT 02 [CORRECT]
  status: OK | latency: 1.32s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskating disco.

AGENT 03 [CORRECT]
  status: OK | latency: 1.03s | attempts: 1
  summary: Susan had a fun date with a man who took her to rollerskating disco.

AGE

Processing Samples:  61%|██████    | 501/819 [33:49<22:22,  4.22s/it]


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.07s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper because they were out.

AGENT 02 [CORRECT]
  status: OK | latency: 1.00s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper and tissues.

AGENT 03 [CORRECT]
  status: OK | latency: 1.08s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper because they were out.

AGENT 04 [CORRECT]
  status: OK | latency: 0.99s | attempts: 1
  summary: Helen asked Simon to buy her more toilet paper because t

Processing Samples:  73%|███████▎  | 601/819 [40:12<12:37,  3.48s/it]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.14s | attempts: 1
  summary: The group discusses and confirms that a passport is required for travel to Bosnia and Herzegovina.

AGENT 02 [CORRECT]
  status: OK | latency: 1.33s | attempts: 1
  summary: The group is preparing for a trip to Bosnia and Herzegovina and Michael reminds them that a passport is required due to border control.

AGENT 03 [CORRECT]
  status: OK | latency: 1.28s |

Processing Samples:  86%|████████▌ | 701/819 [47:16<08:58,  4.56s/it]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.65s | attempts: 1
  summary: Charlie and Daisy plan to meet up in Florence, Italy on September 1st after Charlie decides to take a train there from Naples.

AGENT 02 [CORRECT]
  status: OK | latency: 1.64s | attempts: 1
  summary: Charlie and Daisy, who had a romantic evening together in Sicily, plan to meet again in Florence, Italy on September 1-5.

AGENT 03 [CORRECT]
  status: OK | la

Processing Samples:  98%|█████████▊| 801/819 [53:57<01:12,  4.05s/it]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 [CORRECT]
  status: OK | latency: 1.54s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the process by suggesting specific types of flowers.

AGENT 02 [CORRECT]
  status: OK | latency: 1.53s | attempts: 1
  summary: Linda asks her dad for help choosing flowers to buy for their mother and he assists her by suggesting specific types of flowers.

AGENT 03

Processing Samples: 100%|██████████| 819/819 [55:05<00:00,  4.04s/it]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_homo_mixed_3_6_12_18_llama3.1_8b_1767419513/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_homo_mixed_3_6_12_18_llama3.1_8b_1767419513/run_log.txt





In [77]:
!pkill -f "ollama serve" || true
!nohup env \
  OLLAMA_MODELS="/content/" \
  OLLAMA_KEEP_ALIVE="24h" \
  OLLAMA_MAX_LOADED_MODELS="18" \
  OLLAMA_NUM_PARALLEL="16" \
  OLLAMA_MAX_QUEUE="2048" \
  ollama serve > /tmp/ollama.log 2>&1 &
import os, time, json, asyncio
from pathlib import Path

# Assumes these already exist in your notebook:
# DATASET_PATH, RESULTS_DIR, RUN_OPTIONS
# call_ollama(model, prompt, client, options) -> (txt, latency)
# ollama_is_up() -> bool
# get_consensus(list_of_strings) -> (best_idx, meta)
# check_factuality(message, summary) -> any

async def call_with_retry(model, prompt, client, options, retries=3):
    last_txt, last_lat = None, None
    for k in range(1, retries + 1):
        txt, lat = await call_ollama(model, prompt, client, options)
        if isinstance(txt, str) and not txt.startswith("ERROR"):
            return txt, lat, k
        last_txt, last_lat = txt, lat
        await asyncio.sleep(0.5 * k)
    return last_txt, last_lat, retries

async def run_eval_k3_same_prompt(
    run_name: str,
    models: list,
    max_samples=None,
    concurrency=3,
    print_each=True,
    print_every=10,
    truncate_console=300,
    write_text_log=True,
    options=RUN_OPTIONS,
):
    import httpx
    from tqdm import tqdm

    assert len(models) == 3, f"Expected exactly 3 models, got {len(models)}"

    if not os.path.exists(DATASET_PATH):
        print(f"ERROR: Dataset not found at {DATASET_PATH}. Upload samsum.jsonl to /content/")
        return

    if not await ollama_is_up():
        print("ERROR: Ollama not reachable at http://127.0.0.1:11434")
        print("Try: !tail -n 120 /tmp/ollama.log")
        return

    dataset = []
    with open(DATASET_PATH, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                dataset.append(json.loads(line))

    if max_samples is not None:
        dataset = dataset[:max_samples]

    run_id = f"{run_name}_{int(time.time())}"
    run_dir = Path(RESULTS_DIR) / run_id
    run_dir.mkdir(parents=True, exist_ok=True)
    out_file = run_dir / "outputs.jsonl"
    log_file = run_dir / "run_log.txt"

    print(f"Starting: {run_id}")
    print(f"Models: {models}")
    print(f"Results saved to: {out_file}")
    if write_text_log:
        print(f"Text log saved to: {log_file}")
    print("-" * 80)

    sem = asyncio.Semaphore(concurrency)

    def _log_line(s: str):
        if write_text_log:
            with open(log_file, "a", encoding="utf-8") as lf:
                lf.write(s + "\n")

    def _truncate(s: str, n: int):
        if s is None:
            return ""
        s = str(s)
        return s if len(s) <= n else s[:n] + " ...[truncated]"

    async with httpx.AsyncClient(timeout=300.0) as client:
        for idx_item, item in enumerate(tqdm(dataset, desc="Processing Samples")):
            msg = item["message"]
            prompt = f"Summarize this message in one short sentence:\n\n{msg}\n\nSummary:"

            async def _one(i: int, m: str):
                async with sem:
                    txt, lat, attempts = await call_with_retry(m, prompt, client, options, retries=3)
                    return {
                        "model": m,
                        "agent_idx": int(i),
                        "summary": txt,
                        "latency": float(lat) if lat is not None else None,
                        "attempts": int(attempts),
                    }

            candidates = await asyncio.gather(*[_one(i, models[i]) for i in range(3)])

            valid_summaries = [
                c["summary"] for c in candidates
                if isinstance(c["summary"], str) and not c["summary"].startswith("ERROR")
            ]

            if valid_summaries:
                best_idx, meta = get_consensus(valid_summaries)
                swarm_summary = valid_summaries[best_idx]
            else:
                meta = {}
                swarm_summary = "ERROR: All agents failed"

            do_print = print_each and ((idx_item % print_every) == 0)
            if do_print:
                print("\n" + "=" * 90)
                print(f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}")
                print("MESSAGE (preview):", _truncate(msg.replace("\n", " "), 400))
                print("-" * 90)
                for j, c in enumerate(candidates):
                    s = c["summary"]
                    if truncate_console and isinstance(s, str):
                        s = _truncate(s, truncate_console)
                    print(f"AGENT {j+1:02d} | model={c['model']} | latency={float(c.get('latency') or 0.0):.2f}s | attempts={c['attempts']}")
                    print(" ", s)
                print("-" * 90)
                print("CONSENSUS avg_similarity:", meta.get("avg_similarity"))
                print("SWARM SUMMARY:", swarm_summary)
                print("=" * 90 + "\n")

                _log_line("=" * 90)
                _log_line(f"SAMPLE {idx_item+1}/{len(dataset)} | ID: {item.get('id','NA')}")
                _log_line("MESSAGE:")
                _log_line(msg)
                _log_line("-" * 90)
                for j, c in enumerate(candidates):
                    _log_line(f"AGENT {j+1:02d} | model={c['model']} | latency={c.get('latency')} | attempts={c['attempts']}")
                    _log_line(c.get("summary", ""))
                _log_line("-" * 90)
                _log_line(f"CONSENSUS avg_similarity: {meta.get('avg_similarity')}")
                _log_line(f"SWARM SUMMARY: {swarm_summary}")
                _log_line("=" * 90)

            record = {
                "id": item.get("id"),
                "run_name": run_name,
                "k": 3,
                "models": models,
                "swarm_summary": swarm_summary,
                "swarm_candidates": candidates,
                "consensus_metadata": meta,
                "factuality": check_factuality(msg, swarm_summary),
            }

            with open(out_file, "a", encoding="utf-8") as f:
                f.write(json.dumps(record) + "\n")

    print(f"\n✓ Done. outputs.jsonl: {out_file}")
    if write_text_log:
        print(f"✓ Done. run_log.txt:  {log_file}")

# ------------------------
# HOMO-3: same model x3
# ------------------------
async def run_homo3(model="llama3.1:8b", **kwargs):
    await run_eval_k3_same_prompt(
        run_name=f"colab_homo3_{model.replace(':','_').replace('/','_')}",
        models=[model, model, model],
        **kwargs
    )

# ------------------------
# HETRO-3: 3 different models
# ------------------------
async def run_hetero3(models=("llama3.1:8b", "qwen2.5:7b-instruct", "mistral:7b-instruct"), **kwargs):
    await run_eval_k3_same_prompt(
        run_name="colab_hetero3",
        models=list(models),
        **kwargs
    )

# Example:
await run_homo3(model="llama3.1:8b", max_samples=None, concurrency=10, print_every=100, options=RUN_OPTIONS)
# await run_hetero3(models=("llama3.1:8b","qwen2.5:7b-instruct","gemma:7b-instruct"), max_samples=50, concurrency=3, options=RUN_OPTIONS)


^C
Starting: colab_homo3_llama3.1_8b_1767423259
Models: ['llama3.1:8b', 'llama3.1:8b', 'llama3.1:8b']
Results saved to: /content/slm_evaluation/results/colab_homo3_llama3.1_8b_1767423259/outputs.jsonl
Text log saved to: /content/slm_evaluation/results/colab_homo3_llama3.1_8b_1767423259/run_log.txt
--------------------------------------------------------------------------------


Processing Samples:   0%|          | 1/819 [00:03<47:33,  3.49s/it]


SAMPLE 1/819 | ID: samsum_test_0000
MESSAGE (preview): Hannah: Hey, do you have Betty's number? Amanda: Lemme check Hannah: <file_gif> Amanda: Sorry, can't find it. Amanda: Ask Larry Amanda: He called her last time we were at the park together Hannah: I don't know him well Hannah: <file_gif> Amanda: Don't be shy, he's very nice Hannah: If you say so.. Hannah: I'd rather you texted him Amanda: Just text him 🙂 Hannah: Urgh.. Alright Hannah: Bye Amanda:  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=3.43s | attempts=1
  Hannah asked Amanda for Betty's phone number, but Amanda suggested she ask Larry instead.
AGENT 02 | model=llama3.1:8b | latency=3.43s | attempts=1
  Hannah asked Amanda for Betty's phone number, but Amanda suggested she ask Larry instead.
AGENT 03 | model=llama3.1:8b | latency=3.45s | attempts=1
  Hannah asked Amanda for Betty's phone number, but Amanda suggested asking Lar

Processing Samples:  12%|█▏        | 101/819 [01:24<09:04,  1.32it/s]


SAMPLE 101/819 | ID: samsum_test_0100
MESSAGE (preview): Chris: Hi there! Where are you? Any chance of skyping? Rick: Hi! Our last two days in Cancun before flying to Havana. Yeah, skyping is an idea. When would it suit you? Rick: We don't have the best of connections in the room but I can get you pretty well in the lobby. Chris: What's the time in your place now? Rick: 6:45 pm Chris: It's a quarter to one in the morning here. Am still in front of the b ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.87s | attempts=1
  Chris and Rick discuss scheduling a Skype call for the next day at 3pm, when they are in Cuba and Chris is back home.
AGENT 02 | model=llama3.1:8b | latency=0.81s | attempts=1
  Chris and Rick discuss scheduling a Skype call for the next day at 3pm, when they are in Cuba.
AGENT 03 | model=llama3.1:8b | latency=0.82s | attempts=1
  Chris and Rick discuss and arrange a Skype

Processing Samples:  25%|██▍       | 201/819 [02:42<07:39,  1.34it/s]


SAMPLE 201/819 | ID: samsum_test_0200
MESSAGE (preview): Abdellilah: Where are you? Sam: work Abdellilah: What time you finish? Sam: Not til 5 Abdellilah: Are your bringing him over tonight: Sam: No in the morning: Abdellilah: ok, what time? Sam: About 9. Is that ok? Abdellilah: ok - see you then
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.66s | attempts=1
  Sam will bring someone over to Abdellilah's place on Wednesday at around 9 am.
AGENT 02 | model=llama3.1:8b | latency=0.65s | attempts=1
  Sam will bring someone over to Abdellilah's place on Saturday at around 9am.
AGENT 03 | model=llama3.1:8b | latency=0.61s | attempts=1
  Sam is finishing work at 5 and will bring someone over around 9 the next morning.
------------------------------------------------------------------------------------------
CONSENSUS avg_similarity: 0.767013688882192
SWARM SUMMARY: Sam will bring someone over to Abdellilah'

Processing Samples:  37%|███▋      | 301/819 [04:02<06:36,  1.30it/s]


SAMPLE 301/819 | ID: samsum_test_0300
MESSAGE (preview): Jerry: Hi sweetie :) Janet: Hi sugar ;) Jerry: I'm coming home Janet: Can't wait ;) Jerry: I should be there in 40 minutes Janet: Ok, I'm waiting for you :) Jerry: How was your day? Janet: Oh, it was ok but my boss is a pain in the ass sometimes Jerry: I know, she can be a bitch :P Janet: Yes she can! ;) Jerry: See you later darling Janet: <3
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.61s | attempts=1
  Jerry and Janet are exchanging sweet and playful messages as Jerry heads home after work.
AGENT 02 | model=llama3.1:8b | latency=0.60s | attempts=1
  Jerry and Janet are exchanging affectionate messages as Jerry heads home from work.
AGENT 03 | model=llama3.1:8b | latency=0.58s | attempts=1
  Jerry and Janet are exchanging sweet and playful messages as Jerry heads home after work.
--------------------------------------------------------------

Processing Samples:  49%|████▉     | 401/819 [05:23<06:28,  1.08it/s]


SAMPLE 401/819 | ID: samsum_test_0400
MESSAGE (preview): Jill: so how was your date anyway? :) Susan: it was perfect, he was so sweet!! <3 Jill: tell me everything!! Susan: so first he picked me up from home, all dressed up and everything Jill: suit on? :D Susan: nooo, not like that, but he had a really nice black shirt, elegant shoes, looked like Brad Pitt hahaha Jill: hahahahaha Susan: <file_gif> Jill: so where did he take you? Susan: that's the best  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.65s | attempts=1
  Susan had a fun date with a man who took her to rollerskating disco.
AGENT 02 | model=llama3.1:8b | latency=0.70s | attempts=1
  Susan had a fun and unconventional date with a man who took her rollerskating at a disco.
AGENT 03 | model=llama3.1:8b | latency=0.66s | attempts=1
  Susan had a fun date with a man who took her to rollerskating disco.
---------------------------

Processing Samples:  61%|██████    | 501/819 [06:43<04:16,  1.24it/s]


SAMPLE 501/819 | ID: samsum_test_0500
MESSAGE (preview): Helen: Hey, Simo, are you there? Simon: Yep babe, what's up? Helen: I was calling you before... Simon: Sorry I was on the phone, I didn't hear you... Tell me. Helen: It's a bit embarrassing... The toilet paper is finished, could you fetch me some tissues, please? Simon: Hahaha sure, no worries!
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.55s | attempts=1
  Helen asked Simon to buy her more toilet paper because they were out.
AGENT 02 | model=llama3.1:8b | latency=0.61s | attempts=1
  Helen asked Simon to buy her more tissue because they were out of toilet paper.
AGENT 03 | model=llama3.1:8b | latency=0.59s | attempts=1
  Helen asked Simon to buy her more toilet paper because they were out.
------------------------------------------------------------------------------------------
CONSENSUS avg_similarity: 0.961122969786326
SWARM SUMMARY: Helen

Processing Samples:  73%|███████▎  | 601/819 [08:00<02:37,  1.38it/s]


SAMPLE 601/819 | ID: samsum_test_0600
MESSAGE (preview): Daniel: I'm packing for the trip, are you done with it? Michael: Mostly done, don't forget your passport! Matt: is an ID not enough? Michael: sure not! Brian: why? Michael: Bosnia and Herzegovina is not in the EU, so we will have a proper border control Michael: no passport, no entry Daniel: good you wrote us this I think :P Matt: yes, we're morons  Daniel: haha, yes, a bit Brian: I didn't even kn ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.76s | attempts=1
  The group is reminded that a passport is required for travel to Bosnia and Herzegovina due to its non-EU status.
AGENT 02 | model=llama3.1:8b | latency=0.81s | attempts=1
  The group is discussing travel requirements for a trip to Bosnia and Herzegovina, where a passport is necessary due to border control.
AGENT 03 | model=llama3.1:8b | latency=0.82s | attempts=1
  The gro

Processing Samples:  86%|████████▌ | 701/819 [09:23<01:45,  1.12it/s]


SAMPLE 701/819 | ID: samsum_test_0700
MESSAGE (preview): Charlie: How are you doing. Daisy: Good, seeing a lot of beautiful places Charlie: Where are you now? Daisy: in Naples  Charlie: Pity, I hoped you may be still in Sicily. Daisy: No, we took a ferry early in the morning yesterday. Charlie: I am still thinking about the evening we spent together. Daisy: Yes, it was nice. Charlie: I would like to meet you again. Daisy: But where? Charlie: Either in t ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.71s | attempts=1
  Charlie and Daisy make plans to meet up again in Florence, Italy on September 1st.
AGENT 02 | model=llama3.1:8b | latency=0.74s | attempts=1
  Charlie and Daisy make plans to meet up again in Florence, Italy on September 1st.
AGENT 03 | model=llama3.1:8b | latency=0.84s | attempts=1
  Charlie and Daisy, who had a romantic evening together in Sicily, plan to meet again in F

Processing Samples:  98%|█████████▊| 801/819 [10:43<00:14,  1.24it/s]


SAMPLE 801/819 | ID: samsum_test_0800
MESSAGE (preview): Linda: Hi Dad, I want to buy flowers for mum! But I don't remember which one she likes :( Michael: Well, she likes all the flowers I believe Linda: That doesn't help! I'm on a flower market right now! Michael: Send me some pics then Linda: <file_photo>  Michael: Tulips are nice, roses too Linda:  What about carnations? Michael: No, carnations are boring :D Linda: Thanks Dad, srsly… Michael:  What  ...[truncated]
------------------------------------------------------------------------------------------
AGENT 01 | model=llama3.1:8b | latency=0.82s | attempts=1
  Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the process by suggesting specific types of flowers.
AGENT 02 | model=llama3.1:8b | latency=0.72s | attempts=1
  Linda asks her dad for help choosing flowers to buy for their mother and he guides her through the options.
AGENT 03 | model=llama3.1:8b | latency=0.75s | attempts=1
 

Processing Samples: 100%|██████████| 819/819 [10:56<00:00,  1.25it/s]


✓ Done. outputs.jsonl: /content/slm_evaluation/results/colab_homo3_llama3.1_8b_1767423259/outputs.jsonl
✓ Done. run_log.txt:  /content/slm_evaluation/results/colab_homo3_llama3.1_8b_1767423259/run_log.txt





In [78]:
!cp -r /content/slm_evaluation /content/drive/MyDrive/slm_evaluation_new