# Anthropic Interviewer – Human-Only Conversation Dataset

**Dataset source:** Hugging Face `Anthropic/AnthropicInterviewer` (workforce, creatives, scientists splits).

**Goal:** Merge all splits and keep only human speech for analysis — one row per transcript with human turns in chronological order.

## BDD

### Imports et préparation

In [1]:
# pip install datasets pandas tqdm pyarrow  # uncomment if needed

from datasets import load_dataset
import pandas as pd
import re
from tqdm import tqdm
from pathlib import Path

### BDD et préparation – Chargement et fusion des splits

In [2]:
ds = load_dataset("Anthropic/AnthropicInterviewer")

workforce = ds["workforce"].to_pandas()
creatives = ds["creatives"].to_pandas()
scientists = ds["scientists"].to_pandas()

workforce["split"] = "workforce"
creatives["split"] = "creatives"
scientists["split"] = "scientists"

df_raw = pd.concat([
    workforce[["split", "transcript_id", "text"]],
    creatives[["split", "transcript_id", "text"]],
    scientists[["split", "transcript_id", "text"]],
], ignore_index=True)

print("Shape:", df_raw.shape)
print("\nSplit counts:")
print(df_raw["split"].value_counts())

Shape: (1250, 3)

Split counts:
split
workforce     1000
creatives      125
scientists     125
Name: count, dtype: int64


### Utilitaires de parsing (speaker, nettoyage, transcript)

In [3]:
def detect_speaker(line):
    """Return 'AI', 'HUMAN', or 'unknown' based on line prefix."""
    line_lower = line.strip().lower()
    if any(line_lower.startswith(p) for p in ("assistant:", "claude:", "interviewer:")):
        return "AI"
    if any(line_lower.startswith(p) for p in ("user:", "participant:", "interviewee:")):
        return "HUMAN"
    return "unknown"


def clean_text(text):
    """Remove speaker prefix and normalize whitespace."""
    if not text or not isinstance(text, str):
        return ""
    text = re.sub(r"^(Assistant|Claude|Interviewer|User|Participant|Interviewee):\s*", "", text, flags=re.IGNORECASE)
    return " ".join(text.split()).strip()


def parse_transcript(text):
    """
    Parse transcript into list of (speaker, text) turns.
    Splits on common speaker prefixes and returns chronological turns.
    """
    if not text or not isinstance(text, str):
        return []
    lines = [ln.strip() for ln in text.split("\n") if ln.strip()]
    turns = []
    current_speaker = None
    current_text = []
    for line in lines:
        speaker = detect_speaker(line)
        if speaker != "unknown":
            if current_speaker is not None and current_text:
                turns.append((current_speaker, clean_text(" ".join(current_text))))
            current_speaker = speaker
            current_text = [line]
        else:
            if current_text:
                current_text.append(line)
    if current_speaker is not None and current_text:
        turns.append((current_speaker, clean_text(" ".join(current_text))))
    return turns

### Construction du jeu human-only (1 ligne = 1 transcript)

In [4]:
MAX_HUMAN_TURNS = 40

rows = []
for _, row in tqdm(df_raw.iterrows(), total=len(df_raw), desc="Parsing transcripts"):
    split = row["split"]
    transcript_id = row["transcript_id"]
    text = row["text"]
    turns = parse_transcript(text)
    human_turns = [t[1] for t in turns if t[0] == "HUMAN" and t[1]]
    truncated = len(human_turns) > MAX_HUMAN_TURNS
    if truncated:
        human_turns = human_turns[:MAX_HUMAN_TURNS]
    n_human_turns = len(human_turns)
    out = {
        "split": split,
        "transcript_id": transcript_id,
        "n_human_turns": n_human_turns,
        "truncated": truncated,
    }
    for i in range(1, MAX_HUMAN_TURNS + 1):
        out[f"human_turn_{i:02d}"] = human_turns[i - 1] if i <= n_human_turns else ""
    rows.append(out)

df_final = pd.DataFrame(rows)
df_final

Parsing transcripts: 100%|██████████| 1250/1250 [00:01<00:00, 1088.50it/s]


Unnamed: 0,split,transcript_id,n_human_turns,truncated,human_turn_01,human_turn_02,human_turn_03,human_turn_04,human_turn_05,human_turn_06,...,human_turn_31,human_turn_32,human_turn_33,human_turn_34,human_turn_35,human_turn_36,human_turn_37,human_turn_38,human_turn_39,human_turn_40
0,workforce,work_0000,13,False,"No, I don't have any questions. Let's do it! A...",I pretty rarely use AI in my typical workday. ...,I will open the AI model and provide it with a...,"I always modify them. They're never ""perfect""....",I do not believe there are any tasks that AI c...,"I turn to AI when I'm stuck. For example, if t...",...,,,,,,,,,,
1,workforce,work_0001,12,False,That sounds good. AI: Great! Let's dive in the...,I use AI sparingly at my job. I only use it to...,I use Grammarly primarily. I'd been asked to u...,Sure. I almost always use its spelling suggest...,"For the most part, I'd prefer to handle anythi...",Sure. It's important that my clients can under...,...,,,,,,,,,,
2,workforce,work_0002,11,False,"No questions, we can begin AI: Great! Let's di...",I've used it primarily in the creation of spre...,"So using my specific use case example, I would...","From my past experience, it usually takes a co...","well the deciding factor for me is, if Im spen...","Unfortunately, I haven't found any tasks or si...",...,,,,,,,,,,
3,workforce,work_0003,13,False,Soinds good. Let's proceed. AI: Great! Let's d...,I'm in real estate so I use Ai for a variety o...,For property descriptions I will typically ent...,"I definitely include the basics of room count,...","Definitely, I've chosen not to utilize any Ai ...",I think I would need to feel confident that th...,...,,,,,,,,,,
4,workforce,work_0004,10,False,"Sounds good to me, let's begin. AI: Great! Let...",I'm a data analyst at SUEZ and I use AI for a ...,"Yes , I often tell it to write the entire code...",I'd prefer if AI could handle it independently...,"Yes , for super sensitive information, like bu...",I am the only person who really uses AI. I wor...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1245,scientists,science_0120,8,False,"That sounds good, I have no questions at this ...",A recent project I've worked on is building a ...,"I use Python extensively in my work, because i...",I found that it was much faster to decompose s...,Sometimes decisions are made using information...,I'll often test the functions on small synthet...,...,,,,,,,,,,
1246,scientists,science_0121,11,False,Yes that sounds good. I have no questions. AI:...,The research component of my work largely invo...,The first time we used AI was to ask about how...,We needed a data structure that enabled us to ...,We asked the AI to pull out the data points we...,"So, calculating weighted arithmetic means of g...",...,,,,,,,,,,
1247,scientists,science_0122,9,False,"No AI: Great! Let's dive in then. To start, co...",I was working on getting ideas for a dissertat...,I am still mulling over the type of dissertati...,Having AI search for keywords reduced the time...,"If a reference was provided by the authors, I ...",I haven't begun developing my methodology as y...,...,,,,,,,,,,
1248,scientists,science_0123,10,False,Sounds good to me! AI: Wonderful! Let's dive r...,I've been working with my student on investiga...,"I had the idea when I was in grad school, but ...","Yes, not initially, but when it came time to a...","I've wanted to learn R for a long time, but I'...",I had a gut feeling based on visuals what the ...,...,,,,,,,,,,


### Contrôles de cohérence

In [5]:
print("Total transcripts:", len(df_final))
print("Mean n_human_turns:", df_final["n_human_turns"].mean().round(2))
print("\nStats per split:")
print(df_final.groupby("split").agg(
    count=("transcript_id", "count"),
    mean_turns=("n_human_turns", "mean"),
).round(2))
print("\n% truncated:", (df_final["truncated"].sum() / len(df_final) * 100).round(1), "%")

# Assert no AI text in human turn columns
ai_prefixes = ("assistant:", "claude:", "interviewer:")
turn_cols = [c for c in df_final.columns if c.startswith("human_turn_")]
for col in turn_cols:
    for val in df_final[col].dropna():
        if val and isinstance(val, str):
            val_lower = val[:50].lower()
            assert not any(val_lower.startswith(p) for p in ai_prefixes), f"AI text in {col}"
print("\nAssertion passed: no AI text in df_final human turn columns.")

Total transcripts: 1250
Mean n_human_turns: 11.6

Stats per split:
            count  mean_turns
split                        
creatives     125       10.07
scientists    125        9.28
workforce    1000       12.08

% truncated: 0.0 %

Assertion passed: no AI text in df_final human turn columns.


### Export (CSV et Parquet)

In [6]:
out_dir = Path("./out")
out_dir.mkdir(parents=True, exist_ok=True)

df_final.to_csv(out_dir / "df_final.csv", index=False)
df_final.to_parquet(out_dir / "df_final.parquet", index=False)

print("Saved:", out_dir / "df_final.csv", out_dir / "df_final.parquet")

Saved: out\df_final.csv out\df_final.parquet


## Début d'analyses

### 1ère analyse : Discours vs réalité

Dans cette partie, je m’intéresse à ce que j’appelle le décalage entre le discours et la réalité du travail avec l’IA.

Quand les personnes parlent de leur usage de l’IA dans un cadre officiel, comme une interview, elles ne décrivent pas seulement ce qu’elles font concrètement. Elles décrivent aussi ce qu’il est socialement acceptable de dire.

On observe très souvent des formulations prudentes, par exemple : “Je m’en sers juste pour corriger l’orthographe” ou “je fais attention à ne pas trop déléguer”.

Mais dans le même entretien, ces mêmes personnes expliquent ensuite qu’elles génèrent des campagnes complètes, des rapports, ou qu’elles s’appuient sur l’IA pour prendre des décisions importantes.

L’objectif ici n’est pas de dire que les gens mentent, mais de montrer qu’il existe déjà une norme implicite autour du “bon usage” de l’IA.

Autrement dit, l’impact de l’IA sur le travail commence d’abord dans la manière dont on en parle, avant même de se voir clairement dans les tâches elles-mêmes.

#### LLM Annotation (Ollama / qwen2.5:7b-instruct)

But : annoter automatiquement tous les tours **humains** présents dans `df_final` via un LLM local.
Modèle : `qwen2.5:7b-instruct` via Ollama (`http://localhost:11434`).
Outputs : `df_final_annot` (turn-level annoté), `df_metrics` (Axe 1) + graphes matplotlib.

In [8]:
import requests
import json
import time

OLLAMA_BASE_URL = "http://localhost:11434"
MODEL_NAME = "qwen2.5:7b-instruct"

def _ollama_get_tags():
    r = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=10)
    r.raise_for_status()
    return r.json()

try:
    tags = _ollama_get_tags()
    models = tags.get("models", [])
    model_names = [m.get("name") for m in models if isinstance(m, dict)]
    print("Ollama OK. Models available (sample):", model_names[:10])
    if MODEL_NAME not in model_names:
        print(f"\nModel '{MODEL_NAME}' not found in /api/tags.")
        print(f"Run: ollama pull {MODEL_NAME}")
    else:
        print(f"Model found: {MODEL_NAME}")
except Exception as e:
    print("Failed to reach Ollama at", OLLAMA_BASE_URL)
    raise

Ollama OK. Models available (sample): ['qwen2.5:7b-instruct']
Model found: qwen2.5:7b-instruct


In [9]:
import re

_INVISIBLES_RE = re.compile(r"[\u200B-\u200F\u202A-\u202E\u2060-\u206F\ufeff]")
_WS_RE = re.compile(r"[ \t\r\f\v]+")


def clean_text_basic(s):
    """Basic cleanup: remove invisibles, normalize whitespace/newlines, strip."""
    if s is None:
        return ""
    if not isinstance(s, str):
        s = str(s)
    s = _INVISIBLES_RE.sub("", s)
    s = s.replace("\r\n", "\n").replace("\r", "\n")
    # Normalize horizontal whitespace
    s = _WS_RE.sub(" ", s)
    # Collapse too many blank lines
    s = re.sub(r"\n{3,}", "\n\n", s)
    return s.strip()


def strict_json_load(s):
    """Load strict JSON; if it fails, try extracting first JSON array block."""
    if s is None:
        raise ValueError("Cannot parse JSON from None")
    if not isinstance(s, str):
        s = str(s)

    try:
        return json.loads(s)
    except Exception:
        pass

    # Try extract first top-level JSON array
    start = s.find("[")
    end = s.rfind("]")
    if start != -1 and end != -1 and end > start:
        candidate = s[start : end + 1]
        return json.loads(candidate)

    raise ValueError("Invalid JSON (expected a JSON array)")

In [10]:
SYSTEM_PROMPT = """You are an annotation engine.

Task: For each human turn, classify the *stance* about AI usage.

Labels (choose exactly one):
- MINIMIZATION: downplays AI use, frames it as minor, safe, superficial, or "just" for small tasks.
- PRACTICAL_USE: describes concrete, substantive, or workflow-integrated AI use (generation, decision support, automation, production use).
- NEUTRAL: neither downplays nor describes substantive use; unclear/mixed without clear stance.

Rules:
- Do NOT judge truthfulness, honesty, or intention.
- Do NOT infer beyond the text.
- Return STRICT JSON ONLY. No prose, no markdown.

Output schema: a JSON array of objects, one per input record, with fields:
- row_id (string): copied from input
- label (string): one of MINIMIZATION | PRACTICAL_USE | NEUTRAL
- confidence (number): between 0 and 1
- evidence (string): a short quote/span from the input supporting the label
"""


def _validate_annotation_item(item):
    if not isinstance(item, dict):
        raise ValueError("Each output item must be an object")
    for k in ("row_id", "label", "confidence", "evidence"):
        if k not in item:
            raise ValueError(f"Missing field: {k}")

    row_id = item["row_id"]
    label = item["label"]
    conf = item["confidence"]
    ev = item["evidence"]

    if not isinstance(row_id, str) or not row_id:
        raise ValueError("row_id must be a non-empty string")
    if label not in {"MINIMIZATION", "PRACTICAL_USE", "NEUTRAL"}:
        raise ValueError("Invalid label")
    if not isinstance(conf, (int, float)) or not (0 <= float(conf) <= 1):
        raise ValueError("confidence must be in [0,1]")
    if not isinstance(ev, str):
        raise ValueError("evidence must be a string")

    return {
        "row_id": row_id,
        "label": label,
        "confidence": float(conf),
        "evidence": ev.strip(),
    }


def call_ollama(batch_records, max_retries=2, sleep_s=0.5):
    """Call Ollama generate API with strict JSON parsing + repair retries."""
    if not isinstance(batch_records, list) or not batch_records:
        return []

    input_json = json.dumps(batch_records, ensure_ascii=False)

    base_prompt = (
        "INPUT JSON:\n"
        + input_json
        + "\n\nOUTPUT JSON ONLY:"  # force strict
    )

    def _post(prompt_text):
        payload = {
            "model": MODEL_NAME,
            "stream": False,
            "system": SYSTEM_PROMPT,
            "prompt": prompt_text,
            "options": {"temperature": 0},
        }
        r = requests.post(f"{OLLAMA_BASE_URL}/api/generate", json=payload, timeout=120)
        r.raise_for_status()
        data = r.json()
        return data.get("response", "")

    prompt = base_prompt
    last_raw = None

    for attempt in range(max_retries + 1):
        raw = _post(prompt)
        last_raw = raw
        try:
            parsed = strict_json_load(raw)
            if not isinstance(parsed, list):
                raise ValueError("Top-level JSON must be an array")

            cleaned = [_validate_annotation_item(x) for x in parsed]

            expected = {str(r.get("row_id")) for r in batch_records}
            cleaned = [x for x in cleaned if x["row_id"] in expected]

            return cleaned
        except Exception as e:
            if attempt >= max_retries:
                raise ValueError(
                    "Ollama JSON parse/validation failed after retries. "
                    f"Last error: {e}\nLast raw output:\n{last_raw}"
                )

            prompt = (
                "The previous output was invalid JSON or did not match the schema.\n"
                "Return ONLY a valid JSON array matching the schema exactly.\n"
                "Do not add any commentary.\n\n"
                "INPUT JSON:\n"
                + input_json
                + "\n\nINVALID OUTPUT (for reference):\n"
                + (raw if isinstance(raw, str) else str(raw))
                + "\n\nOUTPUT JSON ONLY:"
            )
            time.sleep(sleep_s)

    return []

In [11]:
import pandas as pd
import numpy as np

# Source unique
assert "df_final" in globals(), "df_final is not defined in the notebook"

# Detect wide (human_turn_XX) vs already turn-level
wide_turn_cols = [c for c in df_final.columns if isinstance(c, str) and c.startswith("human_turn_")]

if wide_turn_cols:
    # Build turn-level table from df_final (in-memory only)
    id_vars = [c for c in ["split", "transcript_id"] if c in df_final.columns]
    if "transcript_id" not in df_final.columns:
        raise ValueError("df_final must contain transcript_id")
    if "split" not in df_final.columns:
        print("Warning: df_final has no 'split' column; graphs by split will be limited.")

    df_turns = df_final[id_vars + wide_turn_cols].melt(
        id_vars=id_vars,
        value_vars=wide_turn_cols,
        var_name="turn_col",
        value_name="text",
    )
    # turn_id from suffix
    df_turns["turn_id"] = df_turns["turn_col"].str.replace("human_turn_", "", regex=False)
    df_turns["turn_id"] = pd.to_numeric(df_turns["turn_id"], errors="coerce").astype("Int64")

    TEXT_COL = "text"
else:
    # Turn-level already
    candidates = ["human_text", "text", "turn_text", "cleaned_text", "utterance"]
    found = [c for c in candidates if c in df_final.columns]
    if not found:
        raise ValueError(
            "Could not find a text column in df_final. Expected one of " + ", ".join(candidates)
        )
    TEXT_COL = found[0]
    df_turns = df_final.copy()

# Clean + drop empty
_df_text = df_turns[TEXT_COL].apply(clean_text_basic)
df_turns[TEXT_COL] = _df_text

df_turns = df_turns[df_turns[TEXT_COL].astype(str).str.len() > 0].copy()

# Stable row_id
if "transcript_id" in df_turns.columns and "turn_id" in df_turns.columns:
    df_turns["row_id"] = df_turns["transcript_id"].astype(str) + "::" + df_turns["turn_id"].astype(str)
else:
    df_turns["row_id"] = df_turns.index.astype(str)

print("Turn-level rows to annotate:", len(df_turns))
print("TEXT_COL:", TEXT_COL)
print(df_turns[[c for c in ["split", "transcript_id", "turn_id", "row_id", TEXT_COL] if c in df_turns.columns]].head(3))

Turn-level rows to annotate: 14495
TEXT_COL: text
       split transcript_id  turn_id        row_id  \
0  workforce     work_0000        1  work_0000::1   
1  workforce     work_0001        1  work_0001::1   
2  workforce     work_0002        1  work_0002::1   

                                                text  
0  No, I don't have any questions. Let's do it! A...  
1  That sounds good. AI: Great! Let's dive in the...  
2  No questions, we can begin AI: Great! Let's di...  


In [None]:
from math import ceil

batch_size = 5  # configurable

MAX_CHARS = 800  # truncate input text to max 600–800 characters
records = [
    {"row_id": rid, "text": (txt[:MAX_CHARS] if isinstance(txt, str) else str(txt)[:MAX_CHARS])}
    for rid, txt in zip(df_turns["row_id"].astype(str), df_turns[TEXT_COL].astype(str))
]

print("Total records:", len(records), "| batch_size:", batch_size, "| batches:", ceil(len(records)/batch_size))

results_by_row_id = {}
failed_batches = 0

for start in range(0, len(records), batch_size):
    batch = records[start : start + batch_size]
    batch_ids = [b["row_id"] for b in batch]

    try:
        out = call_ollama(batch)
        for item in out:
            results_by_row_id[item["row_id"]] = item
    except Exception as e:
        failed_batches += 1
        print(f"Batch {start//batch_size + 1} failed:", e)
        # Continue; we'll mark missing as NA

    if (start // batch_size + 1) % 10 == 0:
        print(f"Progress: {start + len(batch)}/{len(records)} | annotated: {len(results_by_row_id)} | failed_batches: {failed_batches}")

# Attach to df_turns
ann = pd.DataFrame.from_records(list(results_by_row_id.values()))

if ann.empty:
    print("No annotations returned.")
    df_final_annot = df_turns.copy()
    df_final_annot["label"] = pd.NA
    df_final_annot["confidence"] = pd.NA
    df_final_annot["evidence"] = pd.NA
else:
    df_final_annot = df_turns.merge(ann[["row_id", "label", "confidence", "evidence"]], on="row_id", how="left")

print("Annotated turn-level df shape:", df_final_annot.shape)
print(df_final_annot[[c for c in ["split", "transcript_id", "turn_id", "row_id", "label", "confidence"] if c in df_final_annot.columns]].head(5))

Total records: 14495 | batch_size: 5 | batches: 2899
Progress: 50/14495 | annotated: 50 | failed_batches: 0
Progress: 100/14495 | annotated: 100 | failed_batches: 0
Progress: 150/14495 | annotated: 150 | failed_batches: 0
Progress: 200/14495 | annotated: 200 | failed_batches: 0


In [None]:
# Metrics Axe 1 (discours vs réalité) à partir de df_final_annot

required_cols = [c for c in ["transcript_id", "split", "label"] if c in df_final_annot.columns]
if "transcript_id" not in df_final_annot.columns or "label" not in df_final_annot.columns:
    raise ValueError("df_final_annot must have at least transcript_id and label")

# Ordering key
if "turn_id" in df_final_annot.columns:
    order_col = "turn_id"
else:
    order_col = None

rows = []
for transcript_id, g in df_final_annot.groupby("transcript_id", sort=False):
    split = g["split"].iloc[0] if "split" in g.columns else "(unknown)"

    labels = g["label"].dropna().astype(str)
    has_min = (labels == "MINIMIZATION").any()
    has_use = (labels == "PRACTICAL_USE").any()
    discursive_gap = bool(has_min and has_use)

    first_min = np.nan
    first_use = np.nan

    if order_col is not None:
        gg = g.copy()
        # ensure numeric order where possible
        gg[order_col] = pd.to_numeric(gg[order_col], errors="coerce")
        gg = gg.sort_values(order_col)
        min_rows = gg[gg["label"] == "MINIMIZATION"]
        use_rows = gg[gg["label"] == "PRACTICAL_USE"]
        if len(min_rows):
            first_min = float(min_rows[order_col].iloc[0])
        if len(use_rows):
            first_use = float(use_rows[order_col].iloc[0])
    else:
        # fall back to appearance order in the dataframe
        min_idx = g.index[g["label"] == "MINIMIZATION"]
        use_idx = g.index[g["label"] == "PRACTICAL_USE"]
        if len(min_idx):
            first_min = float(min_idx.min())
        if len(use_idx):
            first_use = float(use_idx.min())

    gap_distance = np.nan
    if np.isfinite(first_min) and np.isfinite(first_use):
        gap_distance = first_use - first_min

    rows.append(
        {
            "split": split,
            "transcript_id": transcript_id,
            "has_min": has_min,
            "has_use": has_use,
            "discursive_gap": discursive_gap,
            "first_min_turn_id": first_min,
            "first_use_turn_id": first_use,
            "gap_distance": gap_distance,
        }
    )

df_metrics = pd.DataFrame(rows)
print("df_metrics shape:", df_metrics.shape)
print(df_metrics.head())

if "split" in df_metrics.columns:
    print("\nDiscursive gap rate by split:")
    print(df_metrics.groupby("split")["discursive_gap"].mean().sort_values(ascending=False).round(3))

In [None]:
import matplotlib.pyplot as plt

# 1) Bar: discursive_gap rate by split
if "split" in df_metrics.columns and df_metrics["split"].nunique() > 0:
    rates = df_metrics.groupby("split")["discursive_gap"].mean().sort_index()
    plt.figure(figsize=(7, 4))
    plt.bar(rates.index.astype(str), rates.values)
    plt.ylim(0, 1)
    plt.ylabel("Discursive gap rate")
    plt.title("Discursive gap rate by split")
    plt.xticks(rotation=20, ha="right")
    plt.tight_layout()
    plt.show()
else:
    print("No 'split' available in df_metrics for split-level bar chart.")

# 2) Stacked bar: label distribution by split
if "split" in df_final_annot.columns:
    tmp = df_final_annot.copy()
    tmp = tmp[tmp["label"].notna()].copy()
    counts = (
        tmp.groupby(["split", "label"]).size().unstack("label").fillna(0).astype(int)
    )
    # Ensure consistent label order
    for col in ["MINIMIZATION", "PRACTICAL_USE", "NEUTRAL"]:
        if col not in counts.columns:
            counts[col] = 0
    counts = counts[["MINIMIZATION", "PRACTICAL_USE", "NEUTRAL"]]

    splits = counts.index.astype(str)
    bottoms = np.zeros(len(counts))

    plt.figure(figsize=(8, 4))
    for lab in counts.columns:
        vals = counts[lab].values
        plt.bar(splits, vals, bottom=bottoms, label=lab)
        bottoms += vals

    plt.ylabel("# turns")
    plt.title("Label distribution by split (turn-level)")
    plt.xticks(rotation=20, ha="right")
    plt.legend(title="label")
    plt.tight_layout()
    plt.show()
else:
    print("No 'split' available in df_final_annot for stacked bar chart.")

# 3) Histogram: gap_distance overall
vals = df_metrics["gap_distance"].dropna().astype(float)
if len(vals):
    plt.figure(figsize=(7, 4))
    plt.hist(vals.values, bins=30)
    plt.xlabel("gap_distance (first_use - first_min)")
    plt.ylabel("# transcripts")
    plt.title("Distribution of gap_distance")
    plt.tight_layout()
    plt.show()
else:
    print("No finite gap_distance values to plot.")

In [None]:
from pathlib import Path

out_dir = Path("./out")
out_dir.mkdir(parents=True, exist_ok=True)

# Exports finaux
(df_final_annot).to_csv(out_dir / "df_final_annot.csv", index=False)
(df_metrics).to_csv(out_dir / "df_metrics.csv", index=False)

# Parquet (si dépendance dispo)
try:
    (df_final_annot).to_parquet(out_dir / "df_final_annot.parquet", index=False)
    print("Saved Parquet:", out_dir / "df_final_annot.parquet")
except Exception as e:
    print("Parquet export skipped (missing engine like pyarrow/fastparquet). Error:", e)

print("Saved:", out_dir / "df_final_annot.csv", "and", out_dir / "df_metrics.csv")

In [None]:
# --- Model switch (Ollama) ---
# Switch the local model used for annotation.

MODEL = "mistral-7b-instruct"  # or: "llama3.1:13b-instruct"
MODEL_NAME = MODEL  # backward-compat for earlier cells

# Verify Ollama is reachable + model is available
r = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=10)
r.raise_for_status()
tags = r.json()
model_names = [m.get("name") for m in tags.get("models", []) if isinstance(m, dict)]
print("Ollama OK. Models available (sample):", model_names[:10])

if MODEL not in model_names:
    print(f"\nModel '{MODEL}' not found in /api/tags.")
    print(f"Run: ollama pull {MODEL}")
else:
    print(f"Model found: {MODEL}")

In [None]:
# --- call_ollama override to use MODEL ---
# Same logic as before, only switching the target model + num_predict.


def call_ollama(batch_records, max_retries=2, sleep_s=0.5):
    """Call Ollama generate API with strict JSON parsing + repair retries."""
    if not isinstance(batch_records, list) or not batch_records:
        return []

    input_json = json.dumps(batch_records, ensure_ascii=False)

    base_prompt = (
        "INPUT JSON:\n"
        + input_json
        + "\n\nOUTPUT JSON ONLY:"  # force strict
    )

    def _post(prompt_text):
        payload = {
            "model": MODEL,
            "stream": False,
            "system": SYSTEM_PROMPT,
            "prompt": prompt_text,
            "options": {
                "temperature": 0,
                "num_predict": 120,
            },
        }
        r = requests.post(
            f"{OLLAMA_BASE_URL}/api/generate",
            json=payload,
            timeout=(10, None),
        )
        r.raise_for_status()
        data = r.json()
        return data.get("response", "")

    prompt = base_prompt
    last_raw = None

    for attempt in range(max_retries + 1):
        raw = _post(prompt)
        last_raw = raw
        try:
            parsed = strict_json_load(raw)
            if not isinstance(parsed, list):
                raise ValueError("Top-level JSON must be an array")

            cleaned = [_validate_annotation_item(x) for x in parsed]

            expected = {str(r.get("row_id")) for r in batch_records}
            cleaned = [x for x in cleaned if x["row_id"] in expected]

            return cleaned
        except Exception as e:
            if attempt >= max_retries:
                raise ValueError(
                    "Ollama JSON parse/validation failed after retries. "
                    f"Last error: {e}\nLast raw output:\n{last_raw}"
                )

            prompt = (
                "The previous output was invalid JSON or did not match the schema.\n"
                "Return ONLY a valid JSON array matching the schema exactly.\n"
                "Do not add any commentary.\n\n"
                "INPUT JSON:\n"
                + input_json
                + "\n\nINVALID OUTPUT (for reference):\n"
                + (raw if isinstance(raw, str) else str(raw))
                + "\n\nOUTPUT JSON ONLY:"
            )
            time.sleep(sleep_s)

    return []


print("call_ollama now using MODEL =", MODEL)

### 2ème analyse : l'IA transforme le travail