# Dialogue Summarization Workshop. From English to Low Resource Languages

This notebook is a hands on, end to end tutorial on summarizing multi speaker dialogue. You will start with English, then switch into a low resource mindset by restricting data, adding noise, and applying techniques that transfer well when you have little labeled data and limited tools.

"
"You can run everything on CPU. If you have a GPU, the optional LLM section will run faster.

"
"## What you will build

"
"1. A clean dialogue dataset from raw text.
"
"2. A strong non LLM baseline summarizer (TextRank).
"
"3. An LLM based summarizer with prompt engineering (zero shot, one shot, few shot).
"
"4. A small evaluation harness (ROUGE like metrics, plus sanity checks).
"
"5. A low resource adaptation playbook you can reuse for any language.

"
"## How to use this notebook

"
"- Cells marked **Checkpoint** are the recommended stopping points.
"
"- Cells marked **Challenge** are optional. They still run out of the box, then you can modify them to earn extra points.
"
"- If you get stuck, re run from the top. Most steps are deterministic when the random seed is fixed.


## 0. Setup

### Option A. Minimal pip setup (recommended for workshops)

Run the next cell. It installs only what this notebook uses.

### Option B. Conda environment (Python 3.9)

If you prefer an isolated environment.

```bash
conda create -n dialogue-sum python=3.9 -y
conda activate dialogue-sum
pip install -U pip
pip install "numpy<2" pandas scikit-learn networkx matplotlib tqdm
pip install "transformers==4.49.0" "datasets==3.2.0" "accelerate>=0.25.0" sentencepiece
pip install rouge-score evaluate
pip install ipywidgets
```

If you are on a managed cluster, you may need to load a CUDA module before installing PyTorch. Use the PyTorch install command recommended for your platform.


In [None]:
# Minimal dependencies for this notebook.
# If you already have these installed, you can skip this cell.

import sys, subprocess

def pip_install(pkgs):
    cmd = [sys.executable, "-m", "pip", "install", "-q"] + pkgs
    print("Running:", " ".join(cmd))
    subprocess.check_call(cmd)

pkgs = [
    "numpy<2",
    "pandas",
    "scikit-learn",
    "networkx",
    "matplotlib",
    "tqdm",
    "rouge-score",
    "evaluate",
    "transformers==4.49.0",
    "datasets==3.2.0",
    "accelerate",
    "sentencepiece",
]
try:
    import ipywidgets  # noqa
except Exception:
    pkgs.append("ipywidgets")

try:
    pip_install(pkgs)
    print("Done.")
except Exception as e:
    print("Install step failed. You can continue if you already have the packages.")
    print("Error:", repr(e))


In [None]:
import re
import math
import random
from typing import List, Dict

import numpy as np
import pandas as pd
from tqdm.auto import tqdm

random.seed(842)
np.random.seed(842)

print("Imports ready.")


## 1. Get a real dialogue dataset

To keep this notebook self contained, we use a public domain English play. The text is not invented for this tutorial. It is an excerpt from *The Importance of Being Earnest* by Oscar Wilde (first published in 1895, public domain in many jurisdictions).

We will parse it into speaker turns, then create short dialogue segments that resemble real conversations.


In [None]:
RAW_TEXT = '''
[Enter Lane.]

LANE. Why, Mr. Worthing, I suppose this is one of your pleasant
surprises? I have been expecting you back some time ago.

JACK. I have not been able to return sooner. I have been detained in
town.

LANE. I have received a message from Mr. Algernon. He says he will be
down at four o'clock.

JACK. Is Mr. Algernon here?

LANE. Yes, sir. He is in the dining-room.

JACK. I must see him at once.

[Enter Algernon.]

ALGERNON. How are you, my dear Ernest? What brings you up to town?

JACK. Oh, pleasure, pleasure. What else should bring one anywhere?

ALGERNON. Eating as usual, I see.

JACK. I believe it is customary in good society to take some
refreshment at five o'clock.

ALGERNON. Well, it is a custom that I approve of, and I will do my best
to start it again. However, you are not quite truthful. You did not
come up for pleasure.

JACK. What on earth do you mean?

ALGERNON. You came up to town to tell me to keep away from your cousin.

JACK. My cousin?

ALGERNON. Yes. That charming girl you are always talking about.

JACK. Cecily?

ALGERNON. Cecily. She is my cousin now, you know.

JACK. You have never met her.

ALGERNON. She is my cousin because I intend to marry her.
'''

def parse_play_to_turns(text: str) -> pd.DataFrame:
    """
    Parse a simple play excerpt into (speaker, utterance) turns.
    Assumptions.
    1) Speaker turns look like 'NAME.' at the start of a line.
    2) Stage directions are in [brackets] or parentheses and are dropped.

    Returns a DataFrame with columns: turn_id, speaker, text.
    """
    lines = [ln.strip() for ln in text.strip().splitlines() if ln.strip()]
    turns = []
    current_speaker = None
    buffer = []

    def flush():
        nonlocal buffer, current_speaker
        if current_speaker and buffer:
            utt = " ".join(buffer).strip()
            utt = re.sub(r"\s+", " ", utt)
            if utt:
                turns.append({"speaker": current_speaker, "text": utt})
        buffer = []

    speaker_pat = re.compile(r"^([A-Z][A-Z\s'\-]+)\.\s*(.*)$")

    for ln in lines:
        if ln.startswith("[") and ln.endswith("]"):
            continue
        if ln.startswith("(") and ln.endswith(")"):
            continue

        m = speaker_pat.match(ln)
        if m:
            flush()
            current_speaker = m.group(1).strip()
            rest = m.group(2).strip()
            if rest:
                buffer.append(rest)
        else:
            buffer.append(ln)

    flush()
    df = pd.DataFrame(turns)
    df.insert(0, "turn_id", range(len(df)))
    return df

turns_df = parse_play_to_turns(RAW_TEXT)
turns_df.head(10)


### 1.1 Create dialogue windows

Many dialogue datasets are long conversations. Summarization is easier to teach with smaller windows. We will create overlapping windows of turns, then treat each window as a dialogue sample.

You can adjust the window size. Smaller windows are easier for small models. Larger windows stress test context handling.


In [None]:
def make_dialogue_windows(turns: pd.DataFrame, window_turns: int = 10, stride: int = 6) -> pd.DataFrame:
    """
    Convert a turn DataFrame into overlapping dialogue windows.

    Returns a DataFrame with: sample_id, dialogue_text, speakers_involved, n_turns.
    """
    samples = []
    n = len(turns)
    sample_id = 0
    for start in range(0, max(1, n - window_turns + 1), stride):
        end = min(n, start + window_turns)
        chunk = turns.iloc[start:end]
        dialogue_lines = [f"{r.speaker}: {r.text}" for r in chunk.itertuples()]
        dialogue_text = "\n".join(dialogue_lines)
        speakers = sorted(set(chunk["speaker"].tolist()))
        samples.append(
            {
                "sample_id": sample_id,
                "dialogue_text": dialogue_text,
                "speakers_involved": speakers,
                "n_turns": int(end - start),
            }
        )
        sample_id += 1
        if end == n:
            break
    return pd.DataFrame(samples)

samples_df = make_dialogue_windows(turns_df, window_turns=10, stride=6)
samples_df


### 1.2 Preview one sample

Read the dialogue. Then, in your own words, write a one sentence summary in the next cell. Keep it short. This will become our first human reference.


In [None]:
sample = samples_df.loc[0, "dialogue_text"]
print(sample)


In [None]:
# Your one sentence reference summary.
# You can edit this string. The notebook will still run if you do not.

REFERENCE_SUMMARY = "Jack arrives and learns Algernon is visiting, then Algernon teases Jack and reveals he plans to marry Jack's cousin Cecily."

print(REFERENCE_SUMMARY)


## 2. Baseline. Extractive TextRank summarization

Before using an LLM, build a baseline that is fast, cheap, and interpretable. TextRank selects the most central sentences using a similarity graph and PageRank.

This baseline is language agnostic, as long as you can split text into sentences. That is why it is valuable for low resource languages.


In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import networkx as nx

def split_sentences(text: str) -> List[str]:
    """
    Very simple sentence splitter.
    For robust multilingual splitting, consider spaCy or Stanza.
    """
    text = re.sub(r"\s+", " ", text).strip()
    sents = re.split(r"(?<=[\.\?\!])\s+", text)
    return [s.strip() for s in sents if s.strip()]

def textrank_summarize(dialogue_text: str, max_sentences: int = 2) -> str:
    """
    Extractive summarization using TextRank on sentence similarity.
    """
    content = re.sub(r"^[A-Z][A-Z\s'\-]+:\s*", "", dialogue_text, flags=re.MULTILINE)
    sentences = split_sentences(content)
    if not sentences:
        return ""
    if len(sentences) <= max_sentences:
        return " ".join(sentences)

    vectorizer = TfidfVectorizer(stop_words="english")
    X = vectorizer.fit_transform(sentences)
    sim = cosine_similarity(X)
    np.fill_diagonal(sim, 0.0)

    graph = nx.from_numpy_array(sim)
    scores = nx.pagerank(graph, max_iter=200)

    ranked = sorted(range(len(sentences)), key=lambda i: scores.get(i, 0.0), reverse=True)
    picked = sorted(ranked[:max_sentences])
    return " ".join([sentences[i] for i in picked])

baseline_summary = textrank_summarize(sample, max_sentences=2)
print("Baseline summary:\n", baseline_summary)


### 2.1 Quick evaluation. ROUGE

ROUGE is imperfect, but it is a quick sanity check. We will compute ROUGE 1, ROUGE 2, and ROUGE L against your reference summary.


In [None]:
from rouge_score import rouge_scorer

def rouge_scores(pred: str, ref: str) -> Dict[str, float]:
    scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
    scores = scorer.score(ref, pred)
    return {k: v.fmeasure for k, v in scores.items()}

print("ROUGE (baseline vs reference):")
rouge_scores(baseline_summary, REFERENCE_SUMMARY)


## 3. Mini quiz. What makes dialogue summarization harder?

Try to answer before running the cell. Then run it for instant feedback.


In [None]:
try:
    import ipywidgets as widgets
    from IPython.display import display
except Exception:
    widgets = None

QUESTION = "Which factor is most specific to dialogue summarization, compared to single speaker summarization?"
OPTIONS = [
    "A. Dialogues contain named entities.",
    "B. Dialogues include speaker turns and pragmatic intent.",
    "C. Dialogues use punctuation.",
    "D. Dialogues are always longer than articles.",
]
CORRECT = 1
EXPLANATION = "Speaker turns and pragmatic intent are core. You often need to resolve who said what and why."

def run_quiz():
    if widgets is None:
        print(QUESTION)
        for opt in OPTIONS:
            print(opt)
        print("\nCorrect:", OPTIONS[CORRECT])
        print("Explanation:", EXPLANATION)
        return

    radio = widgets.RadioButtons(options=OPTIONS, description="Your answer:")
    out = widgets.Output()

    def on_change(change):
        if change["name"] != "value":
            return
        with out:
            out.clear_output()
            idx = OPTIONS.index(change["new"])
            if idx == CORRECT:
                print("Correct.")
            else:
                print("Not quite.")
            print("Explanation:", EXPLANATION)

    radio.observe(on_change)
    display(radio, out)

run_quiz()


## 4. LLM summarization with prompt engineering (optional)

We will use a small instruction tuned model. By default we try `google/flan-t5-small`.

If you have no internet access, model download will fail. In that case, skip this section or point `MODEL_NAME` to a local model path.


In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "google/flan-t5-small"

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)

tokenizer = None
model = None

try:
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME).to(device)
    print("Loaded:", MODEL_NAME)
except Exception as e:
    print("Could not load model. Reason:", repr(e))
    print("You can continue with the baseline sections.")


### 4.1 Zero shot instruction prompt

Prompting matters. We will start with a plain instruction. Then we will refine it.

Goal. Produce a concise, faithful summary. Mention key decisions, conflicts, and planned actions.


In [None]:
def generate_summary_t5(dialogue_text: str, prompt: str, max_new_tokens: int = 80, temperature: float = 0.0, top_p: float = 1.0) -> str:
    if tokenizer is None or model is None:
        return "(LLM section skipped. Model not available.)"

    full_prompt = prompt.strip() + "\n\nDIALOGUE:\n" + dialogue_text.strip() + "\n\nSUMMARY:"
    inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True).to(device)

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=int(max_new_tokens),
            do_sample=temperature > 0.0,
            temperature=float(max(1e-6, temperature)),
            top_p=float(top_p),
        )

    return tokenizer.decode(output_ids[0], skip_special_tokens=True).strip()

ZERO_SHOT_PROMPT = "Summarize the following conversation in 1 to 2 sentences. Keep only the important information."

llm_zero = generate_summary_t5(sample, ZERO_SHOT_PROMPT)
print(llm_zero)


### 4.2 Prompt remix playground

You will remix a prompt by selecting options. This is a safe way to teach prompt engineering without making it feel abstract.

Pick your settings, then run the cell. Try to make the summary both concise and faithful.


In [None]:
STYLE_OPTIONS = ["neutral", "bullet", "tweet", "meeting_minutes"]
FOCUS_OPTIONS = ["decisions", "conflict", "relationships", "actions"]

def build_prompt(style: str, focus: str, max_sentences: int) -> str:
    style = style.lower().strip()
    focus = focus.lower().strip()

    base = f"Summarize the conversation in at most {max_sentences} sentences."
    if focus == "decisions":
        base += " Focus on decisions and commitments."
    elif focus == "conflict":
        base += " Focus on disagreements and what caused them."
    elif focus == "relationships":
        base += " Focus on who relates to whom and the social situation."
    elif focus == "actions":
        base += " Focus on actions and next steps."

    if style == "bullet":
        base += " Use 2 to 4 bullet points."
    elif style == "tweet":
        base += " Write it as a single tweet style sentence, under 240 characters."
    elif style == "meeting_minutes":
        base += " Format as meeting minutes with sections: Context, Key Points, Next Steps."

    base += " Do not invent facts. Preserve names."
    return base

def run_playground(style="neutral", focus="relationships", max_sentences=2):
    prompt = build_prompt(style, focus, max_sentences)
    print("Prompt:\n", prompt, "\n")
    out = generate_summary_t5(sample, prompt, max_new_tokens=120, temperature=0.0)
    print("Model output:\n", out)
    return out

llm_play = run_playground(style="meeting_minutes", focus="relationships", max_sentences=2)


### 4.3 One shot and few shot prompts

When you have little data, examples are powerful. We will create a small in notebook prompt set.

You can replace the examples with your own dialogues later.


In [None]:
EXAMPLE_DIALOGUE = """ALICE: Are we still meeting at 3?
BOB: Yes, but I will be 10 minutes late.
ALICE: Ok. Please bring the slides.
BOB: Will do."""

EXAMPLE_SUMMARY = "Alice and Bob confirm a 3 pm meeting. Bob will arrive 10 minutes late and will bring the slides."

ONE_SHOT_PROMPT = f"""Summarize the conversation in 1 to 2 sentences. Do not invent facts.

Example.
DIALOGUE:
{EXAMPLE_DIALOGUE}

SUMMARY:
{EXAMPLE_SUMMARY}

Now summarize this dialogue.
"""

llm_one = generate_summary_t5(sample, ONE_SHOT_PROMPT, max_new_tokens=120, temperature=0.0)
print(llm_one)


### 4.4 Generation parameters. Temperature and length

Temperature can change factuality. Length controls how much detail you get.

Use the sliders if available. Otherwise, edit the numbers and rerun.


In [None]:
def demo_generation_controls(temperature: float = 0.0, max_new_tokens: int = 80):
    prompt = build_prompt(style="neutral", focus="actions", max_sentences=2)
    out = generate_summary_t5(sample, prompt, max_new_tokens=max_new_tokens, temperature=temperature, top_p=0.95)
    print("temperature:", temperature, "max_new_tokens:", max_new_tokens)
    print(out)

try:
    import ipywidgets as widgets
    from IPython.display import display
    if tokenizer is None or model is None:
        raise RuntimeError("Model not available, skipping widgets.")
    ui = widgets.interactive(
        demo_generation_controls,
        temperature=widgets.FloatSlider(min=0.0, max=1.0, step=0.1, value=0.0),
        max_new_tokens=widgets.IntSlider(min=30, max=200, step=10, value=80),
    )
    display(ui)
except Exception:
    demo_generation_controls(temperature=0.0, max_new_tokens=80)
    demo_generation_controls(temperature=0.7, max_new_tokens=120)


## 5. Compare baselines vs LLM

We compare summaries and compute ROUGE against your reference.

In real work, you should also do human evaluation. For example factuality checks, missing action items, and speaker attribution.


In [None]:
results = []
results.append(("TextRank baseline", baseline_summary))
results.append(("LLM zero shot", llm_zero))
results.append(("LLM one shot", llm_one))
results.append(("LLM prompt remix", llm_play))

rows = []
for name, pred in results:
    rows.append({"system": name, "summary": pred, **rouge_scores(pred, REFERENCE_SUMMARY)})

pd.DataFrame(rows).sort_values("rougeL", ascending=False)


## 6. Low resource mode. Make English behave like a low resource language

Low resource usually means one or more of the following.
- Very little labeled data.
- Limited tools for tokenization, sentence splitting, and normalization.
- Domain mismatch. Your data looks different from what models saw during pre training.
- Orthography variation and borrowing, including code switching.

We will simulate these constraints in English by.
1) Reducing the available context.
2) Corrupting the text with noise and inconsistent spelling.
3) Removing punctuation, which hurts naive sentence splitting.

Then we apply strategies that transfer to true low resource settings.


In [None]:
def low_resource_corrupt(text: str, drop_punct_prob: float = 0.5, typo_prob: float = 0.08) -> str:
    rng = random.Random(842)
    out_chars = []
    for ch in text:
        if ch in ".?!," and rng.random() < drop_punct_prob:
            continue
        if ch.isalpha() and rng.random() < typo_prob:
            if rng.random() < 0.5:
                out_chars.append(ch.swapcase())
            else:
                out_chars.append(chr(((ord(ch.lower()) - 97 + 1) % 26) + 97))
        else:
            out_chars.append(ch)
    return "".join(out_chars)

low_text = low_resource_corrupt(sample, drop_punct_prob=0.8, typo_prob=0.05)
print(low_text[:600])


In [None]:
print("Baseline on clean text:")
print(textrank_summarize(sample, max_sentences=2))
print("\nBaseline on low resource corrupted text:")
print(textrank_summarize(low_text, max_sentences=2))


### 6.1 Strategy toolkit

Here are practical tactics that often help in low resource dialogue summarization.

1. Normalize input.
   - Fix common punctuation issues.
   - Normalize whitespace.
   - Normalize speaker labels.

2. Use robust segmentation.
   - If sentence splitting fails, summarize at turn level.

3. Constrain generation.
   - Use explicit length limits.
   - Instruct the model to preserve names, numbers, and decisions.

4. Add lightweight context.
   - Provide a glossary of names and places.
   - Provide a domain hint, such as "family conversation" or "customer support".

5. Evaluate with targeted checks.
   - Did we preserve who wants to marry whom.
   - Did we hallucinate actions that never happened.

We will implement 1 and 2 now.


In [None]:
def normalize_dialogue(text: str) -> str:
    text = text.replace("\t", " ")
    text = re.sub(r"\s+", " ", text)
    text = re.sub(r"([A-Z][A-Z\s'\-]+:)\s*", r"\n\1 ", text)
    return text.strip()

def turn_level_summarize(dialogue_text: str, max_turns: int = 3) -> str:
    """
    Extractive turn level summarization, more robust than sentence splitting.
    """
    lines = [ln.strip() for ln in dialogue_text.splitlines() if ln.strip()]
    lines = [ln for ln in lines if len(ln) > 10]
    if not lines:
        return ""
    if len(lines) <= max_turns:
        return " ".join(lines)

    vectorizer = TfidfVectorizer(stop_words="english")
    X = vectorizer.fit_transform(lines)
    sim = cosine_similarity(X)
    np.fill_diagonal(sim, 0.0)
    graph = nx.from_numpy_array(sim)
    scores = nx.pagerank(graph, max_iter=200)
    ranked = sorted(range(len(lines)), key=lambda i: scores.get(i, 0.0), reverse=True)
    picked = sorted(ranked[:max_turns])
    return " ".join([lines[i] for i in picked])

print("Before normalization:\n", low_text[:250], "\n")
norm_low = normalize_dialogue(low_text)
print("After normalization:\n", norm_low[:250])
print("\nTurn level summary on corrupted text:\n", turn_level_summarize(norm_low, max_turns=3))


### 6.2 Low resource prompting

If you can use an instruction model, you can push it to behave better on noisy input. The key is to add constraints.

We will.
- Ask for short output.
- Ask it to avoid inventing facts.
- Ask it to preserve names.


In [None]:
LOW_RESOURCE_PROMPT = """Summarize the conversation in 1 sentence.
Rules.
1) Do not invent facts.
2) Preserve names exactly as they appear.
3) If the text is noisy, infer only what is obvious."""

if tokenizer is None or model is None:
    print("Model not available, skipping.")
else:
    print(generate_summary_t5(norm_low, LOW_RESOURCE_PROMPT, max_new_tokens=60, temperature=0.0))


## Optional mini dataset hook. Try a real non-English case in two minutes

This workshop is designed to start in English, then transfer the same workflow to a low-resource language.

Below are two quick options.

1. **MiniLux micro-set (synthetic)**. A small set of short Luxembourgish and LU, FR mixed snippets created for teaching. It is intentionally tiny and imperfect, so that you can iterate fast and discuss typical issues, like code-switching, named entities, and spelling variation.

2. **Hugging Face low-resource sample (real text)**. Pull 20 examples from a multilingual summarization dataset and run the same prompt, plus the same evaluation, to see how performance changes outside English.


In [None]:
# Option 1. MiniLux micro-set (synthetic)
# This is only for the workshop. You can replace it with your own low-resource dialogues later.

mini_lux = [
    {
        "id": "lux_001",
        "dialogue": "A: Moien. Hues du Zäit fir e Kaffi?\nB: Jo, mä just zéng Minutten. Ech muss gläich op d'Aarbecht.\nA: Ok. Mir treffen eis beim Gare.\nB: Super, ech kommen direkt.",
        "reference_summary_en": "They agree to meet for a quick coffee at the station before B goes to work."
    },
    {
        "id": "lux_002",
        "dialogue": "A: Wéi war d'Reunioun haut?\nB: Ganz laang. Mir hu just d'Agenda diskutéiert.\nA: An hu mir eng Decisioun?\nB: Nee, mir maachen et nächste Woch nach eng Kéier.",
        "reference_summary_en": "The meeting was long, they only discussed the agenda, and no decision was made."
    },
    {
        "id": "lux_003",
        "dialogue": "A: Kanns du mer de Rapport schécken?\nB: Jo. Ech schécken en elo per Mail.\nA: Merci. Ech muss en nach haut ofginn.\nB: Kloer, ech maachen et direkt.",
        "reference_summary_en": "B will email A the report immediately because A must submit it today."
    },
    {
        "id": "lux_004",
        "dialogue": "A: Ech sinn am Stau op der A6.\nB: Ok, dann fänke mir ouni dech un.\nA: Gitt mir zéng Minutten.\nB: Passt. Mir halen dir e Sëtz fräi.",
        "reference_summary_en": "A is stuck in traffic but will arrive in about ten minutes, and the others will start and save a seat."
    },
    {
        "id": "lux_005",
        "dialogue": "A: Ech hu muer en rendez-vous chez le médecin.\nB: Bass du ok?\nA: Jo, just e Check-up.\nB: Ok, soen mer dono wéi et gaangen ass.",
        "reference_summary_en": "A has a doctor appointment tomorrow for a check-up and will update B afterward."
    },
    {
        "id": "lux_006",
        "dialogue": "A: Wou si mir mam Projet?\nB: Mir hu 80 Prozent fäerdeg.\nA: Wat feelt nach?\nB: D'Dokumentatioun an d'Tester.",
        "reference_summary_en": "The project is about 80 percent done, but documentation and testing are still missing."
    },
    {
        "id": "lux_007",
        "dialogue": "A: Ech kréien ëmmer eng Fehlermeldung.\nB: Wéi eng?\nA: 'Permission denied'.\nB: Dann hues du wahrscheinlech keng Rechter. Probéier et mat sudo oder fro den Admin.",
        "reference_summary_en": "A gets a permission error, and B suggests using sudo or asking the admin for access."
    },
    {
        "id": "lux_008",
        "dialogue": "A: Mir treffen eis um 14:00.\nB: Ech sinn um 14:15 do.\nA: Ok, ech waarden am Café.\nB: Merci. Bis gläich.",
        "reference_summary_en": "They planned to meet at 14:00, but B will arrive at 14:15 and A will wait at a café."
    },
    {
        "id": "lux_009",
        "dialogue": "A: Hues du d'Presentatioun gesinn?\nB: Jo, si ass gutt, mä d'Grafike sinn ze kleng.\nA: Ok, ech maachen se méi grouss.\nB: Super, dann ass et perfekt.",
        "reference_summary_en": "B thinks the presentation is good but the charts are too small, so A will enlarge them."
    },
    {
        "id": "lux_010",
        "dialogue": "A: Ech sinn haut am Homeoffice.\nB: Ok, kënns du trotzdem an de Call?\nA: Jo, ech sinn do um 10:00.\nB: Top, ech schécken de Link.",
        "reference_summary_en": "A works from home but will join the 10:00 call, and B will send the link."
    },
    {
        "id": "lux_011",
        "dialogue": "A: Mir brauche nach e Beispill fir d'Course.\nB: Wat fir ee Beispill?\nA: E klengt Dialog-Set fir Zesummefaassung.\nB: Ok, ech schreiwen 20 kuerz Dialogen.",
        "reference_summary_en": "They need a small dialogue dataset for a summarization course, and B will write 20 short dialogues."
    },
    {
        "id": "lux_012",
        "dialogue": "A: Kanns du den Text nach eng Kéier kontrolléieren?\nB: Jo, ech kucken no Tippfeeler.\nA: An och Punktuatioun.\nB: Maachen ech.",
        "reference_summary_en": "B will proofread the text for typos and punctuation."
    },
    {
        "id": "lux_013",
        "dialogue": "A: Ech hu meng Schlësselen vergiess.\nB: Wou bass du?\nA: Virun der Dier.\nB: Ech kommen, ginn mer fënnef Minutten.",
        "reference_summary_en": "A forgot their keys and is locked out, and B will come in five minutes."
    },
    {
        "id": "lux_014",
        "dialogue": "A: De Bus kënnt net.\nB: Hues du d'App gekuckt?\nA: Jo, et steet 'retard'.\nB: Dann huele mir en Taxi.",
        "reference_summary_en": "The bus is delayed, so they decide to take a taxi."
    },
    {
        "id": "lux_015",
        "dialogue": "A: Ech muss nach d'Fichieren eroplueden.\nB: Wou?\nA: Op Hugging Face.\nB: Ok, vergiss net d'Lizens an d'Readme.",
        "reference_summary_en": "A needs to upload files to Hugging Face, and B reminds them to include a license and README."
    },
    {
        "id": "lux_016",
        "dialogue": "A: D'GPU ass fräi.\nB: Super, dann starte mir den Training.\nA: Ech setzen batch size op 4.\nB: Ok, da maache mir gradient accumulation.",
        "reference_summary_en": "They have GPU availability and will start training with a small batch size and gradient accumulation."
    },
    {
        "id": "lux_017",
        "dialogue": "A: Kanns du mir den Deadline soen?\nB: Et ass Freideg um 18:00.\nA: Merci, ech maachen et haut nach.\nB: Gutt Iddi.",
        "reference_summary_en": "The deadline is Friday at 18:00, and A plans to finish today."
    },
    {
        "id": "lux_018",
        "dialogue": "A: Ech hunn d'Donnéeën gereinegt.\nB: Super. Hues du och d'Nummeren normaliséiert?\nA: Jo, ech hunn se an Wierder ëmgewandelt.\nB: Perfekt.",
        "reference_summary_en": "A cleaned the data and normalized numbers by converting them into words."
    },
    {
        "id": "lux_019",
        "dialogue": "A: Ech verstinn d'Resultater net.\nB: Wat ass komesch?\nA: D'Accuracy ass héich, mä d'F1 ass niddreg.\nB: Dann ass et wahrscheinlech Klassen-Imbalance.",
        "reference_summary_en": "Accuracy is high but F1 is low, suggesting class imbalance."
    },
    {
        "id": "lux_020",
        "dialogue": "A: Tu peux me rappeler le plan?\nB: Oui. D'abord on teste en anglais, après on passe au luxembourgeois.\nA: An de Prompt bleift ähnlech.\nB: Genau.",
        "reference_summary_en": "They will test in English first, then switch to Luxembourgish while keeping a similar prompt."
    },
    {
        "id": "lux_021",
        "dialogue": "A: Ech sinn net sécher ob 'Zentrum' richteg ass.\nB: Et hänkt vum Dialektgebiet of.\nA: Ok, ech kontrolléieren d'Metadata.\nB: Gutt, d'Labels mussen konsistent sinn.",
        "reference_summary_en": "They will verify the metadata because dialect labels must be consistent."
    },
    {
        "id": "lux_022",
        "dialogue": "A: D'Audio ass ze laang.\nB: Wéi laang?\nA: 25 Sekonnen.\nB: Dann schneiden mir et op 10 Sekonnen fir d'Training.",
        "reference_summary_en": "The audio is 25 seconds long, so they will trim it to 10 seconds for training."
    },
    {
        "id": "lux_023",
        "dialogue": "A: Ech hu keng Internet um Laptop.\nB: Probéier d'WLAN nei.\nA: Ok, ech maachen restart.\nB: Wann et net geet, huele mir en Hotspot.",
        "reference_summary_en": "A has no internet, B suggests restarting Wi-Fi, and they may use a hotspot if needed."
    },
    {
        "id": "lux_024",
        "dialogue": "A: D'Zesummefaassung ass ze laang.\nB: Setz eng Limit.\nA: Wéi vill?\nB: Probéier 2 Sätz an maximal 60 Wierder.",
        "reference_summary_en": "They will constrain the summary length to two sentences and at most 60 words."
    },
    {
        "id": "lux_025",
        "dialogue": "A: Ech wëll eng neutral Zesummefaassung.\nB: Da schreiwe mir am Prompt: 'neutral, factual, no opinion'.\nA: Ok, ech testen dat.\nB: Gutt, a kuck ob Bias kënnt.",
        "reference_summary_en": "They want a neutral factual summary and will encode that in the prompt and then test for bias."
    },
]

def sample_and_summarize(dialogue_set, k=1, seed=7, prompt=ZERO_SHOT_PROMPT):
    import random
    random.seed(seed)
    items = random.sample(dialogue_set, k=k)
    for ex in items:
        print("ID:", ex["id"])
        print("\nDIALOGUE:\n", ex["dialogue"])
        pred = generate_summary_t5(ex["dialogue"], prompt=prompt, max_new_tokens=80, temperature=0.0)
        print("\nMODEL SUMMARY:\n", pred)
        print("\nREFERENCE (EN):\n", ex["reference_summary_en"])
        print("\n" + "-"*70 + "\n")

sample_and_summarize(mini_lux, k=2)

# Option 2. Pull a tiny real low-resource sample from Hugging Face
# This uses XL-Sum (multilingual news summarization). Not a dialogue dataset.
# For the workshop, we convert each article into a "pseudo-dialogue" so we can reuse the same pipeline.

from datasets import load_dataset

def article_to_pseudo_dialogue(article_text: str, max_turns: int = 6) -> str:
    # Lightweight sentence split. Good enough for teaching.
    sentences = [s.strip() for s in article_text.replace("\n", " ").split(".") if s.strip()]
    sentences = sentences[:max_turns]
    turns = []
    for i, s in enumerate(sentences):
        speaker = "ANCHOR" if i % 2 == 0 else "REPORTER"
        turns.append(f"{speaker}: {s}.")
    return "\n".join(turns)

def load_low_resource_hf_sample(language_subset: str = "yoruba", n: int = 20):
    ds = load_dataset("csebuetnlp/xlsum", language_subset, split=f"train[:{n}]")
    # XL-Sum fields are typically: "text" and "summary"
    out = []
    for i, ex in enumerate(ds):
        dialogue = article_to_pseudo_dialogue(ex["text"], max_turns=8)
        out.append(
            {
                "id": f"xlsum_{language_subset}_{i:03d}",
                "dialogue": dialogue,
                "reference_summary": ex["summary"],
            }
        )
    return out

xlsum_yoruba = load_low_resource_hf_sample(language_subset="yoruba", n=5)
print("Example pseudo-dialogue from XL-Sum (yoruba subset):")
print(xlsum_yoruba[0]["dialogue"])
print("\nReference summary (yoruba):")
print(xlsum_yoruba[0]["reference_summary"])

print("\nNow run the same English prompt on the pseudo-dialogue. It will usually struggle, and that is the point.")
pred = generate_summary_t5(xlsum_yoruba[0]["dialogue"], prompt=ZERO_SHOT_PROMPT, max_new_tokens=80, temperature=0.0)
print("\nMODEL SUMMARY:\n", pred)


## 7. Challenge. Adapt to your own low resource language

Now you have an English pipeline. The next step is to replace the English dialogue with data from your target language.

If you work on a language with limited resources, use the same structure.
1) Create turns with speaker labels.
2) Normalize and segment.
3) Start with an extractive baseline.
4) Add a multilingual model or a translation pivot only if you need it.
5) Evaluate with a small set of human references.

The next cell includes a ready to use template. It runs as is. Replace `MY_DIALOGUE` with your own data.


In [None]:
MY_DIALOGUE = """SPEAKER1: Replace this with your own dialogue in any language.
SPEAKER2: Keep speaker labels. Keep short lines if possible.
SPEAKER1: Then rerun the cells below."""

clean = normalize_dialogue(MY_DIALOGUE)
summary_baseline = turn_level_summarize(clean, max_turns=3)
print("Baseline summary:\n", summary_baseline)

if tokenizer is not None and model is not None:
    prompt = build_prompt(style="neutral", focus="actions", max_sentences=2)
    summary_llm = generate_summary_t5(clean, prompt, max_new_tokens=80, temperature=0.0)
    print("\nLLM summary:\n", summary_llm)
else:
    print("\nLLM not available. Baseline is your default.")


## 8. Wrap up

You now have a reproducible dialogue summarization pipeline that is usable with.
- No LLM, via TextRank and turn level extraction.
- A small instruction model, via prompt engineering.
- Low resource conditions, via normalization and constraints.

If you want to push further for true low resource languages.
- Swap English stopwords for a custom list, or disable stopwords.
- Use character n gram TF IDF for languages without whitespace.
- Add a small glossary and a retrieval step, then feed only the relevant turns to the model.
- Build a tiny evaluation set, 50 to 200 dialogues with one reference summary each.
