# 🎙️ Context Summarization with Realtime API

Build an end‑to‑end **voice bot** that listens to your mic, speaks back in real time and **summarises long conversations** so quality never drops.

---

## 🏃‍♂️ What You’ll Build
1. **Live microphone streaming** → OpenAI *Realtime* (voice‑to‑voice) endpoint.
2. **Instant transcripts & speech playback** on every turn.
3. **Conversation state container** that stores **every** user/assistant message.
4. **Automatic “context trim”** – when the token window nears 32 k, older turns are compressed into a summary.
5. **Extensible design** you can adapt to support customer‑support bots, kiosks, or multilingual assistants.

---

## 🎯 Learning Objectives
By the end of this notebook you can:

| Skill | Why it matters |
|-------|----------------|
| Capture audio with `sounddevice` | Low‑latency input is critical for natural UX |
| Use WebSockets with the OpenAI **Realtime** API | Streams beats polling for speed & simplicity |
| Track token usage and detect “context pressure” | Prevents quality loss in long chats |
| Summarise & prune history on‑the‑fly | Keeps conversations coherent without manual resets |

---

## 🔧 Prerequisites

| Requirement | Details |
|-------------|---------|
| **Python ≥ 3.10** | `async` / typing improvements |
| **OpenAI API key** | Set `OPENAI_API_KEY` in your shell or paste inline (*not ideal for prod*) |
| Mic + speakers | Grant OS permission if prompted |


**Need help setting up the key?**  
> Follow the [official quick‑start guide](https://platform.openai.com/docs/quickstart#step-2-set-your-api-key).

---

### 🚀 One‑liner install (run in a fresh cell)

In [None]:
# Run once to install or upgrade dependencies (comment out if already installed)
# !pip install --upgrade openai websockets sounddevice simpleaudio

In [1]:
# Essential imports & constants
import os, asyncio, base64, json, sys, itertools
from dataclasses import dataclass, field
from typing import List, Literal

import sounddevice as sd         # microphone capture
import simpleaudio               # speaker playback
import websockets                # WebSocket client
import openai                    # OpenAI Python SDK >= 1.14.0

# Audio/config knobs
SAMPLE_RATE_HZ    = 24_000   # Required by pcm16
CHUNK_DURATION_MS = 40       # ≈ latency granularity
BYTES_PER_SAMPLE  = 2        # pcm16 = 2 bytes/sample
SUMMARY_TRIGGER   = 2_000    # Summarise when context ≥ this
KEEP_LAST_TURNS   = 4        # Keep these turns verbatim
SUMMARY_MODEL     = "gpt-4o-mini"  # Cheaper, fast summariser

In [2]:
# Set your API key safely
openai.api_key = os.getenv("OPENAI_API_KEY", "")
if not openai.api_key:
    raise ValueError("OPENAI_API_KEY not found – please set env var or edit this cell.")

## 2 · Key Concepts Behind the Realtime Voice API

This section gives you the mental model you’ll need before diving into code.  Skim it now; refer back whenever something in the notebook feels “magic”.

---

### 2.1 Realtime vs Chat Completions — Why WebSockets?

|  | **Chat Completions (HTTP)** | **Realtime (WebSocket)** |
|---|---|---|
| Transport | Stateless request → response | Persistent, bi‑directional socket |
| Best for | Plain text or batched jobs | *Live* audio + incremental text |
| Latency model | 1 RTT per message | Sub‑200 ms deltas during one open session |
| Event types | *None* (single JSON) | `session.*`, `input_audio_buffer.append`, `response.*`, … |


**Flow**: you talk ▸ server transcribes ▸ assistant replies ▸ you talk again.  
> Mirrors natural conversation while keeping event handling simple.

---

### 2.2 Audio Encoding Fundamentals

| Parameter | Value | Why it matters |
|-----------|-------|----------------|
| **Format** | PCM‑16 (signed 16‑bit) | Widely supported; no compression delay |
| **Sample rate** | 24 kHz | Required by Realtime endpoint |
| **Chunk size** | ≈ 40 ms | Lower chunk → snappier response ↔ higher packet overhead |

`chunk_bytes  = sample_rate * bytes_per_sample * chunk_duration_s`

---

### 2.3 Context Windows & Token Economics

* GPT‑4o Realtime accepts **up to 128 K tokens** in theory.  
* In practice, answer quality starts to drift around **≈ 32 K tokens**.  
* Every user/assistant turn consumes tokens → the window **only grows**.
* **Fix**: Summarise older turns into a single assistant message, keep the last few verbatim turns, and continue.

---

### 2.4 Conversation State

Instead of scattered globals, the notebook travels with one **state object**:

In [3]:
@dataclass
class Turn:
    """One utterance in the dialogue (user **or** assistant)."""
    role: Literal["user", "assistant"]
    item_id: str                    # Server‑assigned identifier
    text: str | None = None         # Filled once transcript is ready

@dataclass
class ConversationState:
    """All mutable data the session needs — nothing more, nothing less."""
    history: List[Turn] = field(default_factory=list)         # Ordered log
    waiting: dict[str, asyncio.Future] = field(default_factory=dict)  # Pending transcript fetches
    summary_count: int = 0

    latest_tokens: int = 0          # Window size after last reply
    summarising: bool = False       # Guard so we don’t run two summaries at once

Here is a helper to quickly display current conversation history between the user and assistant.

In [4]:
def print_history(state) -> None:
    """Pretty-print the running transcript so far."""
    print("—— Conversation so far ———————————————")
    for turn in state.history:
        text_preview = (turn.text or "").strip().replace("\n", " ")
        print(f"[{turn.role:<9}] {text_preview}  ({turn.item_id})")
    print("——————————————————————————————————————————")

## 3 · Token Utilisation – Text vs Voice

Large‑token windows are precious: every extra token you burn costs latency + money.  
For **audio** the bill climbs much faster than for plain text because amplitude, timing, and other acoustic details must be represented.

*Rule of thumb*: **1 word of text ≈ 1 token**, but **1 second of 24‑kHz PCM‑16 ≈ ~150 audio tokens**.  
In practice you’ll often see **≈ 10 ×** more tokens for the *same* sentence spoken aloud than typed.

---

### 3.1 Hands‑on comparison 📊

The cell below:

1. **Sends `TEXT` to Chat Completions** → reads `prompt_tokens`.  
2. **Turns the same `TEXT` into speech** with TTS.  
3. **Feeds the speech back into the Transcription endpoint** → reads `total_tokens`.  
4. Prints a ratio so you can see the multiplier on *your* hardware / account.

**Note**  
> * Transcription counts *audio tokens*, not text tokens.  
> * Network latency depends on audio length. Keep the sample short for a snappy demo.

In [16]:
# ╔══════════════════════════════════════════════════════════════════╗
# ║ 3 · Token Utilisation – Text vs Voice (Realtime S2S demo)        ║
# ║ Requires: pip install openai websockets soundfile resampy        ║
# ╚══════════════════════════════════════════════════════════════════╝

import asyncio, base64, io, json, os, sys, wave, pathlib
from typing import List

import numpy as np, soundfile as sf, resampy, websockets, openai

# ─── Config ──────────────────────────────────────────────────────────────
TEXT = (
    "Hello there, I am measuring tokens for text versus voice because we want to better compare the number of tokens used when sending a message as text versus when converting it to speech.."
)
CHAT_MODEL  = "gpt-4o-mini"
TTS_MODEL   = "gpt-4o-mini-tts"
RT_MODEL    = "gpt-4o-realtime-preview"          # S2S model
VOICE       = "shimmer"

TARGET_SR   = 24_000
PCM_SCALE   = 32_767
CHUNK_MS    = 120                                # stream step

if not os.getenv("OPENAI_API_KEY"):
    sys.exit("❌  Please export OPENAI_API_KEY before running.")

HEADERS = {
    "Authorization": f"Bearer {openai.api_key}",
    "OpenAI-Beta":   "realtime=v1",
}

show = lambda l, v: print(f"{l:<28}: {v}")

# ─── Helpers ─────────────────────────────────────────────────────────────
def float_to_pcm16(x: np.ndarray) -> bytes:
    return (np.clip(x, -1, 1) * PCM_SCALE).astype("<i2").tobytes()

def chunk_pcm(pcm: bytes, ms: int = CHUNK_MS) -> List[bytes]:
    step = TARGET_SR * 2 * ms // 1000
    return [pcm[i:i + step] for i in range(0, len(pcm), step)]

# ─── 1 · Count text tokens ──────────────────────────────────────────────
chat = openai.chat.completions.create(
    model=CHAT_MODEL,
    messages=[{"role": "user", "content": TEXT}],
    max_tokens=1,
    temperature=0,
)
text_tokens = chat.usage.prompt_tokens
show("📄 Text prompt tokens", text_tokens)

# ─── 2 · Synthesis to WAV & PCM16 ───────────────────────────────────────
wav_bytes = openai.audio.speech.create(
    model=TTS_MODEL, input=TEXT, voice=VOICE, response_format="wav"
).content

with wave.open(io.BytesIO(wav_bytes)) as w:
    pcm_bytes = w.readframes(w.getnframes())
duration_sec = len(pcm_bytes) / (2 * TARGET_SR)
show("🔊 Audio length (s)", f"{duration_sec:.2f}")


📄 Text prompt tokens        : 42
🔊 Audio length (s)          : 10.90


In [19]:
# ─── 3 · Realtime streaming & token harvest ─────────────────────────────
async def count_audio_tokens(pcm: bytes) -> int:
    url = f"wss://api.openai.com/v1/realtime?model={RT_MODEL}"
    chunks = chunk_pcm(pcm)

    async with websockets.connect(url, extra_headers=HEADERS,
                                  max_size=1 << 24) as ws:

        # Wait for session.created
        while json.loads(await ws.recv())["type"] != "session.created":
            pass

        # Configure modalities + voice
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["audio", "text"],
                "voice": VOICE,
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {"model": "gpt-4o-transcribe"},
            }
        }))

        # Stream user audio chunks (no manual commit; server VAD handles it)
        for c in chunks:
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": base64.b64encode(c).decode(),
            }))

        async for raw in ws:
            ev = json.loads(raw)
            t = ev.get("type")

            if t == "response.done":
                print(ev)
                return ev["response"]["usage"]\
                         ["input_token_details"]["audio_tokens"]

audio_tokens = await count_audio_tokens(pcm_bytes)
show("🎤 Audio input tokens", audio_tokens)

# ─── 4 · Comparison ─────────────────────────────────────────────────────
ratio = audio_tokens / text_tokens if text_tokens else float("inf")
show("⚖️  Audio/Text ratio", f"{ratio:.1f}×")
print(f"\n≈{int(audio_tokens/duration_sec)} audio‑tokens / sec vs ≈1 token / word.")

{'type': 'response.done', 'event_id': 'event_BTt4GCm5hrzW0tyejZTFI', 'response': {'object': 'realtime.response', 'id': 'resp_BTt4A0rNrAbJjcPWAAwzY', 'status': 'completed', 'status_details': None, 'output': [{'id': 'item_BTt4As3XS4ZhxsRHhMGzQ', 'object': 'realtime.item', 'type': 'message', 'status': 'completed', 'role': 'assistant', 'content': [{'type': 'audio', 'transcript': "Sure thing! When you're measuring tokens, keep in mind that text and voice can indeed differ in token usage. For text, tokens are counted based on words, punctuation, and formatting, while for voice, the conversion to speech may lead to more tokens due to the nuances of pronunciation and emphasis. Comparing them will help you optimize your communication strategy. How can I assist you further with this?"}]}], 'conversation_id': 'conv_BTt48QSC6gI4XX9ZlDRVU', 'modalities': ['text', 'audio'], 'voice': 'shimmer', 'output_audio_format': 'pcm16', 'temperature': 0.8, 'max_output_tokens': 'inf', 'usage': {'total_tokens': 8

This small example is using a short toy example. When you have large transcripts, the discrepancy text tokens versus voice tokens increases.


---

## 3 · Streaming Audio: Mic → WebSocket → OpenAI

### 3.1 Capture Microphone Input

In [4]:
async def mic_to_queue(pcm_queue: asyncio.Queue[bytes]):
    """Capture raw PCM‑16 audio and push 40 ms chunks to an async queue."""
    blocksize = int(SAMPLE_RATE_HZ * CHUNK_DURATION_MS / 1000)

    def _callback(indata, _frames, _time, status):
        if status:
            print(status, file=sys.stderr)
        try:
            pcm_queue.put_nowait(bytes(indata))
        except asyncio.QueueFull:
            pass  # drop if network slower than mic

    with sd.RawInputStream(
        samplerate=SAMPLE_RATE_HZ,
        blocksize=blocksize,
        dtype="int16",
        channels=1,
        callback=_callback,
    ):
        await asyncio.Event().wait()  # run until cancelled

### 3.2 Send Audio Chunks to the API

In [5]:
b64 = lambda blob: base64.b64encode(blob).decode()

async def queue_to_websocket(pcm_queue: asyncio.Queue[bytes], ws):
    """Read audio chunks from queue and send as JSON events."""
    try:
        while (chunk := await pcm_queue.get()) is not None:
            await ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": b64(chunk),
            }))
    except websockets.ConnectionClosed:
        print("WebSocket closed – stopping uploader")

### 3.3 Handle Incoming Events 
A conversation round‑trip produces these notable events:

| Event Type | When It Arrives | Purpose |
|-----------------------------------------|----------------------------|----------------------------------------------------------|
| session.created | upon connection | confirms handshake & auth |
| conversation.item.created (user) | after user finishes speaking | placeholder user item; may lack transcript initially |
| conversation.item.retrieved | when transcript ready | fills missing user text |
| response.audio.delta | streaming | assistant audio bytes (incremental) |
| response.done | assistant finished | final assistant text & usage metrics |
| conversation.item.deleted | after pruning | acknowledges deletion |

## 4 · Dynamic Context Management & Summarisation

### 4.1 Detect When to Summarise
We monitor latest_tokens returned in response.done. When it exceeds SUMMARY_TRIGGER and we have more than KEEP_LAST_TURNS, we spin up a background summarisation coroutine.

### 4.2 Generate & Insert a Summary
We will be summarizing the conversation messages up to N-4 into french. We will later ask the Voice agent what language was the summary to test if the Summary insertion into Realtime API Conversation Context was successfull.

In [7]:
async def run_summary_llm(text: str) -> str:
    """Call a lightweight model to summarise `text`."""
    resp = await asyncio.to_thread(lambda: openai.chat.completions.create(
        model=SUMMARY_MODEL,
        temperature=0,
        messages=[
            {"role": "system", "content": "Summarise in French the following conversation "
                            "in one concise paragraph so it can be used as "
                            "context for future dialogue."},
            {"role": "user", "content": text},
        ],
    ))
    return resp.choices[0].message.content.strip()

In [16]:
async def summarise_and_prune(ws, state):
    """Summarise old turns, delete them server‑side, and prepend a single summary
    turn locally + remotely."""
    state.summarising = True
    print(
        f"⚠️  Token window ≈{state.latest_tokens} ≥ {SUMMARY_TRIGGER}. Summarising…",
    )
    old_turns, recent_turns = state.history[:-KEEP_LAST_TURNS], state.history[-KEEP_LAST_TURNS:]
    convo_text = "\n".join(f"{t.role}: {t.text}" for t in old_turns if t.text)
    
    if not convo_text:
        print("Nothing to summarise (transcripts still pending).")
        state.summarising = False

    summary_text = await run_summary_llm(convo_text) if convo_text else ""
    state.summary_count += 1
    summary_id = f"sum_{state.summary_count:03d}"
    state.history[:] = [Turn("assistant", summary_id, summary_text)] + recent_turns
    
    print_history(state)    

    # Create summary on server
    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "previous_item_id": "root",
        "item": {
            "id": summary_id,
            "type": "message",
            "role": "assistant",
            "content": [{"type": "text", "text": summary_text}],
        },
    }))

    # Delete old items
    for turn in old_turns:
        await ws.send(json.dumps({
            "type": "conversation.item.delete",
            "item_id": turn.item_id,
        }))

    print(f"✅ Summary inserted ({summary_id})")
    
    state.summarising = False

In [9]:
async def fetch_full_item(
    ws, item_id: str, state: ConversationState, attempts: int = 1
):
    """
    Ask the server for a full conversation item; retry up to 5× if the
    transcript field is still null.  Resolve the waiting future when done.
    """
    # If there is already a pending fetch, just await it
    if item_id in state.waiting:
        return await state.waiting[item_id]

    fut = asyncio.get_running_loop().create_future()
    state.waiting[item_id] = fut

    await ws.send(json.dumps({
        "type": "conversation.item.retrieve",
        "item_id": item_id,
    }))
    item = await fut

    # If transcript still missing retry (max 5×)
    if attempts < 5 and not item.get("content", [{}])[0].get("transcript"):
        await asyncio.sleep(0.4 * attempts)
        return await fetch_full_item(ws, item_id, state, attempts + 1)

    # Done – remove the marker
    state.waiting.pop(item_id, None)
    return item


## 5 · End‑to‑End Workflow Demonstration

Run the two cells below to launch an interactive session. Press Ctrl‑C to stop recording.

In [13]:
# --------------------------------------------------------------------------- #
# 🎤 Realtime session                                                          #
# --------------------------------------------------------------------------- #
async def realtime_session(model="gpt-4o-realtime-preview", voice="shimmer", enable_playback=True):
    """
    Main coroutine: connects to the Realtime endpoint, spawns helper tasks,
    and processes incoming events in a big async‑for loop.
    """
    state = ConversationState()  # Reset state for each run

    pcm_queue: asyncio.Queue[bytes] = asyncio.Queue()
    assistant_audio: List[bytes] = []

    # ----------------------------------------------------------------------- #
    # Open the WebSocket connection to the Realtime API                       #
    # ----------------------------------------------------------------------- #
    url = f"wss://api.openai.com/v1/realtime?model={model}"
    headers = {"Authorization": f"Bearer {openai.api_key}", "OpenAI-Beta": "realtime=v1"}

    async with websockets.connect(url, extra_headers=headers, max_size=1 << 24) as ws:
        # ------------------------------------------------------------------- #
        # Wait until server sends session.created                             #
        # ------------------------------------------------------------------- #
        while json.loads(await ws.recv())["type"] != "session.created":
            pass
        print("session.created ✅")

        # ------------------------------------------------------------------- #
        # Configure session: voice, modalities, audio formats, transcription  #
        # ------------------------------------------------------------------- #
        await ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "voice": voice,
                "modalities": ["audio", "text"],
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {"model": "gpt-4o-transcribe"},
            },
        }))

        # ------------------------------------------------------------------- #
        # Launch background tasks: mic capture → queue → websocket            #
        # ------------------------------------------------------------------- #
        mic_task = asyncio.create_task(mic_to_queue(pcm_queue))
        upl_task = asyncio.create_task(queue_to_websocket(pcm_queue, ws))

        print("🎙️ Speak now (Ctrl‑C to quit)…")

        try:
            # ------------------------------------------------------------------- #
            # Main event loop: process incoming events from the websocket         #
            # ------------------------------------------------------------------- #
            async for event_raw in ws:
                event = json.loads(event_raw)
                etype = event["type"]

                # --------------------------------------------------------------- #
                # User just spoke ⇢ conversation.item.created (role = user)        #
                # --------------------------------------------------------------- #
                if etype == "conversation.item.created" and event["item"]["role"] == "user":
                    item = event["item"]
                    text = None
                    if item["content"]:
                        text = item["content"][0].get("transcript")
                    
                    state.history.append(Turn("user", event["item"]["id"], text))
                    
                    # If transcript not yet available, fetch it later
                    if text is None:
                        asyncio.create_task(fetch_full_item(ws, item["id"], state))

                # --------------------------------------------------------------- #
                # Transcript fetched ⇢ conversation.item.retrieved                 #
                # --------------------------------------------------------------- #
                elif etype == "conversation.item.retrieved":
                    content = event["item"]["content"][0]
                    # Fill missing transcript in history
                    for t in state.history:
                        if t.item_id == event["item"]["id"]:
                            t.text = content.get("transcript")
                            break

                # --------------------------------------------------------------- #
                # Assistant audio arrives in deltas                               #
                # --------------------------------------------------------------- #
                elif etype == "response.audio.delta":
                    assistant_audio.append(base64.b64decode(event["delta"]))

                # --------------------------------------------------------------- #
                # Assistant reply finished ⇢ response.done                        #
                # --------------------------------------------------------------- #
                elif etype == "response.done":
                    for item in event["response"]["output"]:
                        if item["role"] == "assistant":
                            txt = item["content"][0]["transcript"]
                            state.history.append(Turn("assistant", item["id"], txt))
                            # print(f"\n🤖 {txt}\n")
                    state.latest_tokens = event["response"]["usage"]["total_tokens"]
                    print(f"—— response.done  (window ≈{state.latest_tokens} tokens) ——")
                    print_history(state)
                    
                    # Fetch any still‑missing user transcripts
                    for turn in state.history:
                        if (turn.role == "user"
                            and turn.text is None
                            and turn.item_id not in state.waiting):
                            asyncio.create_task(
                                fetch_full_item(ws, turn.item_id, state)
                            )

                    # Playback collected audio once reply completes
                    if enable_playback and assistant_audio:
                        simpleaudio.play_buffer(b"".join(assistant_audio), 1, BYTES_PER_SAMPLE, SAMPLE_RATE_HZ)
                        assistant_audio.clear()

                    # Summarise if context too large – fire in background so we don't block dialogue
                    if state.latest_tokens >= SUMMARY_TRIGGER and len(state.history) > KEEP_LAST_TURNS and not state.summarising:
                        asyncio.create_task(summarise_and_prune(ws, state))

        except KeyboardInterrupt:
            print("\nStopping…")
        finally:
            mic_task.cancel()
            await pcm_queue.put(None)
            await upl_task

In [None]:
# Run the realtime session (this cell blocks until you stop it)
await realtime_session()

```raw
🎙️ Speak now (Ctrl‑C to quit)…
—— response.done  (window ≈228 tokens) ——
—— Conversation so far ———————————————
[user     ]   (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
——————————————————————————————————————————
—— response.done  (window ≈0 tokens) ——
—— Conversation so far ———————————————
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
——————————————————————————————————————————
—— response.done  (window ≈422 tokens) ——
—— Conversation so far ———————————————
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ]   (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
——————————————————————————————————————————
—— response.done  (window ≈1897 tokens) ——
—— Conversation so far ———————————————
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ] Can you tell me a joke?  (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
[user     ]   (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
——————————————————————————————————————————
—— response.done  (window ≈2138 tokens) ——
—— Conversation so far ———————————————
[user     ] Hey, what's up?  (item_BTfKeRGKfjQ976Ojmgpl6)
[assistant] Hey there! Not much, just here to help out. What's up with you?  (item_BTfKeuSJAlvr8WMqewiOo)
[user     ]   (item_BTfMI1PSozC8zYfxBGDEA)
[user     ] Can you tell me a joke?  (item_BTfMIbULFByNpzbBMjP18)
[assistant] Sure thing! Why don't scientists trust atoms? Because they make up everything!  (item_BTfMI7oH0KvSsGxEfSOTP)
[user     ] Hey, یہ 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ]   (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
——————————————————————————————————————————
—— Conversation so far ———————————————
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant répond avec une blague sur les atomes, soulignant son rôle d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, یہ 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
——————————————————————————————————————————
—— response.done  (window ≈0 tokens) ——
—— Conversation so far ———————————————
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant répond avec une blague sur les atomes, soulignant son rôle d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, یہ 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
——————————————————————————————————————————
—— response.done  (window ≈0 tokens) ——
—— Conversation so far ———————————————
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant répond avec une blague sur les atomes, soulignant son rôle d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, یہ 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
——————————————————————————————————————————
—— response.done  (window ≈2082 tokens) ——
—— Conversation so far ———————————————
[assistant] Dans cette conversation, l'utilisateur commence par saluer et demander de l'aide, puis il demande une blague. L'assistant répond avec une blague sur les atomes, soulignant son rôle d'aide et d'interaction amicale.  (sum_001)
[user     ] Hey, یہ 500 word story?  (item_BTfOaHtFgPzBAcUJiZ6Jp)
[assistant] Once upon a time, in a cozy little village, there lived a baker named Lucy. Known for her magical touch with pastries, Lucy dreamed of creating a pie so extraordinary, it would put their village on the map.  One day, news of a royal pie contest reached the village. The winner would earn a place in the royal kitchen, and Lucy knew this was her chance. She experimented day and night, seeking the perfect recipe.  Finally, she crafted a pie with a golden crust, filled with enchanted berries that shimmered. The day of the contest arrived, and Lucy's pie dazzled the judges, winning first place!  Her victory brought fame to the village, and Lucy's bakery became a beloved destination for all. And so, Lucy's dream came true, one delicious pie at a time.  (item_BTfOaW9YNEXg1c7jAVP71)
[user     ] Any other funny things you can tell me?  (item_BTfQYIvLsqARzsSwUF5Wv)
[assistant] Absolutely! How about this: Why did the scarecrow win an award? Because he was outstanding in his field!  (item_BTfQYoNSn0Lv33LMrWle5)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
[user     ]   (item_BTfTV6S7x7gTfHgBhZFst)
[assistant] The language of the first summary of our conversation was French.  (item_BTfTVSiLtjisYYInKT40R)
——————————————————————————————————————————
—— Conversation so far ———————————————
[assistant] Dans cette conversation, l'utilisateur demande de l'aide et une blague, tandis que l'assistant répond avec une blague sur les atomes et une autre sur un épouvantail, montrant son rôle d'interaction amicale et humoristique. L'utilisateur évoque également une histoire de 500 mots sur une boulangère nommée Lucy, qui remporte un concours de tartes, ce qui apporte la renommée à son village.  (sum_002)
[user     ]   (item_BTfTVqDdNbi2X05U8rHIs)
[user     ]   (item_BTfTVHuCRBUzG82xQlx1o)
[user     ] The summary of our conversation  (item_BTfTV6S7x7gTfHgBhZFst)
[assistant] The language of the first summary of our conversation was French.  (item_BTfTVSiLtjisYYInKT40R)
```

---

## 6 · Real‑World Applications & Extension Ideas
- **Customer‑support voicebots:** Summaries enable seamless hand‑off to human agents while preserving privacy.
- **Multilingual assistants:** Swap `SUMMARY_MODEL` and system prompt to translate & condense context across languages.
- **Accessibility tools:** Real‑time captioning plus summarised notes for hearing‑impaired users.
- **Embedded devices:** Edge streaming with local VAD to conserve data.

> **Extension Challenge:** Integrate a browser‑based UI with Web Speech API so users need no Python.


## Next Steps & Further Reading