release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS) by nicolotognoni · Pull Request #81 · PatterAI/Patter

nicolotognoni · 2026-04-29T11:10:47Z

Summary

0.5.5 lands four user-visible additions that target the first-token-to-first-audio (TTFA) path in pipeline mode, plus the multilingual / Italian polish for the sentence chunker. All changes are additive or opt-in — existing 0.5.4 callers keep their current behaviour unchanged.

The release is grounded in a long-form research pass (ElevenLabs latency posts, LiveKit Agents, Pipecat, Cartesia, Daily, Vapi/Retell production benchmarks) and a follow-up review by 11 parallel review agents — all CRITICAL / HIGH findings are folded into the per-feature commits.

Commits

Commit	Scope
`aa252fa`	`test(parity)` — 61-case cross-SDK fixture + standalone Py↔TS runner
`c729a8d`	`feat(chunker)` — IT/EN abbreviations, multilingual terminators, opt-in aggressive first-flush
`184f820`	`feat(hooks)` — `after_llm` 3-tier API + deprecated legacy callable adapter
`e3b7dd2`	`feat(tts)` — `ElevenLabsWebSocketTTS` opt-in low-latency variant
`4e7eb06`	`release: 0.5.5` — version bump + CHANGELOG
`5ed6b5c`	`docs` — Mintlify pages for the four additions

Highlights

Sentence chunker

35+ new abbreviations: EN (vs, etc, Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq) and IT (Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill + ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed).
Multilingual terminator support: ASCII semicolon, Unicode ellipsis, full-width Japanese, Hindi/Devanagari (। ॥), Arabic (؟ ؛ ۔ ؏), Khmer (។ ៕), Burmese (။), Armenian (։), Ethiopic (። ፧), Tibetan (༎ ༏).
All-caps name flush bug fix (Pipecat #1692): "...with RAMESH." no longer sits in the buffer forever.
Suffix-followed-by-starter pattern preserves the trailing period (Patter Inc. He left. keeps Inc.).

Opt-in `aggressiveFirstFlush`

New Agent.aggressive_first_flush: bool = False (Python) / AgentOptions.aggressiveFirstFlush?: boolean (TypeScript). Default OFF.
When enabled, emits the first clause of each turn on a soft punctuation boundary (,, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence.
Italian (language="it") hard-disables the feature regardless of caller preference (decimal-comma + dot-thousands inversion would split mid-number).
8 guards prevent regressions on decimals, thousands separators, currency, JSON, ellipsis, open delimiters, comma-before-quote, sub-token ambiguity.

`after_llm` 3-tier API

New shape: { onChunk, onSentence, onResponse }. Each tier optional, sync or async accepted.
- onChunk (sync, ~0 ms) — per-token transform applied inline.
- onSentence (async, 50–300 ms) — runs between chunker and TTS. Returning null keeps original; "" drops the sentence.
- onResponse (async, 500 ms – 2 s) — full-response rewrite. Blocks streaming TTS. Use only when sentence-level rewrite is insufficient.
Legacy callable (text, ctx) => str still works → mapped to onResponse with one-shot PatterDeprecationWarning (Python — subclass of both DeprecationWarning and UserWarning so it surfaces by default in library code) or console.warn (TypeScript). Removal scheduled for v0.7.0.

`ElevenLabsWebSocketTTS`

New opt-in TTS class targeting wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input instead of HTTP /stream. Saves ~50 ms HTTP setup and avoids TLS cold-start per utterance.
Drop-in API: same synthesize() / synthesizeStream() signature, same for_twilio() / for_telnyx() factories, same default model eleven_flash_v2_5.
auto_mode=true default. inactivity_timeout=60 default. eleven_v3* rejected with a clear error.
Per-utterance lifecycle (per-session pooling on the roadmap).
Resilience hardening: 5 s connect timeout, 30 s per-frame timeout, raises ElevenLabsTTSError on server error, best-effort EOS in finally, audio frame size cap 512 KB, log-string sanitisation, api_key private with read-only property, all WS listeners removed in finally.
The HTTP ElevenLabsTTS class is untouched — both transports coexist.

Cross-SDK parity infrastructure

New tests/parity/sentence_chunker_parity.py runner + 61-case fixture covering EN / IT / CJK / Hindi / Arabic / Khmer / Burmese / Armenian / Ethiopic scripts. Verifies Python and TypeScript chunkers produce identical sentence streams.

Backward compatibility

Zero breaking changes for 0.5.4 callers. The chunker's expanded terminator set may emit slightly different sentence boundaries on responses that previously relied on the old behaviour (e.g. text containing Hindi । now flushes correctly), but the cross-SDK parity fixture documents every behaviour change.

Test plan

Python unit + integration green: 1011 PASS (unit) + 53 PASS (integration), 7 skipped
TypeScript unit green: 1163 PASS / 67 files (e2e Playwright excluded — pre-existing gating)
Cross-SDK chunker parity: 53 PASS / 8 XFAIL (documented quirks/regressions) / 0 FAIL on the 61-case fixture
TS build green (cjs + esm + dts)
All examples in examples/ import cleanly; no deprecation warnings on import
11 parallel review agents — all CRITICAL / HIGH findings folded into the per-feature commits
Live Twilio TTFA benchmark (deferred to a follow-up — see TODO)

Follow-ups (deferred to separate PRs)

Phase 4 — parallel TTS queue with N+1 prefetch (10 invariants + 14 race-condition tests required)
pipecat-ai/smart-turn integration (BSD-2, 23 languages, 8 MB ONNX) as a TurnAnalyzer sitting above Silero VAD
NLTK Punkt Italian + MarkdownTextFilter + SkipTagsAggregator (Pipecat ports, M-effort)
Live bench with the env-configured Twilio number to validate the latency claims in the CHANGELOG
elevenlabsWs(...) factory string helper for parity with elevenlabs(...)
Residual MEDIUM/LOW polish from the 11-agent review (regex pre-compilation, chunk_length_schedule validation, NOTICE.md attribution, etc.)

Add a 61-case fixture documenting expected sentence-chunker output for every supported edge case across English, Italian, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Each case carries the ideal `expected_sentences` plus an optional `current_behavior` field that documents known regressions / by-design quirks so the runner can xfail them without blocking CI. Standalone runner (`sentence_chunker_parity.py`) executes each case through the Python `SentenceChunker`, spawns `node sentence_chunker_shim.js` for the TypeScript equivalent, and compares emissions case-by-case. Self-contained — does not depend on the main `tests/parity/run.py` runner (which currently fails on the recent `patter` -> `getpatter` package rename). Result on the current main branch: 53 PASS / 8 XFAIL / 0 FAIL / 0 PARITY_FAIL — Python and TypeScript chunkers produce identical sentence streams for every covered case.

…ive first-flush Three layered improvements to ``SentenceChunker`` (parity Py↔TS), all additive — no breaking change to the default behaviour: **Italian + English abbreviations** (Phase 1, 7) * Prefix list adds Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill (Italian honorifics) plus Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm (Pipecat NLTK Punkt). * Suffix list adds ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed (Italian) plus vs, etc, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq (Pipecat). * Suffix-followed-by-starter pattern now preserves the trailing period (e.g. ``Patter Inc. He left.`` keeps ``Inc.`` instead of dropping it). * All-caps name fix (Pipecat #1692): the maybe-short-flush gate-5 acronym guard previously blocked any uppercase-preceded period, so ``"...with RAMESH."`` would never flush. Now only purely uppercase ASCII words ≤3 chars (U/US/USA/NATO patterns) are treated as acronyms. **Multilingual terminator support** (Phase 7) * Added ASCII semicolon ``;``, Unicode ellipsis ``…``, full-width semicolon/period/Japanese half-width to the terminator set. * Ported Pipecat's ``UNAMBIGUOUS_NON_LATIN_TERMINATORS`` (BSD-2): Hindi Devanagari ``। ॥``, Arabic ``؟ ؛ ۔ ؏``, Khmer ``។ ៕``, Burmese ``။``, Armenian ``։``, Ethiopic ``። ፧``, Tibetan ``༎ ༏``. * Final ``<stop>`` regex builds its character class from the merged set. **Opt-in aggressive first-clause flush** (Phase 2) * New constructor option ``aggressive_first_flush`` (Python) / ``aggressiveFirstFlush`` (TypeScript). **Default OFF.** * When enabled, emits the first clause of the response on a soft punctuation boundary (``,``, em-dash, en-dash) once the buffer reaches ``aggressive_first_min_len`` (default 40 chars). Saves 200–500 ms TTFA on the first sentence of each turn. * Eight guards prevent regressions on the safe-but-aggressive path: min-length, decimal-comma (``3,14``), thousands-separator (``1,000,000``), currency (``$1,000``, ``€1.000,50``), balanced parens/brackets/braces/double-quotes (protects JSON), ellipsis (``...``, ``…``), comma-before-quote, sub-token ambiguity (requires one char after the terminator). * Italian (``language="it"``) hard-disables the feature regardless of caller preference — Italian inverts EN convention (``,`` decimal, ``.`` thousands), so a comma-flush would split mid-number. * New ``Agent.aggressive_first_flush: bool = False`` field on Python ``Agent`` model. TypeScript ``AgentOptions.aggressiveFirstFlush`` is shipped in the after_llm 3-tier commit alongside the rest of the ``types.ts`` surface. Test coverage: +11 Python unit tests + +11 TypeScript unit tests for the aggressive first-flush feature + parity-fixture cases for RAMESH, Hindi danda, Arabic question mark, ASCII semicolon, Unicode ellipsis, vs./etc./Gen./Sen. abbreviations. Sentence-chunker constants and abbreviation lists ported from Pipecat (BSD-2-Clause, Daily) and from the LiveKit-derived regex base (Apache-2.0).

…pter The ``after_llm`` pipeline hook used to be a single callable ``(text, ctx) → str`` that received the full LLM response only after the stream completed. Buffering the entire response added 500 ms – 2 s of TTFA for any agent that configured the hook. This commit introduces a 3-tier API that lets callers pick the right latency budget for their transform: * ``onChunk`` (sync, ~0 ms) — per-token transform applied inline before the stream-handler ever sees the token. Use for: regex replace, markdown strip, profanity char-swap. Does NOT block streaming. * ``onSentence`` (async, 50–300 ms) — runs between the sentence chunker and TTS. Returns rewritten sentence, ``null`` to keep the original, ``""`` to drop the sentence. Use for: PII redaction, persona overlay, refusal swap. Adds latency only on the rewritten sentence, not the full turn. * ``onResponse`` (async, 500 ms – 2 s) — full-response rewrite that buffers the LLM stream then runs once. **Blocks streaming TTS.** Use only when sentence-level rewrite is insufficient (e.g. structured output validation that needs the full text). Backward compatibility ---------------------- The legacy single callable ``afterLlm: (text, ctx) => string`` still works and is mapped to ``onResponse`` with a one-shot ``PatterDeprecationWarning`` (Python — subclass of both ``DeprecationWarning`` and ``UserWarning`` so it surfaces by default in library code) or ``console.warn`` (TypeScript). Removal scheduled for v0.7.0. Detection in TypeScript uses ``typeof hook === 'function'`` (not ``hook.length`` arity sniffing — that pattern breaks under minifiers and arrow defaults). Detection in Python uses ``callable(hook)`` plus ``_has_tier_attrs(hook)`` to disambiguate from object-form hooks. Wire-up ------- * ``llm_loop.py`` / ``llm-loop.ts`` — ``has_after_llm_response`` (and the legacy callable that maps to it) gates token buffering. ``has_after_llm_chunk`` triggers per-token transform inline before yield. * ``stream_handler.py`` / ``stream-handler.ts`` — applies ``has_after_llm_sentence`` between the chunker emit and the TTS synthesise call. Both the streaming-LLM path and the non-streaming ``_speakFinalResponse`` path apply the hook for parity. * The same ``stream_handler`` change wires ``Agent.aggressive_first_flush`` / ``AgentOptions.aggressiveFirstFlush`` into the chunker constructor (Phase 2 wire-up that needed ``stream_handler`` and ``types.ts`` to land here alongside the hook changes — separating them would have required interactive patch staging on the same hunks). Test coverage ------------- * +11 Python pytest cases under ``TestAfterLlmThreeTier`` covering: no hook pass-through, legacy callable maps to ``on_response`` with deprecation warning, dict / Protocol / object forms, drop-by-empty, fail-open on hook exception, type confusion (non-string return), legacy alias methods (``has_after_llm`` / ``run_after_llm``) preserved. * +9 TypeScript Vitest cases covering the equivalent surface.

New TTS provider that targets ElevenLabs' streaming-input WebSocket endpoint (``/v1/text-to-speech/{voice}/stream-input``) instead of the HTTP ``/stream`` endpoint used by ``ElevenLabsTTS``. Saves ~50 ms HTTP request setup per utterance and avoids the TLS cold-start handshake on bursty calls. Drop-in API matching ``ElevenLabsTTS``: * Same ``synthesize`` (Python) / ``synthesizeStream`` (TypeScript) signature returning ``AsyncGenerator<bytes>``. * Same ``for_twilio()`` / ``for_telnyx()`` factories. * Same default model ``eleven_flash_v2_5``. * Top-level export ``getpatter.ElevenLabsWebSocketTTS`` (Py) / ``import { ElevenLabsWebSocketTTS } from "getpatter"`` (TS). Defaults -------- * ``auto_mode=true`` — server picks chunk timing. * ``inactivity_timeout=60`` (range 5–180). * Per-utterance lifecycle. Documented as a known trade-off vs Pipecat's per-session pool (pooling is on the roadmap for v0.6.x). * ``eleven_v3*`` is rejected at construction with a clear error — the WS stream-input endpoint does not support v3; users must fall back to the HTTP class. Resilience contract (post-review hardening) ------------------------------------------- * **Connect timeout 5 s** (Pipecat-aligned, was 15 s in earlier drafts) bounds DNS + TLS handshake. * **Per-frame receive timeout 30 s** prevents the generator hanging forever on a stalled server. * **Permanent error handler attached BEFORE the open await** — closes a window where an error fired after the once-listener resolved would surface as ``uncaughtException`` in Node. * **All ws listeners removed in ``finally``** — no closure leak past socket close. * **Server ``error`` raises ``ElevenLabsTTSError``** instead of silently completing — caller can distinguish "synthesis succeeded with empty text" from "synth failed mid-stream". * **Best-effort EOS ``{"text":""}`` in ``finally``** — tells ElevenLabs to stop billing for unconsumed audio. Sending it immediately after ``flush:true`` (the previous draft) risked truncating tail audio under ``auto_mode=true``. * **Audio frame size cap 512 KB** prevents OOM via malicious / malformed base64 (real frames are ~75 KB decoded). * **Server error string sanitised** before logging (strips CR/LF/NUL, truncates to 200 chars) — defends against log-line injection. * **``api_key`` private** (``_api_key`` + read-only ``api_key`` property) so ``vars(tts)`` / dataclass-style introspection cannot surface the secret. * **``eleven_v3`` prefix-based reject** also blocks ``eleven_v3_preview``, ``eleven_v3_alpha``. * **Public wrapper exposes the full options surface** (``voice_settings``, ``language_code``, ``inactivity_timeout``, ``chunk_length_schedule``) — earlier drafts dropped them. * **Default voice consistency**: the public wrapper no longer overrides the provider class default — both layers use Rachel (``21m00Tcm4TlvDq8ikWAM``) so direct-construct and wrapped-construct paths agree. Public surface -------------- * ``getpatter/providers/elevenlabs_ws_tts.py`` — provider class ``ElevenLabsWebSocketTTS`` + ``ElevenLabsTTSError``. * ``getpatter/tts/elevenlabs_ws.py`` — wrapper class ``TTS`` re-exported as ``ElevenLabsWebSocketTTS`` from the package root. * ``sdk-ts/src/providers/elevenlabs-ws-tts.ts`` + corresponding TypeScript wrapper at ``sdk-ts/src/tts/elevenlabs-ws.ts``. * ``sdk-ts/src/providers/elevenlabs-tts.ts`` — ``resolveVoiceId`` promoted from module-private to public export so the WS variant can share the voice-name → voice-id resolution table without duplicating the lookup map. * ``sdk-py/getpatter/__init__.py`` and ``sdk-ts/src/index.ts`` — top-level re-exports. Test coverage ------------- * +20 Python pytest cases (construction, factories, URL build, send sequence, ``isFinal`` termination, voice settings in init, ``chunk_length_schedule`` only with ``auto_mode=False``, ``eleven_v3`` rejection + variants, env-var resolution). * +11 TypeScript Vitest cases covering the equivalent surface, including a faked ``ws`` module that records sent frames. The HTTP ``ElevenLabsTTS`` class is **untouched** — both transports coexist and the user picks per-call.

Bump ``getpatter`` to 0.5.5 across both SDKs (Python ``pyproject.toml``, TypeScript ``package.json`` + ``package-lock.json``, and the SDK ``__version__`` / ``VERSION`` constants kept in sync). CHANGELOG entry covers the four user-visible additions shipped in this release: * Sentence chunker — IT/EN abbreviations + multilingual terminators + RAMESH-style all-caps flush bug fix (Pipecat #1692). Default behaviour unchanged for existing users. * Opt-in ``aggressive_first_flush`` / ``aggressiveFirstFlush`` on ``Agent`` / ``AgentOptions`` — emits the first clause of each turn on a soft-punctuation boundary (",", em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA. Italian hard-disabled (decimal-comma + dot-thousands inversion). 8 guards prevent regressions on decimals, currency, JSON, ellipsis, open-delimiters, comma-before-quote, sub-token ambiguity. * New 3-tier ``after_llm`` API (``onChunk`` / ``onSentence`` / ``onResponse``). Legacy single-callable form still works (mapped to ``onResponse``) but emits a one-shot ``PatterDeprecationWarning`` / ``console.warn``. Removal: v0.7.0. * New opt-in ``ElevenLabsWebSocketTTS`` class — drop-in replacement for ``ElevenLabsTTS`` (HTTP) using the ``stream-input`` WebSocket endpoint. Saves ~50 ms HTTP setup + TLS cold-start per utterance. Per-utterance lifecycle (per-session pooling on the roadmap). Test totals after this release: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 / 8 XFAIL / 0 FAIL on a 61-case fixture spanning EN, IT, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Cumulative review hardening from 11 parallel review agents (Python-reviewer, TypeScript-reviewer, provider-reviewer, sdk-parity, security-reviewer, code-reviewer, code-simplifier, refactor-cleaner, docs-sync, build-validator, examples-validator) is folded into the phase-specific commits — see the per-feature commits in this branch for the detailed CRITICAL / HIGH fix lists.

… flush Document the four user-visible additions shipped in 0.5.5: * **ElevenLabsWebSocketTTS** — new provider sub-pages ``docs/{python,typescript}-sdk/providers/elevenlabs-websocket.mdx``. What it is, why use it, ``for_twilio`` / ``for_telnyx`` factories, full constructor params table, ``eleven_v3*`` limitation, per-utterance lifecycle trade-off, ``ElevenLabsTTSError``. Both sub-pages added to the TTS group navigation in ``docs/docs.json``. Existing ``tts.mdx`` providers table updated with the new row plus a callout pointing at the WS variant. * **``after_llm`` 3-tier API** — new "Pipeline Hooks" section in ``docs/{python,typescript}-sdk/events.mdx``: per-tier table for ``onChunk`` (sync, ~0 ms), ``onSentence`` (async, 50–300 ms), and ``onResponse`` (async, 500 ms – 2 s, blocks streaming). Return semantics (``null`` keep / ``""`` drop), legacy callable migration path with ``PatterDeprecationWarning`` (Python) / one-shot ``console.warn`` (TypeScript), removal in v0.7.0. * **``aggressive_first_flush`` opt-in** — new row in the ``AgentOptions`` / ``Agent`` parameters tables in ``docs/{python,typescript}-sdk/agents.mdx`` and ``reference.mdx`` with the Italian hard-disable note. Python ``features.mdx`` adds a dedicated section with code example and the 8-guard summary. * **Chunker improvements** — Python ``features.mdx`` documents the expanded EN abbreviations (``vs.``, ``etc.``, ``Gen.``, ``Sen.``), IT abbreviations (``Sig.``, ``Dott.``, ``S.p.A.``, ``ecc.``), and multilingual terminator support (Hindi / Arabic / Armenian / Ethiopic / Khmer / Burmese / Tibetan). TypeScript SDK has no chunker page so no equivalent change required. ``docs.json`` JSON validated end-to-end. No source / examples / CHANGELOG / NOTICE files touched.

mintlify · 2026-04-29T11:10:54Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
patter-06b046ce	🟢 Ready	View Preview	Apr 29, 2026, 11:11 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions.

…+ diagnostics Three layered fixes targeting the intermittent "outbound call connects but never receives the WS upgrade" failure (Twilio 11100 on answer) documented in BUGS.md. **Root cause A — StatusCallbackEvent encoding** Twilio expects ``StatusCallbackEvent`` as a multi-value parameter (repeated keys), NOT a space-separated single value. The previous ``'initiated ringing answered completed'`` form triggered Twilio notification 21626 ("invalid statusCallbackEvents") on every outbound call, and on some ingestion paths also broke the answer-handler webhook which is exactly the symptom that produced 11100. * TypeScript: use ``params.append('StatusCallbackEvent', evt)`` four times so URLSearchParams emits repeated query keys. * Python: pass the canonical twilio-python snake_case key ``status_callback_event`` as a list — twilio-python serialises it as the multi-value form Twilio expects. **Root cause B — server-not-yet-listening race** The previous ``phone.tunnelReady`` (TS) / ``phone.tunnel_ready`` (Py) signal resolves as soon as the cloudflared hostname is known, BEFORE the embedded HTTP / WS server has finished initialising. ``phone.call`` placed immediately afterwards races the Twilio Media Streams upgrade and produces a half-ready route → 11100. New ``phone.ready`` (TS Promise / Py Future) resolves only after: 1. Tunnel hostname known 2. Carrier auto-config complete 3. EmbeddedServer in ``listen`` state (TS) / uvicorn ``started`` flag set (Py) Outbound pattern is now: ```ts void phone.serve({ agent, tunnel: true }); await phone.ready; // <-- safe for outbound await phone.call(...); ``` ``tunnelReady`` is kept as a separate signal for integrations that only need the hostname (e.g. webhook registration), with a docstring note pointing at ``ready`` for outbound use. **Root cause C — opaque diagnostics** On call drop the user could not tell whether Twilio rejected the dial, the tunnel resolved late, or the WS upgrade failed. The new ``phone.call`` flow logs the Twilio notifications URL on every outbound call ("check here if the call drops with no audio") so self-diagnosis does not require learning the Twilio API. **Test parity** Updated ``test_twilio_statuscallback_always_registered`` to read the new ``status_callback_event`` key (with fallback to the legacy ``StatusCallbackEvent`` for forward compat). Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files. No regressions.

Resolves doc conflicts so the release branch can be landed: - CHANGELOG: keep both 0.5.5 (this branch) and the canonical 0.5.4 entry from main - docs/python-sdk/events.mdx: place EventBus section above the new 3-tier PipelineHooks; remove the older single-callable hook description (covered by the migration section) - docs/python-sdk/tts.mdx: keep both the telephony-factory paragraph and the WebSocket variant note - docs/typescript-sdk/events.mdx + tts.mdx: same treatment as the Python pages Merges in the notebook tutorial series and 0.5.3/0.5.4 docs alignment from main; no SDK source code conflicts.

…eployment DEVLOG.md and superpowers/specs/2026-04-24-patter-feature-test-notebook-design.md fail Mintlify's MDX parser (filenames begin with digits, which MDX treats as JSX expressions). Skip both paths so the docs site can deploy.

- Remove docs/DEVLOG.md and docs/superpowers/ (internal planning notes, no value to public docs site). The .mintignore introduced in the previous commit is no longer needed and is removed too. - sdk-ts/src/client.ts: attach a no-op `.catch` to `_ready` and `_tunnelReady` so callers that never await them don't trigger Node's unhandled-rejection warning when serve() validates inputs synchronously. Awaiters of `phone.ready` / `phone.tunnelReady` still see the rejection. - sdk-ts/package-lock.json: add trailing newline (end-of-file-fixer). - examples/notebooks/**.ipynb: nbstripout pass — clear cell outputs and execution counts to match the repo convention enforced by .pre-commit-config.yaml.

PR #79 added an optional Docker launcher under examples/notebooks/python/ and re-touched all 24 .ipynb files (kernel ID renumbering, source-array reshape). Resolution: - examples/notebooks/python/**.ipynb + examples/notebooks/typescript/**.ipynb: take the main version. Our only prior contribution to these files was the nbstripout pass, which is now re-applied via pre-commit (no behaviour or content of ours is lost). - docs/DEVLOG.md + docs/superpowers/plans/2026-04-24-...md: keep deletion. Both were removed from this branch as out-of-scope for the public docs site; no merge-back.

nicolotognoni added 6 commits April 29, 2026 11:37

mintlify Bot deployed to staging - docs April 29, 2026 11:11 View deployment

nicolotognoni added 4 commits April 29, 2026 15:46

mintlify Bot deployed to staging - docs May 1, 2026 05:57 View deployment

mintlify Bot deployed to staging - docs May 1, 2026 06:04 View deployment

mintlify Bot deployed to staging - docs May 1, 2026 06:13 View deployment

nicolotognoni merged commit 7ac0282 into main May 1, 2026
15 checks passed

nicolotognoni deleted the release/0.5.5-latency-pass-1 branch May 8, 2026 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS)#81

release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS)#81
nicolotognoni merged 12 commits into
mainfrom
release/0.5.5-latency-pass-1

nicolotognoni commented Apr 29, 2026

Uh oh!

mintlify Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented Apr 29, 2026

Summary

Commits

Highlights

Sentence chunker

Opt-in aggressiveFirstFlush

after_llm 3-tier API

ElevenLabsWebSocketTTS

Cross-SDK parity infrastructure

Backward compatibility

Test plan

Follow-ups (deferred to separate PRs)

Uh oh!

mintlify Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Opt-in `aggressiveFirstFlush`

`after_llm` 3-tier API

`ElevenLabsWebSocketTTS`

mintlify Bot commented Apr 29, 2026 •

edited

Loading