Skip to content

release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS)#81

Merged
nicolotognoni merged 12 commits into
mainfrom
release/0.5.5-latency-pass-1
May 1, 2026
Merged

release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS)#81
nicolotognoni merged 12 commits into
mainfrom
release/0.5.5-latency-pass-1

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

0.5.5 lands four user-visible additions that target the first-token-to-first-audio (TTFA) path in pipeline mode, plus the multilingual / Italian polish for the sentence chunker. All changes are additive or opt-in — existing 0.5.4 callers keep their current behaviour unchanged.

The release is grounded in a long-form research pass (ElevenLabs latency posts, LiveKit Agents, Pipecat, Cartesia, Daily, Vapi/Retell production benchmarks) and a follow-up review by 11 parallel review agents — all CRITICAL / HIGH findings are folded into the per-feature commits.

Commits

Commit Scope
aa252fa test(parity) — 61-case cross-SDK fixture + standalone Py↔TS runner
c729a8d feat(chunker) — IT/EN abbreviations, multilingual terminators, opt-in aggressive first-flush
184f820 feat(hooks)after_llm 3-tier API + deprecated legacy callable adapter
e3b7dd2 feat(tts)ElevenLabsWebSocketTTS opt-in low-latency variant
4e7eb06 release: 0.5.5 — version bump + CHANGELOG
5ed6b5c docs — Mintlify pages for the four additions

Highlights

Sentence chunker

  • 35+ new abbreviations: EN (vs, etc, Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq) and IT (Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill + ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed).
  • Multilingual terminator support: ASCII semicolon, Unicode ellipsis, full-width Japanese, Hindi/Devanagari (। ॥), Arabic (؟ ؛ ۔ ؏), Khmer (។ ៕), Burmese (), Armenian (։), Ethiopic (። ፧), Tibetan (༎ ༏).
  • All-caps name flush bug fix (Pipecat #1692): "...with RAMESH." no longer sits in the buffer forever.
  • Suffix-followed-by-starter pattern preserves the trailing period (Patter Inc. He left. keeps Inc.).

Opt-in aggressiveFirstFlush

  • New Agent.aggressive_first_flush: bool = False (Python) / AgentOptions.aggressiveFirstFlush?: boolean (TypeScript). Default OFF.
  • When enabled, emits the first clause of each turn on a soft punctuation boundary (,, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence.
  • Italian (language="it") hard-disables the feature regardless of caller preference (decimal-comma + dot-thousands inversion would split mid-number).
  • 8 guards prevent regressions on decimals, thousands separators, currency, JSON, ellipsis, open delimiters, comma-before-quote, sub-token ambiguity.

after_llm 3-tier API

  • New shape: { onChunk, onSentence, onResponse }. Each tier optional, sync or async accepted.
    • onChunk (sync, ~0 ms) — per-token transform applied inline.
    • onSentence (async, 50–300 ms) — runs between chunker and TTS. Returning null keeps original; "" drops the sentence.
    • onResponse (async, 500 ms – 2 s) — full-response rewrite. Blocks streaming TTS. Use only when sentence-level rewrite is insufficient.
  • Legacy callable (text, ctx) => str still works → mapped to onResponse with one-shot PatterDeprecationWarning (Python — subclass of both DeprecationWarning and UserWarning so it surfaces by default in library code) or console.warn (TypeScript). Removal scheduled for v0.7.0.

ElevenLabsWebSocketTTS

  • New opt-in TTS class targeting wss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-input instead of HTTP /stream. Saves ~50 ms HTTP setup and avoids TLS cold-start per utterance.
  • Drop-in API: same synthesize() / synthesizeStream() signature, same for_twilio() / for_telnyx() factories, same default model eleven_flash_v2_5.
  • auto_mode=true default. inactivity_timeout=60 default. eleven_v3* rejected with a clear error.
  • Per-utterance lifecycle (per-session pooling on the roadmap).
  • Resilience hardening: 5 s connect timeout, 30 s per-frame timeout, raises ElevenLabsTTSError on server error, best-effort EOS in finally, audio frame size cap 512 KB, log-string sanitisation, api_key private with read-only property, all WS listeners removed in finally.
  • The HTTP ElevenLabsTTS class is untouched — both transports coexist.

Cross-SDK parity infrastructure

  • New tests/parity/sentence_chunker_parity.py runner + 61-case fixture covering EN / IT / CJK / Hindi / Arabic / Khmer / Burmese / Armenian / Ethiopic scripts. Verifies Python and TypeScript chunkers produce identical sentence streams.

Backward compatibility

Zero breaking changes for 0.5.4 callers. The chunker's expanded terminator set may emit slightly different sentence boundaries on responses that previously relied on the old behaviour (e.g. text containing Hindi now flushes correctly), but the cross-SDK parity fixture documents every behaviour change.

Test plan

  • Python unit + integration green: 1011 PASS (unit) + 53 PASS (integration), 7 skipped
  • TypeScript unit green: 1163 PASS / 67 files (e2e Playwright excluded — pre-existing gating)
  • Cross-SDK chunker parity: 53 PASS / 8 XFAIL (documented quirks/regressions) / 0 FAIL on the 61-case fixture
  • TS build green (cjs + esm + dts)
  • All examples in examples/ import cleanly; no deprecation warnings on import
  • 11 parallel review agents — all CRITICAL / HIGH findings folded into the per-feature commits
  • Live Twilio TTFA benchmark (deferred to a follow-up — see TODO)

Follow-ups (deferred to separate PRs)

  • Phase 4 — parallel TTS queue with N+1 prefetch (10 invariants + 14 race-condition tests required)
  • pipecat-ai/smart-turn integration (BSD-2, 23 languages, 8 MB ONNX) as a TurnAnalyzer sitting above Silero VAD
  • NLTK Punkt Italian + MarkdownTextFilter + SkipTagsAggregator (Pipecat ports, M-effort)
  • Live bench with the env-configured Twilio number to validate the latency claims in the CHANGELOG
  • elevenlabsWs(...) factory string helper for parity with elevenlabs(...)
  • Residual MEDIUM/LOW polish from the 11-agent review (regex pre-compilation, chunk_length_schedule validation, NOTICE.md attribution, etc.)

Add a 61-case fixture documenting expected sentence-chunker output for
every supported edge case across English, Italian, CJK, Hindi, Arabic,
Khmer, Burmese, Armenian, and Ethiopic scripts. Each case carries the
ideal `expected_sentences` plus an optional `current_behavior` field that
documents known regressions / by-design quirks so the runner can xfail
them without blocking CI.

Standalone runner (`sentence_chunker_parity.py`) executes each case
through the Python `SentenceChunker`, spawns `node sentence_chunker_shim.js`
for the TypeScript equivalent, and compares emissions case-by-case.
Self-contained — does not depend on the main `tests/parity/run.py`
runner (which currently fails on the recent `patter` -> `getpatter`
package rename).

Result on the current main branch: 53 PASS / 8 XFAIL / 0 FAIL /
0 PARITY_FAIL — Python and TypeScript chunkers produce identical
sentence streams for every covered case.
…ive first-flush

Three layered improvements to ``SentenceChunker`` (parity Py↔TS), all
additive — no breaking change to the default behaviour:

**Italian + English abbreviations** (Phase 1, 7)
* Prefix list adds Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On,
  Egr, Spett, Gent, Ill (Italian honorifics) plus Gen, Sen, Rep, Lt,
  Cpt, Capt, Col, Cmdr, Adm (Pipecat NLTK Punkt).
* Suffix list adds ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed
  (Italian) plus vs, etc, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave,
  Blvd, Sq (Pipecat).
* Suffix-followed-by-starter pattern now preserves the trailing period
  (e.g. ``Patter Inc. He left.`` keeps ``Inc.`` instead of dropping it).
* All-caps name fix (Pipecat #1692): the maybe-short-flush gate-5
  acronym guard previously blocked any uppercase-preceded period, so
  ``"...with RAMESH."`` would never flush. Now only purely uppercase
  ASCII words ≤3 chars (U/US/USA/NATO patterns) are treated as acronyms.

**Multilingual terminator support** (Phase 7)
* Added ASCII semicolon ``;``, Unicode ellipsis ``…``, full-width
  semicolon/period/Japanese half-width to the terminator set.
* Ported Pipecat's ``UNAMBIGUOUS_NON_LATIN_TERMINATORS`` (BSD-2): Hindi
  Devanagari ``। ॥``, Arabic ``؟ ؛ ۔ ؏``, Khmer ``។ ៕``, Burmese ``။``,
  Armenian ``։``, Ethiopic ``። ፧``, Tibetan ``༎ ༏``.
* Final ``<stop>`` regex builds its character class from the merged set.

**Opt-in aggressive first-clause flush** (Phase 2)
* New constructor option ``aggressive_first_flush`` (Python) /
  ``aggressiveFirstFlush`` (TypeScript). **Default OFF.**
* When enabled, emits the first clause of the response on a soft
  punctuation boundary (``,``, em-dash, en-dash) once the buffer reaches
  ``aggressive_first_min_len`` (default 40 chars). Saves 200–500 ms TTFA
  on the first sentence of each turn.
* Eight guards prevent regressions on the safe-but-aggressive path:
  min-length, decimal-comma (``3,14``), thousands-separator
  (``1,000,000``), currency (``$1,000``, ``€1.000,50``), balanced
  parens/brackets/braces/double-quotes (protects JSON), ellipsis
  (``...``, ``…``), comma-before-quote, sub-token ambiguity (requires
  one char after the terminator).
* Italian (``language="it"``) hard-disables the feature regardless of
  caller preference — Italian inverts EN convention (``,`` decimal,
  ``.`` thousands), so a comma-flush would split mid-number.
* New ``Agent.aggressive_first_flush: bool = False`` field on Python
  ``Agent`` model. TypeScript ``AgentOptions.aggressiveFirstFlush`` is
  shipped in the after_llm 3-tier commit alongside the rest of the
  ``types.ts`` surface.

Test coverage: +11 Python unit tests + +11 TypeScript unit tests for
the aggressive first-flush feature + parity-fixture cases for RAMESH,
Hindi danda, Arabic question mark, ASCII semicolon, Unicode ellipsis,
vs./etc./Gen./Sen. abbreviations.

Sentence-chunker constants and abbreviation lists ported from Pipecat
(BSD-2-Clause, Daily) and from the LiveKit-derived regex base
(Apache-2.0).
…pter

The ``after_llm`` pipeline hook used to be a single callable
``(text, ctx) → str`` that received the full LLM response only after
the stream completed. Buffering the entire response added 500 ms – 2 s
of TTFA for any agent that configured the hook.

This commit introduces a 3-tier API that lets callers pick the right
latency budget for their transform:

* ``onChunk`` (sync, ~0 ms) — per-token transform applied inline before
  the stream-handler ever sees the token. Use for: regex replace,
  markdown strip, profanity char-swap. Does NOT block streaming.
* ``onSentence`` (async, 50–300 ms) — runs between the sentence chunker
  and TTS. Returns rewritten sentence, ``null`` to keep the original,
  ``""`` to drop the sentence. Use for: PII redaction, persona overlay,
  refusal swap. Adds latency only on the rewritten sentence, not the
  full turn.
* ``onResponse`` (async, 500 ms – 2 s) — full-response rewrite that
  buffers the LLM stream then runs once. **Blocks streaming TTS.** Use
  only when sentence-level rewrite is insufficient (e.g. structured
  output validation that needs the full text).

Backward compatibility
----------------------
The legacy single callable ``afterLlm: (text, ctx) => string`` still
works and is mapped to ``onResponse`` with a one-shot
``PatterDeprecationWarning`` (Python — subclass of both
``DeprecationWarning`` and ``UserWarning`` so it surfaces by default in
library code) or ``console.warn`` (TypeScript). Removal scheduled for
v0.7.0.

Detection in TypeScript uses ``typeof hook === 'function'`` (not
``hook.length`` arity sniffing — that pattern breaks under minifiers
and arrow defaults). Detection in Python uses ``callable(hook)`` plus
``_has_tier_attrs(hook)`` to disambiguate from object-form hooks.

Wire-up
-------
* ``llm_loop.py`` / ``llm-loop.ts`` — ``has_after_llm_response`` (and
  the legacy callable that maps to it) gates token buffering.
  ``has_after_llm_chunk`` triggers per-token transform inline before
  yield.
* ``stream_handler.py`` / ``stream-handler.ts`` — applies
  ``has_after_llm_sentence`` between the chunker emit and the TTS
  synthesise call. Both the streaming-LLM path and the non-streaming
  ``_speakFinalResponse`` path apply the hook for parity.
* The same ``stream_handler`` change wires
  ``Agent.aggressive_first_flush`` / ``AgentOptions.aggressiveFirstFlush``
  into the chunker constructor (Phase 2 wire-up that needed
  ``stream_handler`` and ``types.ts`` to land here alongside the hook
  changes — separating them would have required interactive patch
  staging on the same hunks).

Test coverage
-------------
* +11 Python pytest cases under ``TestAfterLlmThreeTier`` covering: no
  hook pass-through, legacy callable maps to ``on_response`` with
  deprecation warning, dict / Protocol / object forms, drop-by-empty,
  fail-open on hook exception, type confusion (non-string return),
  legacy alias methods (``has_after_llm`` / ``run_after_llm``) preserved.
* +9 TypeScript Vitest cases covering the equivalent surface.
New TTS provider that targets ElevenLabs' streaming-input WebSocket
endpoint (``/v1/text-to-speech/{voice}/stream-input``) instead of the
HTTP ``/stream`` endpoint used by ``ElevenLabsTTS``. Saves ~50 ms HTTP
request setup per utterance and avoids the TLS cold-start handshake on
bursty calls.

Drop-in API matching ``ElevenLabsTTS``:

* Same ``synthesize`` (Python) / ``synthesizeStream`` (TypeScript)
  signature returning ``AsyncGenerator<bytes>``.
* Same ``for_twilio()`` / ``for_telnyx()`` factories.
* Same default model ``eleven_flash_v2_5``.
* Top-level export ``getpatter.ElevenLabsWebSocketTTS`` (Py) /
  ``import { ElevenLabsWebSocketTTS } from "getpatter"`` (TS).

Defaults
--------
* ``auto_mode=true`` — server picks chunk timing.
* ``inactivity_timeout=60`` (range 5–180).
* Per-utterance lifecycle. Documented as a known trade-off vs Pipecat's
  per-session pool (pooling is on the roadmap for v0.6.x).
* ``eleven_v3*`` is rejected at construction with a clear error — the
  WS stream-input endpoint does not support v3; users must fall back
  to the HTTP class.

Resilience contract (post-review hardening)
-------------------------------------------
* **Connect timeout 5 s** (Pipecat-aligned, was 15 s in earlier
  drafts) bounds DNS + TLS handshake.
* **Per-frame receive timeout 30 s** prevents the generator hanging
  forever on a stalled server.
* **Permanent error handler attached BEFORE the open await** — closes
  a window where an error fired after the once-listener resolved would
  surface as ``uncaughtException`` in Node.
* **All ws listeners removed in ``finally``** — no closure leak past
  socket close.
* **Server ``error`` raises ``ElevenLabsTTSError``** instead of
  silently completing — caller can distinguish "synthesis succeeded
  with empty text" from "synth failed mid-stream".
* **Best-effort EOS ``{"text":""}`` in ``finally``** — tells
  ElevenLabs to stop billing for unconsumed audio. Sending it
  immediately after ``flush:true`` (the previous draft) risked
  truncating tail audio under ``auto_mode=true``.
* **Audio frame size cap 512 KB** prevents OOM via malicious /
  malformed base64 (real frames are ~75 KB decoded).
* **Server error string sanitised** before logging (strips CR/LF/NUL,
  truncates to 200 chars) — defends against log-line injection.
* **``api_key`` private** (``_api_key`` + read-only ``api_key``
  property) so ``vars(tts)`` / dataclass-style introspection cannot
  surface the secret.
* **``eleven_v3`` prefix-based reject** also blocks
  ``eleven_v3_preview``, ``eleven_v3_alpha``.
* **Public wrapper exposes the full options surface**
  (``voice_settings``, ``language_code``, ``inactivity_timeout``,
  ``chunk_length_schedule``) — earlier drafts dropped them.
* **Default voice consistency**: the public wrapper no longer
  overrides the provider class default — both layers use Rachel
  (``21m00Tcm4TlvDq8ikWAM``) so direct-construct and wrapped-construct
  paths agree.

Public surface
--------------
* ``getpatter/providers/elevenlabs_ws_tts.py`` — provider class
  ``ElevenLabsWebSocketTTS`` + ``ElevenLabsTTSError``.
* ``getpatter/tts/elevenlabs_ws.py`` — wrapper class ``TTS`` re-exported
  as ``ElevenLabsWebSocketTTS`` from the package root.
* ``sdk-ts/src/providers/elevenlabs-ws-tts.ts`` + corresponding
  TypeScript wrapper at ``sdk-ts/src/tts/elevenlabs-ws.ts``.
* ``sdk-ts/src/providers/elevenlabs-tts.ts`` — ``resolveVoiceId``
  promoted from module-private to public export so the WS variant can
  share the voice-name → voice-id resolution table without
  duplicating the lookup map.
* ``sdk-py/getpatter/__init__.py`` and ``sdk-ts/src/index.ts`` —
  top-level re-exports.

Test coverage
-------------
* +20 Python pytest cases (construction, factories, URL build, send
  sequence, ``isFinal`` termination, voice settings in init,
  ``chunk_length_schedule`` only with ``auto_mode=False``,
  ``eleven_v3`` rejection + variants, env-var resolution).
* +11 TypeScript Vitest cases covering the equivalent surface,
  including a faked ``ws`` module that records sent frames.

The HTTP ``ElevenLabsTTS`` class is **untouched** — both transports
coexist and the user picks per-call.
Bump ``getpatter`` to 0.5.5 across both SDKs (Python ``pyproject.toml``,
TypeScript ``package.json`` + ``package-lock.json``, and the SDK
``__version__`` / ``VERSION`` constants kept in sync).

CHANGELOG entry covers the four user-visible additions shipped in this
release:

* Sentence chunker — IT/EN abbreviations + multilingual terminators +
  RAMESH-style all-caps flush bug fix (Pipecat #1692). Default
  behaviour unchanged for existing users.
* Opt-in ``aggressive_first_flush`` / ``aggressiveFirstFlush`` on
  ``Agent`` / ``AgentOptions`` — emits the first clause of each turn
  on a soft-punctuation boundary (",", em-dash, en-dash) once the
  buffer reaches ~40 chars. Saves 200–500 ms TTFA. Italian
  hard-disabled (decimal-comma + dot-thousands inversion). 8 guards
  prevent regressions on decimals, currency, JSON, ellipsis,
  open-delimiters, comma-before-quote, sub-token ambiguity.
* New 3-tier ``after_llm`` API (``onChunk`` / ``onSentence`` /
  ``onResponse``). Legacy single-callable form still works (mapped to
  ``onResponse``) but emits a one-shot ``PatterDeprecationWarning`` /
  ``console.warn``. Removal: v0.7.0.
* New opt-in ``ElevenLabsWebSocketTTS`` class — drop-in replacement
  for ``ElevenLabsTTS`` (HTTP) using the ``stream-input`` WebSocket
  endpoint. Saves ~50 ms HTTP setup + TLS cold-start per utterance.
  Per-utterance lifecycle (per-session pooling on the roadmap).

Test totals after this release: Python 1064 PASS / 7 skip,
TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 / 8
XFAIL / 0 FAIL on a 61-case fixture spanning EN, IT, CJK, Hindi,
Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts.

Cumulative review hardening from 11 parallel review agents
(Python-reviewer, TypeScript-reviewer, provider-reviewer, sdk-parity,
security-reviewer, code-reviewer, code-simplifier, refactor-cleaner,
docs-sync, build-validator, examples-validator) is folded into the
phase-specific commits — see the per-feature commits in this branch
for the detailed CRITICAL / HIGH fix lists.
… flush

Document the four user-visible additions shipped in 0.5.5:

* **ElevenLabsWebSocketTTS** — new provider sub-pages
  ``docs/{python,typescript}-sdk/providers/elevenlabs-websocket.mdx``.
  What it is, why use it, ``for_twilio`` / ``for_telnyx`` factories,
  full constructor params table, ``eleven_v3*`` limitation,
  per-utterance lifecycle trade-off, ``ElevenLabsTTSError``. Both
  sub-pages added to the TTS group navigation in ``docs/docs.json``.
  Existing ``tts.mdx`` providers table updated with the new row plus a
  callout pointing at the WS variant.

* **``after_llm`` 3-tier API** — new "Pipeline Hooks" section in
  ``docs/{python,typescript}-sdk/events.mdx``: per-tier table for
  ``onChunk`` (sync, ~0 ms), ``onSentence`` (async, 50–300 ms), and
  ``onResponse`` (async, 500 ms – 2 s, blocks streaming). Return
  semantics (``null`` keep / ``""`` drop), legacy callable migration
  path with ``PatterDeprecationWarning`` (Python) / one-shot
  ``console.warn`` (TypeScript), removal in v0.7.0.

* **``aggressive_first_flush`` opt-in** — new row in the
  ``AgentOptions`` / ``Agent`` parameters tables in
  ``docs/{python,typescript}-sdk/agents.mdx`` and ``reference.mdx``
  with the Italian hard-disable note. Python ``features.mdx`` adds a
  dedicated section with code example and the 8-guard summary.

* **Chunker improvements** — Python ``features.mdx`` documents the
  expanded EN abbreviations (``vs.``, ``etc.``, ``Gen.``, ``Sen.``),
  IT abbreviations (``Sig.``, ``Dott.``, ``S.p.A.``, ``ecc.``), and
  multilingual terminator support (Hindi / Arabic / Armenian /
  Ethiopic / Khmer / Burmese / Tibetan). TypeScript SDK has no
  chunker page so no equivalent change required.

``docs.json`` JSON validated end-to-end. No source / examples /
CHANGELOG / NOTICE files touched.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented Apr 29, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview Apr 29, 2026, 11:11 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a
HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD
on macOS x86_64 to a misleading metric that makes healthy calls look
slow.

**Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift**
* ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to
  ``~1.18.0`` — the caret was resolving to 1.24.x where
  ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout
  drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64.
* ``loadOnnxRuntime`` now classifies the underlying error
  (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a
  targeted remedy plus the original error chain via ``Error.cause`` —
  previously the failure mode was hidden behind a single "could not be
  resolved" string.

**Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message**
* The env-var fallback already worked but the error message did not say
  *where* to get an agent ID from (the dashboard, not the API key).
  Updated both Python and TypeScript constructors to point users at
  https://elevenlabs.io/app/conversational-ai and reiterate that the
  agent ID is per-deployed-agent.
* Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty
  (was silently passing through) — TypeScript already did this. Parity.

**Bug #3 (MEDIUM) — ElevenLabs WS payment_required**
* New typed exception ``ElevenLabsPlanError`` (subclass of
  ``ElevenLabsTTSError``) raised when the WS endpoint returns
  ``payment_required``. Free / Starter plans now get a clear "upgrade
  or use the HTTP class (drop-in API)" message instead of an opaque
  ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``.
* Detection is case-insensitive and matches both the exact server
  string and any ``payment_required`` substring.

**Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD**
* On tunnel + speakerphone setups the agent's own TTS leaks into the
  inbound mic feed, STT transcribes it, and the legacy
  "always forward + bargeInThresholdMs" heuristic fails to fire the
  cancel — the agent talks over the user.
* ``serve()`` now logs a one-shot warning at startup when
  ``agent.engine`` is undefined, ``agent.vad`` is undefined, and
  ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or
  ``bargeInThresholdMs: 0``. Both Python and TypeScript.

**Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances**
* ``total_ms`` spans the user's entire utterance (including pauses)
  because it includes ``stt_ms``, which itself measures STT-stream-open
  to transcript-finalisation. On a 4 s user turn ``total_ms`` reads
  ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5
  s — misleading as a p95 / SLO metric.
* New ``LatencyBreakdown.agent_response_ms`` field (Python +
  TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when
  all three signals are available, ``undefined`` / ``None`` otherwise.
  This is the user-perceived latency dashboards should track.
* ``total_ms`` kept unchanged for backward compatibility.

**Bug #7 (HIGH) — outbound TwiML races tunnel startup**
* The documented ``void phone.serve(...) → setTimeout → phone.call(...)``
  pattern reads ``localConfig.webhookUrl`` while the cloudflared
  hostname is still resolving, producing
  ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call
  drop on answer.
* New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready``
  ``asyncio.Future`` (Python). Resolves to the public webhook hostname
  once ``serve()`` knows it (immediately for static webhookUrl,
  after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()``
  fails before the hostname is known.
* Documented pattern is now ``await phone.tunnelReady`` instead of
  ``setTimeout(10_000)`` — deterministic, no race.
* Same root-cause fix likely also addresses Bug #4 (intermittent WS
  upgrade race) which the acceptance run flagged as a related symptom.

Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript
1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0
FAIL on the 61-case fixture. No regressions.
…+ diagnostics

Three layered fixes targeting the intermittent "outbound call connects
but never receives the WS upgrade" failure (Twilio 11100 on answer)
documented in BUGS.md.

**Root cause A — StatusCallbackEvent encoding**
Twilio expects ``StatusCallbackEvent`` as a multi-value parameter
(repeated keys), NOT a space-separated single value. The previous
``'initiated ringing answered completed'`` form triggered Twilio
notification 21626 ("invalid statusCallbackEvents") on every outbound
call, and on some ingestion paths also broke the answer-handler webhook
which is exactly the symptom that produced 11100.

* TypeScript: use ``params.append('StatusCallbackEvent', evt)`` four
  times so URLSearchParams emits repeated query keys.
* Python: pass the canonical twilio-python snake_case key
  ``status_callback_event`` as a list — twilio-python serialises it as
  the multi-value form Twilio expects.

**Root cause B — server-not-yet-listening race**
The previous ``phone.tunnelReady`` (TS) / ``phone.tunnel_ready`` (Py)
signal resolves as soon as the cloudflared hostname is known, BEFORE
the embedded HTTP / WS server has finished initialising. ``phone.call``
placed immediately afterwards races the Twilio Media Streams upgrade
and produces a half-ready route → 11100.

New ``phone.ready`` (TS Promise / Py Future) resolves only after:
1. Tunnel hostname known
2. Carrier auto-config complete
3. EmbeddedServer in ``listen`` state (TS) / uvicorn ``started`` flag
   set (Py)

Outbound pattern is now:

```ts
void phone.serve({ agent, tunnel: true });
await phone.ready;        // <-- safe for outbound
await phone.call(...);
```

``tunnelReady`` is kept as a separate signal for integrations that
only need the hostname (e.g. webhook registration), with a docstring
note pointing at ``ready`` for outbound use.

**Root cause C — opaque diagnostics**
On call drop the user could not tell whether Twilio rejected the dial,
the tunnel resolved late, or the WS upgrade failed. The new
``phone.call`` flow logs the Twilio notifications URL on every
outbound call ("check here if the call drops with no audio") so
self-diagnosis does not require learning the Twilio API.

**Test parity**
Updated ``test_twilio_statuscallback_always_registered`` to read the
new ``status_callback_event`` key (with fallback to the legacy
``StatusCallbackEvent`` for forward compat). Python 1064 PASS / 7
skip, TypeScript 1163 PASS / 67 files. No regressions.
Resolves doc conflicts so the release branch can be landed:
- CHANGELOG: keep both 0.5.5 (this branch) and the canonical 0.5.4 entry from main
- docs/python-sdk/events.mdx: place EventBus section above the new 3-tier PipelineHooks; remove the older single-callable hook description (covered by the migration section)
- docs/python-sdk/tts.mdx: keep both the telephony-factory paragraph and the WebSocket variant note
- docs/typescript-sdk/events.mdx + tts.mdx: same treatment as the Python pages

Merges in the notebook tutorial series and 0.5.3/0.5.4 docs alignment from main; no SDK source code conflicts.
…eployment

DEVLOG.md and superpowers/specs/2026-04-24-patter-feature-test-notebook-design.md fail Mintlify's MDX parser (filenames begin with digits, which MDX treats as JSX expressions). Skip both paths so the docs site can deploy.
- Remove docs/DEVLOG.md and docs/superpowers/ (internal planning notes, no value to public docs site). The .mintignore introduced in the previous commit is no longer needed and is removed too.
- sdk-ts/src/client.ts: attach a no-op `.catch` to `_ready` and `_tunnelReady` so callers that never await them don't trigger Node's unhandled-rejection warning when serve() validates inputs synchronously. Awaiters of `phone.ready` / `phone.tunnelReady` still see the rejection.
- sdk-ts/package-lock.json: add trailing newline (end-of-file-fixer).
- examples/notebooks/**.ipynb: nbstripout pass — clear cell outputs and execution counts to match the repo convention enforced by .pre-commit-config.yaml.
PR #79 added an optional Docker launcher under examples/notebooks/python/ and re-touched all 24 .ipynb files (kernel ID renumbering, source-array reshape).

Resolution:
- examples/notebooks/python/**.ipynb + examples/notebooks/typescript/**.ipynb: take the main version. Our only prior contribution to these files was the nbstripout pass, which is now re-applied via pre-commit (no behaviour or content of ours is lost).
- docs/DEVLOG.md + docs/superpowers/plans/2026-04-24-...md: keep deletion. Both were removed from this branch as out-of-scope for the public docs site; no merge-back.
@nicolotognoni nicolotognoni merged commit 7ac0282 into main May 1, 2026
15 checks passed
@nicolotognoni nicolotognoni deleted the release/0.5.5-latency-pass-1 branch May 8, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant