release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS)#81
Merged
Conversation
Add a 61-case fixture documenting expected sentence-chunker output for every supported edge case across English, Italian, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Each case carries the ideal `expected_sentences` plus an optional `current_behavior` field that documents known regressions / by-design quirks so the runner can xfail them without blocking CI. Standalone runner (`sentence_chunker_parity.py`) executes each case through the Python `SentenceChunker`, spawns `node sentence_chunker_shim.js` for the TypeScript equivalent, and compares emissions case-by-case. Self-contained — does not depend on the main `tests/parity/run.py` runner (which currently fails on the recent `patter` -> `getpatter` package rename). Result on the current main branch: 53 PASS / 8 XFAIL / 0 FAIL / 0 PARITY_FAIL — Python and TypeScript chunkers produce identical sentence streams for every covered case.
…ive first-flush Three layered improvements to ``SentenceChunker`` (parity Py↔TS), all additive — no breaking change to the default behaviour: **Italian + English abbreviations** (Phase 1, 7) * Prefix list adds Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill (Italian honorifics) plus Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm (Pipecat NLTK Punkt). * Suffix list adds ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed (Italian) plus vs, etc, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq (Pipecat). * Suffix-followed-by-starter pattern now preserves the trailing period (e.g. ``Patter Inc. He left.`` keeps ``Inc.`` instead of dropping it). * All-caps name fix (Pipecat #1692): the maybe-short-flush gate-5 acronym guard previously blocked any uppercase-preceded period, so ``"...with RAMESH."`` would never flush. Now only purely uppercase ASCII words ≤3 chars (U/US/USA/NATO patterns) are treated as acronyms. **Multilingual terminator support** (Phase 7) * Added ASCII semicolon ``;``, Unicode ellipsis ``…``, full-width semicolon/period/Japanese half-width to the terminator set. * Ported Pipecat's ``UNAMBIGUOUS_NON_LATIN_TERMINATORS`` (BSD-2): Hindi Devanagari ``। ॥``, Arabic ``؟ ؛ ۔ ؏``, Khmer ``។ ៕``, Burmese ``။``, Armenian ``։``, Ethiopic ``። ፧``, Tibetan ``༎ ༏``. * Final ``<stop>`` regex builds its character class from the merged set. **Opt-in aggressive first-clause flush** (Phase 2) * New constructor option ``aggressive_first_flush`` (Python) / ``aggressiveFirstFlush`` (TypeScript). **Default OFF.** * When enabled, emits the first clause of the response on a soft punctuation boundary (``,``, em-dash, en-dash) once the buffer reaches ``aggressive_first_min_len`` (default 40 chars). Saves 200–500 ms TTFA on the first sentence of each turn. * Eight guards prevent regressions on the safe-but-aggressive path: min-length, decimal-comma (``3,14``), thousands-separator (``1,000,000``), currency (``$1,000``, ``€1.000,50``), balanced parens/brackets/braces/double-quotes (protects JSON), ellipsis (``...``, ``…``), comma-before-quote, sub-token ambiguity (requires one char after the terminator). * Italian (``language="it"``) hard-disables the feature regardless of caller preference — Italian inverts EN convention (``,`` decimal, ``.`` thousands), so a comma-flush would split mid-number. * New ``Agent.aggressive_first_flush: bool = False`` field on Python ``Agent`` model. TypeScript ``AgentOptions.aggressiveFirstFlush`` is shipped in the after_llm 3-tier commit alongside the rest of the ``types.ts`` surface. Test coverage: +11 Python unit tests + +11 TypeScript unit tests for the aggressive first-flush feature + parity-fixture cases for RAMESH, Hindi danda, Arabic question mark, ASCII semicolon, Unicode ellipsis, vs./etc./Gen./Sen. abbreviations. Sentence-chunker constants and abbreviation lists ported from Pipecat (BSD-2-Clause, Daily) and from the LiveKit-derived regex base (Apache-2.0).
…pter The ``after_llm`` pipeline hook used to be a single callable ``(text, ctx) → str`` that received the full LLM response only after the stream completed. Buffering the entire response added 500 ms – 2 s of TTFA for any agent that configured the hook. This commit introduces a 3-tier API that lets callers pick the right latency budget for their transform: * ``onChunk`` (sync, ~0 ms) — per-token transform applied inline before the stream-handler ever sees the token. Use for: regex replace, markdown strip, profanity char-swap. Does NOT block streaming. * ``onSentence`` (async, 50–300 ms) — runs between the sentence chunker and TTS. Returns rewritten sentence, ``null`` to keep the original, ``""`` to drop the sentence. Use for: PII redaction, persona overlay, refusal swap. Adds latency only on the rewritten sentence, not the full turn. * ``onResponse`` (async, 500 ms – 2 s) — full-response rewrite that buffers the LLM stream then runs once. **Blocks streaming TTS.** Use only when sentence-level rewrite is insufficient (e.g. structured output validation that needs the full text). Backward compatibility ---------------------- The legacy single callable ``afterLlm: (text, ctx) => string`` still works and is mapped to ``onResponse`` with a one-shot ``PatterDeprecationWarning`` (Python — subclass of both ``DeprecationWarning`` and ``UserWarning`` so it surfaces by default in library code) or ``console.warn`` (TypeScript). Removal scheduled for v0.7.0. Detection in TypeScript uses ``typeof hook === 'function'`` (not ``hook.length`` arity sniffing — that pattern breaks under minifiers and arrow defaults). Detection in Python uses ``callable(hook)`` plus ``_has_tier_attrs(hook)`` to disambiguate from object-form hooks. Wire-up ------- * ``llm_loop.py`` / ``llm-loop.ts`` — ``has_after_llm_response`` (and the legacy callable that maps to it) gates token buffering. ``has_after_llm_chunk`` triggers per-token transform inline before yield. * ``stream_handler.py`` / ``stream-handler.ts`` — applies ``has_after_llm_sentence`` between the chunker emit and the TTS synthesise call. Both the streaming-LLM path and the non-streaming ``_speakFinalResponse`` path apply the hook for parity. * The same ``stream_handler`` change wires ``Agent.aggressive_first_flush`` / ``AgentOptions.aggressiveFirstFlush`` into the chunker constructor (Phase 2 wire-up that needed ``stream_handler`` and ``types.ts`` to land here alongside the hook changes — separating them would have required interactive patch staging on the same hunks). Test coverage ------------- * +11 Python pytest cases under ``TestAfterLlmThreeTier`` covering: no hook pass-through, legacy callable maps to ``on_response`` with deprecation warning, dict / Protocol / object forms, drop-by-empty, fail-open on hook exception, type confusion (non-string return), legacy alias methods (``has_after_llm`` / ``run_after_llm``) preserved. * +9 TypeScript Vitest cases covering the equivalent surface.
New TTS provider that targets ElevenLabs' streaming-input WebSocket
endpoint (``/v1/text-to-speech/{voice}/stream-input``) instead of the
HTTP ``/stream`` endpoint used by ``ElevenLabsTTS``. Saves ~50 ms HTTP
request setup per utterance and avoids the TLS cold-start handshake on
bursty calls.
Drop-in API matching ``ElevenLabsTTS``:
* Same ``synthesize`` (Python) / ``synthesizeStream`` (TypeScript)
signature returning ``AsyncGenerator<bytes>``.
* Same ``for_twilio()`` / ``for_telnyx()`` factories.
* Same default model ``eleven_flash_v2_5``.
* Top-level export ``getpatter.ElevenLabsWebSocketTTS`` (Py) /
``import { ElevenLabsWebSocketTTS } from "getpatter"`` (TS).
Defaults
--------
* ``auto_mode=true`` — server picks chunk timing.
* ``inactivity_timeout=60`` (range 5–180).
* Per-utterance lifecycle. Documented as a known trade-off vs Pipecat's
per-session pool (pooling is on the roadmap for v0.6.x).
* ``eleven_v3*`` is rejected at construction with a clear error — the
WS stream-input endpoint does not support v3; users must fall back
to the HTTP class.
Resilience contract (post-review hardening)
-------------------------------------------
* **Connect timeout 5 s** (Pipecat-aligned, was 15 s in earlier
drafts) bounds DNS + TLS handshake.
* **Per-frame receive timeout 30 s** prevents the generator hanging
forever on a stalled server.
* **Permanent error handler attached BEFORE the open await** — closes
a window where an error fired after the once-listener resolved would
surface as ``uncaughtException`` in Node.
* **All ws listeners removed in ``finally``** — no closure leak past
socket close.
* **Server ``error`` raises ``ElevenLabsTTSError``** instead of
silently completing — caller can distinguish "synthesis succeeded
with empty text" from "synth failed mid-stream".
* **Best-effort EOS ``{"text":""}`` in ``finally``** — tells
ElevenLabs to stop billing for unconsumed audio. Sending it
immediately after ``flush:true`` (the previous draft) risked
truncating tail audio under ``auto_mode=true``.
* **Audio frame size cap 512 KB** prevents OOM via malicious /
malformed base64 (real frames are ~75 KB decoded).
* **Server error string sanitised** before logging (strips CR/LF/NUL,
truncates to 200 chars) — defends against log-line injection.
* **``api_key`` private** (``_api_key`` + read-only ``api_key``
property) so ``vars(tts)`` / dataclass-style introspection cannot
surface the secret.
* **``eleven_v3`` prefix-based reject** also blocks
``eleven_v3_preview``, ``eleven_v3_alpha``.
* **Public wrapper exposes the full options surface**
(``voice_settings``, ``language_code``, ``inactivity_timeout``,
``chunk_length_schedule``) — earlier drafts dropped them.
* **Default voice consistency**: the public wrapper no longer
overrides the provider class default — both layers use Rachel
(``21m00Tcm4TlvDq8ikWAM``) so direct-construct and wrapped-construct
paths agree.
Public surface
--------------
* ``getpatter/providers/elevenlabs_ws_tts.py`` — provider class
``ElevenLabsWebSocketTTS`` + ``ElevenLabsTTSError``.
* ``getpatter/tts/elevenlabs_ws.py`` — wrapper class ``TTS`` re-exported
as ``ElevenLabsWebSocketTTS`` from the package root.
* ``sdk-ts/src/providers/elevenlabs-ws-tts.ts`` + corresponding
TypeScript wrapper at ``sdk-ts/src/tts/elevenlabs-ws.ts``.
* ``sdk-ts/src/providers/elevenlabs-tts.ts`` — ``resolveVoiceId``
promoted from module-private to public export so the WS variant can
share the voice-name → voice-id resolution table without
duplicating the lookup map.
* ``sdk-py/getpatter/__init__.py`` and ``sdk-ts/src/index.ts`` —
top-level re-exports.
Test coverage
-------------
* +20 Python pytest cases (construction, factories, URL build, send
sequence, ``isFinal`` termination, voice settings in init,
``chunk_length_schedule`` only with ``auto_mode=False``,
``eleven_v3`` rejection + variants, env-var resolution).
* +11 TypeScript Vitest cases covering the equivalent surface,
including a faked ``ws`` module that records sent frames.
The HTTP ``ElevenLabsTTS`` class is **untouched** — both transports
coexist and the user picks per-call.
Bump ``getpatter`` to 0.5.5 across both SDKs (Python ``pyproject.toml``,
TypeScript ``package.json`` + ``package-lock.json``, and the SDK
``__version__`` / ``VERSION`` constants kept in sync).
CHANGELOG entry covers the four user-visible additions shipped in this
release:
* Sentence chunker — IT/EN abbreviations + multilingual terminators +
RAMESH-style all-caps flush bug fix (Pipecat #1692). Default
behaviour unchanged for existing users.
* Opt-in ``aggressive_first_flush`` / ``aggressiveFirstFlush`` on
``Agent`` / ``AgentOptions`` — emits the first clause of each turn
on a soft-punctuation boundary (",", em-dash, en-dash) once the
buffer reaches ~40 chars. Saves 200–500 ms TTFA. Italian
hard-disabled (decimal-comma + dot-thousands inversion). 8 guards
prevent regressions on decimals, currency, JSON, ellipsis,
open-delimiters, comma-before-quote, sub-token ambiguity.
* New 3-tier ``after_llm`` API (``onChunk`` / ``onSentence`` /
``onResponse``). Legacy single-callable form still works (mapped to
``onResponse``) but emits a one-shot ``PatterDeprecationWarning`` /
``console.warn``. Removal: v0.7.0.
* New opt-in ``ElevenLabsWebSocketTTS`` class — drop-in replacement
for ``ElevenLabsTTS`` (HTTP) using the ``stream-input`` WebSocket
endpoint. Saves ~50 ms HTTP setup + TLS cold-start per utterance.
Per-utterance lifecycle (per-session pooling on the roadmap).
Test totals after this release: Python 1064 PASS / 7 skip,
TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 / 8
XFAIL / 0 FAIL on a 61-case fixture spanning EN, IT, CJK, Hindi,
Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts.
Cumulative review hardening from 11 parallel review agents
(Python-reviewer, TypeScript-reviewer, provider-reviewer, sdk-parity,
security-reviewer, code-reviewer, code-simplifier, refactor-cleaner,
docs-sync, build-validator, examples-validator) is folded into the
phase-specific commits — see the per-feature commits in this branch
for the detailed CRITICAL / HIGH fix lists.
… flush
Document the four user-visible additions shipped in 0.5.5:
* **ElevenLabsWebSocketTTS** — new provider sub-pages
``docs/{python,typescript}-sdk/providers/elevenlabs-websocket.mdx``.
What it is, why use it, ``for_twilio`` / ``for_telnyx`` factories,
full constructor params table, ``eleven_v3*`` limitation,
per-utterance lifecycle trade-off, ``ElevenLabsTTSError``. Both
sub-pages added to the TTS group navigation in ``docs/docs.json``.
Existing ``tts.mdx`` providers table updated with the new row plus a
callout pointing at the WS variant.
* **``after_llm`` 3-tier API** — new "Pipeline Hooks" section in
``docs/{python,typescript}-sdk/events.mdx``: per-tier table for
``onChunk`` (sync, ~0 ms), ``onSentence`` (async, 50–300 ms), and
``onResponse`` (async, 500 ms – 2 s, blocks streaming). Return
semantics (``null`` keep / ``""`` drop), legacy callable migration
path with ``PatterDeprecationWarning`` (Python) / one-shot
``console.warn`` (TypeScript), removal in v0.7.0.
* **``aggressive_first_flush`` opt-in** — new row in the
``AgentOptions`` / ``Agent`` parameters tables in
``docs/{python,typescript}-sdk/agents.mdx`` and ``reference.mdx``
with the Italian hard-disable note. Python ``features.mdx`` adds a
dedicated section with code example and the 8-guard summary.
* **Chunker improvements** — Python ``features.mdx`` documents the
expanded EN abbreviations (``vs.``, ``etc.``, ``Gen.``, ``Sen.``),
IT abbreviations (``Sig.``, ``Dott.``, ``S.p.A.``, ``ecc.``), and
multilingual terminator support (Hindi / Arabic / Armenian /
Ethiopic / Khmer / Burmese / Tibetan). TypeScript SDK has no
chunker page so no equivalent change required.
``docs.json`` JSON validated end-to-end. No source / examples /
CHANGELOG / NOTICE files touched.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions.
…+ diagnostics
Three layered fixes targeting the intermittent "outbound call connects
but never receives the WS upgrade" failure (Twilio 11100 on answer)
documented in BUGS.md.
**Root cause A — StatusCallbackEvent encoding**
Twilio expects ``StatusCallbackEvent`` as a multi-value parameter
(repeated keys), NOT a space-separated single value. The previous
``'initiated ringing answered completed'`` form triggered Twilio
notification 21626 ("invalid statusCallbackEvents") on every outbound
call, and on some ingestion paths also broke the answer-handler webhook
which is exactly the symptom that produced 11100.
* TypeScript: use ``params.append('StatusCallbackEvent', evt)`` four
times so URLSearchParams emits repeated query keys.
* Python: pass the canonical twilio-python snake_case key
``status_callback_event`` as a list — twilio-python serialises it as
the multi-value form Twilio expects.
**Root cause B — server-not-yet-listening race**
The previous ``phone.tunnelReady`` (TS) / ``phone.tunnel_ready`` (Py)
signal resolves as soon as the cloudflared hostname is known, BEFORE
the embedded HTTP / WS server has finished initialising. ``phone.call``
placed immediately afterwards races the Twilio Media Streams upgrade
and produces a half-ready route → 11100.
New ``phone.ready`` (TS Promise / Py Future) resolves only after:
1. Tunnel hostname known
2. Carrier auto-config complete
3. EmbeddedServer in ``listen`` state (TS) / uvicorn ``started`` flag
set (Py)
Outbound pattern is now:
```ts
void phone.serve({ agent, tunnel: true });
await phone.ready; // <-- safe for outbound
await phone.call(...);
```
``tunnelReady`` is kept as a separate signal for integrations that
only need the hostname (e.g. webhook registration), with a docstring
note pointing at ``ready`` for outbound use.
**Root cause C — opaque diagnostics**
On call drop the user could not tell whether Twilio rejected the dial,
the tunnel resolved late, or the WS upgrade failed. The new
``phone.call`` flow logs the Twilio notifications URL on every
outbound call ("check here if the call drops with no audio") so
self-diagnosis does not require learning the Twilio API.
**Test parity**
Updated ``test_twilio_statuscallback_always_registered`` to read the
new ``status_callback_event`` key (with fallback to the legacy
``StatusCallbackEvent`` for forward compat). Python 1064 PASS / 7
skip, TypeScript 1163 PASS / 67 files. No regressions.
Resolves doc conflicts so the release branch can be landed: - CHANGELOG: keep both 0.5.5 (this branch) and the canonical 0.5.4 entry from main - docs/python-sdk/events.mdx: place EventBus section above the new 3-tier PipelineHooks; remove the older single-callable hook description (covered by the migration section) - docs/python-sdk/tts.mdx: keep both the telephony-factory paragraph and the WebSocket variant note - docs/typescript-sdk/events.mdx + tts.mdx: same treatment as the Python pages Merges in the notebook tutorial series and 0.5.3/0.5.4 docs alignment from main; no SDK source code conflicts.
…eployment DEVLOG.md and superpowers/specs/2026-04-24-patter-feature-test-notebook-design.md fail Mintlify's MDX parser (filenames begin with digits, which MDX treats as JSX expressions). Skip both paths so the docs site can deploy.
- Remove docs/DEVLOG.md and docs/superpowers/ (internal planning notes, no value to public docs site). The .mintignore introduced in the previous commit is no longer needed and is removed too. - sdk-ts/src/client.ts: attach a no-op `.catch` to `_ready` and `_tunnelReady` so callers that never await them don't trigger Node's unhandled-rejection warning when serve() validates inputs synchronously. Awaiters of `phone.ready` / `phone.tunnelReady` still see the rejection. - sdk-ts/package-lock.json: add trailing newline (end-of-file-fixer). - examples/notebooks/**.ipynb: nbstripout pass — clear cell outputs and execution counts to match the repo convention enforced by .pre-commit-config.yaml.
PR #79 added an optional Docker launcher under examples/notebooks/python/ and re-touched all 24 .ipynb files (kernel ID renumbering, source-array reshape). Resolution: - examples/notebooks/python/**.ipynb + examples/notebooks/typescript/**.ipynb: take the main version. Our only prior contribution to these files was the nbstripout pass, which is now re-applied via pre-commit (no behaviour or content of ours is lost). - docs/DEVLOG.md + docs/superpowers/plans/2026-04-24-...md: keep deletion. Both were removed from this branch as out-of-scope for the public docs site; no merge-back.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
0.5.5 lands four user-visible additions that target the first-token-to-first-audio (TTFA) path in pipeline mode, plus the multilingual / Italian polish for the sentence chunker. All changes are additive or opt-in — existing 0.5.4 callers keep their current behaviour unchanged.
The release is grounded in a long-form research pass (ElevenLabs latency posts, LiveKit Agents, Pipecat, Cartesia, Daily, Vapi/Retell production benchmarks) and a follow-up review by 11 parallel review agents — all CRITICAL / HIGH findings are folded into the per-feature commits.
Commits
aa252fatest(parity)— 61-case cross-SDK fixture + standalone Py↔TS runnerc729a8dfeat(chunker)— IT/EN abbreviations, multilingual terminators, opt-in aggressive first-flush184f820feat(hooks)—after_llm3-tier API + deprecated legacy callable adaptere3b7dd2feat(tts)—ElevenLabsWebSocketTTSopt-in low-latency variant4e7eb06release: 0.5.5— version bump + CHANGELOG5ed6b5cdocs— Mintlify pages for the four additionsHighlights
Sentence chunker
। ॥), Arabic (؟ ؛ ۔ ؏), Khmer (។ ៕), Burmese (။), Armenian (։), Ethiopic (። ፧), Tibetan (༎ ༏)."...with RAMESH."no longer sits in the buffer forever.Patter Inc. He left.keepsInc.).Opt-in
aggressiveFirstFlushAgent.aggressive_first_flush: bool = False(Python) /AgentOptions.aggressiveFirstFlush?: boolean(TypeScript). Default OFF.,, em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA on the first sentence.language="it") hard-disables the feature regardless of caller preference (decimal-comma + dot-thousands inversion would split mid-number).after_llm3-tier API{ onChunk, onSentence, onResponse }. Each tier optional, sync or async accepted.onChunk(sync, ~0 ms) — per-token transform applied inline.onSentence(async, 50–300 ms) — runs between chunker and TTS. Returningnullkeeps original;""drops the sentence.onResponse(async, 500 ms – 2 s) — full-response rewrite. Blocks streaming TTS. Use only when sentence-level rewrite is insufficient.(text, ctx) => strstill works → mapped toonResponsewith one-shotPatterDeprecationWarning(Python — subclass of bothDeprecationWarningandUserWarningso it surfaces by default in library code) orconsole.warn(TypeScript). Removal scheduled for v0.7.0.ElevenLabsWebSocketTTSwss://api.elevenlabs.io/v1/text-to-speech/{voice}/stream-inputinstead of HTTP/stream. Saves ~50 ms HTTP setup and avoids TLS cold-start per utterance.synthesize()/synthesizeStream()signature, samefor_twilio()/for_telnyx()factories, same default modeleleven_flash_v2_5.auto_mode=truedefault.inactivity_timeout=60default.eleven_v3*rejected with a clear error.ElevenLabsTTSErroron server error, best-effort EOS infinally, audio frame size cap 512 KB, log-string sanitisation,api_keyprivate with read-only property, all WS listeners removed infinally.ElevenLabsTTSclass is untouched — both transports coexist.Cross-SDK parity infrastructure
tests/parity/sentence_chunker_parity.pyrunner + 61-case fixture covering EN / IT / CJK / Hindi / Arabic / Khmer / Burmese / Armenian / Ethiopic scripts. Verifies Python and TypeScript chunkers produce identical sentence streams.Backward compatibility
Zero breaking changes for 0.5.4 callers. The chunker's expanded terminator set may emit slightly different sentence boundaries on responses that previously relied on the old behaviour (e.g. text containing Hindi
।now flushes correctly), but the cross-SDK parity fixture documents every behaviour change.Test plan
examples/import cleanly; no deprecation warnings on importFollow-ups (deferred to separate PRs)
pipecat-ai/smart-turnintegration (BSD-2, 23 languages, 8 MB ONNX) as aTurnAnalyzersitting above Silero VADMarkdownTextFilter+SkipTagsAggregator(Pipecat ports, M-effort)elevenlabsWs(...)factory string helper for parity withelevenlabs(...)chunk_length_schedulevalidation, NOTICE.md attribution, etc.)