build(deps-dev): bump typescript from 5.9.3 to 6.0.2 in /sdk-ts#7
Closed
dependabot[bot] wants to merge 1 commit into
Closed
build(deps-dev): bump typescript from 5.9.3 to 6.0.2 in /sdk-ts#7dependabot[bot] wants to merge 1 commit into
dependabot[bot] wants to merge 1 commit into
Conversation
Bumps [typescript](https://github.com/microsoft/TypeScript) from 5.9.3 to 6.0.2. - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](microsoft/TypeScript@v5.9.3...v6.0.2) --- updated-dependencies: - dependency-name: typescript dependency-version: 6.0.2 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Author
|
OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting If you change your mind, just re-open this PR and I'll resolve any conflicts on it. |
6 tasks
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
…#66) * fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
* fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). * docs: update SDK reference for 0.4.4 features - Update version to 0.4.4 in API reference - Add static factories: cartesia(), rime(), lmnt() for TTS - Document new agent() parameters: hooks, text_transforms, vad, audio_filter, background_audio, barge_in_threshold_ms - Add ring_timeout parameter to call() signature - Document Deepgram tuning options: endpointing_ms, utterance_end_ms, vad_events - Synchronize Python and TypeScript API documentation for parity * docs: document barge_in_threshold_ms configuration Update barge-in feature documentation to reflect new barge_in_threshold_ms parameter (default 300ms). Document how to customize or disable via agent configuration. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni
added a commit
that referenced
this pull request
Apr 29, 2026
Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions.
nicolotognoni
added a commit
that referenced
this pull request
May 1, 2026
…#81) * test(parity): cross-SDK sentence chunker fixture + standalone runner Add a 61-case fixture documenting expected sentence-chunker output for every supported edge case across English, Italian, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Each case carries the ideal `expected_sentences` plus an optional `current_behavior` field that documents known regressions / by-design quirks so the runner can xfail them without blocking CI. Standalone runner (`sentence_chunker_parity.py`) executes each case through the Python `SentenceChunker`, spawns `node sentence_chunker_shim.js` for the TypeScript equivalent, and compares emissions case-by-case. Self-contained — does not depend on the main `tests/parity/run.py` runner (which currently fails on the recent `patter` -> `getpatter` package rename). Result on the current main branch: 53 PASS / 8 XFAIL / 0 FAIL / 0 PARITY_FAIL — Python and TypeScript chunkers produce identical sentence streams for every covered case. * feat(chunker): IT/EN abbreviations, multilingual terminators, aggressive first-flush Three layered improvements to ``SentenceChunker`` (parity Py↔TS), all additive — no breaking change to the default behaviour: **Italian + English abbreviations** (Phase 1, 7) * Prefix list adds Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill (Italian honorifics) plus Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm (Pipecat NLTK Punkt). * Suffix list adds ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed (Italian) plus vs, etc, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq (Pipecat). * Suffix-followed-by-starter pattern now preserves the trailing period (e.g. ``Patter Inc. He left.`` keeps ``Inc.`` instead of dropping it). * All-caps name fix (Pipecat #1692): the maybe-short-flush gate-5 acronym guard previously blocked any uppercase-preceded period, so ``"...with RAMESH."`` would never flush. Now only purely uppercase ASCII words ≤3 chars (U/US/USA/NATO patterns) are treated as acronyms. **Multilingual terminator support** (Phase 7) * Added ASCII semicolon ``;``, Unicode ellipsis ``…``, full-width semicolon/period/Japanese half-width to the terminator set. * Ported Pipecat's ``UNAMBIGUOUS_NON_LATIN_TERMINATORS`` (BSD-2): Hindi Devanagari ``। ॥``, Arabic ``؟ ؛ ۔ ؏``, Khmer ``។ ៕``, Burmese ``။``, Armenian ``։``, Ethiopic ``። ፧``, Tibetan ``༎ ༏``. * Final ``<stop>`` regex builds its character class from the merged set. **Opt-in aggressive first-clause flush** (Phase 2) * New constructor option ``aggressive_first_flush`` (Python) / ``aggressiveFirstFlush`` (TypeScript). **Default OFF.** * When enabled, emits the first clause of the response on a soft punctuation boundary (``,``, em-dash, en-dash) once the buffer reaches ``aggressive_first_min_len`` (default 40 chars). Saves 200–500 ms TTFA on the first sentence of each turn. * Eight guards prevent regressions on the safe-but-aggressive path: min-length, decimal-comma (``3,14``), thousands-separator (``1,000,000``), currency (``$1,000``, ``€1.000,50``), balanced parens/brackets/braces/double-quotes (protects JSON), ellipsis (``...``, ``…``), comma-before-quote, sub-token ambiguity (requires one char after the terminator). * Italian (``language="it"``) hard-disables the feature regardless of caller preference — Italian inverts EN convention (``,`` decimal, ``.`` thousands), so a comma-flush would split mid-number. * New ``Agent.aggressive_first_flush: bool = False`` field on Python ``Agent`` model. TypeScript ``AgentOptions.aggressiveFirstFlush`` is shipped in the after_llm 3-tier commit alongside the rest of the ``types.ts`` surface. Test coverage: +11 Python unit tests + +11 TypeScript unit tests for the aggressive first-flush feature + parity-fixture cases for RAMESH, Hindi danda, Arabic question mark, ASCII semicolon, Unicode ellipsis, vs./etc./Gen./Sen. abbreviations. Sentence-chunker constants and abbreviation lists ported from Pipecat (BSD-2-Clause, Daily) and from the LiveKit-derived regex base (Apache-2.0). * feat(hooks): after_llm 3-tier API with deprecated legacy callable adapter The ``after_llm`` pipeline hook used to be a single callable ``(text, ctx) → str`` that received the full LLM response only after the stream completed. Buffering the entire response added 500 ms – 2 s of TTFA for any agent that configured the hook. This commit introduces a 3-tier API that lets callers pick the right latency budget for their transform: * ``onChunk`` (sync, ~0 ms) — per-token transform applied inline before the stream-handler ever sees the token. Use for: regex replace, markdown strip, profanity char-swap. Does NOT block streaming. * ``onSentence`` (async, 50–300 ms) — runs between the sentence chunker and TTS. Returns rewritten sentence, ``null`` to keep the original, ``""`` to drop the sentence. Use for: PII redaction, persona overlay, refusal swap. Adds latency only on the rewritten sentence, not the full turn. * ``onResponse`` (async, 500 ms – 2 s) — full-response rewrite that buffers the LLM stream then runs once. **Blocks streaming TTS.** Use only when sentence-level rewrite is insufficient (e.g. structured output validation that needs the full text). Backward compatibility ---------------------- The legacy single callable ``afterLlm: (text, ctx) => string`` still works and is mapped to ``onResponse`` with a one-shot ``PatterDeprecationWarning`` (Python — subclass of both ``DeprecationWarning`` and ``UserWarning`` so it surfaces by default in library code) or ``console.warn`` (TypeScript). Removal scheduled for v0.7.0. Detection in TypeScript uses ``typeof hook === 'function'`` (not ``hook.length`` arity sniffing — that pattern breaks under minifiers and arrow defaults). Detection in Python uses ``callable(hook)`` plus ``_has_tier_attrs(hook)`` to disambiguate from object-form hooks. Wire-up ------- * ``llm_loop.py`` / ``llm-loop.ts`` — ``has_after_llm_response`` (and the legacy callable that maps to it) gates token buffering. ``has_after_llm_chunk`` triggers per-token transform inline before yield. * ``stream_handler.py`` / ``stream-handler.ts`` — applies ``has_after_llm_sentence`` between the chunker emit and the TTS synthesise call. Both the streaming-LLM path and the non-streaming ``_speakFinalResponse`` path apply the hook for parity. * The same ``stream_handler`` change wires ``Agent.aggressive_first_flush`` / ``AgentOptions.aggressiveFirstFlush`` into the chunker constructor (Phase 2 wire-up that needed ``stream_handler`` and ``types.ts`` to land here alongside the hook changes — separating them would have required interactive patch staging on the same hunks). Test coverage ------------- * +11 Python pytest cases under ``TestAfterLlmThreeTier`` covering: no hook pass-through, legacy callable maps to ``on_response`` with deprecation warning, dict / Protocol / object forms, drop-by-empty, fail-open on hook exception, type confusion (non-string return), legacy alias methods (``has_after_llm`` / ``run_after_llm``) preserved. * +9 TypeScript Vitest cases covering the equivalent surface. * feat(tts): ElevenLabsWebSocketTTS — opt-in low-latency WS variant New TTS provider that targets ElevenLabs' streaming-input WebSocket endpoint (``/v1/text-to-speech/{voice}/stream-input``) instead of the HTTP ``/stream`` endpoint used by ``ElevenLabsTTS``. Saves ~50 ms HTTP request setup per utterance and avoids the TLS cold-start handshake on bursty calls. Drop-in API matching ``ElevenLabsTTS``: * Same ``synthesize`` (Python) / ``synthesizeStream`` (TypeScript) signature returning ``AsyncGenerator<bytes>``. * Same ``for_twilio()`` / ``for_telnyx()`` factories. * Same default model ``eleven_flash_v2_5``. * Top-level export ``getpatter.ElevenLabsWebSocketTTS`` (Py) / ``import { ElevenLabsWebSocketTTS } from "getpatter"`` (TS). Defaults -------- * ``auto_mode=true`` — server picks chunk timing. * ``inactivity_timeout=60`` (range 5–180). * Per-utterance lifecycle. Documented as a known trade-off vs Pipecat's per-session pool (pooling is on the roadmap for v0.6.x). * ``eleven_v3*`` is rejected at construction with a clear error — the WS stream-input endpoint does not support v3; users must fall back to the HTTP class. Resilience contract (post-review hardening) ------------------------------------------- * **Connect timeout 5 s** (Pipecat-aligned, was 15 s in earlier drafts) bounds DNS + TLS handshake. * **Per-frame receive timeout 30 s** prevents the generator hanging forever on a stalled server. * **Permanent error handler attached BEFORE the open await** — closes a window where an error fired after the once-listener resolved would surface as ``uncaughtException`` in Node. * **All ws listeners removed in ``finally``** — no closure leak past socket close. * **Server ``error`` raises ``ElevenLabsTTSError``** instead of silently completing — caller can distinguish "synthesis succeeded with empty text" from "synth failed mid-stream". * **Best-effort EOS ``{"text":""}`` in ``finally``** — tells ElevenLabs to stop billing for unconsumed audio. Sending it immediately after ``flush:true`` (the previous draft) risked truncating tail audio under ``auto_mode=true``. * **Audio frame size cap 512 KB** prevents OOM via malicious / malformed base64 (real frames are ~75 KB decoded). * **Server error string sanitised** before logging (strips CR/LF/NUL, truncates to 200 chars) — defends against log-line injection. * **``api_key`` private** (``_api_key`` + read-only ``api_key`` property) so ``vars(tts)`` / dataclass-style introspection cannot surface the secret. * **``eleven_v3`` prefix-based reject** also blocks ``eleven_v3_preview``, ``eleven_v3_alpha``. * **Public wrapper exposes the full options surface** (``voice_settings``, ``language_code``, ``inactivity_timeout``, ``chunk_length_schedule``) — earlier drafts dropped them. * **Default voice consistency**: the public wrapper no longer overrides the provider class default — both layers use Rachel (``21m00Tcm4TlvDq8ikWAM``) so direct-construct and wrapped-construct paths agree. Public surface -------------- * ``getpatter/providers/elevenlabs_ws_tts.py`` — provider class ``ElevenLabsWebSocketTTS`` + ``ElevenLabsTTSError``. * ``getpatter/tts/elevenlabs_ws.py`` — wrapper class ``TTS`` re-exported as ``ElevenLabsWebSocketTTS`` from the package root. * ``sdk-ts/src/providers/elevenlabs-ws-tts.ts`` + corresponding TypeScript wrapper at ``sdk-ts/src/tts/elevenlabs-ws.ts``. * ``sdk-ts/src/providers/elevenlabs-tts.ts`` — ``resolveVoiceId`` promoted from module-private to public export so the WS variant can share the voice-name → voice-id resolution table without duplicating the lookup map. * ``sdk-py/getpatter/__init__.py`` and ``sdk-ts/src/index.ts`` — top-level re-exports. Test coverage ------------- * +20 Python pytest cases (construction, factories, URL build, send sequence, ``isFinal`` termination, voice settings in init, ``chunk_length_schedule`` only with ``auto_mode=False``, ``eleven_v3`` rejection + variants, env-var resolution). * +11 TypeScript Vitest cases covering the equivalent surface, including a faked ``ws`` module that records sent frames. The HTTP ``ElevenLabsTTS`` class is **untouched** — both transports coexist and the user picks per-call. * release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS) Bump ``getpatter`` to 0.5.5 across both SDKs (Python ``pyproject.toml``, TypeScript ``package.json`` + ``package-lock.json``, and the SDK ``__version__`` / ``VERSION`` constants kept in sync). CHANGELOG entry covers the four user-visible additions shipped in this release: * Sentence chunker — IT/EN abbreviations + multilingual terminators + RAMESH-style all-caps flush bug fix (Pipecat #1692). Default behaviour unchanged for existing users. * Opt-in ``aggressive_first_flush`` / ``aggressiveFirstFlush`` on ``Agent`` / ``AgentOptions`` — emits the first clause of each turn on a soft-punctuation boundary (",", em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA. Italian hard-disabled (decimal-comma + dot-thousands inversion). 8 guards prevent regressions on decimals, currency, JSON, ellipsis, open-delimiters, comma-before-quote, sub-token ambiguity. * New 3-tier ``after_llm`` API (``onChunk`` / ``onSentence`` / ``onResponse``). Legacy single-callable form still works (mapped to ``onResponse``) but emits a one-shot ``PatterDeprecationWarning`` / ``console.warn``. Removal: v0.7.0. * New opt-in ``ElevenLabsWebSocketTTS`` class — drop-in replacement for ``ElevenLabsTTS`` (HTTP) using the ``stream-input`` WebSocket endpoint. Saves ~50 ms HTTP setup + TLS cold-start per utterance. Per-utterance lifecycle (per-session pooling on the roadmap). Test totals after this release: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 / 8 XFAIL / 0 FAIL on a 61-case fixture spanning EN, IT, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Cumulative review hardening from 11 parallel review agents (Python-reviewer, TypeScript-reviewer, provider-reviewer, sdk-parity, security-reviewer, code-reviewer, code-simplifier, refactor-cleaner, docs-sync, build-validator, examples-validator) is folded into the phase-specific commits — see the per-feature commits in this branch for the detailed CRITICAL / HIGH fix lists. * docs: Mintlify pages for 0.5.5 — WS TTS, after_llm 3-tier, aggressive flush Document the four user-visible additions shipped in 0.5.5: * **ElevenLabsWebSocketTTS** — new provider sub-pages ``docs/{python,typescript}-sdk/providers/elevenlabs-websocket.mdx``. What it is, why use it, ``for_twilio`` / ``for_telnyx`` factories, full constructor params table, ``eleven_v3*`` limitation, per-utterance lifecycle trade-off, ``ElevenLabsTTSError``. Both sub-pages added to the TTS group navigation in ``docs/docs.json``. Existing ``tts.mdx`` providers table updated with the new row plus a callout pointing at the WS variant. * **``after_llm`` 3-tier API** — new "Pipeline Hooks" section in ``docs/{python,typescript}-sdk/events.mdx``: per-tier table for ``onChunk`` (sync, ~0 ms), ``onSentence`` (async, 50–300 ms), and ``onResponse`` (async, 500 ms – 2 s, blocks streaming). Return semantics (``null`` keep / ``""`` drop), legacy callable migration path with ``PatterDeprecationWarning`` (Python) / one-shot ``console.warn`` (TypeScript), removal in v0.7.0. * **``aggressive_first_flush`` opt-in** — new row in the ``AgentOptions`` / ``Agent`` parameters tables in ``docs/{python,typescript}-sdk/agents.mdx`` and ``reference.mdx`` with the Italian hard-disable note. Python ``features.mdx`` adds a dedicated section with code example and the 8-guard summary. * **Chunker improvements** — Python ``features.mdx`` documents the expanded EN abbreviations (``vs.``, ``etc.``, ``Gen.``, ``Sen.``), IT abbreviations (``Sig.``, ``Dott.``, ``S.p.A.``, ``ecc.``), and multilingual terminator support (Hindi / Arabic / Armenian / Ethiopic / Khmer / Burmese / Tibetan). TypeScript SDK has no chunker page so no equivalent change required. ``docs.json`` JSON validated end-to-end. No source / examples / CHANGELOG / NOTICE files touched. * fix: 5 bugs from 2026-04-29 acceptance run (sdk-ts 0.5.5) Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions. * fix(bug-4): outbound WS upgrade race — encoded events + ready signal + diagnostics Three layered fixes targeting the intermittent "outbound call connects but never receives the WS upgrade" failure (Twilio 11100 on answer) documented in BUGS.md. **Root cause A — StatusCallbackEvent encoding** Twilio expects ``StatusCallbackEvent`` as a multi-value parameter (repeated keys), NOT a space-separated single value. The previous ``'initiated ringing answered completed'`` form triggered Twilio notification 21626 ("invalid statusCallbackEvents") on every outbound call, and on some ingestion paths also broke the answer-handler webhook which is exactly the symptom that produced 11100. * TypeScript: use ``params.append('StatusCallbackEvent', evt)`` four times so URLSearchParams emits repeated query keys. * Python: pass the canonical twilio-python snake_case key ``status_callback_event`` as a list — twilio-python serialises it as the multi-value form Twilio expects. **Root cause B — server-not-yet-listening race** The previous ``phone.tunnelReady`` (TS) / ``phone.tunnel_ready`` (Py) signal resolves as soon as the cloudflared hostname is known, BEFORE the embedded HTTP / WS server has finished initialising. ``phone.call`` placed immediately afterwards races the Twilio Media Streams upgrade and produces a half-ready route → 11100. New ``phone.ready`` (TS Promise / Py Future) resolves only after: 1. Tunnel hostname known 2. Carrier auto-config complete 3. EmbeddedServer in ``listen`` state (TS) / uvicorn ``started`` flag set (Py) Outbound pattern is now: ```ts void phone.serve({ agent, tunnel: true }); await phone.ready; // <-- safe for outbound await phone.call(...); ``` ``tunnelReady`` is kept as a separate signal for integrations that only need the hostname (e.g. webhook registration), with a docstring note pointing at ``ready`` for outbound use. **Root cause C — opaque diagnostics** On call drop the user could not tell whether Twilio rejected the dial, the tunnel resolved late, or the WS upgrade failed. The new ``phone.call`` flow logs the Twilio notifications URL on every outbound call ("check here if the call drops with no audio") so self-diagnosis does not require learning the Twilio API. **Test parity** Updated ``test_twilio_statuscallback_always_registered`` to read the new ``status_callback_event`` key (with fallback to the legacy ``StatusCallbackEvent`` for forward compat). Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files. No regressions. * chore(docs): mintignore DEVLOG and superpowers/ to unblock Mintlify deployment DEVLOG.md and superpowers/specs/2026-04-24-patter-feature-test-notebook-design.md fail Mintlify's MDX parser (filenames begin with digits, which MDX treats as JSX expressions). Skip both paths so the docs site can deploy. * chore: drop DEVLOG/superpowers, fix CI failures - Remove docs/DEVLOG.md and docs/superpowers/ (internal planning notes, no value to public docs site). The .mintignore introduced in the previous commit is no longer needed and is removed too. - sdk-ts/src/client.ts: attach a no-op `.catch` to `_ready` and `_tunnelReady` so callers that never await them don't trigger Node's unhandled-rejection warning when serve() validates inputs synchronously. Awaiters of `phone.ready` / `phone.tunnelReady` still see the rejection. - sdk-ts/package-lock.json: add trailing newline (end-of-file-fixer). - examples/notebooks/**.ipynb: nbstripout pass — clear cell outputs and execution counts to match the repo convention enforced by .pre-commit-config.yaml.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps typescript from 5.9.3 to 6.0.2.
Release notes
Sourced from typescript's releases.
Commits
607a22aBump version to 6.0.2 and LKG9e72ab7🤖 Pick PR #63239 (Fix missing lib files in reused pro...) into release-6.0 (#...35ff23d🤖 Pick PR #63163 (Port anyFunctionType subtype fix an...) into release-6.0 (#...e175b69Bump version to 6.0.1-rc and LKGaf4caacUpdate LKG8efd7e8Merge remote-tracking branch 'origin/main' into release-6.0206ed1aDeprecate assert in import() (#63172)e688ac8Update dependencies (#63156)29b300dBump the github-actions group across 1 directory with 2 updates (#63205)0c2c7a3DOM update (#63183)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)