build(deps): bump actions/setup-python from 5 to 6#2
Merged
nicolotognoni merged 1 commit intoApr 10, 2026
Merged
Conversation
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
FrancescoRosciano
pushed a commit
that referenced
this pull request
Apr 12, 2026
Resolves both moderate-severity Dependabot alerts (#1 esbuild dev server SSRF, #2 vite path traversal in .map handling). Both were transitive dev dependencies pinned by vitest 2.x via vite 5.x. vitest 4.1.4 pulls vite 8.0.8 + esbuild 0.27.4 — all past patched versions. npm audit reports 0 vulnerabilities. All 311 TS tests pass.
6 tasks
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
…#66) * fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
* fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). * docs: update SDK reference for 0.4.4 features - Update version to 0.4.4 in API reference - Add static factories: cartesia(), rime(), lmnt() for TTS - Document new agent() parameters: hooks, text_transforms, vad, audio_filter, background_audio, barge_in_threshold_ms - Add ring_timeout parameter to call() signature - Document Deepgram tuning options: endpointing_ms, utterance_end_ms, vad_events - Synchronize Python and TypeScript API documentation for parity * docs: document barge_in_threshold_ms configuration Update barge-in feature documentation to reflect new barge_in_threshold_ms parameter (default 300ms). Document how to customize or disable via agent configuration. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni
added a commit
that referenced
this pull request
Apr 29, 2026
Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions.
nicolotognoni
added a commit
that referenced
this pull request
May 1, 2026
…#81) * test(parity): cross-SDK sentence chunker fixture + standalone runner Add a 61-case fixture documenting expected sentence-chunker output for every supported edge case across English, Italian, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Each case carries the ideal `expected_sentences` plus an optional `current_behavior` field that documents known regressions / by-design quirks so the runner can xfail them without blocking CI. Standalone runner (`sentence_chunker_parity.py`) executes each case through the Python `SentenceChunker`, spawns `node sentence_chunker_shim.js` for the TypeScript equivalent, and compares emissions case-by-case. Self-contained — does not depend on the main `tests/parity/run.py` runner (which currently fails on the recent `patter` -> `getpatter` package rename). Result on the current main branch: 53 PASS / 8 XFAIL / 0 FAIL / 0 PARITY_FAIL — Python and TypeScript chunkers produce identical sentence streams for every covered case. * feat(chunker): IT/EN abbreviations, multilingual terminators, aggressive first-flush Three layered improvements to ``SentenceChunker`` (parity Py↔TS), all additive — no breaking change to the default behaviour: **Italian + English abbreviations** (Phase 1, 7) * Prefix list adds Sig, Sgr, Dott, Prof, Avv, Ing, Geom, Rag, Arch, On, Egr, Spett, Gent, Ill (Italian honorifics) plus Gen, Sen, Rep, Lt, Cpt, Capt, Col, Cmdr, Adm (Pipecat NLTK Punkt). * Suffix list adds ecc, cit, cap, sez, art, pag, fig, tab, cfr, vol, ed (Italian) plus vs, etc, No, Vol, pp, cf, ca, op, Mt, Hwy, Rt, Pl, Ave, Blvd, Sq (Pipecat). * Suffix-followed-by-starter pattern now preserves the trailing period (e.g. ``Patter Inc. He left.`` keeps ``Inc.`` instead of dropping it). * All-caps name fix (Pipecat #1692): the maybe-short-flush gate-5 acronym guard previously blocked any uppercase-preceded period, so ``"...with RAMESH."`` would never flush. Now only purely uppercase ASCII words ≤3 chars (U/US/USA/NATO patterns) are treated as acronyms. **Multilingual terminator support** (Phase 7) * Added ASCII semicolon ``;``, Unicode ellipsis ``…``, full-width semicolon/period/Japanese half-width to the terminator set. * Ported Pipecat's ``UNAMBIGUOUS_NON_LATIN_TERMINATORS`` (BSD-2): Hindi Devanagari ``। ॥``, Arabic ``؟ ؛ ۔ ؏``, Khmer ``។ ៕``, Burmese ``။``, Armenian ``։``, Ethiopic ``። ፧``, Tibetan ``༎ ༏``. * Final ``<stop>`` regex builds its character class from the merged set. **Opt-in aggressive first-clause flush** (Phase 2) * New constructor option ``aggressive_first_flush`` (Python) / ``aggressiveFirstFlush`` (TypeScript). **Default OFF.** * When enabled, emits the first clause of the response on a soft punctuation boundary (``,``, em-dash, en-dash) once the buffer reaches ``aggressive_first_min_len`` (default 40 chars). Saves 200–500 ms TTFA on the first sentence of each turn. * Eight guards prevent regressions on the safe-but-aggressive path: min-length, decimal-comma (``3,14``), thousands-separator (``1,000,000``), currency (``$1,000``, ``€1.000,50``), balanced parens/brackets/braces/double-quotes (protects JSON), ellipsis (``...``, ``…``), comma-before-quote, sub-token ambiguity (requires one char after the terminator). * Italian (``language="it"``) hard-disables the feature regardless of caller preference — Italian inverts EN convention (``,`` decimal, ``.`` thousands), so a comma-flush would split mid-number. * New ``Agent.aggressive_first_flush: bool = False`` field on Python ``Agent`` model. TypeScript ``AgentOptions.aggressiveFirstFlush`` is shipped in the after_llm 3-tier commit alongside the rest of the ``types.ts`` surface. Test coverage: +11 Python unit tests + +11 TypeScript unit tests for the aggressive first-flush feature + parity-fixture cases for RAMESH, Hindi danda, Arabic question mark, ASCII semicolon, Unicode ellipsis, vs./etc./Gen./Sen. abbreviations. Sentence-chunker constants and abbreviation lists ported from Pipecat (BSD-2-Clause, Daily) and from the LiveKit-derived regex base (Apache-2.0). * feat(hooks): after_llm 3-tier API with deprecated legacy callable adapter The ``after_llm`` pipeline hook used to be a single callable ``(text, ctx) → str`` that received the full LLM response only after the stream completed. Buffering the entire response added 500 ms – 2 s of TTFA for any agent that configured the hook. This commit introduces a 3-tier API that lets callers pick the right latency budget for their transform: * ``onChunk`` (sync, ~0 ms) — per-token transform applied inline before the stream-handler ever sees the token. Use for: regex replace, markdown strip, profanity char-swap. Does NOT block streaming. * ``onSentence`` (async, 50–300 ms) — runs between the sentence chunker and TTS. Returns rewritten sentence, ``null`` to keep the original, ``""`` to drop the sentence. Use for: PII redaction, persona overlay, refusal swap. Adds latency only on the rewritten sentence, not the full turn. * ``onResponse`` (async, 500 ms – 2 s) — full-response rewrite that buffers the LLM stream then runs once. **Blocks streaming TTS.** Use only when sentence-level rewrite is insufficient (e.g. structured output validation that needs the full text). Backward compatibility ---------------------- The legacy single callable ``afterLlm: (text, ctx) => string`` still works and is mapped to ``onResponse`` with a one-shot ``PatterDeprecationWarning`` (Python — subclass of both ``DeprecationWarning`` and ``UserWarning`` so it surfaces by default in library code) or ``console.warn`` (TypeScript). Removal scheduled for v0.7.0. Detection in TypeScript uses ``typeof hook === 'function'`` (not ``hook.length`` arity sniffing — that pattern breaks under minifiers and arrow defaults). Detection in Python uses ``callable(hook)`` plus ``_has_tier_attrs(hook)`` to disambiguate from object-form hooks. Wire-up ------- * ``llm_loop.py`` / ``llm-loop.ts`` — ``has_after_llm_response`` (and the legacy callable that maps to it) gates token buffering. ``has_after_llm_chunk`` triggers per-token transform inline before yield. * ``stream_handler.py`` / ``stream-handler.ts`` — applies ``has_after_llm_sentence`` between the chunker emit and the TTS synthesise call. Both the streaming-LLM path and the non-streaming ``_speakFinalResponse`` path apply the hook for parity. * The same ``stream_handler`` change wires ``Agent.aggressive_first_flush`` / ``AgentOptions.aggressiveFirstFlush`` into the chunker constructor (Phase 2 wire-up that needed ``stream_handler`` and ``types.ts`` to land here alongside the hook changes — separating them would have required interactive patch staging on the same hunks). Test coverage ------------- * +11 Python pytest cases under ``TestAfterLlmThreeTier`` covering: no hook pass-through, legacy callable maps to ``on_response`` with deprecation warning, dict / Protocol / object forms, drop-by-empty, fail-open on hook exception, type confusion (non-string return), legacy alias methods (``has_after_llm`` / ``run_after_llm``) preserved. * +9 TypeScript Vitest cases covering the equivalent surface. * feat(tts): ElevenLabsWebSocketTTS — opt-in low-latency WS variant New TTS provider that targets ElevenLabs' streaming-input WebSocket endpoint (``/v1/text-to-speech/{voice}/stream-input``) instead of the HTTP ``/stream`` endpoint used by ``ElevenLabsTTS``. Saves ~50 ms HTTP request setup per utterance and avoids the TLS cold-start handshake on bursty calls. Drop-in API matching ``ElevenLabsTTS``: * Same ``synthesize`` (Python) / ``synthesizeStream`` (TypeScript) signature returning ``AsyncGenerator<bytes>``. * Same ``for_twilio()`` / ``for_telnyx()`` factories. * Same default model ``eleven_flash_v2_5``. * Top-level export ``getpatter.ElevenLabsWebSocketTTS`` (Py) / ``import { ElevenLabsWebSocketTTS } from "getpatter"`` (TS). Defaults -------- * ``auto_mode=true`` — server picks chunk timing. * ``inactivity_timeout=60`` (range 5–180). * Per-utterance lifecycle. Documented as a known trade-off vs Pipecat's per-session pool (pooling is on the roadmap for v0.6.x). * ``eleven_v3*`` is rejected at construction with a clear error — the WS stream-input endpoint does not support v3; users must fall back to the HTTP class. Resilience contract (post-review hardening) ------------------------------------------- * **Connect timeout 5 s** (Pipecat-aligned, was 15 s in earlier drafts) bounds DNS + TLS handshake. * **Per-frame receive timeout 30 s** prevents the generator hanging forever on a stalled server. * **Permanent error handler attached BEFORE the open await** — closes a window where an error fired after the once-listener resolved would surface as ``uncaughtException`` in Node. * **All ws listeners removed in ``finally``** — no closure leak past socket close. * **Server ``error`` raises ``ElevenLabsTTSError``** instead of silently completing — caller can distinguish "synthesis succeeded with empty text" from "synth failed mid-stream". * **Best-effort EOS ``{"text":""}`` in ``finally``** — tells ElevenLabs to stop billing for unconsumed audio. Sending it immediately after ``flush:true`` (the previous draft) risked truncating tail audio under ``auto_mode=true``. * **Audio frame size cap 512 KB** prevents OOM via malicious / malformed base64 (real frames are ~75 KB decoded). * **Server error string sanitised** before logging (strips CR/LF/NUL, truncates to 200 chars) — defends against log-line injection. * **``api_key`` private** (``_api_key`` + read-only ``api_key`` property) so ``vars(tts)`` / dataclass-style introspection cannot surface the secret. * **``eleven_v3`` prefix-based reject** also blocks ``eleven_v3_preview``, ``eleven_v3_alpha``. * **Public wrapper exposes the full options surface** (``voice_settings``, ``language_code``, ``inactivity_timeout``, ``chunk_length_schedule``) — earlier drafts dropped them. * **Default voice consistency**: the public wrapper no longer overrides the provider class default — both layers use Rachel (``21m00Tcm4TlvDq8ikWAM``) so direct-construct and wrapped-construct paths agree. Public surface -------------- * ``getpatter/providers/elevenlabs_ws_tts.py`` — provider class ``ElevenLabsWebSocketTTS`` + ``ElevenLabsTTSError``. * ``getpatter/tts/elevenlabs_ws.py`` — wrapper class ``TTS`` re-exported as ``ElevenLabsWebSocketTTS`` from the package root. * ``sdk-ts/src/providers/elevenlabs-ws-tts.ts`` + corresponding TypeScript wrapper at ``sdk-ts/src/tts/elevenlabs-ws.ts``. * ``sdk-ts/src/providers/elevenlabs-tts.ts`` — ``resolveVoiceId`` promoted from module-private to public export so the WS variant can share the voice-name → voice-id resolution table without duplicating the lookup map. * ``sdk-py/getpatter/__init__.py`` and ``sdk-ts/src/index.ts`` — top-level re-exports. Test coverage ------------- * +20 Python pytest cases (construction, factories, URL build, send sequence, ``isFinal`` termination, voice settings in init, ``chunk_length_schedule`` only with ``auto_mode=False``, ``eleven_v3`` rejection + variants, env-var resolution). * +11 TypeScript Vitest cases covering the equivalent surface, including a faked ``ws`` module that records sent frames. The HTTP ``ElevenLabsTTS`` class is **untouched** — both transports coexist and the user picks per-call. * release: 0.5.5 — latency pass 1 (chunker + after_llm 3-tier + WS TTS) Bump ``getpatter`` to 0.5.5 across both SDKs (Python ``pyproject.toml``, TypeScript ``package.json`` + ``package-lock.json``, and the SDK ``__version__`` / ``VERSION`` constants kept in sync). CHANGELOG entry covers the four user-visible additions shipped in this release: * Sentence chunker — IT/EN abbreviations + multilingual terminators + RAMESH-style all-caps flush bug fix (Pipecat #1692). Default behaviour unchanged for existing users. * Opt-in ``aggressive_first_flush`` / ``aggressiveFirstFlush`` on ``Agent`` / ``AgentOptions`` — emits the first clause of each turn on a soft-punctuation boundary (",", em-dash, en-dash) once the buffer reaches ~40 chars. Saves 200–500 ms TTFA. Italian hard-disabled (decimal-comma + dot-thousands inversion). 8 guards prevent regressions on decimals, currency, JSON, ellipsis, open-delimiters, comma-before-quote, sub-token ambiguity. * New 3-tier ``after_llm`` API (``onChunk`` / ``onSentence`` / ``onResponse``). Legacy single-callable form still works (mapped to ``onResponse``) but emits a one-shot ``PatterDeprecationWarning`` / ``console.warn``. Removal: v0.7.0. * New opt-in ``ElevenLabsWebSocketTTS`` class — drop-in replacement for ``ElevenLabsTTS`` (HTTP) using the ``stream-input`` WebSocket endpoint. Saves ~50 ms HTTP setup + TLS cold-start per utterance. Per-utterance lifecycle (per-session pooling on the roadmap). Test totals after this release: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 / 8 XFAIL / 0 FAIL on a 61-case fixture spanning EN, IT, CJK, Hindi, Arabic, Khmer, Burmese, Armenian, and Ethiopic scripts. Cumulative review hardening from 11 parallel review agents (Python-reviewer, TypeScript-reviewer, provider-reviewer, sdk-parity, security-reviewer, code-reviewer, code-simplifier, refactor-cleaner, docs-sync, build-validator, examples-validator) is folded into the phase-specific commits — see the per-feature commits in this branch for the detailed CRITICAL / HIGH fix lists. * docs: Mintlify pages for 0.5.5 — WS TTS, after_llm 3-tier, aggressive flush Document the four user-visible additions shipped in 0.5.5: * **ElevenLabsWebSocketTTS** — new provider sub-pages ``docs/{python,typescript}-sdk/providers/elevenlabs-websocket.mdx``. What it is, why use it, ``for_twilio`` / ``for_telnyx`` factories, full constructor params table, ``eleven_v3*`` limitation, per-utterance lifecycle trade-off, ``ElevenLabsTTSError``. Both sub-pages added to the TTS group navigation in ``docs/docs.json``. Existing ``tts.mdx`` providers table updated with the new row plus a callout pointing at the WS variant. * **``after_llm`` 3-tier API** — new "Pipeline Hooks" section in ``docs/{python,typescript}-sdk/events.mdx``: per-tier table for ``onChunk`` (sync, ~0 ms), ``onSentence`` (async, 50–300 ms), and ``onResponse`` (async, 500 ms – 2 s, blocks streaming). Return semantics (``null`` keep / ``""`` drop), legacy callable migration path with ``PatterDeprecationWarning`` (Python) / one-shot ``console.warn`` (TypeScript), removal in v0.7.0. * **``aggressive_first_flush`` opt-in** — new row in the ``AgentOptions`` / ``Agent`` parameters tables in ``docs/{python,typescript}-sdk/agents.mdx`` and ``reference.mdx`` with the Italian hard-disable note. Python ``features.mdx`` adds a dedicated section with code example and the 8-guard summary. * **Chunker improvements** — Python ``features.mdx`` documents the expanded EN abbreviations (``vs.``, ``etc.``, ``Gen.``, ``Sen.``), IT abbreviations (``Sig.``, ``Dott.``, ``S.p.A.``, ``ecc.``), and multilingual terminator support (Hindi / Arabic / Armenian / Ethiopic / Khmer / Burmese / Tibetan). TypeScript SDK has no chunker page so no equivalent change required. ``docs.json`` JSON validated end-to-end. No source / examples / CHANGELOG / NOTICE files touched. * fix: 5 bugs from 2026-04-29 acceptance run (sdk-ts 0.5.5) Five fixes uncovered by the 0.5.5 acceptance matrix run, ranging from a HIGH-severity onnxruntime-node version mismatch that blocks Silero VAD on macOS x86_64 to a misleading metric that makes healthy calls look slow. **Bug #1 (HIGH) — SileroVAD onnxruntime-node 1.24+ API drift** * ``optionalDependencies.onnxruntime-node`` tightened from ``^1.18.0`` to ``~1.18.0`` — the caret was resolving to 1.24.x where ``listSupportedBackends`` was removed and the prebuilt ``bin/`` layout drifted, so ``import('onnxruntime-node')`` failed on macOS x86_64. * ``loadOnnxRuntime`` now classifies the underlying error (``missing`` / ``binding`` / ``api-drift`` / ``unknown``) and surfaces a targeted remedy plus the original error chain via ``Error.cause`` — previously the failure mode was hidden behind a single "could not be resolved" string. **Bug #2 (MEDIUM) — ElevenLabsConvAI agent_id error message** * The env-var fallback already worked but the error message did not say *where* to get an agent ID from (the dashboard, not the API key). Updated both Python and TypeScript constructors to point users at https://elevenlabs.io/app/conversational-ai and reiterate that the agent ID is per-deployed-agent. * Python ``ConvAI.__post_init__`` now raises when ``agent_id`` is empty (was silently passing through) — TypeScript already did this. Parity. **Bug #3 (MEDIUM) — ElevenLabs WS payment_required** * New typed exception ``ElevenLabsPlanError`` (subclass of ``ElevenLabsTTSError``) raised when the WS endpoint returns ``payment_required``. Free / Starter plans now get a clear "upgrade or use the HTTP class (drop-in API)" message instead of an opaque ``ElevenLabsTTSError: ElevenLabs WS error: payment_required``. * Detection is case-insensitive and matches both the exact server string and any ``payment_required`` substring. **Bug #5 (MEDIUM) — barge-in fragile in pipeline mode without VAD** * On tunnel + speakerphone setups the agent's own TTS leaks into the inbound mic feed, STT transcribes it, and the legacy "always forward + bargeInThresholdMs" heuristic fails to fire the cancel — the agent talks over the user. * ``serve()`` now logs a one-shot warning at startup when ``agent.engine`` is undefined, ``agent.vad`` is undefined, and ``bargeInThresholdMs > 0``, recommending ``SileroVAD`` or ``bargeInThresholdMs: 0``. Both Python and TypeScript. **Bug #6 (LOW) — pipeline ``total_ms`` misleading on long utterances** * ``total_ms`` spans the user's entire utterance (including pauses) because it includes ``stt_ms``, which itself measures STT-stream-open to transcript-finalisation. On a 4 s user turn ``total_ms`` reads ~5.5 s even though the agent's TTFA after end-of-speech is ~1.3-1.5 s — misleading as a p95 / SLO metric. * New ``LatencyBreakdown.agent_response_ms`` field (Python + TypeScript). Computed as ``endpoint_ms + llm_ttft_ms + tts_ms`` when all three signals are available, ``undefined`` / ``None`` otherwise. This is the user-perceived latency dashboards should track. * ``total_ms`` kept unchanged for backward compatibility. **Bug #7 (HIGH) — outbound TwiML races tunnel startup** * The documented ``void phone.serve(...) → setTimeout → phone.call(...)`` pattern reads ``localConfig.webhookUrl`` while the cloudflared hostname is still resolving, producing ``wss://undefined/...`` in the dial TwiML and a Twilio 11100 call drop on answer. * New ``phone.tunnelReady`` Promise (TS) / ``phone.tunnel_ready`` ``asyncio.Future`` (Python). Resolves to the public webhook hostname once ``serve()`` knows it (immediately for static webhookUrl, after ``startTunnel`` for ``tunnel: true``). Rejects if ``serve()`` fails before the hostname is known. * Documented pattern is now ``await phone.tunnelReady`` instead of ``setTimeout(10_000)`` — deterministic, no race. * Same root-cause fix likely also addresses Bug #4 (intermittent WS upgrade race) which the acceptance run flagged as a related symptom. Test totals after the fixes: Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files, cross-SDK chunker parity 53 PASS / 8 XFAIL / 0 FAIL on the 61-case fixture. No regressions. * fix(bug-4): outbound WS upgrade race — encoded events + ready signal + diagnostics Three layered fixes targeting the intermittent "outbound call connects but never receives the WS upgrade" failure (Twilio 11100 on answer) documented in BUGS.md. **Root cause A — StatusCallbackEvent encoding** Twilio expects ``StatusCallbackEvent`` as a multi-value parameter (repeated keys), NOT a space-separated single value. The previous ``'initiated ringing answered completed'`` form triggered Twilio notification 21626 ("invalid statusCallbackEvents") on every outbound call, and on some ingestion paths also broke the answer-handler webhook which is exactly the symptom that produced 11100. * TypeScript: use ``params.append('StatusCallbackEvent', evt)`` four times so URLSearchParams emits repeated query keys. * Python: pass the canonical twilio-python snake_case key ``status_callback_event`` as a list — twilio-python serialises it as the multi-value form Twilio expects. **Root cause B — server-not-yet-listening race** The previous ``phone.tunnelReady`` (TS) / ``phone.tunnel_ready`` (Py) signal resolves as soon as the cloudflared hostname is known, BEFORE the embedded HTTP / WS server has finished initialising. ``phone.call`` placed immediately afterwards races the Twilio Media Streams upgrade and produces a half-ready route → 11100. New ``phone.ready`` (TS Promise / Py Future) resolves only after: 1. Tunnel hostname known 2. Carrier auto-config complete 3. EmbeddedServer in ``listen`` state (TS) / uvicorn ``started`` flag set (Py) Outbound pattern is now: ```ts void phone.serve({ agent, tunnel: true }); await phone.ready; // <-- safe for outbound await phone.call(...); ``` ``tunnelReady`` is kept as a separate signal for integrations that only need the hostname (e.g. webhook registration), with a docstring note pointing at ``ready`` for outbound use. **Root cause C — opaque diagnostics** On call drop the user could not tell whether Twilio rejected the dial, the tunnel resolved late, or the WS upgrade failed. The new ``phone.call`` flow logs the Twilio notifications URL on every outbound call ("check here if the call drops with no audio") so self-diagnosis does not require learning the Twilio API. **Test parity** Updated ``test_twilio_statuscallback_always_registered`` to read the new ``status_callback_event`` key (with fallback to the legacy ``StatusCallbackEvent`` for forward compat). Python 1064 PASS / 7 skip, TypeScript 1163 PASS / 67 files. No regressions. * chore(docs): mintignore DEVLOG and superpowers/ to unblock Mintlify deployment DEVLOG.md and superpowers/specs/2026-04-24-patter-feature-test-notebook-design.md fail Mintlify's MDX parser (filenames begin with digits, which MDX treats as JSX expressions). Skip both paths so the docs site can deploy. * chore: drop DEVLOG/superpowers, fix CI failures - Remove docs/DEVLOG.md and docs/superpowers/ (internal planning notes, no value to public docs site). The .mintignore introduced in the previous commit is no longer needed and is removed too. - sdk-ts/src/client.ts: attach a no-op `.catch` to `_ready` and `_tunnelReady` so callers that never await them don't trigger Node's unhandled-rejection warning when serve() validates inputs synchronously. Awaiters of `phone.ready` / `phone.tunnelReady` still see the rejection. - sdk-ts/package-lock.json: add trailing newline (end-of-file-fixer). - examples/notebooks/**.ipynb: nbstripout pass — clear cell outputs and execution counts to match the repo convention enforced by .pre-commit-config.yaml.
nicolotognoni
added a commit
that referenced
this pull request
May 4, 2026
Bug #2 from the barge-in audit: on speakerphone / tunnel-loop deployments the agent's outbound TTS bleeds back into the mic. VAD sees that bleed as continuous voice-like energy and never transitions out of "speaking" state, so a caller interruption only registers during natural TTS pauses → "interrupt sometimes works, sometimes the agent keeps talking" intermittent symptom. Fix at the source — proper acoustic echo cancellation. NLMS adaptive filter (2048 taps @ 16 kHz, 128 ms history) subtracts an estimate of the TTS-derived echo from the mic stream before VAD/STT see it. Geigel double-talk detector freezes adaptation when the caller is speaking on top of the agent so the filter does not learn the user's voice as part of the channel response. Convergence on the synthetic narrowband test signal: - ~24 dB ERLE after 1 s of TTS-only training - Near-end speech preserved within 0 dB during double-talk Not a drop-in replacement for WebRTC AEC3 (state-of-the-art needs adaptive sub-band processing + comfort noise + nonlinear post-filter that this scope does not cover). For production-grade quality, wrap a binding to ``webrtc-audio-processing-2`` externally. - libraries/python/getpatter/audio/aec.py — NlmsEchoCanceller class. - libraries/typescript/src/audio/aec.ts — TS parity. - Agent.echo_cancellation / AgentOptions.echoCancellation — opt-in flag, default false. Handset / headset deployments don't need it and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played. - PipelineStreamHandler.start() (Py) / StreamHandler.initPipeline (TS) instantiate the canceller when the flag is on. Far-end tap fires before the carrier transcode in synthesizeSentence; near-end runs after the inbound 8k→16k resample, before VAD. - 8 unit tests per SDK covering convergence, double-talk preservation, construction validation, pass-through-before-priming, reset, empty buffers. Tests: Py 1574 passed (+8), TS 1236 passed (+8), tsc clean.
nicolotognoni
added a commit
that referenced
this pull request
May 8, 2026
…wave (#83) * chore: scrub competitor lineage + bug fixes + phone preamble Repo-wide pass to remove external license headers, "ported from" notes and competitor product names from source files, plus three runtime fixes and one missing best-practice feature surfaced by the audit. ## Cleanup (zero residual livekit/pipecat/apache references) - Removed Apache 2.0 header blocks from 12 Python + 12 TypeScript provider files (the headers travelled in from external ports; Patter ships under the root MIT LICENSE only — no per-file copyright notices). - Stripped "Adapted from livekit-plugins-X" / "Ported from pipecat" / "Based on LiveKit Agents" provenance comments across ~40 source files in sdk-py/getpatter/{services,providers,observability,resources,evals}/ and sdk-ts/src/{services,providers,observability}/, including the cartesia-stt USER_AGENT integration tag. - Rewrote competitor framing in 12 docs MDX pages (provider docs, patter-tool, call-logging) — descriptions now stand on Patter's own shape, no migration-from-X language. - Renamed test fixtures and variables that named LiveKit/Pipecat in sentence_chunker tests (Py + TS) and the parity scenario JSON; test logic preserved. - Removed personal-name copyright in LICENSE / sdk-py/LICENSE / sdk-ts/LICENSE in favour of "Patter Contributors". - .gitignore: ignore .ruff_cache/, sdk/ (legacy build dir from the pre-rename Python SDK), .agents/, skills-lock.json. ## Bug fixes - llm_loop.py:420-421 (Python): cache_read_input_tokens / cache_creation_input_tokens were Anthropic-style keys, but every Python provider emits cache_read_tokens / cache_write_tokens. Fix reads the keys the providers actually emit, so OpenAI / Google cache attribution is no longer silently zeroed. - llm-loop.ts:304-308 (TS): non-OK upstream HTTP responses were logged and silently swallowed; callers couldn't distinguish empty model output from API failure. Now throws PatterConnectionError with the status + truncated body. ## Performance - text_transforms.py: precompiled the 14 markdown regex patterns and 2 emoji-cleanup helpers as module-level constants — they previously recompiled on every sentence flush. Drop-in win, public API and 37/37 existing tests unchanged. ## Feature: default phone-call preamble - New Agent.disable_phone_preamble (Py) / disablePhonePreamble (TS) field, default False. When False, LLMLoop prepends a short spoken-language preamble to system_prompt instructing the model to avoid markdown / emojis / bullet lists and keep replies concise. - Wired through stream_handler and test_mode in both SDKs. - Adds two Py tests and one TS test covering the new behaviour. ## Test status - Python: 1466 passed, 8 skipped - TypeScript: 1164/1164 passed * chore(env): add per-SDK .env.example, drop obsolete cloud variant - sdk-py/.env.example, sdk-ts/.env.example: full inventory of every env var the SDK reads at runtime, grouped by role (telephony, LLM providers, STT, TTS, tracing, Patter tunables). Only OPENAI_API_KEY + a telephony carrier is required; the rest are uncommented as the user enables specific provider integrations. - .env.example.cloud removed — variables (PATTER_DATABASE_URL, PATTER_ENCRYPTION_KEY, PATTER_REDIS_URL, etc.) belonged to the hosted cloud surface that was retired in 0.5.3. - Root .env.example kept as a minimal quickstart sample. * refactor(pricing): introduce PricingUnit enum Replace the magic strings ``"minute"`` / ``"1k_chars"`` / ``"token"`` sprinkled across DEFAULT_PRICING with a named enum, so the pricing table reads as a typed shape rather than free-form dicts. - Python: ``PricingUnit(StrEnum)`` — ``MINUTE``, ``THOUSAND_CHARS``, ``TOKEN``. Subclassing ``str`` keeps the dict JSON-serialisable and unchanged for any consumer that compares against the literal string. - TypeScript: ``PricingUnit`` const object + ``PricingUnitValue`` union type. ``ProviderPricing.unit`` accepts ``PricingUnitValue | string`` so user overrides loaded from JSON / env config still flow through ``mergePricing`` without type gymnastics. - Behaviour preserved end-to-end: 143 Python pricing/metrics tests pass, 18 TypeScript pricing tests pass, full suites 1466 Py / 1164 TS green. * refactor: reorganize as libraries/{python,typescript}; drop in-repo examples mcp-use-style monorepo layout: each SDK gets its own library folder with README, CLAUDE.md, .env.example, tests/, and the package source. Sample code is maintained in separate example repos and is no longer tracked here (notebooks tutorial preserved — it's the documentation, not an example). ## Layout ``` libraries/ ├── python/ (was sdk-py/) │ ├── README.md, CLAUDE.md, LICENSE, .env.example │ ├── pyproject.toml, pytest.ini │ ├── getpatter/ │ └── tests/ └── typescript/ (was sdk-ts/) ├── README.md, CLAUDE.md, LICENSE, .env.example ├── package.json, tsconfig.json, vitest.config.ts, tsup.config.ts ├── src/ └── tests/ ``` ## What changed - 405 ``git mv`` renames so history follows every file. ``sdk-py/`` and ``sdk-ts/`` no longer exist on disk. - Per-library CLAUDE.md guides (~40 lines each); .gitignore exception ``!libraries/*/CLAUDE.md`` so the library guides ARE tracked while the root guide stays ignored. - CI: ``.github/workflows/{audit,release,test,docs-feature-drift}.yml`` rewritten to the new paths. ``scripts/check_feature_docs_drift.py`` also fixed (it had a stale ``patter/__init__.py`` from the pre-rename era). - Pre-commit, pre-push, ``scripts/pr-validate.sh``, top-level README and CONTRIBUTING.md re-pointed at ``libraries/{python,typescript}``. - Internal package re-organisation (``handlers → telephony``, splitting ``audio/``, ``tools/``) deliberately deferred to a follow-up PR — that layer of import-path churn doesn't belong in the same commit as the outer rename. ## Examples ``examples/{developer,enterprise,startup,integrations}/`` removed (24 files + the index README). Sample code is published in dedicated repos. ``examples/notebooks/`` kept — it's the 24-notebook tutorial series documented in the Mintlify site and depended on by ``.github/workflows/notebooks.yml`` and ``.pre-commit-config.yaml``. PatterTool docs now point at the external example repo (TODO comment left for the canonical URL — to fill in once the public examples repo is public). ## Test status - Python: 1413 passed, 6 skipped (pytest libraries/python/tests) - TypeScript: 1164 passed, 67 files (vitest run libraries/typescript) - TypeScript: ``tsc --noEmit`` clean (one pre-existing ``@ts-expect-error`` in silero-vad — predates this branch) * refactor(types,providers): enum-ify config + tighten Agent.provider type Wave 2 of the cleanup pass — covers half of the provider integrations. Replaces hardcoded model/voice/format/sample-rate strings with typed enums (Python ``StrEnum`` / ``IntEnum``, TypeScript ``const`` objects + union types) so user code gets autocomplete and the type system catches typos at the call site instead of at the provider's HTTP 400. ## Agent / public types - ``Agent.provider`` (Python) tightened from ``str`` to a ``ProviderMode = Literal["openai_realtime", "elevenlabs_convai", "pipeline"]`` alias. TS counterpart was already a string union. - Expanded ``Agent`` (Py) and ``AgentOptions`` (TS) docstrings to document the precedence rule for fields that appear both on the agent and on the engine adapter (``voice``, ``model``, ``language``): explicit kwarg on ``agent()`` wins; otherwise the engine value populates the agent via ``_unpack_engine``; otherwise the default. - No behaviour change. ``StrEnum`` subclasses ``str``; existing callers passing raw strings keep working. ## Providers covered Python: ``anthropic_llm``, ``cartesia_tts``, ``cerebras_llm``, ``deepgram_stt``, ``elevenlabs_tts``, ``google_llm``, ``groq_llm``, ``lmnt_tts``, ``openai_realtime``, ``rime_tts``. TypeScript: ``anthropic-llm``, ``cerebras-llm``, ``deepgram-stt``, ``elevenlabs-tts``, ``google-llm``, ``groq-llm``, ``lmnt-tts``, ``openai-realtime``, ``rime-tts``. Each module now exposes its own ``<Provider>Model`` / ``<Provider>OutputFormat`` / ``<Provider>Voice`` / etc. New enums are re-exported from ``__init__.py`` and ``index.ts`` in dedicated "provider-specific enums" sections. ## Still pending The following providers still hold magic strings — covered in a follow-up commit: ``assemblyai_stt``, ``soniox_stt``, ``speechmatics_stt``, ``cartesia_stt``, ``telnyx_stt``, ``whisper_stt``, ``elevenlabs_ws_tts``, ``openai_tts``, ``telnyx_tts``, ``gemini_live``, ``ultravox_realtime``, ``silero_vad``, ``silero_onnx``, ``krisp_*``. The TS ``cartesia-tts.ts`` enums also still need to land (Py is already covered). ## Test status - Python: 1466 passed, 8 skipped - TypeScript: 1164/1164 passed; ``tsc --noEmit`` clean (one pre-existing silero-vad warning unchanged) * refactor(providers,server): finalize provider enums + bug-fix wave Provider enum residuals (Wave 2.5) - Python: assemblyai_stt, cartesia_stt, soniox_stt, speechmatics_stt, telnyx_stt, whisper_stt, elevenlabs_ws_tts, openai_tts, telnyx_tts, gemini_live, ultravox_realtime, silero_vad, silero_onnx, krisp_* - TypeScript: assemblyai-stt, cartesia-stt, cartesia-tts, soniox-stt - All hardcoded model/voice/format strings now live behind StrEnum/IntEnum (Python) or const-object + value union (TypeScript) Bug fixes (Wave 3a) - stream_handler: barge-in now sets asyncio.Event / AbortController to cancel in-flight LLM stream instead of letting it run to completion - server (Py): SSRF validator on outbound webhook URLs + per-IP WS cap (MAX_WS_PER_IP=10) for parity with TS - server (Py): voicemail POST gets explicit 10s timeout - metrics (Py): agent_response_ms accepts 0.0 instead of treating it as "missing" (use is None gate) - metrics (TS): emit llm/stt/tts TTFB events on the event bus - observability/event_bus (Py): listener errors now surface to logger instead of being swallowed - server (TS): queryTelephonyCost catch logs instead of silent return * feat(errors): add ErrorCode enum to exception taxonomy Stable, machine-readable error codes attached to every Patter exception class. Existing class-name-based catches keep working; the enum is additive. ErrorCode values (10): CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT, WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION, INTERNAL. - Python: StrEnum on `exceptions.py`; class-default `code` attribute per subclass (PatterError → INTERNAL, PatterConnectionError → CONNECTION, AuthenticationError → AUTH, ProvisionError → PROVISION, RateLimitError → RATE_LIMIT). Optional `code=` kwarg on the base ctor lets callers override per-instance. - TypeScript: const-object + value union in `errors.ts`; `readonly code: ErrorCode` on every class; optional `{ code }` constructor option. Same class→code mapping byte-for-byte with Python. - Both SDKs re-export `ErrorCode` from the package root. - Test parity asserts the enum value sets match between SDKs. * feat(errors): wire ErrorCode enum into exceptions module + package roots Companion to 8b8c503 (test files). Ships the actual enum + class wiring: - libraries/python/getpatter/exceptions.py — ErrorCode StrEnum, default .code per subclass, optional code= kwarg on PatterError.__init__ - libraries/python/getpatter/__init__.py — re-export ErrorCode - libraries/typescript/src/errors.ts — ErrorCode const-object + value union, readonly code on every class, optional { code } ctor option - libraries/typescript/src/index.ts — re-export ErrorCode * perf(elevenlabs-ws-tts): auto-flip output_format=ulaw_8000 when paired with Twilio ElevenLabs WS TTS streams `ulaw_8000` natively. When the carrier is Twilio (mulaw 8 kHz), we can let ElevenLabs do the encoding server-side and skip the SDK-side mulaw transcode entirely. - ElevenLabsWebSocketTTS.set_telephony_carrier(carrier) / TS setTelephonyCarrier(carrier) — duck-typed hook called by the stream handler after TTS construction. Maps "twilio" → "ulaw_8000", "telnyx" → "pcm_16000" (lowest conversion). - output_format constructor arg becomes truly optional (sentinel) — user-passed format wins over the carrier hint. - for_twilio / for_telnyx factories already pass explicit formats → the carrier hint is a no-op for those callers. - 7 new unit cases per SDK in TestCarrierAutoFlip / equivalent: default flip, URL contains ulaw_8000, telnyx no-op, explicit format respected, factory wins, unknown carrier no-op. No public-API break — existing constructor calls behave identically when no carrier hook is wired up. * perf(openai-tts): add direct 24k→8k resample path (opt-in via target_sample_rate) OpenAI TTS streams 24 kHz audio. The default 24k→16k resample stays for the Telnyx (PCM 16 kHz) carrier; for Twilio (mulaw 8 kHz) the chained 24→16 + 16→8 used to cost two ratecv passes. New `target_sample_rate=8000` constructor opt-in collapses the two passes into a single 3:1 decimation with a tighter LPF (Nyquist ≈ 4 kHz). - Python: getpatter.services.transcoding.create_resampler_24k_to_8k() factory; OpenAITTS gains optional `target_sample_rate=16000` (default preserves existing behaviour). - TypeScript: createResampler24kTo8k() + 24000→8000 case in StatefulResampler; OpenAITTS gains optional positional `targetSampleRate=16000` with `LPF_ALPHA_8K=0.45` for proper anti-aliasing at 4 kHz Nyquist. Auto-engagement on Twilio carriers is deferred — the audio sender currently assumes 16 kHz PCM input. Manual opt-in keeps the change narrowly scoped. * feat(sentence-chunker): per-language honorifics + single-word flush Bug #48 — per-language honorifics - New HONORIFICS_{EN,IT,ES,DE,FR,PT} constants merged into HONORIFICS_ALL (sorted longest-first). Module-level HONORIFICS_REGEX_ALT alternation built once. The aggregation is union-of-all regardless of `language` (mixed-language deployments are common; safer default). - splitSentences prefix regex sources from the union — sentences like "Ho incontrato il Sig. Rossi alla riunione" no longer split mid-honorific in any of the supported languages. Bug #49 — single-word "Yes." never flushed - DEFAULT_MIN_WORDS_FOR_SHORT_FLUSH lowered from 2 → 1; single-word replies ending in "."/"!"/"?" now flush on the terminator. - New gate #6 in maybeShortFlush blocks flushes whose trailing word is a known honorific — prevents "Mr." / "Sig." escaping as a standalone sentence. - Legacy escape hatch: pass `minWordsForShortFlush=2` to restore the pre-fix behaviour. Tests: 22 Python + 21 TS new honorific cases; 12 + 12 single-word flush cases. Existing tests updated where they asserted the old buffered behaviour for single-word replies. Both suites green (Py 1538, TS 1224). * docs(changelog,chore): unreleased section + tool_executor docstring + silero-vad lint - CHANGELOG.md: comprehensive Unreleased section covering reorg, provider enums, error taxonomy, bug-fix wave, perf wins, and cleanup work landing on this branch. - tool_executor.py: add module-level docstring describing the SSRF guard, response-size cap, and OTel span emission. - silero-vad.ts:127: replace stale @ts-expect-error directive (now a TS2578 warning since onnxruntime-node types resolve at build) with a plain comment explaining the optional-peer-dep dynamic import. * refactor(layout,py): split internal layout into telephony/, audio/, tools/ Internal restructure of the Python SDK; PUBLIC API surface unchanged. - handlers/{twilio,telnyx,common}_handler.py → telephony/{twilio,telnyx,common}.py ("_handler" suffix dropped — the parent module name already conveys intent). stream_handler.py promoted out of handlers/ to package root since it's the per-call orchestrator, not a telephony adapter. handlers/ folder removed. - services/{transcoding,pcm_mixer,background_audio}.py → audio/* (audio pipeline collected in one place). - services/{tool_decorator,tool_executor}.py → tools/* (tool-decoration & webhook-execution kept together). - Other services/* stay where they are: llm_loop, metrics, sentence_chunker, text_transforms, ivr, fallback_provider, pipeline_hooks, chat_context, call_log, remote_message. - tts/ and stt/ namespaces kept — they expose getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve and are public surface. - File moves use git mv so blame/history follow. - Imports rewritten across providers, server, services, tests, and package-root re-exports. Python tests: 1538 passed. TS side ships in a separate commit. * refactor(layout,ts): mirror Python telephony/, audio/, tools/ layout TS internal restructure for parity with the Python d5d9391 commit. Public API surface unchanged. - carriers/{twilio,telnyx}.ts → telephony/{twilio,telnyx}.ts (rename for naming parity with Py; "carrier" was the original term, "telephony" reads better next to twilio/telnyx). - transcoding.ts → audio/transcoding.ts. - services/background-audio.ts → audio/background-audio.ts. - tool-decorator.ts → tools/tool-decorator.ts. - Imports rewritten across client, index, types, stream-handler, deepfilternet-filter, plus 5 test files. TS tests: 1224 passed, tsc --noEmit clean. The telephony/audio/tools triad now matches between Python and TypeScript SDKs. * docs(claude.md): reflect new telephony/audio/tools layout Update per-library AI-agent quickstarts to match the post-restructure package tree. Adds the new folder names (telephony/, audio/, tools/) and a one-line description per folder. * docs(changelog): document internal layout reorg in Unreleased section * docs(getpatter): fill missing docstrings in services/llm/tts/stt/observability/dashboard/top-level * docs(getpatter): fill missing docstrings in providers/telephony/audio/tools Adds 1-3 line docstrings to public symbols (modules, classes, methods) in libraries/python/getpatter/{providers,telephony,audio,tools} that previously had none. No behaviour changes; pre-existing docstrings are left untouched. * docs(getpatter-ts): fill missing JSDoc in providers/telephony/audio/tools * docs(getpatter-ts): fill missing JSDoc in services/llm/tts/stt/observability/dashboard/top-level Adds short JSDoc summaries to public classes, interfaces, type aliases, and exported functions that were missing them. Existing JSDoc is preserved verbatim — this is a fill-the-gaps pass only, no rewrites. * docs(changelog): note docstring/JSDoc sweep in Unreleased * chore(comments): update sdk-py/sdk-ts path refs after libraries/ reorg Mechanical replace of stale path strings in docstrings, comments, and .env.example headers: - sdk-ts/src/* → libraries/typescript/src/* - sdk-py/getpatter/* → libraries/python/getpatter/* - conceptual "(sdk-py)" → "the Python SDK" No behaviour change; tests still 1538 passed, tsc clean. * chore(ts): remove playwright e2e tests + devDeps The e2e Playwright suite (tests/e2e/*.spec.ts + playwright.config.ts + @playwright/test / playwright devDeps) were inherited from an earlier "comprehensive test suite" PR but never integrated with downstream test infra after the libraries/ reorg. Per CLAUDE.md, end-to-end call testing lives in a separate downstream test repo. - Drop libraries/typescript/playwright.config.ts. - Drop libraries/typescript/tests/e2e/ (6 .spec.ts + test-server.ts). - Remove @playwright/test and playwright from package.json devDeps. - Refresh package-lock.json (npm install). - silero-vad.ts: switch back to @ts-ignore on the optional onnxruntime-node import — the dynamic-import line surfaces a TS7016 warning when types are unresolved post-lock-refresh. * feat(parity): port DefaultToolExecutor, LLMChunk, builtin_clip_path, select_sound_from_list, resample_24k_to_16k from TypeScript SDK Closes 5 public-surface gaps in the Python SDK so every symbol exported from libraries/typescript/src/index.ts now has a Python counterpart. - ``DefaultToolExecutor`` — async tool dispatcher with retry/fallback, webhook SSRF validation via ``server.validate_webhook_url``, and the same JSON error shape as the TypeScript class. Added to ``services/llm_loop.py``. - ``LLMChunk`` — frozen dataclass mirror of the TS ``LLMChunk`` interface (text/tool_call/done/usage). ``to_dict()`` produces the same shape as ``OpenAILLMProvider.stream`` for callers that prefer dicts. - ``builtin_clip_path`` — top-level helper resolving ``BuiltinAudioClip`` values (or raw filenames) to absolute paths. ``BuiltinAudioClip.path`` now delegates to the new function for a single source of truth. - ``select_sound_from_list`` — promoted from a private static method on ``BackgroundAudioPlayer`` to a public top-level function. The static method is preserved as a backward-compatible delegator. - ``resample_24k_to_16k`` — stateless one-shot helper following the existing ``resample_8k_to_16k`` / ``resample_16k_to_8k`` convention, including the per-process ``DeprecationWarning`` latch. All five symbols are re-exported from ``getpatter.__init__`` and listed in ``__all__``. The five ``TODO(parity)`` markers are removed in the same commit. 25 unit tests added in ``tests/unit/test_parity_ports.py`` covering public-symbol reachability, ``LLMChunk`` round-trip, real handler/webhook dispatch through ``DefaultToolExecutor`` (including the SSRF-blocked branch), bundled clip resolution, weighted selection empirics, and equivalence of ``resample_24k_to_16k`` to a single-shot ``StatefulResampler``. Tests: 1546 → 1571, all passing. * docs: document ErrorCode enum and disable_phone_preamble * docs: document PricingUnit, OpenAITTS targetSampleRate, WS carrier auto-detect, and LLM primitives * docs: document SentenceChunker language, provider enums, and audio helpers * fix(ci): defer silero_onnx imports + remove dead E2E workflow CI failures on PR #83: 1. Python SDK Tests (3.11/3.13) and Security Tests fail with ModuleNotFoundError: No module named 'numpy'. Cause: a recent commit eagerly re-exported OnnxExecutionProvider and SileroOnnxSampleRate from getpatter/__init__.py, which transitively imports numpy via silero_onnx. Move the re-export into the existing __getattr__ lazy path (parallel with SileroVAD / KrispSampleRate) so importing getpatter no longer requires the silero extra. 2. E2E Tests job tries to run playwright on a folder we deleted in 47a97f0. Drop the job — E2E lives in a downstream test repo. Local verify: `python3 -c 'import getpatter'` works with numpy blocked via meta_path; pytest still 1563 passed. * release: 0.6.0 Bump 0.5.5 → 0.6.0 for the cleanup + restructure release. Minor bump because of two breaking changes: 1. Agent.provider is now a closed enum (ProviderMode); arbitrary strings error. 2. Internal import paths reorganised — callers reaching into getpatter.handlers.* / getpatter.services.{transcoding,pcm_mixer, background_audio,tool_decorator,tool_executor} must migrate to the new telephony/, audio/, tools/ folders. Public top-level imports (`from getpatter import Patter`) are unchanged. Migration guidance and the full surface diff live in CHANGELOG.md under the 0.6.0 section. - libraries/python/pyproject.toml: 0.5.5 → 0.6.0 - libraries/python/getpatter/__init__.py: __version__ = "0.6.0" - libraries/typescript/package.json: 0.5.5 → 0.6.0 - CHANGELOG.md: rename Unreleased section to 0.6.0 (2026-05-03) * fix(silero-vad): align defaults with upstream Silero VAD Three SileroVAD.load defaults were tuned for telephony in an earlier pass, but they diverged from the upstream Silero defaults documented in snakers4/silero-vad utils_vad.py. The upstream values are conservative and well-vetted; align with them so callers who follow the official Silero docs get consistent behaviour. | param | old (telephony-tuned) | upstream Silero | new | |------------------------|-----------------------|-----------------|----------| | min_speech_duration | 0.05 s (50 ms) | 250 ms | 0.25 s | | min_silence_duration | 0.55 s (550 ms) | 100 ms | 0.10 s | | prefix_padding | 0.5 s (500 ms) | 30 ms | 0.03 s | Activation threshold (0.5) and the derived deactivation threshold (max(t-0.15, 0.01) ≈ 0.35) already matched upstream and stay unchanged. Both SDKs match byte-for-byte; no test references the prior literals. Tests: Py 1563 passed, TS 1224 passed, tsc --noEmit clean. Callers that previously relied on the telephony-tuned defaults can restore them explicitly via `SileroVAD.load(min_speech_duration=0.05, min_silence_duration=0.55, prefix_padding_duration=0.5)` (Py) or the `minSpeechDuration: 0.05, minSilenceDuration: 0.55, prefixPaddingDuration: 0.5` options (TS). * feat(silero-vad): auto-VAD in pipeline mode + forPhoneCall preset + robust ONNX path Three changes that make SileroVAD usable out of the box. Closes the "have to recreate VAD every time" complaint and removes the createRequire(import.meta.url).resolve("getpatter") workaround callers needed under bundlers that break import.meta.url. 1. Auto-VAD in pipeline mode - PipelineStreamHandler auto-loads SileroVAD.for_phone_call (Py) / SileroVAD.forPhoneCall (TS) when agent.vad is not provided. - Falls back silently to the STT-endpoint heuristic when the silero extra / onnxruntime-node is not installed. - No opt-out flag — auto-VAD is a strict upgrade over the heuristic when the optional dep is available. 2. forPhoneCall / for_phone_call factory - Convenience wrapper around load(...) that pre-applies the telephony preset (sample_rate=16000, min_silence_duration=1.0). - Pass overrides for noisy environments (e.g. minSilenceDuration=1.5 for tunnel + speakerphone echo). 3. Robust ONNX model resolution (TS) - silero-vad.ts now probes 4 anchor candidates (__dirname, import.meta.url, createRequire(import.meta.url).resolve("getpatter/ package.json"), createRequire(cwd).resolve) crossed with 3 path shapes (<dir>/resources/, <dir>/../resources/, <dir>/dist/ resources/). Mirrors the user-side workaround directly inside the SDK so callers stop needing it. Tests: Py 1563 passed, TS 1224 passed, tsc --noEmit clean. * fix(tunnel): block phone.ready until tunnel hostname is publicly resolvable Outbound calls placed immediately after `phone.ready` would race the public DNS edge: cloudflared returns the trycloudflare.com URL the moment its control plane has issued it, but the edge can take several seconds to start serving the hostname. Twilio (and any webhook caller) gets HTTP 502 "Unknown host" and the call is torn down before ever reaching the WS media stream. `phone.ready` now blocks until: 1. Embedded server is in `listen` state (existing behaviour) 2. Tunnel hostname resolves through public DNS (1.1.1.1 / 8.8.8.8) 3. 2.5 s grace window passes for the cloudflared origin bridge DNS resolution bypasses the OS resolver to dodge macOS mDNSResponder's aggressive NXDOMAIN cache. Why DNS-only and not full HTTP probing: trycloudflare quick tunnels frequently fail same-host loopback (NAT hairpinning / IPv4 vs IPv6 race) even when the URL is reachable from external hosts. Twilio's edge resolves via public DNS, so DNS resolution is the correct proxy for "Twilio can reach us". - Python: raw UDP DNS query parser (~30 lines, no new dependency). Smoke-test verified against cloudflare.com (returns IP) and non-existent host (returns None). Fallback 1.1.1.1 → 8.8.8.8. - TypeScript: Node's `dns.Resolver` with custom servers (3 lines, c-ares built-in already bypasses OS cache). - Static / explicit-webhookUrl callers skip the probe (the operator already knows the host is up). - 30 s total timeout with exponential backoff (250 ms → 2 s capped). Cleanup pass on the surrounding docstrings: aligned the Py rationale with the TS one (the previous Py docstring incorrectly claimed an HTTP /health probe). Both now explain DNS-only + grace as a single coherent design choice. Tests: Py 1563 passed, TS 1224 passed, tsc --noEmit clean. * fix: pipeline-mode hardening — assemblyai handshake race, barge-in trailing chunk, tunnel resolver budget, alarm fatigue cleanup Five small but customer-visible fixes uncovered while running outbound pipeline tests on this branch. 1. AssemblyAISTT.sendAudio: silently drop audio on WS not OPEN - Mirrors Deepgram / Cartesia / Soniox / Telnyx parity. Twilio starts streaming media frames immediately on call connect; the first ~10–25 frames (200–500 ms) race the AssemblyAI WS handshake. The previous `throw` propagated out of `handleAudio` and killed the call. Lost frames during connect are acceptable — the user is still saying "Hello" — and the connect path retries on close. - Architectural race in server.ts (handleAudio fires concurrently with handleCallStart) is flagged out-of-scope; this fix is the symptom-level guard. 2. afterSynthesize hook + barge-in race - Both Py and TS pipelines re-check `isSpeaking` after the `afterSynthesize` hook returns. The hook's await yields long enough for the VAD path to fire BARGE-IN and flip `isSpeaking` to false; without the re-check, exactly one trailing TTS chunk (~20–100 ms of audio) raced past the cancel and prolonged the perceived "agent didn't stop" window. 3. SileroVAD.for_phone_call / forPhoneCall — match upstream defaults - Stop pinning min_silence_duration=1.0s. The factory now only pins sample_rate=16000 (the only rate Patter's pipeline-mode bus uses); every other parameter mirrors snakers4/silero-vad upstream defaults. Docstring documents the override path (min_silence_duration=0.5–1.0 s) for deployments that experience truncation on natural pauses. 4. Tunnel reachability — c-ares budget fix + 60s ceiling - `Resolver({ timeout: 1500, tries: 1 })` overrides c-ares's default 5000 ms × 4 = up to 20 s per resolve4 call so the outer retry loop actually retries. Without this each NXDOMAIN burned 5–20 s of wall-clock and the budget ran out after 1–2 attempts. - Total budget raised 30 s → 60 s for slow Cloudflare propagation. 5. Stale "Pipeline mode without VAD" warning removed (Py + TS) - The warning fired even now that auto-VAD lands a working SileroVAD when onnxruntime is installed. Operators saw a scary warning AND a successful auto-VAD log on every call — alarm-fatigue territory. The auto-VAD path already logs a single accurate message in the rare case the load fails, so the server-level warning is pure noise. Tests: Py 1563 passed, TS 1224 passed, tsc --noEmit clean. * fix(llm-loop): cancel in-flight LLM stream on barge-in (architectural) Barge-in used to set llmAbort/llm_cancel_event but the upstream provider fetch was never aborted — the loop only checked ``signal.aborted`` between tokens. With no token arriving (fetch blocked on the network response), the loop sat blocked until the provider's own 30 s timeout. Symptom: after the user interrupted the agent, the next utterance was queued but never processed because ``transcriptProcessing`` / the equivalent Py guard stayed true until the original LLM fetch timed out — agent stayed silent up to 30 s. This commit plumbs the per-turn cancel signal end-to-end: TypeScript - New ``LLMStreamOptions { signal?: AbortSignal }`` shape. - ``LLMProvider.stream`` accepts ``opts?: LLMStreamOptions``; ``LLMLoop.run`` accepts ``opts`` and forwards it. - Built-in ``OpenAILLMProvider`` and the four standalone providers (cerebras, anthropic, groq, google) now combine ``opts.signal`` with their existing 30 s timeout via ``AbortSignal.any([...])`` so a barge-in tears the fetch down immediately. - ``runPipelineLlm`` passes ``{ signal: llmSignal }`` into ``llmLoop.run``. Python - ``LLMProvider.stream`` accepts ``cancel_event: asyncio.Event | None``; ``LLMLoop.run`` forwards it. - Built-in ``OpenAILLMProvider``, plus ``cerebras_llm``, ``anthropic_llm``, ``google_llm`` now check the event between upstream chunks and short-circuit (``await response.close()`` / break out of ``async with`` / ``return``). - ``PipelineStreamHandler`` passes ``self._llm_cancel_event`` into ``llm_loop.run``. Backward compat: every parameter is keyword-only with a default of None; existing callers keep working. Test mocks updated with ``**_kwargs`` so they accept the new kwarg without rewrites. Tests: Py 1563 passed (10 mock signatures patched), TS 1224 passed, tsc --noEmit clean. * fix(llm-loop,ts): mergeAbortSignals polyfill — Node 18 compat The previous commit's signal-merge used AbortSignal.any() unconditionally, but that API only landed in Node 20.3. Patter's engines.node says ">=18.0.0" so users on Node 18.x would have crashed with "AbortSignal.any is not a function" on the first LLM call — worse-than-original-bug regression introduced by the cancel fix. Add a small ``mergeAbortSignals(...signals)`` helper exported from ``llm-loop.ts``: - Falls through to ``AbortSignal.any(filtered)`` when available (Node 20.3+, browsers). - Polyfills via aggregating ``AbortController`` + ``addEventListener ('abort')`` listeners on Node 18 / older runtimes. - Single-signal inputs short-circuit to the input itself (no allocation). All five LLM provider stream sites (built-in OpenAI in llm-loop.ts + the four standalones cerebras / anthropic / groq / google) now call ``mergeAbortSignals(opts?.signal, AbortSignal.timeout(30_000))`` which behaves identically on Node 20.3+ and gracefully on Node 18. Tests: TS 1224 / lint clean. Py 1563. * fix(client): clear tunnel-owned webhookUrl on disconnect() Bug reproduced today against the published 0.6.0 dist: 1. Patter.serve() starts a cloudflared tunnel and stores the freshly- minted hostname in localConfig.webhookUrl so subsequent call()s in the same process resolve to the right host. 2. Patter.disconnect() stops the tunnel handle and embedded server but never clears localConfig.webhookUrl. 3. Plugins / integrations that wrap Patter often dispose+rebuild on agent-identity changes via disconnect() → serve(). On the second serve() the stale webhookUrl is still set AND the constructor still wants a Cloudflare tunnel → the guard "Cannot use both tunnel: true and webhookUrl. Pick one." throws and the plugin breaks. Fix tracks ownership of the webhookUrl: a new private flag ``tunnelOwnsWebhookUrl`` (TS) / ``_tunnel_owns_webhook_url`` (Py) goes true the moment serve() pulls the hostname out of the tunnel handle. disconnect() clears the field IF AND ONLY IF the flag is set, leaving explicit / Static-tunnel hostnames in place. It also drops the ``ready`` / ``tunnel_ready`` deferreds so a follow-up serve() recreates them fresh — without this the next ``await phone.ready`` resolved with the previous lifecycle's hostname. - libraries/typescript/src/client.ts: tunnelOwnsWebhookUrl flag, set in serve() after tunnel start, cleared in disconnect() along with the Promise pair (re-derived with pre-resolve when an explicit webhookUrl is still configured). - libraries/python/getpatter/client.py: parity port. Frozen-dataclass config gets ``replace(webhook_url="")``; deferreds are cleared to None so the lazy ``ready`` / ``tunnel_ready`` properties recreate them on next access. Tests added (5 Py + 4 TS): - disconnect clears the tunnel-owned webhookUrl - explicit webhookUrl is preserved across disconnect - disconnect is idempotent - ready / tunnelReady are fresh handles after disconnect - _server reference is null after disconnect Suites: Py 1566 (+3), TS 1228 (+4), tsc clean. * feat(audio): NLMS acoustic echo cancellation (opt-in) Bug #2 from the barge-in audit: on speakerphone / tunnel-loop deployments the agent's outbound TTS bleeds back into the mic. VAD sees that bleed as continuous voice-like energy and never transitions out of "speaking" state, so a caller interruption only registers during natural TTS pauses → "interrupt sometimes works, sometimes the agent keeps talking" intermittent symptom. Fix at the source — proper acoustic echo cancellation. NLMS adaptive filter (2048 taps @ 16 kHz, 128 ms history) subtracts an estimate of the TTS-derived echo from the mic stream before VAD/STT see it. Geigel double-talk detector freezes adaptation when the caller is speaking on top of the agent so the filter does not learn the user's voice as part of the channel response. Convergence on the synthetic narrowband test signal: - ~24 dB ERLE after 1 s of TTS-only training - Near-end speech preserved within 0 dB during double-talk Not a drop-in replacement for WebRTC AEC3 (state-of-the-art needs adaptive sub-band processing + comfort noise + nonlinear post-filter that this scope does not cover). For production-grade quality, wrap a binding to ``webrtc-audio-processing-2`` externally. - libraries/python/getpatter/audio/aec.py — NlmsEchoCanceller class. - libraries/typescript/src/audio/aec.ts — TS parity. - Agent.echo_cancellation / AgentOptions.echoCancellation — opt-in flag, default false. Handset / headset deployments don't need it and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played. - PipelineStreamHandler.start() (Py) / StreamHandler.initPipeline (TS) instantiate the canceller when the flag is on. Far-end tap fires before the carrier transcode in synthesizeSentence; near-end runs after the inbound 8k→16k resample, before VAD. - 8 unit tests per SDK covering convergence, double-talk preservation, construction validation, pass-through-before-priming, reset, empty buffers. Tests: Py 1574 passed (+8), TS 1236 passed (+8), tsc clean. * fix(client,py): expose aggressive_first_flush / disable_phone_preamble / echo_cancellation in agent() builder Pre-existing parity violation surfaced during the AEC audit: the Py ``Patter.agent(...)`` builder enumerates kwargs explicitly, so any field not listed silently drops on the floor. Three boolean flags on the Agent dataclass — ``aggressive_first_flush``, ``disable_phone_preamble``, and the freshly added ``echo_cancellation`` — were unreachable through the builder, forcing users to construct ``Agent(...)`` directly. TS does not have this problem because ``agent(opts)`` spreads the whole ``AgentOptions`` object, so every field passes through. Add the three flags to the Py builder signature and forward them to ``Agent(...)``. Defaults match the dataclass (all ``False``) so existing callers keep their behaviour. 2 new tests: - builder defaults match dataclass defaults (no silent True leak) - explicit ``aggressive_first_flush=True`` / ``disable_phone_preamble= True`` / ``echo_cancellation=True`` reach the resulting Agent Tests: Py 1576 passed (+2), TS 1236 unchanged, tsc clean. * fix(audio): NLMS AEC — 512 taps + adaptive step for fast convergence Real cellular-call test on 0.6.0 with the initial 2048-tap + constant-0.1-step config exposed an 8–12 s convergence window during which the user's first turn was either over-cancelled to silence (filter eats voice while learning the channel) or contaminated by residual echo (Deepgram transcribes garbage and discards). The user report: ~11 s of perceived silence after firstMessage, then everything worked from turn 4 onward. Net first-turn UX was worse than no AEC. The architectural fix the user asked for ("source-level, no workaround, solid"): two NLMS hyperparameter changes that compress convergence into the first ~250 ms — the same window where the agent's firstMessage finishes playing. 1. **512 taps (was 2048)**: 4× fewer coefficients to converge with no measurable cancellation loss on cellular / VoIP paths whose RT60 stays under ~50 ms after the carrier's own echo suppression. Pass ``filter_taps=2048`` explicitly for landline hairpin loops where the tail extends beyond 32 ms @ 16 kHz. 2. **Adaptive step**: aggressive warm-up step (0.5) for the first 0.5 s of processed audio, then taper to the textbook 0.1 for steady-state tracking. The Geigel double-talk detector still gates updates so the larger step does not learn the caller's voice into the echo model. Verification: regression-test fed a broadband synthetic signal (3 sinusoids + white noise) in realistic 20 ms frames hits **17–19 dB ERLE in the very first 250 ms** with the new defaults — well above the previous 0 dB at the 1.25 s mark. - New constructor knobs: ``warmup_step_size`` (default 0.5), ``warmup_seconds`` (default 0.5). Step branch is constant within a frame so the inner sample-by-sample loop stays branch-free. - Validation extended for the two new fields. - ``reset()`` now clears the ``processed_samples`` counter so the warm-up window re-engages on filter reset. - 1 new regression test per SDK enforces the "≥10 dB ERLE in the first 250 ms with defaults" guarantee on a broadband signal. Tests: Py 1577 passed (+1), TS 1237 passed (+1), tsc --noEmit clean. * fix(stream-handler): firstMessage isSpeaking + AEC tap; barge-in-only ring flush Two fixes for the speakerphone "agent unresponsive on first turn / mid-call gets stuck after a few exchanges" symptom reported on 0.6.0 cellular tests. 1. firstMessage was bypassing beginSpeaking + AEC far-end tap The ``firstMessage`` block streamed TTS chunks directly to the carrier without (a) marking ``isSpeaking=true`` and (b) pushing each chunk to ``aec.pushFarEnd()``. Consequence on speakerphone: while the intro played, the self-hearing guard never engaged, the user's audio (mixed with TTS bleed) was forwarded to STT and produced garbage transcripts; AEC had no reference signal so the bleed survived in the inbound channel. Wraps the firstMessage TTS streaming loop in ``beginSpeaking()`` + ``try/finally { endSpeakingWithGrace() }`` and pushes each chunk to ``aec.pushFarEnd()`` before encoding for the carrier. Mirrors the per-turn behaviour of ``runPipelineLlm`` / ``_process_streaming_response``. 2. Ring buffer must NOT flush on natural turn end An earlier iteration also flushed ``inboundAudioRing`` from ``endSpeakingWithGrace`` so user audio captured during the agent's TTS that never tripped VAD would still reach STT. In practice this raced live STT input post-grace: the ring contained partially-cancelled echo (AEC still adapting) and possibly over-cancelled user voice (Geigel rho=0.6 misses quiet double-talk). Replaying on every turn produced phantom transcripts that confused the LLM and caused the "out of order responses + agent gets stuck" symptom the user observed mid-call. Reverted: flush only on real barge-in (where VAD confirmed user speech). Audio captured during the agent's turn that VAD did not classify as speech is intentionally dropped at the next ``beginSpeaking`` — the user can repeat themselves rather than have the LLM react to a stale phantom transcript. The barge-in flush remains: extracted into ``flushInboundAudioRing()`` / ``_flush_inbound_audio_ring()`` helpers (clean refactor, 1 caller now). Stale "2048 taps + 0.5–2 s convergence" log message updated to the post-AEC-tuning "512 taps + 0.5 s warmup μ=0.5 → ~250 ms convergence". Tests: Py 1577 passed, TS 1237 passed, tsc --noEmit clean. * fix(stream-handler): gate barge-in on minimum agent-speaking duration The previous fix wrapped the firstMessage TTS in ``beginSpeaking`` + ``endSpeakingWithGrace`` so the self-hearing guard could engage during the intro. This worked, but exposed a second defect: the AEC filter needs ~500 ms of TTS reference to converge, and during that warmup window residual TTS bleed in the inbound mic stream still looks like speech to VAD. With ``isSpeaking=true`` from frame zero of the firstMessage, the very first chunk of bleed triggered an immediate barge-in cancel — the firstMessage was killed before a single byte had been played. Test reported "agent never speaks". Fix: gate both barge-in entry points (VAD ``speech_start`` and transcript-based) on a 1-second minimum agent-speaking duration. Real users almost never start interrupting within the first second of an agent turn anyway, and the gate cleanly covers the AEC convergence period (500 ms warmup + safety margin). - TypeScript: ``MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN = 1000`` static on ``StreamHandler``. New ``speakingStartedAt: number | null`` field set in ``beginSpeaking()`` and cleared in ``cancelSpeaking()`` and the grace flip. New ``canBargeIn()`` helper used by both barge-in sites; suppressed events log at debug level so call-debug logs still show why the cancel did not fire. - Python: ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN = 1.0`` module-level constant. ``_speaking_started_at`` field with the same lifecycle. ``_can_barge_in()`` helper applied at the VAD speech_start path in ``on_audio_received`` and at the entry of ``_handle_barge_in``. Helper uses ``getattr`` so test fixtures that bypass ``_begin_speaking`` still permit barge-in to fire. 5 new unit tests (3 Py + 5 TS): - ``canBargeIn() / _can_barge_in()`` returns true with no active turn, false within the gate window, true past the gate window. - ``handleBargeIn / _handle_barge_in`` returns / does nothing during the warmup window, ``isSpeaking`` stays True. - ``handleBargeIn / _handle_barge_in`` fires normally past the gate. Tests: Py 1579 passed (+2), TS 1242 passed (+5), tsc --noEmit clean. * fix(stream-handler): warn that NLMS AEC is wrong for PSTN; keep it off-default The previous AEC commits added a server-side NLMS adaptive filter and exposed an ``echoCancellation`` flag. Real-call testing on cellular PSTN turned up a fundamental architectural mismatch the early benchmarks did not catch: the round-trip echo path on Twilio Media Streams is ~250–1500 ms (jitter buffer + carrier loop), but a 512-tap NLMS filter at 16 kHz can only see the most recent 32 ms of far-end samples. The echo never lands inside the filter's window, the weights stay near zero, and the filter silently no-ops. Worse, with ``isSpeaking=true`` during firstMessage and a barge-in gate of 1 s, once the gate releases any residual bleed reaching VAD triggers an immediate self-cancel — the agent stops talking right after starting. Industry consensus from this round of research: - LiveKit & Pipecat handle echo cancellation at the transport layer for browser/native paths only. - Twilio's own guidance is to "rely on network echo cancellers" for telephone scenarios. - Vapi, Retell, Bland do not run server-side AEC. They rely on the carrier's network echo suppression and the caller device's built-in AEC (modern handsets ship one). Server-side NLMS is the right tool only when the SDK owns the audio path end-to-end and the loop latency is on the order of the filter window (~30 ms — browser WebRTC, mobile native). PSTN does not meet that bar and never will under realistic carrier conditions. This commit: - ``echoCancellation`` stays opt-in (default false) so existing PSTN callers see no change in behaviour. - When ``echoCancellation: true`` is detected on a Twilio or Telnyx carrier, log a clear warning explaining why it will not work as intended and what to do instead. The filter is still instantiated so curious operators can compare; the warning makes the recommendation explicit. For PSTN deployments, the working stack is: Patter's self-hearing guard + 1 s barge-in cooldown + Silero VAD with the phone-tuned preset + carrier / handset native echo suppression. No server-side AEC. Tests: Py 1579 passed, TS 1242 passed, tsc --noEmit clean. * fix: barge-in robustness + AMD on-by-default + STT finalize + post-cancel drain Six architectural fixes for the post-barge-in failure modes surfaced during the 0.6.0 acceptance pass against real PSTN calls. Validated end-to-end on six pipeline stacks (Deepgram + Groq/OpenAI/Anthropic/Cerebras/Google + Cartesia/OpenAI TTS) with verbose Italian conversation flow. 1. Adaptive barge-in gate - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_AEC = 1000 (covers AEC warmup) - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_NO_AEC = 250 (anti-flicker only) - canBargeIn() picks the right gate based on whether AEC is wired. - Suppression call sites log at INFO level with the AEC state. 2. Inbound audio ring cap reduced from 30 frames (~600 ms) to 13 (~260 ms) to match VAD minSpeechDuration. Pre-fix, the replay was dragging in ~350 ms of agent TTS bleed which Deepgram (default English) transcribed as English garbage and committed to the LLM as phantom user input. 3. STT.finalize() on VAD speech_end - New optional finalize() on STTAdapter / STTProvider. - DeepgramSTT.finalize() exposes {type: 'Finalize'} as a public method. - StreamHandler calls stt.finalize() whenever the SDK's VAD signals speech_end so the provider returns is_final immediately rather than waiting on its own (slow) endpointing heuristic. 4. AMD on by default + onMachineDetection callback (Twilio + Telnyx parity) - New MachineDetectionResult carrier-agnostic shape. - Twilio: MachineDetection=DetectMessageEnd + AsyncAmd=true (no answer-latency penalty on human pickups). - Telnyx: answering_machine_detection=greeting_end. - Callback fires on both webhooks before the legacy voicemail-drop path so callers see the result regardless of voicemailMessage. 5. Post-cancel drain window of 150 ms - Tracks lastCancelAt timestamp on every barge-in cancel (both VAD-path and transcript-path). - beginSpeaking() is now async and awaits the drain remainder so the remote PSTN player has time to flush the cancelled turn's tail before the next TTS chunk lands on top of it. Eliminates the "doubled first sentence" audio artefact reported during testing. 6. AssemblyAI accepts a parity-only `language` field for cross-provider uniformity (forwarded as no-op; AssemblyAI selects language by model). Both SDKs (TypeScript and Python) updated with identical defaults, constants, and call-site coverage. Unit tests: TS 24/24 passing, Python 33/33 passing. Includes [DIAG] INFO logs in TS deepgram-stt.ts and stream-handler.ts for the diagnostic phase; these can be removed in a follow-up commit once the bleed-transcription root cause is sealed. * feat(sdk): tool platform overhaul + Realtime fixes + persist option + tunnel grace Bundles the SDK changes from a focused work session: 5 bug fixes + 6 new feature areas, with full Python ↔ TypeScript parity. Bug fixes --------- * fix(client): bump cloudflared quick-tunnel grace 2.5 → 5 s. The 2.5 s window covered HTTP propagation only — Twilio's WSS upgrade for the media stream goes through a different cloudflared edge route that takes ~1-3 s longer; ~5 % of first calls dropped silently at pickup with no media. 5 s drops the failure rate to <1 %. (client.ts / client.py) * fix(realtime): handler-only tools were silently ignored in TS Realtime mode (CRITICAL). `handleFunctionCall` only dispatched `webhookUrl` tools; tools with an in-process `handler` callback (the default pattern in the demos) fell through without sending `function_call_output`, hanging the model. * fix(realtime): `onTranscript({ role: 'assistant' })` was never fired. Assistant text was pushed into history but never surfaced via the user-supplied callback, so demos only saw `[user]` lines. * fix(realtime): dashboard transcript shown out of order. OpenAI Realtime emits `input_audio_transcription.completed` AFTER `response.done`, so the naïve push order was [assistant, user, ...]. Added a per-call buffer (`pendingAssistantTurn` + 3 s fallback timer) that holds the assistant turn until the matching user transcript arrives. * fix(realtime): tool invocations were invisible in the transcript timeline. Added `emitToolEvent` that pushes `role: 'tool'` history entries and fires `onTranscript({ role: 'tool', tool_name, tool_args, tool_result, ... })` for the call/return semantics. Features -------- * feat(api): `Patter({ persist })` opt-in dashboard persistence. The on-disk per-call records (metadata.json, transcript.jsonl, events.jsonl) were previously opt-in only via `PATTER_LOG_DIR`. New explicit option: `false` (default), `true` (platform default location), or a custom string path. Env var still works as deployment-time override. * feat(tools): JSON-schema validation at `agent()` build time + OpenAI strict-mode opt-in. Schemas are validated structurally for every tool; `Tool({ strict: true })` additionally enforces OpenAI's strict-mode requirements (recursive `additionalProperties: false`, every property in `required`). Catches typos at build time. * feat(tools): retry with exponential backoff + per-tool circuit breaker. Both handler and webhook paths now get 3 attempts with jittered backoff (capped at 5 s). New `CircuitBreakerRegistry` trips OPEN after 5 consecutive failures and stays OPEN for 30 s before allowing a HALF_OPEN probe; while OPEN it returns `{error, fallback: true, circuit_state: "open", retry_after_ms}`. * feat(tools): reassurance auto-message during long tool calls. New `Tool({ reassurance: "Let me check..." })` (or `{ message, afterMs }`) bridges the silence on slow tools by enqueueing the message via `OpenAIRealtimeAdapter.sendText` after `afterMs` (default 1500 ms) — cancelled if the tool returns earlier. Realtime-only for now. * feat(tools): MCP (Model Context Protocol) client integration (MVP). New `agent({ mcpServers: [...] })` plugs the agent into MCP servers (Google Workspace, PayPal, Postgres, GitHub, ...) without writing wrapper handlers. Each server is queried at call start via `tools/list`; discovered tools are wrapped with synthetic handlers that dispatch to `tools/call` and merged into `agent.tools`. Optional dependency: `@modelcontextprotocol/sdk` (TS) / `mcp` (Py extra). Streamable-HTTP transport only for now. * feat(tools): streaming results via async generator handlers. Tool handlers can be `async function*` (TS) / `async def ... yield` (Py) generators that emit `{ progress: "..." }` updates while running; each yield is sent to the agent via `sendText` for inline status. New files --------- * libraries/typescript/src/tools/schema-validation.ts * libraries/typescript/src/tools/circuit-breaker.ts * libraries/typescript/src/tools/mcp-client.ts * libraries/python/getpatter/tools/schema_validation.py * libraries/python/getpatter/tools/circuit_breaker.py * libraries/python/getpatter/tools/mcp_client.py * test files: 4 TS + 3 Py covering schema validation, breaker, streaming, reassurance Tests ----- 1280 TS · 1156 Py · 0 regressions. Updates two stale tests (AMD on-by-default test in new-features.test.ts; handler retry count in llm-loop.test.ts) to reflect new behaviour. * feat(dashboard): React + Vite SPA replaces inline HTML/CSS template The dashboard is now a real SPA in `dashboard-app/` (Vite + React + TypeScript) instead of a 700-line HTML/CSS/JS string embedded in `dashboard/ui.{ts,py}`. The build pipeline produces a single self-contained HTML file (vite-plugin-singlefile inlines JS + CSS) which is committed to `libraries/typescript/src/dashboard/ui.html` and mirrored to the Python package via `dashboard-app/scripts/sync.mjs`. At runtime the SDK serves the same `GET /` endpoint as before — the inlined HTML is loaded by tsup's esbuild ``text`` loader (TS) or the package-data file (Py). Customer-side: zero change in start-up UX (`phone.serve()` → http://127.0.0.1:8000/), but the dashboard is now typed, modular, and maintainable as proper React. Why this approach (option D from the design discussion): * No CDN dependency at runtime (no unpkg.com / Babel-in-browser). * No new runtime deps in the SDK — React + Vite live only at build time in the dev repo; the published package ships static HTML. * Self-contained bundle: the SDK still works air-gapped and behind corporate firewalls. * Type safety end-to-end (TSX components, tsconfig strict). Components ported from the reference design: * Topbar, PageHeader, Metric cards * CallTable with row selection + search * LiveCallPanel (transcript stream + call controls) * LatencyPanel (p50 / p95 / STT / TTS bars) * CostPanel (per-provider breakdown) Hooks: * useDashboardData — fetches `/api/dashboard/calls` + subscribes to the SSE stream at `/api/dashboard/stream` * useTranscript — incremental transcript updates per selected call * mappers.ts — maps the wire format (CallRecord) to the UI shape Build: * `dashboard-app/` is its own Vite project with `npm run build && npm run sync` — sync copies the inlined HTML to both SDKs. * `libraries/typescript/tsup.config.ts` adds the ``.html`` text loader so the asset is inlined into `dist/index.{js,mjs}`. * `libraries/python/pyproject.toml` declares `ui.html` as `getpatter.dashboard` package-data so it ships with `pip install`. * `libraries/typescript/package.json` `files` array includes `src/dashboard/ui.html` so npm packs it. * docs(changelog): unreleased entries for tool platform + Realtime fixes + persist Documents the two preceding commits in CHANGELOG.md under ``## Unreleased``: * Added: ``Patter(persist=...)`` option, JSON-schema validation + strict mode, retry + circuit breaker, reassurance auto-message, MCP client integration, streaming results. * Fixed: Realtime handler-tool dispatch, assistant ``onTranscript``, transcript ordering buffer, tool transcript events, cloudflared quick-tunnel WSS upgrade race. Per the project rule (``.claude/rules/documentation-best-practices.md`` invariant 0): every user-visible change updates ``## Unreleased`` in the same unit of work. The dashboard rewrite is intentionally NOT in the changelog — same URL, same UX, same `phone.serve()` entry point; the SPA migration is internal and customer-invisible. * chore(dashboard): commit Python sync of ui.html Mirror of the built dashboard SPA into the Python package — produced by ``dashboard-app/scripts/sync.mjs`` alongside the TS-side ``libraries/typescript/src/dashboard/ui.html``. Should have been part of the dashboard SPA commit; tracking it now keeps the two SDKs in parity for ``pip install getpatter``. * chore: black reformatting of test_dashboard.py Pure formatter pass — splits long argument lists across multiple lines and adds the missing blank line after the conditional ``import fastapi``. No logic changes; the test still verifies the dashboard store and routes the same way. * feat(dashboard): polish — real logo, range filter, interactive sparklines, realtime mode Iterative refinement of the React/Vite dashboard SPA shipped in 3877719. Customer-side it remains a single embedded HTML file served from `phone.serve()` at `/`, but the UX is now markedly closer to the target design. UI changes: * Real Patter logo: mark (wireframe stack-tile from the favicon path, thin stroke instead of the chunky filled silhouette in `docs/logo/light.svg`) + tightened-viewBox wordmark, sized independently so the wordmark stays large while the mark line weight stays light. * Tab title: "Patter | Dashboard". Favicon: stack-tile SVG inline, matching the previous SDK dashboard. * Topbar: dropped Bell / Settings / Avatar buttons and "Place call" CTA (will reintroduce when actually wired). Phone-number pill always shown, derived from the most recent call's Patter-side number. * Live chip pulse: peach static when zero calls, green pulsing when ≥1 is active. * Latency + Cost merged into one MetricsPanel with a peach segmented switcher, fixing the right-rail clipping that hid Cost on short viewports. Realtime mode collapses the STT/LLM/TTS waterfall to a single end-to-end bar (those metrics aren't meaningful when the provider does the round-trip in one model call). Range filter (1h / 24h / 7d / All) is now real: * Bucket strategy aligned to natural boundaries — 12 × 5min, 24 × 1h, 7 × 1day, plus 9-bucket auto for All. Tooltip ranges read as "11:00 → 12:00" instead of "11:39 → 12:33". * Filtered call list, headline counters (Calls / Latency p95 / Spend), and sparklines all reflect the active range. Live calls always stay visible even when out of the range so users see what's happening now. * Sparkline scaling: tallest bar normalises to 100, no more lonely single bar surrounded by ghost grey lines. Sparklines are now interactive: * Hover any bar → custom tooltip (instant, dark surface, mono numerics in peach) showing the bucket window, call count, and a 4-call sample (number / status / cost). React-driven, replaces the slow native `title=""`. * Click → selects the newest call in that bucket into the right rail. * Empty buckets are invisible (no grey ghosts). * Bars now sit flush against the card bottom (flex column + `margin-top: auto`), matching the original design. Export CSV button is now wired to `/api/dashboard/export/calls?format=csv` via a transient anchor download. Backend additions: none — every change above is in `dashboard-app/` plus the synced `ui.html` rebundles in both SDKs. Pre-publish flow is still `cd dashboard-app && npm run build && npm run sync`. * feat(tts): add Inworld TTS provider (TTS-2 default, NDJSON streaming) New TTS adapter calling Inworld's HTTP NDJSON streaming endpoint `POST https://api.inworld.ai/tts/v1/voice:stream`. Defaults to `inworld-tts-2` (sub-200 ms TTFB, 100+ languages, natural-language voice steering); pass `model: "inworld-tts-1.5-max"` for the prior generation. Default audio output is PCM_S16LE at 16 kHz so the result feeds straight into the Patter pipeline without transcoding. Public API parity: - TS: `import { InworldTTS } from "getpatter"` / `getpatter/tts/inworld` - Py: `from getpatter import InworldTTS` / `getpatter.tts.inworld` - Env-var auto-resolve via `INWORLD_API_KEY` (paste the Base64 token from the Inworld dashboard — already in `Authorization: Basic <token>` form). - Optional knobs: `language` (BCP-47), `temperature` (TTS-1.5 only), `speakingRate` (0.5-1.5), `deliveryMode` (`EXPRESSIVE`/`BALANCED`/ `STABLE` — TTS-2 only), `bitrate`. Pricing entry `inworld` added to both pricing tables (placeholder $0.020/1k chars — verify against current platform tier). Optional dependency `getpatter[inworld]` adds `aiohttp>=3.10`. 7 mocked unit tests per SDK covering payload shape, NDJSON line interleave (`audio, timestamp, audio`), base64 audio decoding, optional field omission, env-var fallback, and non-200 error surfacing. New files: - libraries/typescript/src/providers/inworld-tts.ts - libraries/typescript/src/tts/inworld.ts - libraries/python/getpatter/providers/inworld_tts.py - libraries/python/getpatter/tts/inworld.py - libraries/{typescript,python}/{tests/unit/inworld-tts*.test.*,tests/unit/test_inworld_tts.py} * feat(observability): speech-edge events for turn-taking instrumentation Adds seven optional async callbacks to every Patter instance plus a read-only conversation_state snapshot, mirroring the public APIs of LiveKit Agents, Pipecat and OpenAI Realtime so downstream metrics map onto the canonical Hamming AI / Coval / Cekura voice-agent metric set without translation: on_user_speech_started - raw VAD positive edge on_user_speech_ended - raw VAD trailing edge (not EOU) on_user_speech_eos - committed end-of-utterance (canonical "user finished" — anchors eos_to_first_token_ms) on_agent_speech_started - first wire-time chunk (what user hears) on_agent_speech_ended - last wire chunk; payload includes interrupted on_llm_token - TTFT marker, fires once per turn on_audio_out - first TTS chunk per turn (warmup, distinct from wire-time) Each event also records an OpenTelemetry span event on the current call span (patter.event.*), with gen_ai.* attributes for the LLM event per the OTel GenAI semconv. OTel branch is a zero-cost no-op when the peer dep is missing. Wired into the realtime stream handler so the user/agent edge events fire automatically on the OpenAI Realtime + Twilio/Telnyx path; LLM/TTS-warmup events are exposed on the dispatcher for adapter/pipeline integrations. Public API: SpeechEvents, SpeechEventCallback, ConversationStateSnapshot, UserState, AgentState, EouTrigger. Tests: 16 unit tests Py + 15 unit tests TS covering payload schema, state transitions, idempotency, OTel attach contract, callback-exception isolation, and Patter-level proxy mirroring. Motivated by patter-agent-runner's 15 turn-taking acceptance verbs that previously auto-skipped because the SDK did not surface per-side speech edges. * fix: Realtime first_message role swap, dashboard 404 spam, SDK plumbing for speech-edge events Three Realtime mode fixes (Python + TypeScript parity) plus the host- binding / observability plumbing required to drive the speech-edge event suite from external test runners. Realtime: first_message role swap ------------------…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps actions/setup-python from 5 to 6.
Release notes
Sourced from actions/setup-python's releases.
... (truncated)
Commits
a309ff8Bump urllib3 from 2.6.0 to 2.6.3 in /tests/data (#1264)bfe8cc5Upgrade@actionsdependencies to Node 24 compatible versions (#1259)4f41a90Bump urllib3 from 2.5.0 to 2.6.0 in /tests/data (#1253)83679a8Bump@types/nodefrom 24.1.0 to 24.9.1 and update macos-13 to macos-15-intel ...bfc4944Bump prettier from 3.5.3 to 3.6.2 (#1234)97aeb3eBump requests from 2.32.2 to 2.32.4 in /tests/data (#1130)443da59Bump actions/publish-action from 0.3.0 to 0.4.0 & Documentation update for pi...cfd55cagraalpy: add graalpy early-access and windows builds (#880)bba65e5Bump typescript from 5.4.2 to 5.9.3 and update docs/advanced-usage.md (#1094)18566f8Improve wording and "fix example" (remove 3.13) on testing against pre-releas...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)