Modified README#17
Merged
Merged
Conversation
Patter is now local-only. Remove cloud quickstart, self-hosting section, cloud vs local table, cloud architecture diagram, and cloud API examples. Update How It Works diagram to show embedded architecture.
6 tasks
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration).
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
…#66) * fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni
added a commit
that referenced
this pull request
Apr 21, 2026
* fix(deps): pin websockets>=14 and add python-multipart Fixes BUG #7 and #9 from acceptance suite. - websockets: pin >=14,<16. The 'additional_headers=' kwarg used by the OpenAI Realtime, Deepgram STT and ElevenLabs ConvAI adapters is only supported on the new asyncio client that became the default in 14.0. Under 13.x the call failed with 'got an unexpected keyword argument additional_headers', blocking every streaming provider. - python-multipart: add to the base install. Starlette >= 0.45 raises on 'await request.form()' without python-multipart installed, so every Twilio webhook returned 422 and the call was silently dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): repair Twilio & Telnyx webhook stack Fixes BUG #6, #8, #16 from acceptance suite. - #8 Request/Response import lifted to the top of server.py. With ``from __future__ import annotations`` in place, FastAPI's ``get_type_hints(handler)`` resolved the 'Request' annotation against module globals where only WebSocket was imported. The ForwardRef stayed unresolved, FastAPI classified the parameter as a query-string field and every Twilio/Telnyx webhook POST returned HTTP 422 before the handler body could run. Local mode was fundamentally broken on 0.4.3. - #6 dashboard tracking of failed outbound calls: new route ``POST /webhooks/twilio/status`` consumes Twilio statusCallback events (initiated/ringing/answered/completed/no-answer/busy/failed) and feeds them into MetricsStore.update_call_status. Operators now see every dialled attempt in the dashboard, including ones that never reach media. - #16 Telnyx Call Control: ``/webhooks/telnyx/voice`` now POSTs ``actions/answer`` on call.initiated and ``actions/streaming_start`` on call.answered against the REST API and returns empty HTTP 200. Previously the route returned a JSON ``{commands: [...]}`` body that Telnyx silently discards — the call rang forever. Twilio voice route also falls back to the ``Caller`` / ``Called`` form fields when ``From`` / ``To`` are empty (see BUG #6 notes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(telnyx): WS event shape, frame format, track filter, audio sender Fixes BUG #17, #18, #19 from acceptance suite. - #17 Media-stream WebSocket events use ``event`` (start / media / stop / dtmf / error / connected), not the Call Control REST notification ``event_type``. Audio payload lives in ``data.media.payload`` (base64), caller/callee live in ``data.start.{from,to}``. Previously the bridge matched ``event_type == "stream_started"`` and looked for audio in ``payload.audio.chunk`` — no media chunk was ever decoded, so the agent never heard the caller. - #18 Outbound wire format corrected to ``{"event":"media","media":{"payload":b64}}`` and ``{"event":"clear"}``. The legacy ``event_type``/``payload.audio.chunk`` shape was silently dropped by Telnyx, so the caller heard silence. - #19 When ``stream_track=both_tracks`` Telnyx emits media for both the caller leg and the agent's own outbound leg; forwarding the outbound echo broke OpenAI Realtime turn detection ("speech_started" never fired). The bridge now filters ``media.track != "inbound"`` before forwarding. OpenAI Realtime handler on Telnyx is now configured with ``audio_format="g711_ulaw"`` to match the PCMU 8 kHz bidirectional stream. The TelnyxAudioSender transcodes PCM16 16 kHz → mulaw 8 kHz for pipeline / ConvAI providers (PCM16 TTS output) and passes mulaw bytes through when OpenAI Realtime provides them directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(twilio): OpenAI Realtime audio format + pass-through audio sender Fixes BUG #10 from acceptance suite. OpenAI Realtime emits PCM16 at 24 kHz natively. The Twilio handler previously left ``audio_format`` at the pcm16 default and fed the bytes into TwilioAudioSender, which unconditionally ran ``resample_16k_to_8k(pcm) → pcm16_to_mulaw`` assuming 16 kHz input. 24 kHz bytes run through a 16→8 kHz resampler come out at ~66% of the correct rate — the caller heard a deep, slurred voice. Fix: on the Twilio path construct ``OpenAIRealtimeStreamHandler(..., audio_format="g711_ulaw")`` so OpenAI emits Twilio-native mulaw 8 kHz directly. Pair it with ``TwilioAudioSender(..., input_is_mulaw_8k=True)`` which skips the resample+mulaw encode and forwards the bytes as-is. Pipeline and ConvAI still produce PCM16 @ 16 kHz and go through the default transcoding path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pipeline): STT path + hooks + barge-in + dedup + hallucination filter Fixes BUG #12, #15, #20, #22 from acceptance suite. - #12 Pipeline on Twilio: the bridge converts mulaw 8 kHz → PCM16 16 kHz before STT. The STT adapter used to be built with ``for_twilio=True`` (mulaw 8 kHz) — Deepgram decoded the already-PCM bytes as mulaw and produced garbage transcripts. The pipeline now always configures linear16 @ 16 kHz. - #15 ``PipelineHooks.before_send_to_stt`` was declared but never invoked. ``PipelineStreamHandler.on_audio_received`` now runs the hook on every inbound chunk and drops the chunk when it returns ``None``. - #20 Pipeline barge-in: ``on_audio_received`` used to skip STT when ``_is_speaking=True``, blocking any barge-in detection. It now keeps forwarding caller audio to STT during TTS (unless ``agent.barge_in_threshold_ms == 0``), and ``_stt_loop`` flips ``_is_speaking=False`` + ``send_clear`` on any Deepgram transcript with text observed while speaking. Effective latency floor is ~800 ms (Deepgram interim), so noisy / short TTS sentences may not actually be interrupted — full sub-second barge-in requires a server-side VAD (Silero, already supported via ``agent.vad=``). - #22 Dedup + throttle + hallucination filter. Low-quality STT (Whisper on mulaw 8 kHz) emits several nearly-identical final transcripts in 1–2 s ("you", "you", "you") and hallucinates short fillers from silence / TTS echo. Each used to kick off a new LLM+TTS turn, and consecutive turns overlapped on the caller's line. Fix in ``_stt_loop``: dedup identical finals within 2 s, drop any final within 500 ms of the last committed turn, drop a curated blacklist of fillers (``you``, ``thank you``, ``yeah``, ``uh``, ``.``…). Also adds the 8 kHz output path used by the Telnyx handler via a shared linear16 STT factory in ``handlers/common.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): voice name resolver, Deepgram knobs, TTS streaming resample Fixes BUG #11, #13, #23 from acceptance suite. - #11 ElevenLabs voice-name resolver. ``Patter.elevenlabs(voice="rachel")`` (the quickstart default) used to pass "rachel" verbatim into the /text-to-speech/{voice_id}/stream URL, which 404s because the API only accepts the opaque 20-char voice IDs. The new ``resolve_voice_id`` helper maps ~45 common display names (rachel, adam, matilda, alloy, …) to their UUIDs and returns unknown strings unchanged so custom voices keep working. Removes the ad-hoc "alloy" substitution in stream_handler. - #13 DeepgramSTT exposes ``endpointing_ms`` / ``utterance_end_ms`` / ``smart_format`` / ``interim_results`` / ``vad_events`` kwargs and the ``Patter.deepgram(...)`` factory forwards them via ``STTConfig.options``. Defaults tuned for telephony (endpointing_ms=150, utterance_end_ms=1000). The transcript gate is loosened to ``is_final OR speech_final`` so we don't wait up to utterance_end_ms on every turn. Pipeline turn latency on Twilio drops from ~4 s to ~2.2 s. - #23 OpenAI TTS streaming resample. ``response_format=pcm`` returns 24 kHz PCM16 chunks that must be downsampled to 16 kHz. The old implementation did the 3:2 downsample chunk-by-chunk without preserving filter state, so cross-chunk alignment drifted and the caller heard pops / dropped audio. Now uses ``audioop.ratecv`` with a persistent ``state`` and stashes odd trailing bytes between calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(scheduler,fallback): per-loop schedulers + async close + cancel probes Fixes BUG #2, #3, #5 from acceptance suite. - #3 Scheduler singleton dies across event loops. The old ``_scheduler_singleton`` bound to the first loop it saw; pytest-asyncio closed that loop at the end of every test and the next scheduled callback crashed with ``Event loop is closed``. Replaced by ``_schedulers_by_loop`` — a dict keyed on ``id(asyncio.get_event_loop())`` that drops stale entries when the owning loop has been closed. Adds ``reset_for_tests()`` to tear down every cached scheduler; the public ``shutdown()`` is now an alias for it. - #2 ``FallbackLLMProvider.complete_stream`` — convenience wrapper that flattens ``{"type": "text"}`` chunks so callers don't have to switch on chunk type. Mirrors the TS SDK's ``completeStream``. - #5 ``FallbackLLMProvider`` recovery task leak. ``_probe`` tasks created by ``_start_recovery`` were never awaited, and pytest-asyncio tears the loop down before they finish. Adds ``aclose()`` and async context manager support (``__aenter__``/``__aexit__``) so callers can ``async with FallbackLLMProvider(...)`` and have the probes cancelled + awaited on exit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tools): @tool adapter unpacks kwargs into user function Fixes BUG #21 from acceptance suite. ``@tool`` exposed the raw user function as ``handler`` but ``services/tool_executor._execute_handler`` always calls ``handler(arguments_dict, call_context_dict)``. Every typed tool — e.g. ``async def check_order(order_id: str)`` — crashed at runtime with "takes 1 positional argument but 2 were given" and OpenAI Realtime received a fallback error JSON instead of the tool's result. The decorator now wraps the user function in an async adapter whose signature matches the executor's contract ``(arguments, call_context)``. The adapter inspects the original signature: if it already takes ``(arguments, call_context)`` positionally it passes through unchanged, otherwise it filters ``arguments`` to the user function's declared parameter names and calls ``fn(**args)``. The original function is still reachable via ``handler.__wrapped__`` for introspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): track failed & no-answer outbound calls Fixes BUG #6 from acceptance suite. The embedded dashboard used to show only calls that made it to the media channel. An outbound dial that rang out (``status=no-answer``, ``busy``, ``failed``) never produced a webhook hit, so the row never appeared in the UI even though Twilio billed for the attempt. Changes: - ``MetricsStore.record_call_initiated({call_id, caller, callee, …})`` pre-registers the call when ``Patter.call()`` returns, so the row shows up the moment the dial is dispatched. - ``MetricsStore.update_call_status(call_id, status, **extra)`` promotes the record through the lifecycle (ringing → in-progress → completed / no-answer / busy / failed / canceled). Terminal states move the row from active to the completed list so the UI timer freezes. Fed by the new ``/webhooks/twilio/status`` route. - ``MetricsStoreProtocol`` extended with the two new methods. - ``call_end`` now synthesises a minimal metrics shim when the call ended without a full CallMetrics payload, so the UI can still render duration / status. - Dashboard UI: new ``STATUS`` column, filter pills (all / completed / failed), colour-coded badges (green / yellow / red / orange), red row tint for failed statuses, and SSE listeners for the new ``call_initiated`` and ``call_status`` events. The duration timer respects ``data-ended`` so rows that already received call_end stop ticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): ring_timeout + agent.hooks/vad/audio_filter forwarding + call pre-register Fixes BUG #14 + IMP2 + completes BUG #6 from acceptance suite. - #14 ``Patter.agent(...)`` used to drop ``hooks``, ``text_transforms``, ``vad``, ``audio_filter``, ``background_audio`` and ``barge_in_threshold_ms`` even though the ``Agent`` dataclass accepted them. The factory now forwards all fields. - IMP2 ``ring_timeout: int | None`` kwarg on ``Patter.call(...)``. Forwarded to Twilio as ``Timeout=`` and to Telnyx as ``timeout_secs`` (added to ``TelnyxAdapter.initiate_call``). Italian mobile carriers silence-drop the default ~28 s ring on US→IT calls; the quickstart now works with ``ring_timeout=60``. - #6 ``Patter.call()`` pre-registers the dialled call in the MetricsStore via ``record_call_initiated(...)`` before returning, so the dashboard shows the attempt even when the callee never picks up. The Twilio branch also passes ``StatusCallbackEvent="initiated ringing answered completed"`` so we receive every state transition. Also exposes the new Deepgram knobs on the ``Patter.deepgram(...)`` factory (``model``, ``endpointing_ms``, ``utterance_end_ms``, ``smart_format``, ``interim_results``). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): models barge_in_threshold_ms + STT/TTS options, top-level mix_pcm, docstring Rolls up the smaller API additions — BUG #1, #04g, extras from #13/#15. - ``Agent.barge_in_threshold_ms`` (default 300) — hangover window before treating caller audio as barge-in. Used by PipelineStreamHandler and mirrored on TS ``AgentOptions.bargeInThresholdMs``. - ``STTConfig.options`` / ``TTSConfig.options`` — provider-specific knobs bag (e.g. Deepgram endpointing) that ``common._create_stt_from_config`` unpacks when building the adapter. Keeps older ``STTConfig`` callers forward-compatible. - Top-level ``patter.mix_pcm(agent, bg, ratio)`` — parity alias for the TS ``mixPcm(...)`` standalone helper (BUG #04g). Thin wrapper over the existing ``PcmMixer`` class with an explicit ratio. - ``patter/__init__.py`` docstring enumerates the installable extras (scheduling, anthropic, groq, cerebras, google, …) so ``pip install getpatter`` users discover them without hitting a ``RuntimeError: Scheduling requires the 'apscheduler' package`` at call time (BUG #1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: align Python tests with BUG #12/#16/#17/#18/#19/#21 fixes - ``test_local_mode``: pipeline Twilio bridge test now patches ``DeepgramSTT`` directly instead of ``DeepgramSTT.for_twilio`` — after BUG #12 the pipeline path uses the default linear16 16 kHz adapter on both telephony providers. - ``test_new_features``: ``machine_detection=False`` no longer asserts an empty extra_params dict; BUG #6 now always wires a ``StatusCallback`` so the dashboard sees failed attempts. The test keeps its original intent (AMD-specific params absent) and additionally checks the status callback is set. - ``test_server_unit::TestTelnyxVoiceRoute``: rewritten to assert the REST ``actions/answer`` POST after BUG #16 — the route no longer returns a JSON commands body. - ``test_telnyx_bridge_unit``: helper messages updated to the ``{event: start|media|stop}`` wire shape from BUG #17; the OpenAI Realtime audio_format assertion now expects ``g711_ulaw`` (from #18). - ``test_telnyx_handler_unit``: TelnyxAudioSender test uses ``input_is_mulaw_8k=True`` so the round-trip byte assertion still holds with the new PCM16→mulaw transcode path (#18). Wire format asserts ``event == "media"`` / ``event == "clear"``. - ``test_tool_decorator``: invokes handlers with the new adapter signature ``(arguments_dict, call_context_dict)`` (#21), including a sync-wrapped handler awaited through the adapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ts/api): Python parity — auto-detect local, static factories, ring_timeout Brings TS parity with Python on BUG #4 parity items + #14 agent fields + IMP2 ring_timeout. - Auto-detect local mode: ``new Patter({twilioSid, twilioToken, …})`` without explicit ``mode: 'local'`` is now treated as local when apiKey is missing (mirrors Python). - Static provider factories: ``Patter.deepgram(...)``, ``Patter.elevenlabs(...)``, ``Patter.whisper(...)``, ``Patter.openaiTts(...)``, ``Patter.cartesia(...)``, ``Patter.rime(...)``, ``Patter.lmnt(...)``. - ``STTConfig.toDict`` / ``TTSConfig.toDict`` are now optional — plain object literals ``{provider, apiKey, language}`` are accepted everywhere (fallback serialisation is handled via ``sttConfigToDict`` / ``ttsConfigToDict`` helpers). - ``STTConfig`` gets an ``options`` bag (parity with Python BUG #13). - ``LocalCallOptions.ringTimeout`` forwarded to Twilio as ``Timeout`` and Telnyx as ``timeout_secs`` — plus ``StatusCallbackEvent`` wired so the dashboard sees ringing/no-answer/busy/failed transitions (BUG #6). - ``AgentOptions.bargeInThresholdMs`` (parity with #20 on Python). - ``LocalOptions.deepgramKey`` / ``elevenlabsKey`` added as provider-level defaults (parity with Python Patter() kwargs). - ``Patter.call()`` Twilio branch pre-registers the dialled call with ``metricsStore.recordCallInitiated`` so no-answer / busy / failed attempts still show up in the dashboard. - ``providers.deepgram(...)`` factory exposes the Deepgram knobs (model / endpointing_ms / utterance_end_ms / smart_format / interim_results) and carries them in ``STTConfig.options``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts/providers): voice resolver, Deepgram knobs, TTS streaming resample TS parity port of Python BUG #11, #13, #23. - ElevenLabs: ``resolveVoiceId()`` maps display names (rachel, adam, matilda, alloy, …) to the opaque 20-char UUIDs accepted by the /text-to-speech/{voice_id}/stream endpoint. Map mirrors the Python SDK byte-for-byte. - DeepgramSTT: constructor overloaded to accept ``DeepgramSTTOptions`` (endpointingMs / utteranceEndMs / smartFormat / interimResults / vadEvents) alongside the legacy positional form. Transcript gate loosened to ``is_final OR speech_final`` so short utterances don't wait for Deepgram's utterance_end commit. - OpenAITTS: streaming 24 kHz → 16 kHz resample now carries state (``carryByte`` + ``leftover`` samples) between chunks so cross-chunk alignment doesn't drift. The legacy ``resample24kTo16k`` static is kept as a thin wrapper around the streaming path for the existing unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ts): Telnyx stack, pipeline hooks/barge-in/dedup, dashboard status, scheduler sync TS parity port of the Python fixes for BUG #2/#3/#6/#12/#15/#16/#17/#18/#19/#20/#22. - ``stream-handler.ts``: ``handleAudio`` now runs the ``before_send_to_stt`` hook (#15), transcodes Twilio mulaw 8 kHz → PCM16 16 kHz unconditionally on the pipeline path (#12), and keeps forwarding caller audio during TTS so barge-in can trigger (#20). ``processTranscript`` implements the dedup + 500 ms throttle + hallucination-word blacklist from #22 and flips ``isSpeaking`` + ``sendClear`` on any transcript with text while the agent is speaking (#20). - ``server.ts``: ``TelnyxBridge.sendAudio`` / ``sendClear`` use the correct ``{event:"media",media:{payload:b64}}`` wire format (#18); the Telnyx WS handler matches ``data.event`` (start / media / stop / dtmf / error / connected) and filters ``media.track !== "inbound"`` before forwarding (#17, #19); the ``/webhooks/telnyx/voice`` route POSTs ``actions/answer`` and ``actions/streaming_start`` via the Call Control REST API and returns empty HTTP 200 (#16). ``TwilioBridge.createStt`` picks linear16 16 kHz when ``provider === 'pipeline'`` so Deepgram doesn't decode already-PCM bytes as mulaw (#12). A new ``/webhooks/twilio/status`` handler consumes Twilio status callbacks and updates the dashboard (#6). - ``scheduler.ts``: ``scheduleCron`` returns a ``ScheduleHandle`` synchronously (lazy node-cron import happens in the background) — parity with Python #4. ``scheduleInterval`` accepts ``{intervalMs}`` or ``{seconds}`` in addition to the legacy positional ms, matching Python ``schedule_interval(seconds=...)``. - ``fallback-provider.ts``: ``completeStream()`` text-only convenience generator (#2), ``aclose()`` + ``Symbol.asyncDispose`` so ``await using fallback = ...`` parity with Python's ``async with FallbackLLMProvider(...)`` (#5). - ``dashboard/store.ts``: ``recordCallInitiated`` pre-registers outbound attempts, ``updateCallStatus`` promotes rows through ringing / no-answer / busy / failed and moves terminal states to the completed list (#6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(ts): align with 0.4.4 wire-format & provider API changes - ``providers.test.ts``: toDict now surfaces ``options`` when set, knobs forwarding verified. - ``types.test.ts``: toDict optional chain covered. - ``openai-tts.test.ts``: 1-byte input no longer returns the byte verbatim — the streaming resampler stashes it as ``carryByte`` and the stateless wrapper flushes only complete samples, so the test now asserts an empty buffer. - ``integration/twilio-pipeline.test.ts`` + ``integration/telnyx-pipeline.test.ts``: ``handleAudio`` is now async; tests await it. Telnyx fixture feeds mulaw 8 kHz and asserts the transcoded PCM16 16 kHz lands on the STT mock (BUG #12 + #19). - ``unit/server-routes.test.ts``: Telnyx webhook tests assert the REST ``actions/answer`` + ``actions/streaming_start`` POSTs and the empty HTTP 200 response (BUG #16). - ``package-lock.json``: refreshed for the sdk-ts worktree so the ``0.4.3`` → ``0.4.3-worktree`` alignment is consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(unit): regression coverage for BUG #6/#22/#23 + ring_timeout (IMP2) Three new unit test files lock in fixes that previously lived in the acceptance suite as live-call checks: test_pipeline_dedup.py (13 tests) - Hallucination blacklist: "you", "thank you", ".", case/punctuation variants, empty-after-strip all drop silently. - 2-second duplicate window with time.time monkeypatched so parity with the live Whisper feedback loop is deterministic. - 500 ms back-to-back throttle covering legitimate vs spurious second turns. - Interim / empty finals must not fire on_transcript. test_openai_tts_resample.py (7 tests) - Cross-chunk ratecv state: multi-chunk stream output matches a single-shot resample byte-for-byte. - Odd-byte boundary: a chunk ending on a dangling byte must not drop the sample. - Empty / single-byte / tiny chunks must not crash. - Response is always aclosed on both successful and early-exit paths. test_twilio_status_and_ring_timeout.py (13 tests) - /webhooks/twilio/status routes to update_call_status with parsed duration, and survives missing SID, bad duration, and the dashboard-disabled path. - Twilio signature enforcement on the status endpoint. - Twilio ring_timeout -> Timeout REST param, Telnyx -> timeout_secs. - Twilio StatusCallback / StatusCallbackEvent are always registered on outbound calls so BUG #6 cannot regress. Full unit suite: 728 passed, 2 skipped. * docs+ci: latency/provider caveats + audit workflow README - Pipeline turn-latency floor documented (~2.0–2.8 s) with per-stage breakdown so users know to switch to `provider="openai_realtime"` for sub-second UX. - ElevenLabs free-tier library-voice restriction (402) with pointer to `ELEVENLABS_VOICE_ID`. - Telnyx outbound D38 Outbound Profile requirement. - Google Gemini free-tier quota=0 caveat. - Whisper hallucination filter documented. - `ring_timeout` + status callback description added to call(). .github/workflows/audit.yml (new) - pip-audit on sdk-py runtime deps. - npm audit on sdk-ts production deps. - bandit static analysis with SARIF upload to GitHub Security. - Runs on dep-manifest changes, weekly schedule, and manual dispatch. - Findings are advisory-only to keep the pipeline from flaking on upstream CVE churn (telephony stack pulls many C-wrapped libs). Baseline audit run: npm=0, bandit medium+/high-confidence=0, pip-audit=2 (pytest dev-only + transformers optional-extra only). * docs(readme): remove local-measured latency numbers from Voice Modes The millisecond ranges previously listed for each provider came from a single local benchmark run and are neither representative nor a target. Keep the modes table qualitative and replace the per-stage breakdown with a short note that latency is inherited from the chosen providers — no hard numbers we don't want callers anchoring on. * test(unit): bug coverage gaps — BUG #15/#19/#20 Three new unit test modules fill the remaining coverage gap for the bugs fixed on this branch: test_pipeline_bargein.py (7 tests) — BUG #20 - Interim transcript during TTS triggers send_clear + is_speaking=False. - record_turn_interrupted is fired on the metrics accumulator. - send_clear throwing does not crash the STT loop (fail-open). - No barge-in when the agent is idle or the transcript has no text. - Final transcripts also trigger the barge-in branch before the downstream LLM turn runs. test_before_send_to_stt_hook.py (9 tests) — BUG #15 - Sync / async hook returning None drops the chunk (zero STT sends). - Returning modified bytes forwards the new buffer verbatim. - Hook receives the decoded PCM, not the raw mulaw payload. - Raising hooks fail-open: original audio still reaches STT. - Missing hook / hooks instance with before_send_to_stt=None are both bypass paths that must still forward audio. test_telnyx_track_filter.py (5 tests) — BUG #19 - track=inbound forwards, track=outbound drops. - Missing `track` field defaults to inbound (legacy Telnyx payloads). - Mixed stream: only inbound frames reach the handler, in order. - Unknown track values are skipped defensively. Full unit suite: 749 passed, 2 skipped (+21 from this commit). * feat(sdk-py): add cartesia/rime/lmnt static factories + vad_events to deepgram Brings Python SDK to parity with sdk-ts: - Adds Patter.cartesia / Patter.rime / Patter.lmnt static methods so local-mode users can configure these TTS providers the same way they do in TypeScript. - Adds the missing vad_events keyword to Patter.deepgram and the patter.providers.deepgram factory — the DeepgramSTT ctor already accepted it, but the public config helper silently dropped the flag. * chore: bump to 0.4.4 Regression suites re-run after the bump: - sdk-py: 749 passed, 2 skipped - sdk-ts: 932 passed (57 test files, including soak) * fix(ci): integration tests on 0.4.4 wire format + misc hygiene Addresses the five failing CI checks on PR #66. Telnyx integration tests (test_telnyx_{convai,pipeline,realtime}.py) - ``_telnyx_stream_started`` / ``_telnyx_media_event`` / ``_telnyx_stream_stopped`` helpers migrated from the pre-0.4.4 ``{event_type, payload.audio.chunk}`` shape to the real Telnyx media-stream wire format ``{event, start|media.payload}`` (BUG #17/#18). Without this the bridge silently drops every test frame and 11 integration tests fail with "handler called 0 times". - ``test_audio_format_pcm16`` renamed to ``test_audio_format_g711_ulaw`` and the assertion flipped — Telnyx is PCMU 8 kHz bidirectional (BUG #19), Realtime runs on ``g711_ulaw`` so both legs stay pass-through. sdk-ts/src/scheduler.ts - Removed the trailing blank line that broke the pre-commit ``end-of-file-fixer`` hook. .github/workflows/audit.yml - Bandit stock CLI doesn't support ``-f sarif`` — install ``bandit-sarif-formatter`` alongside bandit, and guard the upload-sarif step with ``hashFiles`` so future formatter breakage doesn't fail the job. Local verification: 802 passed, 4 skipped (sdk-py unit + integration). * docs: update SDK reference for 0.4.4 features - Update version to 0.4.4 in API reference - Add static factories: cartesia(), rime(), lmnt() for TTS - Document new agent() parameters: hooks, text_transforms, vad, audio_filter, background_audio, barge_in_threshold_ms - Add ring_timeout parameter to call() signature - Document Deepgram tuning options: endpointing_ms, utterance_end_ms, vad_events - Synchronize Python and TypeScript API documentation for parity * docs: document barge_in_threshold_ms configuration Update barge-in feature documentation to reflect new barge_in_threshold_ms parameter (default 300ms). Document how to customize or disable via agent configuration. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan