fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89) by nicolotognoni · Pull Request #92 · PatterAI/Patter

nicolotognoni · 2026-05-12T14:42:49Z

Summary

Replacement for PR #89 — rebased on top of feat/observability-otel-attrs-0.6.1 HEAD with CHANGELOG conflict resolved (entries land directly under ## 0.6.1 (2026-05-12)).

Implementation

Dashboard SSE reducer immutability: mergeCallPreserving preserves existing call state when a new call arrives.
Twilio firstMessage barge-in: per-chunk mark gating; outbound buffer no longer blocks interruption of the first message.
Drain pending marks on cleanup / handleStop / handleWsClose to prevent orphan futures/promises.
Reset _first_message_mark_counter per send and on cleanup (no stale fm_N matching across turns/calls).
Dashboard mergeCalls: cap at MAX_UI_CALLS=500, sort by startedAtMs desc.
onMark TS now updates lastConfirmedMark only when the mark matches a queued entry (parity with Python).

Test plan

Python: pytest tests/unit — 1262 passed, 5 skipped
TypeScript: npm test — 1506 passed
Dashboard: npm test — 8 passed
TypeScript: tsc --noEmit — clean

…stream `mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts` rebuilt the calls array from the server snapshot via `next.map(...)`, so any call present in the previous UI state but missing from the next payload was silently dropped. With back-to-back calls, the SSE `call_start` refresh occasionally landed before the prior call propagated to `/api/dashboard/calls` and the row vanished from the SPA — regression reported as #124. The merge is now a true upsert: rows present in `prev` but absent from `next` are appended, so prior calls stay visible until the server snapshot stabilises. Server-side eviction (ring buffer of 500) bounds long-running sessions. Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts` and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added Vitest to the SPA so the helpers can be tested in isolation without a React harness). Refs #124.

The firstMessage TTS chunks were pushed into the carrier WebSocket as fast as the provider yielded them. Twilio's outbound buffer ended up several seconds deep, and a barge-in's sendClear was queued behind the already-enqueued media frames — the agent kept talking on the user's earpiece for up to ~2 s after the user spoke (#128). The firstMessage send path is now a paced loop: * Twilio: every chunk is followed by a unique mark; the loop waits for the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``) resolves every pending mark waiter so the loop exits on the next tick and ``sendClear`` lands on a near-empty carrier buffer. * Telnyx (no mark concept): the loop falls back to a playout-duration- based sleep so the buffer can't out-run a clear by more than one chunk. Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py ``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` / ``_stream_prewarm_bytes`` delegate to the new helper. The existing prewarm chunking test was updated to echo marks via the mock bridge so it interoperates with the new pacing. Coverage: * libraries/typescript/tests/unit/stream-handler.test.ts — ``firstMessage mark-gated pacing`` (3 cases: window cap + barge-in, mark echo slides window, Telnyx playout pacing). * libraries/python/tests/unit/test_first_message_pacing.py — 4 cases including FIFO mark resolution. Refs #128.

The firstMessage paced sender accumulates one mark waiter (asyncio.Future on Python / Promise on TS) per chunk in _pending_marks / pendingMarks while audio is streaming to the carrier. The barge-in cancel path already drained these, but a call that ended without going through cancel — carrier WebSocket drop, hangup mid firstMessage, stop event arriving before the paced sender finished — left every queued future unresolved. The send loop was awaiting them, so the orphan futures leaked until the handler itself was garbage-collected. Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks before tearing down adapters; the TS handleStop and handleWsClose do the equivalent via drainPendingMarks(). Idempotent and safe when the queue is already empty. Added regression coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestCleanupDrainsPendingMarks) - libraries/typescript/tests/unit/stream-handler.test.ts (cleanup drains pending firstMessage marks — handleStop + handleWsClose)

PipelineStreamHandler._first_message_mark_counter (Py) and StreamHandler.firstMessageMarkCounter (TS) were never reset between turns or calls. With handler re-use, the counter incremented monotonically across turns — a paced send for the second turn issued fm_<previous_count + 1> while the carrier could still be echoing a stale fm_<N> from the previous turn, corrupting FIFO matching in on_mark / onMark. Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes (Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a fresh fm_1, fm_2, ... sequence. Also reset on cleanup (PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a belt-and-braces against the cross-call boundary. Coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset) - libraries/typescript/tests/unit/stream-handler.test.ts (firstMessage mark counter resets across sends + on cleanup)

mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved prev_only calls indefinitely by appending them after the fresh snapshot block. Two consequences on a long-lived session: 1. The UI array grew unbounded — once the session cycled through more than 500 calls (the server-side MetricsStore ring buffer default), rows the server had already evicted stayed pinned by prev and were re-appended on every refresh. 2. Ordering was non-deterministic — prev_only rows always landed at the bottom regardless of their startedAtMs, so a newer call could end up below an older one if the snapshot ordering shifted. Fix: after the upsert pass, sort the merged list by startedAtMs descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the server ring buffer. Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a 600-prev+1-fresh cap test and an explicit startedAtMs ordering test.

…with Python) StreamHandler.onMark in libraries/typescript/src/stream-handler.ts unconditionally assigned this.lastConfirmedMark = markName before checking whether the name corresponded to a queued mark. Any echo arriving after the queue was drained, or any mark name emitted by adapters outside the firstMessage queue, would overwrite the handler- level field and contaminate downstream barge-in heuristics gated on lastConfirmedMark. Python stream_handler.py's on_mark never touches a handler-level field at all — the equivalent state lives on TwilioAudioSender.last_confirmed_mark and is updated only by the carrier's own echo handler. The TS path now matches that behaviour defensively: lastConfirmedMark is updated only after the queue lookup confirms a matching entry, mirroring the safer Python semantics. Coverage: libraries/typescript/tests/unit/stream-handler.test.ts (onMark only updates lastConfirmedMark on a matched mark) asserts that an unmatched echo cannot clobber a previously-set value.

* feat(observability): patter.* OTel span attributes (Python only) + 0.6.1 release Ports the observability work from the now-closed PR #82 onto the post-refactor `libraries/python/` layout. PR #82 was authored against the legacy `sdk-py/` paths and was consolidated into the 0.6.0 release branch; this commit lands the actual implementation against the new layout for 0.6.1. What it adds: - `getpatter.observability.attributes` — three new helpers: `record_patter_attrs(attrs)`, `patter_call_scope(call_id, side)` context manager, `attach_span_exporter(patter, exporter, side)`. Lazy-OTel-guarded; no-op when the `[tracing]` extra is not installed. Two ContextVars (`patter.call_id`, `patter.side`) propagate through the asyncio task tree so spans emitted by deeply nested provider code inherit the active call's identity automatically. - `Patter._attach_span_exporter(exporter, *, side="uut")` — public-but- underscore hook for tools that observe Patter from outside (e.g. an out-of-process agent runner). - Per-provider cost emission across 19 surfaces: `patter.cost.{ telephony_minutes, stt_seconds, tts_chars, llm_input_tokens, llm_output_tokens, realtime_minutes}` stamped on the active span. Provider tag emitted alongside as `patter.{telephony,stt,tts,llm, realtime}.provider`. All call sites wrapped in defensive try/except so observability cannot kill a live call. - Per-turn latency: `patter.latency.{ttfb_ms, turn_ms}` stamped from `StreamHandler._emit_turn_metrics` via a new `PipelineHookExecutor.record_turn_latency(*, ttfb_ms, turn_ms)`. - Bridge-level `patter_call_scope` entry on Twilio + Telnyx — entire WebSocket bridge lifetime (incl. hangup/cleanup) bound to the call identity via `contextlib.ExitStack`. - `TwilioAdapter.record_call_end_cost` / `TelnyxAdapter.record_call_end_cost` — adapter helpers used by the bridge to emit `patter.cost.telephony_minutes` once wall-clock duration is known. Versions bumped 0.6.0 → 0.6.1 in `__init__.py`, `pyproject.toml`, `package.json`. CHANGELOG entry added under a new `## 0.6.1 (2026-05-09)` block; the existing `## 0.6.0 (2026-05-08)` block is preserved verbatim — it reflects exactly what was published to PyPI and npm at that tag. ⚠️ TS parity gap: Python only. TypeScript follow-up tracked separately. This is a known time-boxed exception per `.claude/rules/sdk-parity.md`. 5 new unit tests in `libraries/python/tests/unit/ test_observability_attributes_unit.py` exercise the helper module's public surface (`patter_call_scope`, `record_patter_attrs` no-op, `attach_span_exporter` side stamping). Full Python suite: 1719 passed, 7 skipped — green. Refs: closed PR #82. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dashboard,pipeline): hydrate cost/latency from top-level + barge-in gate from first audio Two bugs caught during 0.6.0 acceptance against `releases/0.6.0/typescript/matrix/outbound-cartesia-cerebras-elevenlabs.ts`: 1. **Dashboard hydrate schema mismatch**: `CallLogger.log_call_end` writes `cost`/`latency`/`duration_ms`/`telephony_provider` as top-level keys of `metadata.json`, but `MetricsStore.hydrate` looked for them under `meta.metrics.cost`/`meta.metrics.latency`. Every hydrated row landed with `metrics=null`, so cost/latency rendered as `$0.00`/`—` for all on-disk calls (only the in-flight call had real numbers). Fix synthesizes a `metrics` dict from the top-level fields when `meta.metrics` is absent while preserving any explicit `meta.metrics` payload untouched. 2. **Early barge-in self-cancellation**: cloud TTS first-byte latency is 200–700 ms; the 250 ms anti-flicker gate (no-AEC PSTN default) was anchored on `_speaking_started_at`/`speakingStartedAt` and expired BEFORE TTS produced audio. VAD then picked up background noise and self-cancelled the agent's first turn — 0 bytes emitted, line silent. Fix anchors the gate on a new `_first_audio_sent_at`/`firstAudioSentAt` set AFTER `bridge.sendAudio` / `audio_sender.send_audio` succeeds at the four pipeline emit sites (firstMessage, streaming, regular, WebSocket remote). `_can_barge_in`/`canBargeIn` returns false while the marker is null. Gate values (250 ms / 1000 ms) unchanged — only the anchor moves. Tests: - Py 1717/1717, TS 1394/1394 green; lint clean. - New regressions: `test_hydrate_lifts_top_level_cost_and_latency_into_metrics`, `test_hydrate_preserves_explicit_metrics_when_present`, `test_barge_in_suppressed_before_first_audio_emitted` (Py) + parity TS cases in `tests/dashboard-store.test.ts` and `tests/unit/stream-handler.test.ts`. - Existing `_handle_barge_in`/`handleBargeIn` tests updated to set both timestamps for the new contract. * feat(barge-in): opt-in confirmation strategies (MinWords reference impl) Cloud TTS first-byte latency (200-700 ms) plus PSTN background noise mean the legacy "any VAD speech_start cancels the agent" contract produced frequent false-positive cancels — cough, click, HVAC, breath, or a quick "okay" cut the agent mid-sentence and lost the conversational thread. This PR adds an opt-in two-stage confirmation pipeline. With the new empty-tuple default behaviour is unchanged. Configure ``Agent.barge_in_strategies`` / ``agent.bargeInStrategies`` to enable: 1. VAD speech_start during TTS marks the barge-in PENDING. TTS keeps streaming naturally — the LLM stream stays alive. 2. Each STT transcript is evaluated by every configured strategy (short-circuit OR; per-strategy errors are isolated). 3. First strategy that returns True confirms the cancel: runs the existing send_clear + flush ring + LLM abort sequence. 4. If no strategy confirms within ``barge_in_confirm_ms`` (default 1500 ms) the pending state is dropped and the agent finishes its sentence. New module ``getpatter.services.barge_in_strategies`` exposes: - ``BargeInStrategy`` Protocol (async ``evaluate`` + optional ``reset``) - ``MinWordsStrategy`` — filters short backchannels by requiring N words while the agent is speaking and letting any single word through while the agent is silent (so the first user turn is never delayed). - ``evaluate_strategies`` / ``reset_strategies`` helpers. TS parity in ``src/services/barge-in-strategies.ts`` with the same public surface (``MinWordsStrategy``, ``BargeInStrategy`` interface, ``evaluateStrategies``/``resetStrategies``). Wiring lives in stream_handler.py ``_handle_barge_in`` and stream-handler.ts ``handleBargeIn`` — both keep the existing canBargeIn gate (firstAudioSentAt anchor) and only add the strategy check when at least one strategy is configured. Tests: - Py: 1741/1741 green; new ``test_barge_in_strategies.py`` (14) + ``test_barge_in_two_stage.py`` (10). - TS: 1419/1419 green; new ``barge-in-strategies.test.ts`` (15) + ``barge-in-two-stage.test.ts`` (10). Lint clean. - Existing barge-in regression suites still pass byte-for-byte: empty strategies preserve legacy behaviour exactly. CHANGELOG ``## Unreleased`` updated with full design + file list. * feat(0.6.1): cost split + STT methodology fix + prewarm + 13 review fixes Three user-visible features plus a hardening sweep from a 5-agent code review covering security, billing safety, race conditions, and resource leaks. ## Features ### Dashboard cost panel: STT and TTS as separate rows The cost breakdown previously combined STT and TTS into one "STT / TTS" line, hiding which side dominated cost. Now rendered as two adjacent rows labelled with the actual provider name (e.g. "Cartesia STT" / "ElevenLabs TTS"), driven by ``record.metrics.stt_provider`` / ``tts_provider`` already exposed by the backend. Files: ``dashboard-app/src/components/CostPanel.tsx``, ``dashboard-app/src/lib/mappers.ts``. ### stt_ms is now finalization-only (BREAKING semantic change) Previously ``LatencyBreakdown.stt_ms`` measured ``stt_complete - turn_start`` — which conflated user speech duration with STT processing. A 5 s utterance produced ``stt_ms ≈ 5000`` even when Cartesia/Deepgram finalized in 200 ms after end-of-speech. Industry benchmarks (Picovoice/Deepgram/Gladia/Speechmatics) all report STT latency as the finalization window: ``final_transcript - end_of_speech``. ``stt_ms`` now matches that definition. New optional field ``user_speech_duration_ms`` carries the displaced "how long did the user speak" number. Files: ``libraries/python/getpatter/models.py``, ``libraries/python/getpatter/services/metrics.py``, ``libraries/typescript/src/metrics.ts``. ### Pre-warm services + pre-synth firstMessage ``Agent.prewarm: bool = True`` (default on) warms STT/TTS/LLM provider connections in parallel with carrier ``initiate_call`` so DNS, TLS, HTTP/2 / WebSocket handshakes are complete by the time the callee answers. Concrete ``warmup()`` overrides shipped on Deepgram / Cartesia / AssemblyAI STT, ElevenLabs WS / Cartesia / Inworld TTS, OpenAI Realtime. ``Agent.prewarm_first_message: bool = False`` (opt-in) pre-renders ``first_message`` to TTS bytes during ringing and streams the cached buffer instantly when the carrier emits ``start`` — eliminates 200-700 ms of TTS first-byte latency on the greeting at the cost of paying TTS even when the call isn't answered (logged at WARN level when wasted). ## Review fixes (12 issues from 5-agent multi-perspective review) ### Provider warmup correctness - 🔴 OpenAI Realtime warmup uses ``session.update`` (not the non-spec ``response.create`` with ``generate:false`` which could silently bill tokens or return ``invalid_request_error``). Files: ``providers/openai_realtime.py``, ``providers/openai-realtime.ts``. - 🟡 ElevenLabs WS warmup BOS frame now mirrors the live ``synthesize`` BOS byte-for-byte (``voice_settings`` + ``generation_config``). Shared helper ``_build_bos_frame`` / ``buildBosFrame``. Verified billing-safe via no ``flush:true``, no real text. Files: ``providers/elevenlabs_ws_tts.py``, ``providers/elevenlabs-ws-tts.ts``. - 🟡 Inworld TTS warmup uses ``GET /tts/v1/voices`` instead of ``HEAD`` against POST-only stream endpoint (was returning 405 in audit logs). - 🟡 Cartesia STT + AssemblyAI STT warmup error logs no longer leak the API key — catches ``WSServerHandshakeError`` specifically and logs only the HTTP status code, never ``str(exc)`` (which embeds the URL). ### StreamHandler / barge-in correctness - 🟠 Double ``record_overlap_start`` on strategy-confirmed barge-in fixed: VAD start path now stamps T1, the strategy-confirm path no longer overwrites with T2 — ``detection_delay_ms`` is now correct for every user opting into ``barge_in_strategies``. Files: ``stream_handler.py:_do_cancel_for_barge_in``, ``stream-handler.ts:runBargeInCancel``. - 🟠 Pending barge-in task leak fixed: ``cleanup`` (Py) / ``handleStop`` + ``handleWsClose`` (TS) now call ``_clear_pending_barge_in`` so a call ending mid-pending no longer leaves an asyncio.Task / setTimeout firing on a finalized handler. - 🟢 Pre-warm bytes now chunked (1280 B / 40 ms) before ``audio_sender.send_audio`` so barge-in mid-greeting can flush cleanly via the existing mark/clear bookkeeping. ### Patter client + cache hardening - 🟠 Cache eviction on abnormal hangup: the Twilio status callback (``no-answer`` / ``busy`` / ``failed`` / ``canceled``) and the Telnyx ``call.hangup`` / AMD-machine paths now call ``_record_prewarm_waste`` so memory doesn't leak proportional to no-answer rate. - 🟠 Race start-vs-prewarm fixed: a ``_prewarm_consumed`` set tracks consumed call_ids so a late-arriving prewarm task drops its bytes instead of orphaning them in the cache. - 🟡 ``disconnect()`` now cancels in-flight prewarm tasks and clears the cache (no spend leak across serve/disconnect cycles). - 🟡 ``prewarm_first_message=True`` on Realtime / ConvAI mode now logs a WARN and skips the spawn (was silently paying TTS for bytes the StreamHandler never consumed). - 🟡 Prewarm cache bounded at 200 entries with TTL-based eviction (``ring_timeout + 5 s``) — caps memory under outbound flood scenarios. ### Documentation - Docstring for ``Agent.barge_in_strategies`` corrected: TTS continues streaming naturally during pending state (was misleadingly described as "paused"). ## Tests 47 new regression tests across 4 new files plus updates to existing suites. Verifies every fix above with authentic mocks at the network boundary only: - ``libraries/python/tests/test_prewarm.py`` (new — 28 tests covering default flag values, no-op default ``warmup``, all-three-providers warmup invocation, opt-out, exception swallow, cache populate / skip / empty-message / timeout, one-shot pop, waste-warn log, StreamHandler cache-hit short-circuit + cache-miss live-TTS fallback, race orphan, disconnect cleanup, cap+TTL eviction, provider-mode validation, chunking). - ``libraries/python/tests/unit/test_provider_warmup.py`` (new — 18 tests covering all 7 concrete ``warmup()`` overrides + billing-safety regressions + key-leak regressions). - ``libraries/typescript/tests/unit/prewarm.test.ts`` (new — 23 TS twins). - ``libraries/typescript/tests/unit/provider-warmup.mocked.test.ts`` (new — 19 TS twins). - Updates to ``test_barge_in_two_stage.py`` (3 ``record_overlap_start`` tests + 2 cleanup tests), ``barge-in-two-stage.test.ts`` (4 TS twins), ``server-routes.test.ts`` (2 status-callback eviction tests). ## Verification - Python: 1797 passed, 7 skipped, 0 failed (was 1707 + 14 prewarm + 76 inherited from new subclass collection-tests) - TypeScript: 1467 passed across 83 files (was 1430 + 37 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (ESM + CJS + dts + CLI): clean ## CHANGELOG All entries under ``## 0.6.1 (2026-05-09)`` with file paths, line numbers, rationale, and test paths. * fix(0.6.1): WS handoff prewarm + dashboard regressions + first-turn latency Live PSTN smoke tests against ``outbound-cartesia-cerebras-elevenlabs.ts`` exposed several issues in 0.6.1 that were not caught by the unit suite. This commit ships seven fixes plus three quick wins on top of the prewarm pipeline. ## Architectural — WebSocket handoff for prewarm (replaces open-then-close) The 0.6.1 prewarm pipeline as previously shipped (commit ``c585f6d``) opened a streaming-STT and streaming-TTS WebSocket during the carrier ringing window, idled ~250 ms, and closed it. Investigation showed the strategy is structurally insufficient on Node: the ``ws`` package does not thread a TLS session ticket across separate ``new WebSocket(...)`` constructions, so every fresh ``connect()`` at call pickup pays full TCP+TLS+HTTP-101 upgrade. Net saved time was 50–250 ms (DNS cache only) versus 700–1500 ms of cold-start budget. Live test reported "several seconds" first-turn latency, p95 3048 ms. The new strategy keeps the warmed WS open and hands it off to the ``StreamHandler`` at call pickup. New API surface: - ``Patter._prewarmedConnections: Map<callId, ParkedProviderConnections>`` (TS) / ``self._prewarmed_connections: dict[str, ParkedProviderConnections]`` (Py) — keyed by carrier-issued ``call_id``, populated during ringing, drained on call end or after a 30 s safety TTL. - ``provider.openParkedConnection()`` / ``open_parked_connection()`` — added to ``CartesiaSTT``, ``ElevenLabsWebSocketTTS``, ``OpenAIRealtimeAdapter``. Opens the WS, sends the same initial config the live ``connect()`` sends (STT: empty config; TTS: BOS frame matching ``synthesize`` BOS byte-for-byte; Realtime: ``session.update``), and returns a handle the caller parks. - ``provider.adoptWebSocket(handle)`` / ``adopt_websocket(handle)`` — added to the same three providers. Accepts a pre-opened WS, validates ``readyState === OPEN``, and proceeds with the live message loop. For ElevenLabs WS TTS the handle carries a ``bosAlreadySent: true`` flag so the first ``synthesizeStream`` iteration does not double-send BOS (which would be a protocol error). - ``StreamHandler`` checks ``client.popPrewarmedConnections(callId)`` before falling back to fresh ``connect()``. On adopt, the path skips TCP+TLS+upgrade and the BOS round-trip — STT connects in 0 ms, TTS in 0 ms. Cleanup wiring: the same status callback paths that already drain the prewarm-audio cache (FIX #91) now also close any parked WS for failed calls (no-answer / busy / failed / canceled / AMD-machine). The 30 s TTL covers the rare carrier path that emits neither ``start`` nor a status callback. Live validation against ``outbound-cartesia-cerebras-elevenlabs.ts``: ``[PREWARM] callId=… provider=stt ms=769`` followed by ``[CONNECT] callId=… provider=stt source=adopted ms=0`` — STT connect went from 150–400 ms to 0 ms. First-turn greeting wire-time dropped from "several seconds" to **990 ms**. Files: ``libraries/typescript/src/client.ts`` (cache + ``parkProviderConnections``, ``popPrewarmedConnections``, ``closePrewarmedConnections``, ``ParkedProviderConnections`` interface, ``closeParkedConnections`` helper); ``libraries/typescript/src/server.ts`` (forwards ``popPrewarmedConnections`` into ``StreamHandlerDeps``); ``libraries/typescript/src/stream-handler.ts`` (adopt-or-connect logic); ``libraries/typescript/src/providers/{cartesia-stt,elevenlabs-ws-tts,openai-realtime}.ts`` (park + adopt API surface). Python parity in ``libraries/python/getpatter/{client,server,stream_handler,telephony/twilio,telephony/telnyx}.py`` and ``libraries/python/getpatter/providers/{cartesia_stt,elevenlabs_ws_tts,openai_realtime}.py``. Realtime mode has the API surface but the ``OpenAIRealtimeStreamHandler`` adoption is deferred to a follow-up — pipeline mode dominates the affected use case. ## Quick wins (parallel to WS handoff, smaller individual savings) - **Eager AEC import on ``Patter.serve()``** (gated on ``agent.echo_cancellation=true``). Was previously a lazy ``await import('./audio/aec')`` on first ``start`` event, paying 150–400 ms JIT on the first call. Files: ``libraries/typescript/src/client.ts``, ``libraries/python/getpatter/client.py``. - **Parallel ``stt.connect()`` and TTS-firstMessage kickoff**. Previously the StreamHandler awaited STT before TTS firstMessage — STT does not need to be ready to send firstMessage out, only to receive caller audio. Now both kick off concurrently. Saves 200–400 ms on the first turn. Files: ``libraries/typescript/src/stream-handler.ts``, ``libraries/python/getpatter/stream_handler.py``. - **Timing instrumentation**: new ``[PREWARM]`` and ``[CONNECT]`` INFO logs in the prewarm spawn and provider connect paths, with elapsed-ms per provider. Lets us A/B-test future prewarm changes with numerical evidence rather than perceptual reports. ## Dashboard fixes (third pass — issues found during the round-2 PSTN test) ### Live transcript shows only one turn at a time (BUG #102) ``MetricsStore.recordTurn`` correctly accumulated turns into ``active.turns[]`` but the frontend ``toUiTranscript`` mapper had two paths: a primary keyed on ``record.transcript.length > 0`` (used for completed calls) and a fallback that derived rows from ``record.turns``. For an in-flight call the primary always returned empty (active records never carried ``transcript[]``) and only the fallback rendered, so the two paths diverged. Each ``recordTurn`` now mirrors the round-trip into a flat ``active.transcript`` array (one user entry + one assistant entry per turn, filtering empty ``user_text`` and the ``[interrupted]`` agent sentinel), so the primary path sees the same accumulating ``user → assistant → user → assistant → …`` history live calls and completed calls both expose. Files: ``libraries/typescript/src/dashboard/store.ts``, ``libraries/typescript/tests/dashboard-store.test.ts`` (5 new authentic tests). ### Transcript disappears after call end (BUG #101) The Twilio status callback for ``CallStatus=completed`` fires a beat before the WS ``stop`` frame, so ``MetricsStore.updateCallStatus`` moved the active record into the completed buffer **without preserving ``turns[]`` or ``transcript[]``**. The subsequent ``recordCallEnd`` overwrote that completed entry, but in the gap any ``useTranscript`` fetch returned a record with no transcript and the live pane went blank. Three-point fix: (a) ``updateCallStatus`` terminal branch now copies ``active.turns`` and ``active.transcript`` into the new completed entry; (b) ``recordCallEnd`` falls back to active/existing transcript when ``data.transcript`` is empty; (c) the ``useTranscript`` hook subscribes to ``call_end`` SSE events (independent of ``isLive``) so the pane refetches the moment ``recordCallEnd`` lands the SDK-authoritative ``history.entries``. Files: ``libraries/typescript/src/dashboard/store.ts``, ``dashboard-app/src/hooks/useTranscript.ts``. ### Sparkline tooltip generic / wrong metric (BUG #104) The metric-tile sparkline tooltip rendered ``"N call(s)"`` plus a per-call sample list regardless of which card it was attached to — the latency and spend cards therefore showed the same headline as the calls card. New ``MetricKind`` prop (``'count' | 'latency' | 'spend'``) threaded through ``Metric`` → ``SparkBar`` → ``SparkTooltip``, with a pure ``bucketHeadline(bucket, kind)`` helper that computes per-card aggregates: ``TOTAL COST $X.XXX`` (sum of per-call cost), ``AVG LATENCY <p95-mean> ms`` (mean of per-call P95), or ``N CALL(S)``. Headline label uppercased, monospace, styled to match the existing time-range header on the same tooltip. Files: ``dashboard-app/src/App.tsx``, ``dashboard-app/src/components/Metric.tsx``, ``dashboard-app/src/styles/dashboard.css``. ### caller / callee never persisted to metadata.json (BUG B from the second pass) Every persisted ``metadata.json`` showed ``"caller": ""``, ``"callee": ""`` for completed calls — only the in-memory ``MetricsStore`` had the right values. The persist layer received empty strings because the ``CallLogger.log_call_end`` data shape was built from agent options rather than the live record. ``server.ts`` ``wrappedStart`` now resolves ``caller``/``callee`` from the active store record before persisting; Python ``record_call_start`` parity fix stops clobbering caller/callee with empty strings on the upgrade-from-initiated path (TS already had the right pattern). ### Call disappears from dashboard after end (BUG C from the second pass) Race-induced duplicate row: Twilio's status callback for ``CallStatus=completed`` fires ~50–200 ms before the WS ``stop`` frame. ``updateCallStatus`` moved the row out of ``activeCalls`` into ``calls[]`` correctly, then the WS ``stop`` drove ``recordCallEnd``, ``activeCalls.get(callId)`` returned undefined, and a duplicate entry was pushed with ``started_at = 0`` and empty caller/callee. The duplicate masked the well-formed earlier row and the 24h window filter excluded it. ``recordCallEnd`` / ``record_call_end`` now searches ``calls[]`` for the existing entry when active is gone and **updates in place**, preserving caller/callee/started_at and merging in the just-collected metrics. ## Tests 47 new regression tests across 6 files (TS + Py parity): - ``libraries/python/tests/test_prewarm_handoff.py`` (new — 6 tests) - ``libraries/typescript/tests/unit/prewarm-handoff.test.ts`` (new — 6 tests) - ``libraries/python/tests/unit/test_dashboard_store_unit.py`` (+4 dedup + active-accessor tests) - ``libraries/python/tests/unit/test_server_unit.py`` (+1 caller/callee persist test) - ``libraries/typescript/tests/dashboard-store.test.ts`` (+7 dedup + transcript accumulate + accessor tests) - ``libraries/typescript/tests/server.test.ts`` (+1 caller/callee persist test using real ``CallLogger``) ## Verification - Python: ``pytest -q`` → 1808 passed, 7 skipped (was 1797 + 11 new) - TypeScript: ``npm test`` → 1481 passed (was 1467 + 14 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (esm + cjs + dts + cli): clean - Dashboard SPA build (``cd dashboard-app && npm run build``): clean (204.93 kB / 63.47 kB gz) - Dashboard sync: both ``libraries/{python,typescript}/.../dashboard/ui.html`` refreshed - Live PSTN smoke test (``outbound-cartesia-cerebras-elevenlabs.ts``): WS handoff log fired, first-turn greeting 990 ms, transcript live and post-end render OK, sparkline tooltip per-card OK * fix(0.6.1): roll back STT debounce, dashboard threshold, Krisp TS scaffold Headline changes since cbe1886: * Rolled back the 400 ms STT-final → LLM dispatch debounce introduced earlier in 0.6.1 (`_scheduleTurnCommit` / `_runDeferredTurnCommit` in TS, `_schedule_turn_commit` / `_delayed_turn_commit` in Python). The partial-transcript reschedule branch was overwriting the dispatched FINAL text with the latest partial, causing entire user turns to be dropped during slow-LLM windows. Verified on real PSTN (round 10k with gpt-5-nano dropped 3 of 5 user turns). Dispatch is now synchronous on `is_final` again. The original double-talk symptom is re-opened with a better fix path documented internally. * Kept beneficial 0.6.1 work: `beginSpeaking` stamps `firstAudioSentAt = Date.now()` on every turn so the `canBargeIn()` anti-flicker gate runs in parallel with LLM TTFT + TTS TTFB; VAD `speech_start` calls `anchorUserSpeechStart()` and skips on phantom-during-warmup-gate; commit-drop path re-anchors; WARN log when pipeline has no `llm` / `onMessage` handler; char/4 fallback billing for providers that don't emit a usage chunk; `OpenAILLMProvider.providerKey` static; firstMessage TTS char billing; persist full latency breakdown per percentile in metadata.json; dashboard hydrate reads `transcript.jsonl`; ElevenLabs default flipped to WS. * Lowered dashboard percentile threshold 5 → 2 turns so the detail pane no longer shows `—` for p50/p95 on typical 4-7 turn PSTN calls while the list column already shows a real number via avg fallback. * Added Krisp VIVA noise-suppression scaffold for the TypeScript SDK at `libraries/typescript/src/providers/krisp-filter.ts` for cross- SDK parity with the existing Python `KrispVivaFilter`. Throws at construction time because Krisp does not publish an official Node SDK as of 2026-05; users supply SDK + `.kef` model + license. New top-level exports: `KrispVivaFilter`, `KrispVivaFilterOptions`, `KrispSampleRate`, `KrispFrameDuration`, `DeepFilterNetFilter`, `DeepFilterNetOptions`. * CHANGELOG 0.6.1 section revised to reflect the rollback narrative honestly (debounce attempted, rolled back before release) and to document the new entries. * Scrubbed competitor-name references from source files (Pipecat, LiveKit) per project rule `.claude/rules/no-competitor-references.md`; replaced with "industry-standard pattern" wording. Source files affected: `stream-handler.ts`, `stream_handler.py`, `metrics.ts`, `services/metrics.py`, `silero_vad.py`. * Krisp Python wrapper unchanged. Tests: TS lint clean, vitest 1486/1486 pass; Python pytest unit 1252 pass, 5 skip. Validated on real PSTN: post-rollback p95 wait 1844 ms over 4 clean sequential turns (no drops) on cellular hotspot — vs catastrophic 8521 ms with 3 dropped turns pre-rollback. * fix: revert ElevenLabs HTTP→WS default flip from 4ff09bd Keep ElevenLabsTTS backed by HTTP REST (original cbe1886 state). The WS default caused pipeline latency regression and prewarm lifecycle bugs. ElevenLabsWebSocketTTS remains available as opt-in via direct import. * fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89) (#92) * fix(dashboard): preserve existing calls when new call arrives in SSE stream `mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts` rebuilt the calls array from the server snapshot via `next.map(...)`, so any call present in the previous UI state but missing from the next payload was silently dropped. With back-to-back calls, the SSE `call_start` refresh occasionally landed before the prior call propagated to `/api/dashboard/calls` and the row vanished from the SPA — regression reported as #124. The merge is now a true upsert: rows present in `prev` but absent from `next` are appended, so prior calls stay visible until the server snapshot stabilises. Server-side eviction (ring buffer of 500) bounds long-running sessions. Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts` and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added Vitest to the SPA so the helpers can be tested in isolation without a React harness). Refs #124. * fix(barge-in): firstMessage interruptible via per-chunk mark gating The firstMessage TTS chunks were pushed into the carrier WebSocket as fast as the provider yielded them. Twilio's outbound buffer ended up several seconds deep, and a barge-in's sendClear was queued behind the already-enqueued media frames — the agent kept talking on the user's earpiece for up to ~2 s after the user spoke (#128). The firstMessage send path is now a paced loop: * Twilio: every chunk is followed by a unique mark; the loop waits for the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``) resolves every pending mark waiter so the loop exits on the next tick and ``sendClear`` lands on a near-empty carrier buffer. * Telnyx (no mark concept): the loop falls back to a playout-duration- based sleep so the buffer can't out-run a clear by more than one chunk. Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py ``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` / ``_stream_prewarm_bytes`` delegate to the new helper. The existing prewarm chunking test was updated to echo marks via the mock bridge so it interoperates with the new pacing. Coverage: * libraries/typescript/tests/unit/stream-handler.test.ts — ``firstMessage mark-gated pacing`` (3 cases: window cap + barge-in, mark echo slides window, Telnyx playout pacing). * libraries/python/tests/unit/test_first_message_pacing.py — 4 cases including FIFO mark resolution. Refs #128. * fix(barge-in): drain pending marks on call cleanup/stop/ws-close The firstMessage paced sender accumulates one mark waiter (asyncio.Future on Python / Promise on TS) per chunk in _pending_marks / pendingMarks while audio is streaming to the carrier. The barge-in cancel path already drained these, but a call that ended without going through cancel — carrier WebSocket drop, hangup mid firstMessage, stop event arriving before the paced sender finished — left every queued future unresolved. The send loop was awaiting them, so the orphan futures leaked until the handler itself was garbage-collected. Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks before tearing down adapters; the TS handleStop and handleWsClose do the equivalent via drainPendingMarks(). Idempotent and safe when the queue is already empty. Added regression coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestCleanupDrainsPendingMarks) - libraries/typescript/tests/unit/stream-handler.test.ts (cleanup drains pending firstMessage marks — handleStop + handleWsClose) * fix(barge-in): reset firstMessage mark counter per send + on cleanup PipelineStreamHandler._first_message_mark_counter (Py) and StreamHandler.firstMessageMarkCounter (TS) were never reset between turns or calls. With handler re-use, the counter incremented monotonically across turns — a paced send for the second turn issued fm_<previous_count + 1> while the carrier could still be echoing a stale fm_<N> from the previous turn, corrupting FIFO matching in on_mark / onMark. Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes (Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a fresh fm_1, fm_2, ... sequence. Also reset on cleanup (PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a belt-and-braces against the cross-call boundary. Coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset) - libraries/typescript/tests/unit/stream-handler.test.ts (firstMessage mark counter resets across sends + on cleanup) * fix(dashboard): cap merged UI calls at 500 + sort by startedAt desc mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved prev_only calls indefinitely by appending them after the fresh snapshot block. Two consequences on a long-lived session: 1. The UI array grew unbounded — once the session cycled through more than 500 calls (the server-side MetricsStore ring buffer default), rows the server had already evicted stayed pinned by prev and were re-appended on every refresh. 2. Ordering was non-deterministic — prev_only rows always landed at the bottom regardless of their startedAtMs, so a newer call could end up below an older one if the snapshot ordering shifted. Fix: after the upsert pass, sort the merged list by startedAtMs descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the server ring buffer. Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a 600-prev+1-fresh cap test and an explicit startedAtMs ordering test. * fix(realtime): only update lastConfirmedMark on matched mark (parity with Python) StreamHandler.onMark in libraries/typescript/src/stream-handler.ts unconditionally assigned this.lastConfirmedMark = markName before checking whether the name corresponded to a queued mark. Any echo arriving after the queue was drained, or any mark name emitted by adapters outside the firstMessage queue, would overwrite the handler- level field and contaminate downstream barge-in heuristics gated on lastConfirmedMark. Python stream_handler.py's on_mark never touches a handler-level field at all — the equivalent state lives on TwilioAudioSender.last_confirmed_mark and is updated only by the carrier's own echo handler. The TS path now matches that behaviour defensively: lastConfirmedMark is updated only after the queue lookup confirms a matching entry, mirroring the safer Python semantics. Coverage: libraries/typescript/tests/unit/stream-handler.test.ts (onMark only updates lastConfirmedMark on a matched mark) asserts that an unmatched echo cannot clobber a previously-set value. * fix(metrics): align EOU semantics + unit (ms) across Python/TS The Python ``CallMetricsAccumulator._emit_eou_metrics`` had ``end_of_utterance_delay`` and ``transcription_delay`` swapped relative to the TypeScript ``emitEouMetrics`` AND emitted them in seconds while TS emits milliseconds. Dashboards or exporters reading the same metric across both SDKs saw a 1000x disagreement on top of swapped field semantics. Locked convention (now identical in both SDKs): - end_of_utterance_delay = stt_final - vad_stopped (ms) - transcription_delay = turn_commit - vad_stopped (ms) - on_user_turn_completed_delay (ms, unchanged) Python now clamps negative deltas to 0 (TS already did). The Python ``EOUMetrics`` docstring updated from "seconds" to "milliseconds". Tests pin both behaviours: - libraries/python/tests/test_metrics.py::TestEOUMetricsEmission - libraries/typescript/tests/unit/metrics.test.ts :: CallMetricsAccumulator > emitEouMetrics field semantics Refs: 0.6.1 observability parity audit. * feat(ts): observability OTel no-op stubs for SDK parity The Python SDK exposed three OTel-related helpers since 0.6.1: ``record_patter_attrs``, ``patter_call_scope``, ``attach_span_exporter`` (in ``getpatter.observability.attributes``). The TypeScript SDK had no equivalent surface — every provider adapter that called the Python helpers had no place to call across the parity boundary, violating ``.claude/rules/sdk-parity.md``. Port the helpers to TypeScript as no-ops by default. When ``PATTER_OTEL_ENABLED`` is unset or ``@opentelemetry/api`` is not installed, each helper returns immediately, keeping the zero-cost disabled path that the rest of the observability module already respects. Semantic mapping: - recordPatterAttrs(attrs) <-> record_patter_attrs - patterCallScope({ callId, side }, fn) <-> patter_call_scope - attachSpanExporter(patterInstance, exporter) <-> attach_span_exporter The JS form of patterCallScope takes an async callback because JS lacks ``with``-style context managers; the closure is the scope body. The module uses a module-level stack instead of a ContextVar, which is sufficient for the SDK's one-call-per-handler model. Tests: - libraries/typescript/tests/unit/observability-attributes.test.ts (7 smoke cases covering the public surface + scope unwind on throw) * fix(0.6.1): Cerebras usage-chunk log + Krisp TS status refresh (re-base of #90) (#91) * chore(cerebras): debug log when usage chunk missing + fallback fires When an upstream LLM stream (Cerebras and similar) does not emit a `usage` chunk despite `stream_options={include_usage:true}`, the char/4 fallback billing path previously emitted WARN on every tool-loop iteration. Multi-tool turns logged 5-10 identical WARN lines for the same call, drowning real warnings. Replace with one-shot INFO at first fallback per LLMLoop instance (provider, model, char counts, est_tokens), then DEBUG for every subsequent iteration with the running `_usage_missing_count` / `_usageMissingCount` total. No billing behaviour change — char/4 estimation still drives `record_llm_usage` / `recordLlmUsage`. Symmetric Python (`logger.info`/`logger.debug`) and TypeScript (`getLogger().info`/`.debug`). * docs(krisp): refresh unavailable message with current SDK status KrispVivaFilter constructor in the TypeScript SDK still throws — no official Krisp Node.js server SDK exists as of 2026-05. Verified via `npm search krisp`: - `@livekit/krisp-noise-filter` (0.4.3, 2026-04) — browser WASM track processor on the local microphone; cannot run server-side. - `@livekit/react-native-krisp-noise-filter` (0.0.3) — mobile native. - `@krisp.ai/kr-local-monitoring` — Krisp's only first-party npm package; "Local Monitoring API", not noise cancellation. Refreshed the thrown message to (a) stamp the verification date, (b) explicitly distinguish "server Node SDK" from the existing browser/RN wrappers, (c) list the LiveKit packages with the reason they don't apply to Patter (server-received PCM/mulaw stream). Python KrispVivaFilter and TS DeepFilterNetFilter remain the only shipped paths. No code behaviour change. * fix(krisp): remove competitor package names from error message Per .claude/rules/no-competitor-references.md the TS Krisp filter error message cannot cite competitor package names — refactored the "Browser/React Native" block to describe the category generically (third-party wrappers, client-side scope) without naming specific packages. Same cleanup applied to the matching CHANGELOG entry. No behavioural change. * fix(ts): correct ElevenLabsTTS export comment — HTTP REST default, not WebSocket After commit 8507a34 reverted the HTTP→WS flip, the comment still said ElevenLabsTTS "defaults to WebSocket streaming as of 0.6.1". Updated to reflect current reality: ElevenLabsTTS = HTTP REST (pcm_16000), ElevenLabsWebSocketTTS = WS variant, ElevenLabsRestTTS = HTTP alias. * fix(pipeline): always pace prewarmed first-message audio by playout duration Twilio mark ACKs can batch-resolve simultaneously — when all 3 pending marks in FIRST_MESSAGE_MARK_WINDOW resolve at once, waitForMarkWindow unblocks 3 consecutive loop iterations with no delay, sending a burst of ~120ms of audio. The carrier jitter buffer drains for a moment then refills, producing audible crackling on the first message only (regular turns use synthesizeSentence / synthesize_sentence which send audio directly without marks and are unaffected). Remove the `if markPromise === null` / `if mark_fut is None` guard so the playout sleep (40ms for a 1280-byte chunk) runs unconditionally after every chunk on all carriers. Mark tracking for barge-in is preserved. Files: libraries/typescript/src/stream-handler.ts, libraries/python/getpatter/stream_handler.py. Update tests to use fake timers (TS: vi.useFakeTimers + advanceTimersByTimeAsync, Python: asyncio.sleep mock) so the 40ms per-chunk sleep does not make unit tests slow. Align tts-facade-language.test.ts with the current ElevenLabs HTTP REST default (commit 8507a34 reverted the WS flip). * fix(pipeline): correct first-message burst pacing with initialFillComplete flag The previous fix (always sleep 40ms per chunk) eliminated the initial burst needed to pre-fill Twilio's PSTN jitter buffer (250–1500 ms), causing the same crackling symptom it was meant to cure. Root cause: Twilio can batch-resolve all FIRST_MESSAGE_MARK_WINDOW (3) mark ACKs in a single event-loop turn. When the window unblocks 3 consecutive iterations with no sleep, 3 chunks are sent in burst and the jitter buffer drains momentarily → crackling. Correct fix: the first FIRST_MESSAGE_MARK_WINDOW chunks go out in burst (no playout sleep) to pre-fill the jitter buffer. Once the window is first full, a sticky `initialFillComplete` / `initial_fill_complete` flag flips to true and subsequent chunks are paced by playout time (~40 ms per chunk), preventing batch-ACK bursts. On Telnyx (no mark concept) the playout sleep runs unconditionally on every chunk. Tests: 7/7 Python, 35/35 TypeScript — no changes needed on Python side (autouse sleep-patch fixture already makes all sleeps instant). * fix(pipeline): one-shot barge-in + first-message crackle + barge-in gate Three related fixes to the no-AEC Twilio PSTN pipeline that together deliver a smooth-feeling agent on real phone calls (verified live: 5 turns, p95 wait 685 ms, every user utterance produced a fresh VAD speech_start with multiple successful interruptions in one call). 1. One-shot barge-in. After a successful barge-in, subsequent barge-in attempts silently failed. PSTN echo of the agent's TTS kept SileroVAD's smoothed probability above deactivationThreshold (0.35) for the full agent turn, so pubSpeaking stayed true cross-turn and no fresh SILENCE -> SPEECH transition ever fired. Added an optional reset() hook to VADProvider; SileroVAD implements it by clearing the pending buffer, pubSpeaking, the speech/silence threshold durations, the ExpFilter, AND the ONNX RNN hidden state + rolling context (without resetting the model the detector "remembers" the echo). StreamHandler invokes reset in beginSpeaking (every new agent turn starts clean) and at the grace-timer fire of endSpeakingWithGrace (natural turn end leaves VAD ready for the next spontaneous user utterance). 2. First-message crackle. StatefulResampler seeded its 5-tap FIR history with input[0] on the first call. When ElevenLabs HTTP streaming delivers a chunk that starts at non-zero amplitude this produced a startup transient audible as a brief crackle at the beginning of the first TTS message. Seeded with zeros instead — the correct initial condition for a filter that has received no prior input. 3. Barge-in gate 250 ms -> 100 ms, suppressed speech flushed. The no-AEC anti-flicker gate was 250 ms, which on short agent turns (< ~400 ms of audio) consumed most of the turn and silently suppressed legitimate interruptions. Reduced to 100 ms (still blocks the ~100-200 ms PSTN echo round-trip). When a speech_start is gate-suppressed the inboundAudioRing accumulates user audio that was previously discarded at the next beginSpeaking; added a suppressedSpeechPending flag so the grace-timer flush replays the ring to STT on natural turn end. Parity: TS unconditionally stamps firstAudioSentAt in beginSpeaking since 2026-05-11; Python _begin_speaking now matches (was conditional on is_first_message, which made any turn with a slow LLM un-interruptible for the full LLM TTFT window). Files touched: libraries/typescript/src/types.ts (VADProvider.reset?) libraries/typescript/src/providers/silero-vad.ts (reset impl) libraries/typescript/src/stream-handler.ts (call resetVad, suppressedSpeechPending, gate constant, ring flush on grace) libraries/typescript/src/audio/transcoding.ts (FIR zero-seed) libraries/python/getpatter/providers/base.py (VADProvider.reset) libraries/python/getpatter/providers/silero_onnx.py (OnnxModel.reset) libraries/python/getpatter/providers/silero_vad.py (reset impl) libraries/python/getpatter/stream_handler.py (parity wiring) CHANGELOG.md (Unreleased entries) tests: silero-vad reset coverage (TS + Py), updated stream-handler + transcoding tests for new state-machine wiring. Validation: TS 945/945 unit tests + lint + build green; Python 1259+13 unit tests green; live PSTN call confirmed smooth multi-barge-in. * feat(dashboard): multi-select delete + privacy/dark toggles + dark-mode polish Three coordinated dashboard improvements landed together because they share the same SPA bundle + cross-SDK route/store parity surface. 1. Soft-delete selected calls from the dashboard view + aggregates. On-disk artefacts (metadata.json, transcript.jsonl) are preserved as the durable backup the operator can audit outside the dashboard. - MetricsStore.deleteCalls / delete_calls accept ids, ignore active calls (safety), persist the set atomically to <log_root>/.deleted_call_ids.json so deletions survive restart. - getCalls / getCall / getAggregates / getCallsInRange / callCount / hydrate now filter against the deleted set so rolling metrics (avg latency, total spend) recompute against the visible window immediately on delete. - New endpoints, parity TS↔Python: * DELETE /api/dashboard/calls/:call_id * POST /api/dashboard/calls/delete { call_ids: [...] } - SSE event ``calls_deleted`` so other tabs / external clients re-render in real time. - SPA: per-row checkbox column (live rows disabled), bulk-action bar that reveals on selection > 0 with inline confirmation step ("Removes from view + metrics. Logs kept on disk.") gated by a peach destructive button. 2. Top-bar toggles: PII reveal (eye / eye-slash) + theme (sun / moon), both persisted in localStorage so the operator's last choice survives a reload. Default state is hidden + light — screen-share safe out of the box. - New ``useUiPrefs`` hook centralises both prefs and applies the ``body.dark`` class side-effect so the existing dark-mode CSS overrides flip in lockstep. - fmtPhone(p, revealed) renders ``•••<last4>`` using U+2022 BULLET instead of asterisks so the mask sits on the digit baseline. PII cells gain ``font-variant-numeric: tabular-nums`` so toggling reveal doesn't jitter the column width. - Reveal currently honours whatever the server provided — ``PATTER_LOG_REDACT_PHONE`` still controls the on-disk format, unchanged. Operators who want full numbers in the dashboard can set ``PATTER_LOG_REDACT_PHONE=full`` to log new calls in full; historical data stays masked by construction. 3. Dark-mode polish + min-height layout. - Page palette lifted: bg #0d0d0d → #121212, cards #171717 → #1c1c1c, borders #262626 → #2a2a2a. Previous pitch-black felt oppressive against the brand's cream/peach accent. - Active toggles use the peach accent instead of stark white (.seg button.on, .icon-btn.toggle.on) — the white blocks felt like a light-mode leftover floating on the dark page. - Fixed invisible / unreadable elements in dark mode: * Patter logo (was inline ``color: var(--ink)``, now inherits) * Transcript turn body text (.turn .body .txt was #1a1a1a) * Metrics waterfall track + STT bar + value (.wf-row .track was cream, .seg-bar.stt was #1a1a1a, .v was #000) * Duration block value (.duration-block .v was #000) * Sparkline empty bars (cascade from broad ``.spark-bar`` override leaked into ``.empty`` — added ``:not(.empty)``) * kbd ⇧K chip (cream blob on dark) * .ctrl.active, .pill.queued, .pill.fail, .lat-bar.warn, .car-dot.tx, .stack-row labels, .latbox.warn variant. * New-row insertion flash used cream end-state → dedicated ``slideInDark`` keyframe. - Min-height baseline so the layout doesn't collapse when no calls match the active range: table scroll area pinned at 540px (was unbounded down), .rr right column 590px, .rr-card 280px. Files touched ============= - dashboard-app/src/components: CallTable, Topbar, PatterLogo, LiveCallPanel, icons, format - dashboard-app/src/hooks: useUiPrefs (new), useDashboardData - dashboard-app/src/lib/api.ts, App.tsx, styles/dashboard.css - libraries/python/getpatter/dashboard: store.py, routes.py, ui.html - libraries/python/tests/test_dashboard.py (TestMetricsStoreDelete — 8 new tests) - libraries/typescript/src/dashboard: store.ts, routes.ts, ui.html - libraries/typescript/tests/unit/dashboard-store.test.ts (deleteCalls — 8 new tests covering hide / aggregates / range / active-skip / idempotent / SSE / persistence / empty input) - CHANGELOG.md Verification ============ - SPA build green (224 KB bundle, gzip 68 KB) - Python: 1832 tests passing (8 new) - TypeScript: 952 tests passing (8 new) + lint clean * fix(dashboard): MetricsPanel Latency/Cost tabs render at the same height Toggling MetricsPanel tabs between Latency and Cost caused a vertical jump because the two layouts had different natural heights — Latency (pipeline mode) renders 4 latency cards + a 3-row waterfall + legend (~230 px), while Cost renders the cost bar + 4-6 stack rows (~180 px). The card outer height shifted by ~50 px on every toggle. Wrapped both tab views in a .metrics-panel-body container with min-height: 240 px (sized to the tallest layout). Both tabs now occupy exactly 321 px outer / 240 px body — switching is a pure content swap with no layout reflow. Verified via Chrome DOM audit: latencyHeight=321, costHeight=321, diff=0. Files: dashboard-app/src/components/MetricsPanel.tsx (body wrapper) dashboard-app/src/styles/dashboard.css (.metrics-panel-body rule) libraries/{python,typescript}/.../dashboard/ui.html (resynced bundle) * feat(providers): add OpenAIRealtime2 engine for gpt-realtime-2 (GA Realtime API) The 0.6.1 enum entry for `gpt-realtime-2` advertised it as drop-in with the existing v1 Realtime adapter; OpenAI in fact promoted that model to the GA Realtime API, which rejects the `OpenAI-Beta: realtime=v1` header, requires a different `session.update` wire shape (`type: "realtime"`, `output_modalities`, nested `audio.{input,output}` with MIME types), and renamed the audio-delta event family (`response.audio.*` → `response.output_audio.*`). Going through the v1 adapter with `model: "gpt-realtime-2"` either timed out at connect() or produced a "successful" call with zero audio bytes forwarded to the carrier. New `OpenAIRealtime2` engine marker (kind: `openai_realtime_2`) + new `OpenAIRealtime2Adapter` subclassing `OpenAIRealtimeAdapter`. The subclass overrides only `connect()` (GA payload + no beta header) and `sendFirstMessage()` (forces `output_modalities` shape, re-injects `audio.output.voice` since GA `response.create` doesn't inherit it from session, sets `reasoning.effort: "minimal"` to keep TTFB tight on the literal "say exactly X" greeting). A WS-level `emit` shim renames the GA audio-delta event types back to the v1 names so the parent dispatcher and `StreamHandler` keep working unchanged. The legacy `OpenAIRealtime` engine and `OpenAIRealtimeAdapter` continue to serve `gpt-realtime`, `gpt-realtime-mini`, `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview` against the v1-beta endpoint byte-for-byte unchanged. Only visibility on a handful of fields/methods was promoted from `private` to `protected` so the subclass can reuse heartbeat + message dispatch; no public surface changed. Verified end-to-end on a Twilio PSTN call: 13.6s / 3 turns / firstMessage plays in the configured voice, language follows systemPrompt, audio flows both directions. Python parity is a follow-up — flagged in CHANGELOG; the daily docs-feature-drift cron will surface the gap until Python lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(openai-realtime-2): bidirectional audio transcoding + outbound chunk splitting for Twilio The 0.6.1 OpenAIRealtime2 engine connected and exchanged events but produced silent calls over Twilio: the GA endpoint accepts `audio/pcmu` in `session.update` but the audio engine silently drops mulaw frames (`input_audio_buffer.commit` reports "0.00 ms of audio") and always emits PCM-24 regardless of the declared output format. Until OpenAI ships native g711 on the GA endpoint we transcode on both directions inside the subclass. Inbound (Twilio → model): override `sendAudio` to decode mulaw, apply 2x gain to lift telephony peaks into the GA VAD's expected band, then 3x linear upsample to PCM-24 with a one-sample carry across chunk boundaries. `session.audio.input.format` switched to `{ type: "audio/pcm", rate: 24000 }`. Outbound (model → Twilio): wrap the audio-delta translation to resample PCM-24 → PCM-8 via 24→16→8 chain (second step carries the 5-tap FIR anti-alias filter that the direct 24→8 path lacks), encode to mulaw 8 kHz, and split into 20 ms (160 B) slices emitted as separate audio events. Twilio's media pipeline stalls when fed deltas of the GA's natural ~200-400 ms granularity; 20 ms frames restore the expected playout cadence. VAD tuning: lowered `server_vad` threshold to 0.1 (default 0.5) and raised `silence_duration_ms` to 500 so 3x-upsampled telephony-band audio reliably triggers `speech_started`. Visibility bumps on `OpenAIRealtimeAdapter`: `ws`, `armHeartbeatAndListener`, `options` promoted from `private` to `protected` so the subclass can install the wire-level translation shim and reuse the parent's message dispatch unchanged. No public surface changed; v1 adapter behaviour byte-for-byte identical. Known limitation: model output now plays audibly on the caller side, but GA `server_vad` is still tuned for studio audio so the user-speech path remains less reliable than pipeline mode. Pipeline mode (STT+LLM+TTS) is the recommended production path for Twilio in 0.6.1 until OpenAI ships native g711_ulaw GA. Python parity is still a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): 0.6.1 Roll the Unreleased changelog block into the 0.6.1 (2026-05-15) section. Version literals in sdk-py/__init__.py, sdk-py/pyproject.toml, sdk-ts/package.json were already at 0.6.1 from prior commits — this commit only normalises the changelog ahead of the release PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: pre-commit end-of-file-fixer on cartesia-stt.ts Removes a trailing blank line so the Pre-commit CI hook is happy on the 0.6.1 release PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nicolotognoni added 6 commits May 12, 2026 16:37

nicolotognoni mentioned this pull request May 12, 2026

fix: dashboard live merge + firstMessage interruptible #89

Closed

8 tasks

nicolotognoni merged commit 893a3bb into feat/observability-otel-attrs-0.6.1 May 12, 2026
5 checks passed

This was referenced May 12, 2026

feat(0.6.1): OpenAI Realtime prewarm — tools + double-handshake fix + duck-type adopt (re-base of #88) #93

Merged

fix(0.6.1): Realtime firstMessage interruption on adopted path #95

Open

github-actions Bot deleted the fix/0.6.2-dashboard-and-bargein-v2 branch May 13, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89)#92

fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89)#92
nicolotognoni merged 6 commits into
feat/observability-otel-attrs-0.6.1from
fix/0.6.2-dashboard-and-bargein-v2

nicolotognoni commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented May 12, 2026

Summary

Implementation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant