Skip to content

fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89)#92

Merged
nicolotognoni merged 6 commits into
feat/observability-otel-attrs-0.6.1from
fix/0.6.2-dashboard-and-bargein-v2
May 12, 2026
Merged

fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89)#92
nicolotognoni merged 6 commits into
feat/observability-otel-attrs-0.6.1from
fix/0.6.2-dashboard-and-bargein-v2

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

Replacement for PR #89 — rebased on top of feat/observability-otel-attrs-0.6.1 HEAD with CHANGELOG conflict resolved (entries land directly under ## 0.6.1 (2026-05-12)).

Implementation

  • Dashboard SSE reducer immutability: mergeCallPreserving preserves existing call state when a new call arrives.
  • Twilio firstMessage barge-in: per-chunk mark gating; outbound buffer no longer blocks interruption of the first message.
  • Drain pending marks on cleanup / handleStop / handleWsClose to prevent orphan futures/promises.
  • Reset _first_message_mark_counter per send and on cleanup (no stale fm_N matching across turns/calls).
  • Dashboard mergeCalls: cap at MAX_UI_CALLS=500, sort by startedAtMs desc.
  • onMark TS now updates lastConfirmedMark only when the mark matches a queued entry (parity with Python).

Test plan

  • Python: pytest tests/unit — 1262 passed, 5 skipped
  • TypeScript: npm test — 1506 passed
  • Dashboard: npm test — 8 passed
  • TypeScript: tsc --noEmit — clean

…stream

`mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts`
rebuilt the calls array from the server snapshot via `next.map(...)`,
so any call present in the previous UI state but missing from the next
payload was silently dropped. With back-to-back calls, the SSE
`call_start` refresh occasionally landed before the prior call
propagated to `/api/dashboard/calls` and the row vanished from the
SPA — regression reported as #124.

The merge is now a true upsert: rows present in `prev` but absent from
`next` are appended, so prior calls stay visible until the server
snapshot stabilises. Server-side eviction (ring buffer of 500) bounds
long-running sessions.

Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts`
and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added
Vitest to the SPA so the helpers can be tested in isolation without a
React harness).

Refs #124.
The firstMessage TTS chunks were pushed into the carrier WebSocket as
fast as the provider yielded them. Twilio's outbound buffer ended up
several seconds deep, and a barge-in's sendClear was queued behind the
already-enqueued media frames — the agent kept talking on the user's
earpiece for up to ~2 s after the user spoke (#128).

The firstMessage send path is now a paced loop:
* Twilio: every chunk is followed by a unique mark; the loop waits for
  the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks
  ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the
  next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``)
  resolves every pending mark waiter so the loop exits on the next
  tick and ``sendClear`` lands on a near-empty carrier buffer.
* Telnyx (no mark concept): the loop falls back to a playout-duration-
  based sleep so the buffer can't out-run a clear by more than one
  chunk.

Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py
``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` /
``_stream_prewarm_bytes`` delegate to the new helper. The existing
prewarm chunking test was updated to echo marks via the mock bridge so
it interoperates with the new pacing.

Coverage:
* libraries/typescript/tests/unit/stream-handler.test.ts —
  ``firstMessage mark-gated pacing`` (3 cases: window cap +
  barge-in, mark echo slides window, Telnyx playout pacing).
* libraries/python/tests/unit/test_first_message_pacing.py — 4 cases
  including FIFO mark resolution.

Refs #128.
The firstMessage paced sender accumulates one mark waiter (asyncio.Future
on Python / Promise on TS) per chunk in _pending_marks / pendingMarks
while audio is streaming to the carrier. The barge-in cancel path
already drained these, but a call that ended without going through
cancel — carrier WebSocket drop, hangup mid firstMessage, stop event
arriving before the paced sender finished — left every queued future
unresolved. The send loop was awaiting them, so the orphan futures
leaked until the handler itself was garbage-collected.

Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks
before tearing down adapters; the TS handleStop and handleWsClose do
the equivalent via drainPendingMarks(). Idempotent and safe when the
queue is already empty.

Added regression coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestCleanupDrainsPendingMarks)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (cleanup drains pending firstMessage marks — handleStop + handleWsClose)
PipelineStreamHandler._first_message_mark_counter (Py) and
StreamHandler.firstMessageMarkCounter (TS) were never reset between
turns or calls. With handler re-use, the counter incremented
monotonically across turns — a paced send for the second turn issued
fm_<previous_count + 1> while the carrier could still be echoing a
stale fm_<N> from the previous turn, corrupting FIFO matching in
on_mark / onMark.

Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes
(Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a
fresh fm_1, fm_2, ... sequence. Also reset on cleanup
(PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a
belt-and-braces against the cross-call boundary.

Coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (firstMessage mark counter resets across sends + on cleanup)
mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved
prev_only calls indefinitely by appending them after the fresh snapshot
block. Two consequences on a long-lived session:

1. The UI array grew unbounded — once the session cycled through more
   than 500 calls (the server-side MetricsStore ring buffer default),
   rows the server had already evicted stayed pinned by prev and were
   re-appended on every refresh.
2. Ordering was non-deterministic — prev_only rows always landed at
   the bottom regardless of their startedAtMs, so a newer call could
   end up below an older one if the snapshot ordering shifted.

Fix: after the upsert pass, sort the merged list by startedAtMs
descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the
server ring buffer.

Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a
600-prev+1-fresh cap test and an explicit startedAtMs ordering test.
…with Python)

StreamHandler.onMark in libraries/typescript/src/stream-handler.ts
unconditionally assigned this.lastConfirmedMark = markName before
checking whether the name corresponded to a queued mark. Any echo
arriving after the queue was drained, or any mark name emitted by
adapters outside the firstMessage queue, would overwrite the handler-
level field and contaminate downstream barge-in heuristics gated on
lastConfirmedMark.

Python stream_handler.py's on_mark never touches a handler-level
field at all — the equivalent state lives on
TwilioAudioSender.last_confirmed_mark and is updated only by the
carrier's own echo handler. The TS path now matches that behaviour
defensively: lastConfirmedMark is updated only after the queue lookup
confirms a matching entry, mirroring the safer Python semantics.

Coverage: libraries/typescript/tests/unit/stream-handler.test.ts
(onMark only updates lastConfirmedMark on a matched mark) asserts
that an unmatched echo cannot clobber a previously-set value.
@nicolotognoni nicolotognoni merged commit 893a3bb into feat/observability-otel-attrs-0.6.1 May 12, 2026
5 checks passed
@github-actions github-actions Bot deleted the fix/0.6.2-dashboard-and-bargein-v2 branch May 13, 2026 06:49
nicolotognoni added a commit that referenced this pull request May 17, 2026
* feat(observability): patter.* OTel span attributes (Python only) + 0.6.1 release

Ports the observability work from the now-closed PR #82 onto the
post-refactor `libraries/python/` layout. PR #82 was authored against
the legacy `sdk-py/` paths and was consolidated into the 0.6.0 release
branch; this commit lands the actual implementation against the new
layout for 0.6.1.

What it adds:

- `getpatter.observability.attributes` — three new helpers:
  `record_patter_attrs(attrs)`, `patter_call_scope(call_id, side)`
  context manager, `attach_span_exporter(patter, exporter, side)`.
  Lazy-OTel-guarded; no-op when the `[tracing]` extra is not installed.
  Two ContextVars (`patter.call_id`, `patter.side`) propagate through
  the asyncio task tree so spans emitted by deeply nested provider
  code inherit the active call's identity automatically.
- `Patter._attach_span_exporter(exporter, *, side="uut")` — public-but-
  underscore hook for tools that observe Patter from outside (e.g. an
  out-of-process agent runner).
- Per-provider cost emission across 19 surfaces: `patter.cost.{
  telephony_minutes, stt_seconds, tts_chars, llm_input_tokens,
  llm_output_tokens, realtime_minutes}` stamped on the active span.
  Provider tag emitted alongside as `patter.{telephony,stt,tts,llm,
  realtime}.provider`. All call sites wrapped in defensive try/except
  so observability cannot kill a live call.
- Per-turn latency: `patter.latency.{ttfb_ms, turn_ms}` stamped from
  `StreamHandler._emit_turn_metrics` via a new
  `PipelineHookExecutor.record_turn_latency(*, ttfb_ms, turn_ms)`.
- Bridge-level `patter_call_scope` entry on Twilio + Telnyx — entire
  WebSocket bridge lifetime (incl. hangup/cleanup) bound to the call
  identity via `contextlib.ExitStack`.
- `TwilioAdapter.record_call_end_cost` /
  `TelnyxAdapter.record_call_end_cost` — adapter helpers used by the
  bridge to emit `patter.cost.telephony_minutes` once wall-clock
  duration is known.

Versions bumped 0.6.0 → 0.6.1 in `__init__.py`, `pyproject.toml`,
`package.json`. CHANGELOG entry added under a new `## 0.6.1
(2026-05-09)` block; the existing `## 0.6.0 (2026-05-08)` block is
preserved verbatim — it reflects exactly what was published to PyPI
and npm at that tag.

⚠️ TS parity gap: Python only. TypeScript follow-up tracked separately.
This is a known time-boxed exception per `.claude/rules/sdk-parity.md`.

5 new unit tests in `libraries/python/tests/unit/
test_observability_attributes_unit.py` exercise the helper module's
public surface (`patter_call_scope`, `record_patter_attrs` no-op,
`attach_span_exporter` side stamping). Full Python suite: 1719 passed,
7 skipped — green.

Refs: closed PR #82.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dashboard,pipeline): hydrate cost/latency from top-level + barge-in gate from first audio

Two bugs caught during 0.6.0 acceptance against
`releases/0.6.0/typescript/matrix/outbound-cartesia-cerebras-elevenlabs.ts`:

1. **Dashboard hydrate schema mismatch**: `CallLogger.log_call_end` writes
   `cost`/`latency`/`duration_ms`/`telephony_provider` as top-level keys of
   `metadata.json`, but `MetricsStore.hydrate` looked for them under
   `meta.metrics.cost`/`meta.metrics.latency`. Every hydrated row landed
   with `metrics=null`, so cost/latency rendered as `$0.00`/`—` for all
   on-disk calls (only the in-flight call had real numbers). Fix synthesizes
   a `metrics` dict from the top-level fields when `meta.metrics` is absent
   while preserving any explicit `meta.metrics` payload untouched.

2. **Early barge-in self-cancellation**: cloud TTS first-byte latency is
   200–700 ms; the 250 ms anti-flicker gate (no-AEC PSTN default) was
   anchored on `_speaking_started_at`/`speakingStartedAt` and expired
   BEFORE TTS produced audio. VAD then picked up background noise and
   self-cancelled the agent's first turn — 0 bytes emitted, line silent.
   Fix anchors the gate on a new `_first_audio_sent_at`/`firstAudioSentAt`
   set AFTER `bridge.sendAudio` / `audio_sender.send_audio` succeeds at
   the four pipeline emit sites (firstMessage, streaming, regular,
   WebSocket remote). `_can_barge_in`/`canBargeIn` returns false while
   the marker is null. Gate values (250 ms / 1000 ms) unchanged — only
   the anchor moves.

Tests:
- Py 1717/1717, TS 1394/1394 green; lint clean.
- New regressions: `test_hydrate_lifts_top_level_cost_and_latency_into_metrics`,
  `test_hydrate_preserves_explicit_metrics_when_present`,
  `test_barge_in_suppressed_before_first_audio_emitted` (Py) +
  parity TS cases in `tests/dashboard-store.test.ts` and
  `tests/unit/stream-handler.test.ts`.
- Existing `_handle_barge_in`/`handleBargeIn` tests updated to set both
  timestamps for the new contract.

* feat(barge-in): opt-in confirmation strategies (MinWords reference impl)

Cloud TTS first-byte latency (200-700 ms) plus PSTN background noise
mean the legacy "any VAD speech_start cancels the agent" contract
produced frequent false-positive cancels — cough, click, HVAC, breath,
or a quick "okay" cut the agent mid-sentence and lost the
conversational thread.

This PR adds an opt-in two-stage confirmation pipeline. With the new
empty-tuple default behaviour is unchanged. Configure
``Agent.barge_in_strategies`` / ``agent.bargeInStrategies`` to enable:

  1. VAD speech_start during TTS marks the barge-in PENDING. TTS keeps
     streaming naturally — the LLM stream stays alive.
  2. Each STT transcript is evaluated by every configured strategy
     (short-circuit OR; per-strategy errors are isolated).
  3. First strategy that returns True confirms the cancel: runs the
     existing send_clear + flush ring + LLM abort sequence.
  4. If no strategy confirms within ``barge_in_confirm_ms``
     (default 1500 ms) the pending state is dropped and the agent
     finishes its sentence.

New module ``getpatter.services.barge_in_strategies`` exposes:
  - ``BargeInStrategy`` Protocol (async ``evaluate`` + optional ``reset``)
  - ``MinWordsStrategy`` — filters short backchannels by requiring N
    words while the agent is speaking and letting any single word
    through while the agent is silent (so the first user turn is
    never delayed).
  - ``evaluate_strategies`` / ``reset_strategies`` helpers.

TS parity in ``src/services/barge-in-strategies.ts`` with the same
public surface (``MinWordsStrategy``, ``BargeInStrategy`` interface,
``evaluateStrategies``/``resetStrategies``).

Wiring lives in stream_handler.py ``_handle_barge_in`` and
stream-handler.ts ``handleBargeIn`` — both keep the existing
canBargeIn gate (firstAudioSentAt anchor) and only add the strategy
check when at least one strategy is configured.

Tests:
- Py: 1741/1741 green; new ``test_barge_in_strategies.py`` (14) +
  ``test_barge_in_two_stage.py`` (10).
- TS: 1419/1419 green; new ``barge-in-strategies.test.ts`` (15) +
  ``barge-in-two-stage.test.ts`` (10). Lint clean.
- Existing barge-in regression suites still pass byte-for-byte:
  empty strategies preserve legacy behaviour exactly.

CHANGELOG ``## Unreleased`` updated with full design + file list.

* feat(0.6.1): cost split + STT methodology fix + prewarm + 13 review fixes

Three user-visible features plus a hardening sweep from a 5-agent code
review covering security, billing safety, race conditions, and resource
leaks.

## Features

### Dashboard cost panel: STT and TTS as separate rows
The cost breakdown previously combined STT and TTS into one "STT / TTS"
line, hiding which side dominated cost. Now rendered as two adjacent
rows labelled with the actual provider name (e.g. "Cartesia STT" /
"ElevenLabs TTS"), driven by ``record.metrics.stt_provider`` /
``tts_provider`` already exposed by the backend. Files:
``dashboard-app/src/components/CostPanel.tsx``,
``dashboard-app/src/lib/mappers.ts``.

### stt_ms is now finalization-only (BREAKING semantic change)
Previously ``LatencyBreakdown.stt_ms`` measured ``stt_complete -
turn_start`` — which conflated user speech duration with STT processing.
A 5 s utterance produced ``stt_ms ≈ 5000`` even when Cartesia/Deepgram
finalized in 200 ms after end-of-speech. Industry benchmarks
(Picovoice/Deepgram/Gladia/Speechmatics) all report STT latency as the
finalization window: ``final_transcript - end_of_speech``. ``stt_ms``
now matches that definition. New optional field
``user_speech_duration_ms`` carries the displaced "how long did the
user speak" number. Files: ``libraries/python/getpatter/models.py``,
``libraries/python/getpatter/services/metrics.py``,
``libraries/typescript/src/metrics.ts``.

### Pre-warm services + pre-synth firstMessage
``Agent.prewarm: bool = True`` (default on) warms STT/TTS/LLM provider
connections in parallel with carrier ``initiate_call`` so DNS, TLS,
HTTP/2 / WebSocket handshakes are complete by the time the callee
answers. Concrete ``warmup()`` overrides shipped on Deepgram / Cartesia
/ AssemblyAI STT, ElevenLabs WS / Cartesia / Inworld TTS, OpenAI
Realtime. ``Agent.prewarm_first_message: bool = False`` (opt-in)
pre-renders ``first_message`` to TTS bytes during ringing and streams
the cached buffer instantly when the carrier emits ``start`` —
eliminates 200-700 ms of TTS first-byte latency on the greeting at the
cost of paying TTS even when the call isn't answered (logged at WARN
level when wasted).

## Review fixes (12 issues from 5-agent multi-perspective review)

### Provider warmup correctness
- 🔴 OpenAI Realtime warmup uses ``session.update`` (not the non-spec
  ``response.create`` with ``generate:false`` which could silently bill
  tokens or return ``invalid_request_error``). Files:
  ``providers/openai_realtime.py``, ``providers/openai-realtime.ts``.
- 🟡 ElevenLabs WS warmup BOS frame now mirrors the live ``synthesize``
  BOS byte-for-byte (``voice_settings`` + ``generation_config``). Shared
  helper ``_build_bos_frame`` / ``buildBosFrame``. Verified billing-safe
  via no ``flush:true``, no real text. Files:
  ``providers/elevenlabs_ws_tts.py``, ``providers/elevenlabs-ws-tts.ts``.
- 🟡 Inworld TTS warmup uses ``GET /tts/v1/voices`` instead of ``HEAD``
  against POST-only stream endpoint (was returning 405 in audit logs).
- 🟡 Cartesia STT + AssemblyAI STT warmup error logs no longer leak the
  API key — catches ``WSServerHandshakeError`` specifically and logs
  only the HTTP status code, never ``str(exc)`` (which embeds the URL).

### StreamHandler / barge-in correctness
- 🟠 Double ``record_overlap_start`` on strategy-confirmed barge-in
  fixed: VAD start path now stamps T1, the strategy-confirm path no
  longer overwrites with T2 — ``detection_delay_ms`` is now correct for
  every user opting into ``barge_in_strategies``. Files:
  ``stream_handler.py:_do_cancel_for_barge_in``,
  ``stream-handler.ts:runBargeInCancel``.
- 🟠 Pending barge-in task leak fixed: ``cleanup`` (Py) /
  ``handleStop`` + ``handleWsClose`` (TS) now call
  ``_clear_pending_barge_in`` so a call ending mid-pending no longer
  leaves an asyncio.Task / setTimeout firing on a finalized handler.
- 🟢 Pre-warm bytes now chunked (1280 B / 40 ms) before
  ``audio_sender.send_audio`` so barge-in mid-greeting can flush
  cleanly via the existing mark/clear bookkeeping.

### Patter client + cache hardening
- 🟠 Cache eviction on abnormal hangup: the Twilio status callback
  (``no-answer`` / ``busy`` / ``failed`` / ``canceled``) and the Telnyx
  ``call.hangup`` / AMD-machine paths now call ``_record_prewarm_waste``
  so memory doesn't leak proportional to no-answer rate.
- 🟠 Race start-vs-prewarm fixed: a ``_prewarm_consumed`` set tracks
  consumed call_ids so a late-arriving prewarm task drops its bytes
  instead of orphaning them in the cache.
- 🟡 ``disconnect()`` now cancels in-flight prewarm tasks and clears
  the cache (no spend leak across serve/disconnect cycles).
- 🟡 ``prewarm_first_message=True`` on Realtime / ConvAI mode now logs
  a WARN and skips the spawn (was silently paying TTS for bytes the
  StreamHandler never consumed).
- 🟡 Prewarm cache bounded at 200 entries with TTL-based eviction
  (``ring_timeout + 5 s``) — caps memory under outbound flood
  scenarios.

### Documentation
- Docstring for ``Agent.barge_in_strategies`` corrected: TTS continues
  streaming naturally during pending state (was misleadingly described
  as "paused").

## Tests

47 new regression tests across 4 new files plus updates to existing
suites. Verifies every fix above with authentic mocks at the network
boundary only:

- ``libraries/python/tests/test_prewarm.py`` (new — 28 tests covering
  default flag values, no-op default ``warmup``, all-three-providers
  warmup invocation, opt-out, exception swallow, cache populate / skip
  / empty-message / timeout, one-shot pop, waste-warn log, StreamHandler
  cache-hit short-circuit + cache-miss live-TTS fallback, race orphan,
  disconnect cleanup, cap+TTL eviction, provider-mode validation,
  chunking).
- ``libraries/python/tests/unit/test_provider_warmup.py`` (new — 18
  tests covering all 7 concrete ``warmup()`` overrides + billing-safety
  regressions + key-leak regressions).
- ``libraries/typescript/tests/unit/prewarm.test.ts`` (new — 23 TS
  twins).
- ``libraries/typescript/tests/unit/provider-warmup.mocked.test.ts``
  (new — 19 TS twins).
- Updates to ``test_barge_in_two_stage.py`` (3 ``record_overlap_start``
  tests + 2 cleanup tests), ``barge-in-two-stage.test.ts`` (4 TS
  twins), ``server-routes.test.ts`` (2 status-callback eviction tests).

## Verification

- Python: 1797 passed, 7 skipped, 0 failed (was 1707 + 14 prewarm + 76
  inherited from new subclass collection-tests)
- TypeScript: 1467 passed across 83 files (was 1430 + 37 new)
- TypeScript ``tsc --noEmit`` (lint): clean
- TypeScript ``tsup build`` (ESM + CJS + dts + CLI): clean

## CHANGELOG

All entries under ``## 0.6.1 (2026-05-09)`` with file paths, line
numbers, rationale, and test paths.

* fix(0.6.1): WS handoff prewarm + dashboard regressions + first-turn latency

Live PSTN smoke tests against ``outbound-cartesia-cerebras-elevenlabs.ts``
exposed several issues in 0.6.1 that were not caught by the unit suite.
This commit ships seven fixes plus three quick wins on top of the
prewarm pipeline.

## Architectural — WebSocket handoff for prewarm (replaces open-then-close)

The 0.6.1 prewarm pipeline as previously shipped (commit ``c585f6d``)
opened a streaming-STT and streaming-TTS WebSocket during the carrier
ringing window, idled ~250 ms, and closed it. Investigation showed the
strategy is structurally insufficient on Node: the ``ws`` package does
not thread a TLS session ticket across separate ``new WebSocket(...)``
constructions, so every fresh ``connect()`` at call pickup pays full
TCP+TLS+HTTP-101 upgrade. Net saved time was 50–250 ms (DNS cache only)
versus 700–1500 ms of cold-start budget. Live test reported "several
seconds" first-turn latency, p95 3048 ms.

The new strategy keeps the warmed WS open and hands it off to the
``StreamHandler`` at call pickup. New API surface:

- ``Patter._prewarmedConnections: Map<callId, ParkedProviderConnections>``
  (TS) / ``self._prewarmed_connections: dict[str, ParkedProviderConnections]``
  (Py) — keyed by carrier-issued ``call_id``, populated during ringing,
  drained on call end or after a 30 s safety TTL.
- ``provider.openParkedConnection()`` / ``open_parked_connection()`` —
  added to ``CartesiaSTT``, ``ElevenLabsWebSocketTTS``,
  ``OpenAIRealtimeAdapter``. Opens the WS, sends the same initial config
  the live ``connect()`` sends (STT: empty config; TTS: BOS frame
  matching ``synthesize`` BOS byte-for-byte; Realtime: ``session.update``),
  and returns a handle the caller parks.
- ``provider.adoptWebSocket(handle)`` / ``adopt_websocket(handle)`` —
  added to the same three providers. Accepts a pre-opened WS, validates
  ``readyState === OPEN``, and proceeds with the live message loop. For
  ElevenLabs WS TTS the handle carries a ``bosAlreadySent: true`` flag so
  the first ``synthesizeStream`` iteration does not double-send BOS
  (which would be a protocol error).
- ``StreamHandler`` checks ``client.popPrewarmedConnections(callId)``
  before falling back to fresh ``connect()``. On adopt, the path skips
  TCP+TLS+upgrade and the BOS round-trip — STT connects in 0 ms, TTS in
  0 ms.

Cleanup wiring: the same status callback paths that already drain the
prewarm-audio cache (FIX #91) now also close any parked WS for failed
calls (no-answer / busy / failed / canceled / AMD-machine). The 30 s
TTL covers the rare carrier path that emits neither ``start`` nor a
status callback.

Live validation against ``outbound-cartesia-cerebras-elevenlabs.ts``:
``[PREWARM] callId=… provider=stt ms=769`` followed by
``[CONNECT] callId=… provider=stt source=adopted ms=0`` — STT connect
went from 150–400 ms to 0 ms. First-turn greeting wire-time dropped from
"several seconds" to **990 ms**. Files:
``libraries/typescript/src/client.ts`` (cache + ``parkProviderConnections``,
``popPrewarmedConnections``, ``closePrewarmedConnections``,
``ParkedProviderConnections`` interface, ``closeParkedConnections``
helper); ``libraries/typescript/src/server.ts`` (forwards
``popPrewarmedConnections`` into ``StreamHandlerDeps``);
``libraries/typescript/src/stream-handler.ts`` (adopt-or-connect logic);
``libraries/typescript/src/providers/{cartesia-stt,elevenlabs-ws-tts,openai-realtime}.ts``
(park + adopt API surface). Python parity in
``libraries/python/getpatter/{client,server,stream_handler,telephony/twilio,telephony/telnyx}.py``
and ``libraries/python/getpatter/providers/{cartesia_stt,elevenlabs_ws_tts,openai_realtime}.py``.
Realtime mode has the API surface but the ``OpenAIRealtimeStreamHandler``
adoption is deferred to a follow-up — pipeline mode dominates the
affected use case.

## Quick wins (parallel to WS handoff, smaller individual savings)

- **Eager AEC import on ``Patter.serve()``** (gated on
  ``agent.echo_cancellation=true``). Was previously a lazy
  ``await import('./audio/aec')`` on first ``start`` event, paying
  150–400 ms JIT on the first call. Files:
  ``libraries/typescript/src/client.ts``, ``libraries/python/getpatter/client.py``.
- **Parallel ``stt.connect()`` and TTS-firstMessage kickoff**. Previously
  the StreamHandler awaited STT before TTS firstMessage — STT does not
  need to be ready to send firstMessage out, only to receive caller
  audio. Now both kick off concurrently. Saves 200–400 ms on the first
  turn. Files: ``libraries/typescript/src/stream-handler.ts``,
  ``libraries/python/getpatter/stream_handler.py``.
- **Timing instrumentation**: new ``[PREWARM]`` and ``[CONNECT]`` INFO
  logs in the prewarm spawn and provider connect paths, with elapsed-ms
  per provider. Lets us A/B-test future prewarm changes with numerical
  evidence rather than perceptual reports.

## Dashboard fixes (third pass — issues found during the round-2 PSTN test)

### Live transcript shows only one turn at a time (BUG #102)

``MetricsStore.recordTurn`` correctly accumulated turns into
``active.turns[]`` but the frontend ``toUiTranscript`` mapper had two
paths: a primary keyed on ``record.transcript.length > 0`` (used for
completed calls) and a fallback that derived rows from ``record.turns``.
For an in-flight call the primary always returned empty (active records
never carried ``transcript[]``) and only the fallback rendered, so the
two paths diverged. Each ``recordTurn`` now mirrors the round-trip into
a flat ``active.transcript`` array (one user entry + one assistant entry
per turn, filtering empty ``user_text`` and the ``[interrupted]`` agent
sentinel), so the primary path sees the same accumulating ``user →
assistant → user → assistant → …`` history live calls and completed
calls both expose. Files: ``libraries/typescript/src/dashboard/store.ts``,
``libraries/typescript/tests/dashboard-store.test.ts`` (5 new authentic
tests).

### Transcript disappears after call end (BUG #101)

The Twilio status callback for ``CallStatus=completed`` fires a beat
before the WS ``stop`` frame, so ``MetricsStore.updateCallStatus``
moved the active record into the completed buffer **without preserving
``turns[]`` or ``transcript[]``**. The subsequent ``recordCallEnd``
overwrote that completed entry, but in the gap any ``useTranscript``
fetch returned a record with no transcript and the live pane went
blank. Three-point fix: (a) ``updateCallStatus`` terminal branch now
copies ``active.turns`` and ``active.transcript`` into the new
completed entry; (b) ``recordCallEnd`` falls back to active/existing
transcript when ``data.transcript`` is empty; (c) the
``useTranscript`` hook subscribes to ``call_end`` SSE events
(independent of ``isLive``) so the pane refetches the moment
``recordCallEnd`` lands the SDK-authoritative ``history.entries``.
Files: ``libraries/typescript/src/dashboard/store.ts``,
``dashboard-app/src/hooks/useTranscript.ts``.

### Sparkline tooltip generic / wrong metric (BUG #104)

The metric-tile sparkline tooltip rendered ``"N call(s)"`` plus a
per-call sample list regardless of which card it was attached to —
the latency and spend cards therefore showed the same headline as the
calls card. New ``MetricKind`` prop (``'count' | 'latency' | 'spend'``)
threaded through ``Metric`` → ``SparkBar`` → ``SparkTooltip``, with a
pure ``bucketHeadline(bucket, kind)`` helper that computes per-card
aggregates: ``TOTAL COST $X.XXX`` (sum of per-call cost),
``AVG LATENCY <p95-mean> ms`` (mean of per-call P95), or
``N CALL(S)``. Headline label uppercased, monospace, styled to match
the existing time-range header on the same tooltip. Files:
``dashboard-app/src/App.tsx``, ``dashboard-app/src/components/Metric.tsx``,
``dashboard-app/src/styles/dashboard.css``.

### caller / callee never persisted to metadata.json (BUG B from the second pass)

Every persisted ``metadata.json`` showed ``"caller": ""``,
``"callee": ""`` for completed calls — only the in-memory
``MetricsStore`` had the right values. The persist layer received empty
strings because the ``CallLogger.log_call_end`` data shape was built
from agent options rather than the live record. ``server.ts``
``wrappedStart`` now resolves ``caller``/``callee`` from the active
store record before persisting; Python ``record_call_start`` parity fix
stops clobbering caller/callee with empty strings on the
upgrade-from-initiated path (TS already had the right pattern).

### Call disappears from dashboard after end (BUG C from the second pass)

Race-induced duplicate row: Twilio's status callback for
``CallStatus=completed`` fires ~50–200 ms before the WS ``stop`` frame.
``updateCallStatus`` moved the row out of ``activeCalls`` into
``calls[]`` correctly, then the WS ``stop`` drove ``recordCallEnd``,
``activeCalls.get(callId)`` returned undefined, and a duplicate entry
was pushed with ``started_at = 0`` and empty caller/callee. The
duplicate masked the well-formed earlier row and the 24h window filter
excluded it. ``recordCallEnd`` / ``record_call_end`` now searches
``calls[]`` for the existing entry when active is gone and **updates
in place**, preserving caller/callee/started_at and merging in the
just-collected metrics.

## Tests

47 new regression tests across 6 files (TS + Py parity):
- ``libraries/python/tests/test_prewarm_handoff.py`` (new — 6 tests)
- ``libraries/typescript/tests/unit/prewarm-handoff.test.ts`` (new — 6 tests)
- ``libraries/python/tests/unit/test_dashboard_store_unit.py`` (+4 dedup
  + active-accessor tests)
- ``libraries/python/tests/unit/test_server_unit.py`` (+1 caller/callee
  persist test)
- ``libraries/typescript/tests/dashboard-store.test.ts`` (+7 dedup +
  transcript accumulate + accessor tests)
- ``libraries/typescript/tests/server.test.ts`` (+1 caller/callee persist
  test using real ``CallLogger``)

## Verification

- Python: ``pytest -q`` → 1808 passed, 7 skipped (was 1797 + 11 new)
- TypeScript: ``npm test`` → 1481 passed (was 1467 + 14 new)
- TypeScript ``tsc --noEmit`` (lint): clean
- TypeScript ``tsup build`` (esm + cjs + dts + cli): clean
- Dashboard SPA build (``cd dashboard-app && npm run build``): clean
  (204.93 kB / 63.47 kB gz)
- Dashboard sync: both ``libraries/{python,typescript}/.../dashboard/ui.html``
  refreshed
- Live PSTN smoke test (``outbound-cartesia-cerebras-elevenlabs.ts``):
  WS handoff log fired, first-turn greeting 990 ms, transcript live and
  post-end render OK, sparkline tooltip per-card OK

* fix(0.6.1): roll back STT debounce, dashboard threshold, Krisp TS scaffold

Headline changes since cbe1886:

* Rolled back the 400 ms STT-final → LLM dispatch debounce introduced
  earlier in 0.6.1 (`_scheduleTurnCommit` / `_runDeferredTurnCommit` in
  TS, `_schedule_turn_commit` / `_delayed_turn_commit` in Python). The
  partial-transcript reschedule branch was overwriting the dispatched
  FINAL text with the latest partial, causing entire user turns to be
  dropped during slow-LLM windows. Verified on real PSTN (round 10k
  with gpt-5-nano dropped 3 of 5 user turns). Dispatch is now
  synchronous on `is_final` again. The original double-talk symptom is
  re-opened with a better fix path documented internally.

* Kept beneficial 0.6.1 work: `beginSpeaking` stamps
  `firstAudioSentAt = Date.now()` on every turn so the
  `canBargeIn()` anti-flicker gate runs in parallel with LLM TTFT +
  TTS TTFB; VAD `speech_start` calls `anchorUserSpeechStart()` and
  skips on phantom-during-warmup-gate; commit-drop path re-anchors;
  WARN log when pipeline has no `llm` / `onMessage` handler; char/4
  fallback billing for providers that don't emit a usage chunk;
  `OpenAILLMProvider.providerKey` static; firstMessage TTS char
  billing; persist full latency breakdown per percentile in
  metadata.json; dashboard hydrate reads `transcript.jsonl`;
  ElevenLabs default flipped to WS.

* Lowered dashboard percentile threshold 5 → 2 turns so the detail
  pane no longer shows `—` for p50/p95 on typical 4-7 turn PSTN calls
  while the list column already shows a real number via avg fallback.

* Added Krisp VIVA noise-suppression scaffold for the TypeScript SDK
  at `libraries/typescript/src/providers/krisp-filter.ts` for cross-
  SDK parity with the existing Python `KrispVivaFilter`. Throws at
  construction time because Krisp does not publish an official Node
  SDK as of 2026-05; users supply SDK + `.kef` model + license. New
  top-level exports: `KrispVivaFilter`, `KrispVivaFilterOptions`,
  `KrispSampleRate`, `KrispFrameDuration`, `DeepFilterNetFilter`,
  `DeepFilterNetOptions`.

* CHANGELOG 0.6.1 section revised to reflect the rollback narrative
  honestly (debounce attempted, rolled back before release) and to
  document the new entries.

* Scrubbed competitor-name references from source files (Pipecat,
  LiveKit) per project rule `.claude/rules/no-competitor-references.md`;
  replaced with "industry-standard pattern" wording. Source files
  affected: `stream-handler.ts`, `stream_handler.py`, `metrics.ts`,
  `services/metrics.py`, `silero_vad.py`.

* Krisp Python wrapper unchanged.

Tests: TS lint clean, vitest 1486/1486 pass; Python pytest unit 1252
pass, 5 skip. Validated on real PSTN: post-rollback p95 wait
1844 ms over 4 clean sequential turns (no drops) on cellular
hotspot — vs catastrophic 8521 ms with 3 dropped turns pre-rollback.

* fix: revert ElevenLabs HTTP→WS default flip from 4ff09bd

Keep ElevenLabsTTS backed by HTTP REST (original cbe1886 state).
The WS default caused pipeline latency regression and prewarm lifecycle bugs.
ElevenLabsWebSocketTTS remains available as opt-in via direct import.

* fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89) (#92)

* fix(dashboard): preserve existing calls when new call arrives in SSE stream

`mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts`
rebuilt the calls array from the server snapshot via `next.map(...)`,
so any call present in the previous UI state but missing from the next
payload was silently dropped. With back-to-back calls, the SSE
`call_start` refresh occasionally landed before the prior call
propagated to `/api/dashboard/calls` and the row vanished from the
SPA — regression reported as #124.

The merge is now a true upsert: rows present in `prev` but absent from
`next` are appended, so prior calls stay visible until the server
snapshot stabilises. Server-side eviction (ring buffer of 500) bounds
long-running sessions.

Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts`
and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added
Vitest to the SPA so the helpers can be tested in isolation without a
React harness).

Refs #124.

* fix(barge-in): firstMessage interruptible via per-chunk mark gating

The firstMessage TTS chunks were pushed into the carrier WebSocket as
fast as the provider yielded them. Twilio's outbound buffer ended up
several seconds deep, and a barge-in's sendClear was queued behind the
already-enqueued media frames — the agent kept talking on the user's
earpiece for up to ~2 s after the user spoke (#128).

The firstMessage send path is now a paced loop:
* Twilio: every chunk is followed by a unique mark; the loop waits for
  the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks
  ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the
  next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``)
  resolves every pending mark waiter so the loop exits on the next
  tick and ``sendClear`` lands on a near-empty carrier buffer.
* Telnyx (no mark concept): the loop falls back to a playout-duration-
  based sleep so the buffer can't out-run a clear by more than one
  chunk.

Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py
``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` /
``_stream_prewarm_bytes`` delegate to the new helper. The existing
prewarm chunking test was updated to echo marks via the mock bridge so
it interoperates with the new pacing.

Coverage:
* libraries/typescript/tests/unit/stream-handler.test.ts —
  ``firstMessage mark-gated pacing`` (3 cases: window cap +
  barge-in, mark echo slides window, Telnyx playout pacing).
* libraries/python/tests/unit/test_first_message_pacing.py — 4 cases
  including FIFO mark resolution.

Refs #128.

* fix(barge-in): drain pending marks on call cleanup/stop/ws-close

The firstMessage paced sender accumulates one mark waiter (asyncio.Future
on Python / Promise on TS) per chunk in _pending_marks / pendingMarks
while audio is streaming to the carrier. The barge-in cancel path
already drained these, but a call that ended without going through
cancel — carrier WebSocket drop, hangup mid firstMessage, stop event
arriving before the paced sender finished — left every queued future
unresolved. The send loop was awaiting them, so the orphan futures
leaked until the handler itself was garbage-collected.

Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks
before tearing down adapters; the TS handleStop and handleWsClose do
the equivalent via drainPendingMarks(). Idempotent and safe when the
queue is already empty.

Added regression coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestCleanupDrainsPendingMarks)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (cleanup drains pending firstMessage marks — handleStop + handleWsClose)

* fix(barge-in): reset firstMessage mark counter per send + on cleanup

PipelineStreamHandler._first_message_mark_counter (Py) and
StreamHandler.firstMessageMarkCounter (TS) were never reset between
turns or calls. With handler re-use, the counter incremented
monotonically across turns — a paced send for the second turn issued
fm_<previous_count + 1> while the carrier could still be echoing a
stale fm_<N> from the previous turn, corrupting FIFO matching in
on_mark / onMark.

Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes
(Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a
fresh fm_1, fm_2, ... sequence. Also reset on cleanup
(PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a
belt-and-braces against the cross-call boundary.

Coverage:
- libraries/python/tests/unit/test_first_message_pacing.py
  (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset)
- libraries/typescript/tests/unit/stream-handler.test.ts
  (firstMessage mark counter resets across sends + on cleanup)

* fix(dashboard): cap merged UI calls at 500 + sort by startedAt desc

mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved
prev_only calls indefinitely by appending them after the fresh snapshot
block. Two consequences on a long-lived session:

1. The UI array grew unbounded — once the session cycled through more
   than 500 calls (the server-side MetricsStore ring buffer default),
   rows the server had already evicted stayed pinned by prev and were
   re-appended on every refresh.
2. Ordering was non-deterministic — prev_only rows always landed at
   the bottom regardless of their startedAtMs, so a newer call could
   end up below an older one if the snapshot ordering shifted.

Fix: after the upsert pass, sort the merged list by startedAtMs
descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the
server ring buffer.

Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a
600-prev+1-fresh cap test and an explicit startedAtMs ordering test.

* fix(realtime): only update lastConfirmedMark on matched mark (parity with Python)

StreamHandler.onMark in libraries/typescript/src/stream-handler.ts
unconditionally assigned this.lastConfirmedMark = markName before
checking whether the name corresponded to a queued mark. Any echo
arriving after the queue was drained, or any mark name emitted by
adapters outside the firstMessage queue, would overwrite the handler-
level field and contaminate downstream barge-in heuristics gated on
lastConfirmedMark.

Python stream_handler.py's on_mark never touches a handler-level
field at all — the equivalent state lives on
TwilioAudioSender.last_confirmed_mark and is updated only by the
carrier's own echo handler. The TS path now matches that behaviour
defensively: lastConfirmedMark is updated only after the queue lookup
confirms a matching entry, mirroring the safer Python semantics.

Coverage: libraries/typescript/tests/unit/stream-handler.test.ts
(onMark only updates lastConfirmedMark on a matched mark) asserts
that an unmatched echo cannot clobber a previously-set value.

* fix(metrics): align EOU semantics + unit (ms) across Python/TS

The Python ``CallMetricsAccumulator._emit_eou_metrics`` had
``end_of_utterance_delay`` and ``transcription_delay`` swapped relative
to the TypeScript ``emitEouMetrics`` AND emitted them in seconds while
TS emits milliseconds. Dashboards or exporters reading the same metric
across both SDKs saw a 1000x disagreement on top of swapped field
semantics.

Locked convention (now identical in both SDKs):

- end_of_utterance_delay = stt_final  - vad_stopped  (ms)
- transcription_delay    = turn_commit - vad_stopped (ms)
- on_user_turn_completed_delay                       (ms, unchanged)

Python now clamps negative deltas to 0 (TS already did). The Python
``EOUMetrics`` docstring updated from "seconds" to "milliseconds".

Tests pin both behaviours:
- libraries/python/tests/test_metrics.py::TestEOUMetricsEmission
- libraries/typescript/tests/unit/metrics.test.ts ::
  CallMetricsAccumulator > emitEouMetrics field semantics

Refs: 0.6.1 observability parity audit.

* feat(ts): observability OTel no-op stubs for SDK parity

The Python SDK exposed three OTel-related helpers since 0.6.1:
``record_patter_attrs``, ``patter_call_scope``, ``attach_span_exporter``
(in ``getpatter.observability.attributes``). The TypeScript SDK had no
equivalent surface — every provider adapter that called the Python
helpers had no place to call across the parity boundary, violating
``.claude/rules/sdk-parity.md``.

Port the helpers to TypeScript as no-ops by default. When
``PATTER_OTEL_ENABLED`` is unset or ``@opentelemetry/api`` is not
installed, each helper returns immediately, keeping the zero-cost
disabled path that the rest of the observability module already
respects.

Semantic mapping:
- recordPatterAttrs(attrs)                       <-> record_patter_attrs
- patterCallScope({ callId, side }, fn)          <-> patter_call_scope
- attachSpanExporter(patterInstance, exporter)   <-> attach_span_exporter

The JS form of patterCallScope takes an async callback because JS lacks
``with``-style context managers; the closure is the scope body. The
module uses a module-level stack instead of a ContextVar, which is
sufficient for the SDK's one-call-per-handler model.

Tests:
- libraries/typescript/tests/unit/observability-attributes.test.ts
  (7 smoke cases covering the public surface + scope unwind on throw)

* fix(0.6.1): Cerebras usage-chunk log + Krisp TS status refresh (re-base of #90) (#91)

* chore(cerebras): debug log when usage chunk missing + fallback fires

When an upstream LLM stream (Cerebras and similar) does not emit a
`usage` chunk despite `stream_options={include_usage:true}`, the
char/4 fallback billing path previously emitted WARN on every
tool-loop iteration. Multi-tool turns logged 5-10 identical WARN
lines for the same call, drowning real warnings.

Replace with one-shot INFO at first fallback per LLMLoop instance
(provider, model, char counts, est_tokens), then DEBUG for every
subsequent iteration with the running `_usage_missing_count` /
`_usageMissingCount` total. No billing behaviour change — char/4
estimation still drives `record_llm_usage` / `recordLlmUsage`.
Symmetric Python (`logger.info`/`logger.debug`) and TypeScript
(`getLogger().info`/`.debug`).

* docs(krisp): refresh unavailable message with current SDK status

KrispVivaFilter constructor in the TypeScript SDK still throws — no
official Krisp Node.js server SDK exists as of 2026-05. Verified via
`npm search krisp`:

- `@livekit/krisp-noise-filter` (0.4.3, 2026-04) — browser WASM
  track processor on the local microphone; cannot run server-side.
- `@livekit/react-native-krisp-noise-filter` (0.0.3) — mobile native.
- `@krisp.ai/kr-local-monitoring` — Krisp's only first-party npm
  package; "Local Monitoring API", not noise cancellation.

Refreshed the thrown message to (a) stamp the verification date,
(b) explicitly distinguish "server Node SDK" from the existing
browser/RN wrappers, (c) list the LiveKit packages with the reason
they don't apply to Patter (server-received PCM/mulaw stream).
Python KrispVivaFilter and TS DeepFilterNetFilter remain the only
shipped paths. No code behaviour change.

* fix(krisp): remove competitor package names from error message

Per .claude/rules/no-competitor-references.md the TS Krisp filter
error message cannot cite competitor package names — refactored
the "Browser/React Native" block to describe the category
generically (third-party wrappers, client-side scope) without
naming specific packages. Same cleanup applied to the matching
CHANGELOG entry. No behavioural change.

* fix(ts): correct ElevenLabsTTS export comment — HTTP REST default, not WebSocket

After commit 8507a34 reverted the HTTP→WS flip, the comment still said
ElevenLabsTTS "defaults to WebSocket streaming as of 0.6.1". Updated to
reflect current reality: ElevenLabsTTS = HTTP REST (pcm_16000),
ElevenLabsWebSocketTTS = WS variant, ElevenLabsRestTTS = HTTP alias.

* fix(pipeline): always pace prewarmed first-message audio by playout duration

Twilio mark ACKs can batch-resolve simultaneously — when all 3 pending
marks in FIRST_MESSAGE_MARK_WINDOW resolve at once, waitForMarkWindow
unblocks 3 consecutive loop iterations with no delay, sending a burst
of ~120ms of audio. The carrier jitter buffer drains for a moment then
refills, producing audible crackling on the first message only (regular
turns use synthesizeSentence / synthesize_sentence which send audio
directly without marks and are unaffected).

Remove the `if markPromise === null` / `if mark_fut is None` guard so
the playout sleep (40ms for a 1280-byte chunk) runs unconditionally after
every chunk on all carriers. Mark tracking for barge-in is preserved.
Files: libraries/typescript/src/stream-handler.ts,
       libraries/python/getpatter/stream_handler.py.

Update tests to use fake timers (TS: vi.useFakeTimers + advanceTimersByTimeAsync,
Python: asyncio.sleep mock) so the 40ms per-chunk sleep does not make unit
tests slow. Align tts-facade-language.test.ts with the current ElevenLabs
HTTP REST default (commit 8507a34 reverted the WS flip).

* fix(pipeline): correct first-message burst pacing with initialFillComplete flag

The previous fix (always sleep 40ms per chunk) eliminated the initial
burst needed to pre-fill Twilio's PSTN jitter buffer (250–1500 ms),
causing the same crackling symptom it was meant to cure.

Root cause: Twilio can batch-resolve all FIRST_MESSAGE_MARK_WINDOW (3)
mark ACKs in a single event-loop turn. When the window unblocks 3
consecutive iterations with no sleep, 3 chunks are sent in burst and
the jitter buffer drains momentarily → crackling.

Correct fix: the first FIRST_MESSAGE_MARK_WINDOW chunks go out in burst
(no playout sleep) to pre-fill the jitter buffer. Once the window is
first full, a sticky `initialFillComplete` / `initial_fill_complete`
flag flips to true and subsequent chunks are paced by playout time
(~40 ms per chunk), preventing batch-ACK bursts. On Telnyx (no mark
concept) the playout sleep runs unconditionally on every chunk.

Tests: 7/7 Python, 35/35 TypeScript — no changes needed on Python side
(autouse sleep-patch fixture already makes all sleeps instant).

* fix(pipeline): one-shot barge-in + first-message crackle + barge-in gate

Three related fixes to the no-AEC Twilio PSTN pipeline that together
deliver a smooth-feeling agent on real phone calls (verified live: 5
turns, p95 wait 685 ms, every user utterance produced a fresh VAD
speech_start with multiple successful interruptions in one call).

1. One-shot barge-in. After a successful barge-in, subsequent barge-in
   attempts silently failed. PSTN echo of the agent's TTS kept
   SileroVAD's smoothed probability above deactivationThreshold (0.35)
   for the full agent turn, so pubSpeaking stayed true cross-turn and
   no fresh SILENCE -> SPEECH transition ever fired. Added an optional
   reset() hook to VADProvider; SileroVAD implements it by clearing
   the pending buffer, pubSpeaking, the speech/silence threshold
   durations, the ExpFilter, AND the ONNX RNN hidden state + rolling
   context (without resetting the model the detector "remembers" the
   echo). StreamHandler invokes reset in beginSpeaking (every new
   agent turn starts clean) and at the grace-timer fire of
   endSpeakingWithGrace (natural turn end leaves VAD ready for the
   next spontaneous user utterance).

2. First-message crackle. StatefulResampler seeded its 5-tap FIR
   history with input[0] on the first call. When ElevenLabs HTTP
   streaming delivers a chunk that starts at non-zero amplitude this
   produced a startup transient audible as a brief crackle at the
   beginning of the first TTS message. Seeded with zeros instead —
   the correct initial condition for a filter that has received no
   prior input.

3. Barge-in gate 250 ms -> 100 ms, suppressed speech flushed. The
   no-AEC anti-flicker gate was 250 ms, which on short agent turns
   (< ~400 ms of audio) consumed most of the turn and silently
   suppressed legitimate interruptions. Reduced to 100 ms (still
   blocks the ~100-200 ms PSTN echo round-trip). When a speech_start
   is gate-suppressed the inboundAudioRing accumulates user audio
   that was previously discarded at the next beginSpeaking; added a
   suppressedSpeechPending flag so the grace-timer flush replays the
   ring to STT on natural turn end.

Parity: TS unconditionally stamps firstAudioSentAt in beginSpeaking
since 2026-05-11; Python _begin_speaking now matches (was conditional
on is_first_message, which made any turn with a slow LLM
un-interruptible for the full LLM TTFT window).

Files touched:
  libraries/typescript/src/types.ts (VADProvider.reset?)
  libraries/typescript/src/providers/silero-vad.ts (reset impl)
  libraries/typescript/src/stream-handler.ts (call resetVad,
    suppressedSpeechPending, gate constant, ring flush on grace)
  libraries/typescript/src/audio/transcoding.ts (FIR zero-seed)
  libraries/python/getpatter/providers/base.py (VADProvider.reset)
  libraries/python/getpatter/providers/silero_onnx.py (OnnxModel.reset)
  libraries/python/getpatter/providers/silero_vad.py (reset impl)
  libraries/python/getpatter/stream_handler.py (parity wiring)
  CHANGELOG.md (Unreleased entries)
  tests: silero-vad reset coverage (TS + Py), updated stream-handler
    + transcoding tests for new state-machine wiring.

Validation: TS 945/945 unit tests + lint + build green;
Python 1259+13 unit tests green; live PSTN call confirmed smooth
multi-barge-in.

* feat(dashboard): multi-select delete + privacy/dark toggles + dark-mode polish

Three coordinated dashboard improvements landed together because they
share the same SPA bundle + cross-SDK route/store parity surface.

1. Soft-delete selected calls from the dashboard view + aggregates.
   On-disk artefacts (metadata.json, transcript.jsonl) are preserved
   as the durable backup the operator can audit outside the dashboard.

   - MetricsStore.deleteCalls / delete_calls accept ids, ignore active
     calls (safety), persist the set atomically to
     <log_root>/.deleted_call_ids.json so deletions survive restart.
   - getCalls / getCall / getAggregates / getCallsInRange / callCount /
     hydrate now filter against the deleted set so rolling metrics
     (avg latency, total spend) recompute against the visible window
     immediately on delete.
   - New endpoints, parity TS↔Python:
     * DELETE /api/dashboard/calls/:call_id
     * POST   /api/dashboard/calls/delete  { call_ids: [...] }
   - SSE event ``calls_deleted`` so other tabs / external clients
     re-render in real time.
   - SPA: per-row checkbox column (live rows disabled), bulk-action
     bar that reveals on selection > 0 with inline confirmation step
     ("Removes from view + metrics. Logs kept on disk.") gated by a
     peach destructive button.

2. Top-bar toggles: PII reveal (eye / eye-slash) + theme (sun / moon),
   both persisted in localStorage so the operator's last choice
   survives a reload. Default state is hidden + light — screen-share
   safe out of the box.

   - New ``useUiPrefs`` hook centralises both prefs and applies the
     ``body.dark`` class side-effect so the existing dark-mode CSS
     overrides flip in lockstep.
   - fmtPhone(p, revealed) renders ``•••<last4>`` using U+2022 BULLET
     instead of asterisks so the mask sits on the digit baseline.
     PII cells gain ``font-variant-numeric: tabular-nums`` so toggling
     reveal doesn't jitter the column width.
   - Reveal currently honours whatever the server provided —
     ``PATTER_LOG_REDACT_PHONE`` still controls the on-disk format,
     unchanged. Operators who want full numbers in the dashboard can
     set ``PATTER_LOG_REDACT_PHONE=full`` to log new calls in full;
     historical data stays masked by construction.

3. Dark-mode polish + min-height layout.

   - Page palette lifted: bg #0d0d0d → #121212, cards #171717 →
     #1c1c1c, borders #262626 → #2a2a2a. Previous pitch-black felt
     oppressive against the brand's cream/peach accent.
   - Active toggles use the peach accent instead of stark white
     (.seg button.on, .icon-btn.toggle.on) — the white blocks felt
     like a light-mode leftover floating on the dark page.
   - Fixed invisible / unreadable elements in dark mode:
     * Patter logo (was inline ``color: var(--ink)``, now inherits)
     * Transcript turn body text (.turn .body .txt was #1a1a1a)
     * Metrics waterfall track + STT bar + value (.wf-row .track was
       cream, .seg-bar.stt was #1a1a1a, .v was #000)
     * Duration block value (.duration-block .v was #000)
     * Sparkline empty bars (cascade from broad ``.spark-bar``
       override leaked into ``.empty`` — added ``:not(.empty)``)
     * kbd ⇧K chip (cream blob on dark)
     * .ctrl.active, .pill.queued, .pill.fail, .lat-bar.warn,
       .car-dot.tx, .stack-row labels, .latbox.warn variant.
     * New-row insertion flash used cream end-state → dedicated
       ``slideInDark`` keyframe.
   - Min-height baseline so the layout doesn't collapse when no
     calls match the active range: table scroll area pinned at
     540px (was unbounded down), .rr right column 590px,
     .rr-card 280px.

Files touched
=============
- dashboard-app/src/components: CallTable, Topbar, PatterLogo,
  LiveCallPanel, icons, format
- dashboard-app/src/hooks: useUiPrefs (new), useDashboardData
- dashboard-app/src/lib/api.ts, App.tsx, styles/dashboard.css
- libraries/python/getpatter/dashboard: store.py, routes.py, ui.html
- libraries/python/tests/test_dashboard.py
  (TestMetricsStoreDelete — 8 new tests)
- libraries/typescript/src/dashboard: store.ts, routes.ts, ui.html
- libraries/typescript/tests/unit/dashboard-store.test.ts
  (deleteCalls — 8 new tests covering hide / aggregates / range /
  active-skip / idempotent / SSE / persistence / empty input)
- CHANGELOG.md

Verification
============
- SPA build green (224 KB bundle, gzip 68 KB)
- Python: 1832 tests passing (8 new)
- TypeScript: 952 tests passing (8 new) + lint clean

* fix(dashboard): MetricsPanel Latency/Cost tabs render at the same height

Toggling MetricsPanel tabs between Latency and Cost caused a vertical
jump because the two layouts had different natural heights — Latency
(pipeline mode) renders 4 latency cards + a 3-row waterfall + legend
(~230 px), while Cost renders the cost bar + 4-6 stack rows (~180 px).
The card outer height shifted by ~50 px on every toggle.

Wrapped both tab views in a .metrics-panel-body container with
min-height: 240 px (sized to the tallest layout). Both tabs now
occupy exactly 321 px outer / 240 px body — switching is a pure
content swap with no layout reflow.

Verified via Chrome DOM audit: latencyHeight=321, costHeight=321,
diff=0.

Files:
  dashboard-app/src/components/MetricsPanel.tsx (body wrapper)
  dashboard-app/src/styles/dashboard.css (.metrics-panel-body rule)
  libraries/{python,typescript}/.../dashboard/ui.html (resynced bundle)

* feat(providers): add OpenAIRealtime2 engine for gpt-realtime-2 (GA Realtime API)

The 0.6.1 enum entry for `gpt-realtime-2` advertised it as drop-in with the
existing v1 Realtime adapter; OpenAI in fact promoted that model to the GA
Realtime API, which rejects the `OpenAI-Beta: realtime=v1` header, requires
a different `session.update` wire shape (`type: "realtime"`,
`output_modalities`, nested `audio.{input,output}` with MIME types), and
renamed the audio-delta event family (`response.audio.*` →
`response.output_audio.*`). Going through the v1 adapter with
`model: "gpt-realtime-2"` either timed out at connect() or produced a
"successful" call with zero audio bytes forwarded to the carrier.

New `OpenAIRealtime2` engine marker (kind: `openai_realtime_2`) + new
`OpenAIRealtime2Adapter` subclassing `OpenAIRealtimeAdapter`. The subclass
overrides only `connect()` (GA payload + no beta header) and
`sendFirstMessage()` (forces `output_modalities` shape, re-injects
`audio.output.voice` since GA `response.create` doesn't inherit it from
session, sets `reasoning.effort: "minimal"` to keep TTFB tight on the
literal "say exactly X" greeting). A WS-level `emit` shim renames the GA
audio-delta event types back to the v1 names so the parent dispatcher and
`StreamHandler` keep working unchanged.

The legacy `OpenAIRealtime` engine and `OpenAIRealtimeAdapter` continue to
serve `gpt-realtime`, `gpt-realtime-mini`, `gpt-4o-realtime-preview`,
`gpt-4o-mini-realtime-preview` against the v1-beta endpoint byte-for-byte
unchanged. Only visibility on a handful of fields/methods was promoted
from `private` to `protected` so the subclass can reuse heartbeat + message
dispatch; no public surface changed.

Verified end-to-end on a Twilio PSTN call: 13.6s / 3 turns / firstMessage
plays in the configured voice, language follows systemPrompt, audio flows
both directions.

Python parity is a follow-up — flagged in CHANGELOG; the daily
docs-feature-drift cron will surface the gap until Python lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(openai-realtime-2): bidirectional audio transcoding + outbound chunk splitting for Twilio

The 0.6.1 OpenAIRealtime2 engine connected and exchanged events but produced
silent calls over Twilio: the GA endpoint accepts `audio/pcmu` in
`session.update` but the audio engine silently drops mulaw frames
(`input_audio_buffer.commit` reports "0.00 ms of audio") and always emits
PCM-24 regardless of the declared output format. Until OpenAI ships native
g711 on the GA endpoint we transcode on both directions inside the subclass.

Inbound (Twilio → model): override `sendAudio` to decode mulaw, apply 2x gain
to lift telephony peaks into the GA VAD's expected band, then 3x linear
upsample to PCM-24 with a one-sample carry across chunk boundaries.
`session.audio.input.format` switched to `{ type: "audio/pcm", rate: 24000 }`.

Outbound (model → Twilio): wrap the audio-delta translation to resample
PCM-24 → PCM-8 via 24→16→8 chain (second step carries the 5-tap FIR
anti-alias filter that the direct 24→8 path lacks), encode to mulaw 8 kHz,
and split into 20 ms (160 B) slices emitted as separate audio events.
Twilio's media pipeline stalls when fed deltas of the GA's natural ~200-400 ms
granularity; 20 ms frames restore the expected playout cadence.

VAD tuning: lowered `server_vad` threshold to 0.1 (default 0.5) and raised
`silence_duration_ms` to 500 so 3x-upsampled telephony-band audio reliably
triggers `speech_started`.

Visibility bumps on `OpenAIRealtimeAdapter`: `ws`, `armHeartbeatAndListener`,
`options` promoted from `private` to `protected` so the subclass can install
the wire-level translation shim and reuse the parent's message dispatch
unchanged. No public surface changed; v1 adapter behaviour byte-for-byte
identical.

Known limitation: model output now plays audibly on the caller side, but
GA `server_vad` is still tuned for studio audio so the user-speech path
remains less reliable than pipeline mode. Pipeline mode (STT+LLM+TTS) is
the recommended production path for Twilio in 0.6.1 until OpenAI ships
native g711_ulaw GA.

Python parity is still a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): 0.6.1

Roll the Unreleased changelog block into the 0.6.1 (2026-05-15)
section. Version literals in sdk-py/__init__.py, sdk-py/pyproject.toml,
sdk-ts/package.json were already at 0.6.1 from prior commits — this
commit only normalises the changelog ahead of the release PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: pre-commit end-of-file-fixer on cartesia-stt.ts

Removes a trailing blank line so the Pre-commit CI hook is happy on the
0.6.1 release PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant