Skip to content

release: 0.6.2 — GA Realtime adapter + 14-bug fix wave + dashboard hardening#104

Merged
nicolotognoni merged 11 commits into
mainfrom
release/0.6.2
May 25, 2026
Merged

release: 0.6.2 — GA Realtime adapter + 14-bug fix wave + dashboard hardening#104
nicolotognoni merged 11 commits into
mainfrom
release/0.6.2

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

  • New OpenAIRealtime2 engine + adapter (Python parity with TS) speaking the GA Realtime API (gpt-realtime-2), with bidirectional mulaw 8 kHz ↔ PCM 24 kHz transcoding for Twilio / Telnyx.
  • 14-bug fix wave validated live via PSTN against the inbound and outbound Realtime paths, covering prewarm + adoption hardening, dashboard persistence, inbound caller/callee TwiML, Whisper hallucination filter, deferred response.create, and 9 more (full list below).
  • New branded README banner (docs/github-banner.png).

Implementation

  • Bumps: libraries/python/getpatter/__init__.py, libraries/python/pyproject.toml, libraries/typescript/package.json all → 0.6.2.
  • ## Unreleased promoted to ## 0.6.2 (2026-05-25) in CHANGELOG.md.
  • New files: libraries/python/getpatter/engines/openai_realtime_2.py, libraries/python/getpatter/providers/openai_realtime_2.py, libraries/python/tests/unit/test_twilio_adapter_snake_case_kwargs.py.
  • 57 files touched across both SDKs + dashboard SPA + docs.

14 bug-fix wave (all parity Python↔TS via sdk-parity agent)

  1. Prewarm Realtime adoption (liveness check rewrite + app-level keepalive)
  2. FirstMessage stable (phantom barge-in gate via adapter state)
  3. cancel_response no-op when no item in flight (eliminates response_cancel_not_active spam)
  4. Dashboard record_call_end preserves live turns + falls back to active/existing transcript
  5. Persist default ON (~/Library/Application Support/patter on macOS, XDG dir on Linux, %LOCALAPPDATA% on Windows)
  6. Python hydrate() backfills transcript from transcript.jsonl (TS already did)
  7. Standalone dashboard sees call_initiated in real time via notify_dashboard + cli ingest routing
  8. direction persisted in metadata.json (fixes outbound rendering as inbound after hydrate)
  9. PATTER_LOG_REDACT_PHONE default flipped maskfull so the UI reveal toggle works
  10. SDK version auto-derived into aggregates.sdk_version (Python __version__, TS via package.json runtime read)
  11. OpenAI Realtime GA VAD threshold 0.1 → 0.5 (kills phantom barge-in loop on PSTN echo)
  12. Inbound caller/callee via <Parameter> children of <Stream> (Twilio strips query-string params)
  13. Whisper hallucination filter on Realtime transcript_input ("Thank you for watching.", "[music]", etc.)
  14. turn_detection.create_response: false + interrupt_response: false + new request_response() driven from stream-handler after the filter accepts the transcript

Breaking change?

Two opt-out flips with safe defaults:

  • persist=None now defaults to ON (was: off unless PATTER_LOG_DIR was set). Migration: pass persist=False to opt back into the old ephemeral-only behaviour.
  • PATTER_LOG_REDACT_PHONE default flipped maskfull. Migration: set PATTER_LOG_REDACT_PHONE=mask for setups that ship logs off-host.

Both flips are documented in CHANGELOG.md under the 0.6.2 entry.

No API removals or renames. Existing code continues to work without changes.

Test plan

  • Python: pytest tests/ (regression tests in test_metrics.py, test_prewarm.py, test_client_unit.py, test_stream_handler_unit.py, test_twilio_status_and_ring_timeout.py, test_twilio_adapter_snake_case_kwargs.py)
  • TypeScript: npm test + npm run lint + npm run build
  • sdk-parity agent run for every fix in the wave — verified parity for items 1-14
  • Live PSTN validation on Twilio (Italy carrier) — both inbound and outbound, ~12 calls across the day:
    • Prewarm adoption ms=0 confirmed
    • FirstMessage clean (no phantom interruption loop)
    • Dashboard hydrate + live update + transcript persistence across restart
    • Inbound: caller / callee populated, top-bar shows Patter number
    • Reveal toggle alternates +1659...•••4527

Docs updates

  • CHANGELOG.md 0.6.2 block (already populated during the fix wave).
  • docs/github-banner.png refreshed with new branded artwork.
  • Provider / engine docs under docs/python-sdk/ and docs/typescript-sdk/ already touched in earlier commits on the branch (engines, agents, carrier, providers/elevenlabs-tts, providers/silero-vad).

…to v4

The `bandit` job in `.github/workflows/audit.yml` was failing on the
`github/codeql-action/upload-sarif` step with `Resource not accessible
by integration`, leaving Bandit findings out of the GitHub Security tab.
Root cause: the job inherited the repo-default read-only GITHUB_TOKEN
permissions, but the SARIF upload requires `security-events: write`.

Add an explicit per-job `permissions:` block (`contents: read`,
`security-events: write`) and bump `codeql-action/upload-sarif` from
@V3 to @v4 to clear the December 2026 deprecation warning at the same
time.
…d on same turn

Pipeline `transcript.jsonl` rows after a barge-in carried an empty
`user_text` even when the user had clearly spoken. Root cause was a
race between the two turn-close paths:

  1. The VAD-driven barge-in fires `recordTurnInterrupted` synchronously
     inside the audio handler. `_resetTurnState` clears `_turnUserText`.
  2. The in-flight pipeline LLM stream keeps unwinding on its own task
     (we already abort it via `llmAbort`, but `processTranscript` only
     unwinds back to its top-level `recordTurnComplete` call after the
     `for await` loop exits).
  3. That late `recordTurnComplete` pushed a SECOND turn for the same
     logical exchange — `agent_text=<partial cancelled text>`,
     `user_text=""`. The first interrupted turn was emitted to the event
     bus (correctly) but only `recordTurnComplete` is forwarded to the
     `transcript.jsonl` writer, so the operator-facing JSON showed the
     phantom row.

Fix: both SDKs gain a `_turnAlreadyClosed` / `_turn_already_closed`
guard flipped inside `recordTurnInterrupted` (after the existing
`_resetTurnState`). `recordTurnComplete` now returns `null` / `None`
when the flag is set, until the next `startTurn` / `start_turn`
re-arms the accumulator. `emitTurnMetrics` / `_emit_turn_metrics`
were already null-safe, so the late call becomes a silent no-op
end-to-end.

Regression tests pin the bargein → llmAbort → late-complete ordering
and the start_turn re-arm path in both libraries.

See `patter-sdk-acceptance/BUGS.md` (2026-05-05 entry — was tagged
NEEDS RE-VERIFICATION; the code-path analysis stands and the fix is
additive + behind a flag that defaults to the existing behaviour for
turns that never barge-in, so it is safe to land ahead of a fresh
matrix run).
Every pipeline acceptance run hit a 1.5-2.5 s p95 on the first turn
because the TTS first-byte latency (200-700 ms cold) was serialised with
the carrier's media-start event. Pre-rendering the greeting during the
ringing window and streaming the cached buffer at pickup collapses that
to a single Buffer.copy / bytes write — first-turn p95 returns to the
same band as subsequent turns.

Trade-off: paying the TTS bill on calls that ring and never answer
(~$0.001-$0.005 each depending on TTS provider). Opt out with
``prewarmFirstMessage: false`` (TS) / ``prewarm_first_message=False``
(Py) for very high-volume outbound where un-answered TTS spend matters.

Changes:
- libraries/python/getpatter/models.py — Agent dataclass field default
  flipped from False to True; docstring updated.
- libraries/typescript/src/client.ts — Patter.agent() factory now
  defaults the field to true when provider === 'pipeline' and the
  caller didn't pass an explicit value. Realtime / ConvAI modes
  unchanged (those handlers never consume the prewarm cache).
- libraries/typescript/src/types.ts — docstring updated.

Tests added in both SDKs covering the new default and the opt-out path.
…low per-chunk sleep

sendPacedFirstMessageBytes (TS) / _send_paced_first_message_bytes (Py)
were pacing each prewarm chunk with setTimeout / asyncio.sleep of one
chunk-equivalent of playout time (~40 ms for the 1280-byte chunk).
Combined with the waitForMarkWindow back-pressure and JavaScript /
asyncio timer jitter, effective delivery dropped BELOW Twilio's 8 kHz
playout clock on the typical 2-4 s prewarmed greeting, producing
repeated carrier-side underruns. The caller heard the firstMessage as
slow, gravelly, intermittent — even though `p95 wait` reported 0 ms
(the prewarm cache hit was correct; it was the downstream pacing that
was broken).

Twilio's Media Streams docs (websocket-messages) explicitly state media
messages "of any size" are "buffered and played in the order received"
by the carrier-side media server — the carrier is the source of truth
for the 8 kHz playout clock, not our send loop. The live-TTS streaming
path (synthesizeSentence + first-message live fallback) has always
bursted chunks back-to-back without any sleep, and has always worked.

Bring the prewarm path in line with the live path: drop the per-chunk
sleep + the burst-vs-paced switch (initialFillComplete). Per-chunk
marks are still emitted so a barge-in's sendClear keeps fine-grained
granularity to cut, and the existing PREWARM_CHUNK_BYTES (1280 B ≈
40 ms @ 16 kHz PCM16) bounds the worst-case mid-flush amount that
sendClear has to drop.

Cleaned up the now-unused PCM16_16K_BYTES_PER_MS constant in both SDKs.
Surfaced by a parallel review pass (code-reviewer + sdk-parity + docs-sync
+ code-simplifier agents on the release/0.6.2 diff).

1. **Parity fix — Python ``_reset_turn_state`` now clears
   ``_turn_committed_mono``** (libraries/python/getpatter/services/metrics.py).
   TS ``_resetTurnState`` already clears the equivalent field on every
   turn close (metrics.ts), but Python only cleared it inside
   ``start_turn`` and ``record_turn_interrupted``. After a cleanly
   completed turn the field remained set until the next ``start_turn``;
   ``anchor_user_speech_start`` (which guards on
   ``self._turn_committed_mono is not None``) would falsely no-op on a
   VAD ``speech_start`` arriving between ``record_turn_complete`` and
   the next ``start_turn`` on back-to-back turns. Single-line additive
   fix; no behaviour change on already-aligned flows.

2. **Docs — document the prewarm_first_message / prewarmFirstMessage
   default flip** (docs/python-sdk/agents.mdx,
   docs/typescript-sdk/agents.mdx). Added a row in each AgentOptions
   table calling out the 0.6.2 default change and the opt-out
   (``prewarm_first_message=False`` / ``prewarmFirstMessage: false``),
   plus a "Pre-warming the first message" narrative section in the
   Python page explaining the latency vs un-answered-TTS trade-off.

3. **Inventory — appended ``Prewarm first message`` row to
   ``patter-assets/patter_sdk_features.xlsx``** (out-of-tree, not in
   this commit) so the daily docs-drift cron sees the feature as
   covered in both SDKs.

Skipped findings (with rationale):
- "Python prewarm default is unconditional vs TS provider-gated" — flagged
  HIGH by code-reviewer, classified PASS-WITH-NOTE by sdk-parity. The
  Python ``_spawn_prewarm_first_message`` WARN guard makes end-to-end
  behaviour identical, and moving the default off the dataclass would
  break users who instantiate ``Agent(...)`` directly. Documented
  divergence accepted.
- "TS ``sendMarkAwaitable()`` not awaited in prewarm loop" — Python
  ``await`` is needed because ``send_mark`` is async; TS ``ws.send`` is
  sync void. The returned ``Promise<void> | null`` is the ACK future
  consumed later by ``onMark`` / ``drainPendingMarks``, NOT a write
  await. Behaviourally equivalent.
- "Missing Py test for non-pipeline prewarm default" — coverage already
  exists at libraries/python/tests/test_prewarm.py:702 and :725
  (``test_prewarm_skipped_for_realtime_provider`` and
  ``..._for_convai_provider``).
- "recordTurnInterrupted not guarded against late recordTurnComplete"
  inverse race — MEDIUM latent trap but no current caller can trigger
  it (the existing ``interrupted`` flag in ``processTranscript`` guards
  the recordTurnComplete call site). Tracked for a future hardening
  pass.

Tests: Python 58/58 (test_metrics.py + test_prewarm.py), TypeScript
lint clean.
Two follow-ups on the parallel quality-gate review the user requested
should NOT be skipped:

1. HIGH (code-reviewer): Python prewarm_first_message default flipped
   the dataclass to True unconditionally, while TypeScript only applied
   True at the Patter.agent() factory when provider==='pipeline'. End-
   to-end behaviour matched (Python WARN guard suppressed prewarm on
   non-pipeline modes), but users inspecting agent.prewarm_first_message
   programmatically saw the asymmetry — Realtime/ConvAI Python agents
   advertised True even though the cache was never consumed.

   Fix:
   - libraries/python/getpatter/models.py — dataclass default restored
     to False (back-compat for direct Agent(...) construction).
   - libraries/python/getpatter/client.py — Patter.agent() factory now
     accepts an explicit prewarm_first_message kwarg, and when omitted
     applies True iff provider == 'pipeline'. Exact mirror of the
     TypeScript factory in client.ts.
   - libraries/python/tests/test_prewarm.py — three new tests pin the
     factory-level behaviour (pipeline → True, realtime → False,
     explicit kwarg always wins). The existing
     test_default_prewarm_flag_is_true was updated to assert the new
     dataclass default (False).

2. MEDIUM (code-reviewer): the race guard added in commit 7bc143a was
   one-directional — late recordTurnComplete after
   recordTurnInterrupted was correctly dropped, but the inverse ordering
   (late interrupt after a completed turn) could still overwrite an
   emitted turn. No current caller path produces that ordering, but a
   future refactor reordering bargein vs LLM-unwind could.

   Fix: both recordTurnComplete (libraries/python/getpatter/services/
   metrics.py + libraries/typescript/src/metrics.ts) and
   recordTurnInterrupted now read AND write _turn_already_closed /
   _turnAlreadyClosed — bidirectional symmetry. Regression tests added
   in libraries/python/tests/test_metrics.py and libraries/typescript/
   tests/metrics.test.ts.

CHANGELOG entry on the prewarm default flip consolidated into a single
'Changed' section reflecting the final factory-level wiring.

Tests: Python 62/62 (test_metrics.py + test_prewarm.py), TypeScript
37/37 (metrics.test.ts + prewarm.test.ts), lint clean.
Bundled fix wave landed after live PSTN validation across outbound and
inbound flows on the OpenAI Realtime GA path (gpt-realtime-2). Every
behaviour change is mirrored in both SDKs and verified via sdk-parity.

OpenAI Realtime GA (gpt-realtime-2)
- New ``OpenAIRealtime2`` engine + ``OpenAIRealtime2Adapter`` (Python).
  Speaks the GA ``session.update`` shape (``session.type = "realtime"``,
  ``output_modalities``, nested ``audio.{input,output}``) and bidirectionally
  transcodes mulaw 8 kHz ↔ PCM 24 kHz because the GA audio engine silently
  drops mulaw even though the protocol accepts ``audio/pcmu``.
- ``turn_detection.threshold`` raised 0.1 → 0.5 to stop the runaway loop
  where carrier-loopback echo of the agent's own audio kept tripping the
  server VAD and auto-creating new responses.
- ``turn_detection.create_response: false`` + ``interrupt_response: false``.
  Patter now drives ``response.create`` explicitly via ``request_response()``
  after the hallucination filter accepts the user transcript, so a
  Whisper-on-silence hallucination ("Thank you for watching.", "[music]")
  no longer materialises as a phantom assistant turn.
- ``_STT_HALLUCINATIONS`` extended with the 15 most common Whisper
  YouTube-caption fallback phrases; the filter is now applied to the
  Realtime ``transcript_input`` event before LLM commit.

Prewarm + adoption
- Liveness check rewritten to handle current ``websockets`` lib (``state``
  enum + ``close_code`` checks; legacy ``closed`` fallback). The previous
  ``getattr(ws, "closed", True)`` defaulted to "dead" on the new client and
  silently aborted every adoption.
- Application-level keepalive on the parked GA Realtime WS (``session.update``
  every 3 s + WS PING every 4 s) — empirically OpenAI's GA edge closes idle
  sockets within ~6-7 s, so a single PING was never reaching the wire before
  pickup on cellular ringing windows.
- ``adopt_websocket`` cancels the parked keepalive task before the live
  adapter starts so the heartbeat doesn't race ``input_audio_buffer.append``.

Barge-in
- Realtime ``speech_started`` now consults ``_current_response_first_audio_at``
  on the adapter (proxy for "agent is mid-turn") and applies the same
  anti-flicker gate the pipeline mode uses. Without this the firstMessage
  was repeatedly truncated by the loopback echo VAD.
- ``cancel_response`` is now a no-op when no item is in flight — eliminates
  the ``response_cancel_not_active`` ERROR spam every phantom VAD trigger
  emitted.

Dashboard + persistence
- Persistence default flipped from opt-in to ON. ``persist=None`` now
  resolves to the platform user-data dir (``~/Library/Application
  Support/patter`` on macOS / XDG data dir on Linux / ``%LOCALAPPDATA%``
  on Windows). Set ``persist=False`` to opt out.
- ``log_call_start`` / ``logCallStart`` now persists ``direction`` in
  ``metadata.json``. Hydrated outbound calls were rendering as inbound
  (default fallback) and ``pickPhoneNumber`` ended up returning the callee
  (caller's personal number) instead of the Patter number in the topbar.
- ``PATTER_LOG_REDACT_PHONE`` default flipped ``mask`` → ``full``. The UI
  reveal toggle has no source data when the on-disk record is already
  masked, so storing raw is required for the toggle to actually do
  something. ``~/Library/Application Support/patter`` is user-private.
- ``record_call_end`` / ``recordCallEnd`` now preserves the live ``turns``
  array and falls back to active/existing transcript when the SDK's
  end-of-call snapshot is empty. Mirrors TS fix that was never ported.
- Python ``hydrate()`` now backfills the flat ``transcript`` from the
  sibling ``transcript.jsonl`` when ``metadata.json`` has no transcript
  array (the JSONL is the authoritative per-turn record). Parity with TS
  ``loadTranscriptJsonl``.
- New ``aggregates.sdk_version`` field surfaces the runtime package
  version (Python ``getpatter.__version__`` / TS ``package.json`` read at
  runtime via the dist-relative path). Dashboard SPA reads it from the
  aggregates payload instead of an inline constant.
- Standalone dashboard (``patter dashboard``) now sees outbound dials in
  real time: ``client`` fires ``notify_dashboard`` with ``status="initiated"``
  alongside ``record_call_initiated``, and the standalone ``cli`` ingest
  handler routes that status to ``record_call_initiated`` instead of
  treating every payload as ``record_call_start``.

Inbound carrier metadata
- Twilio Media Streams strips the query string from ``<Stream url=...>``
  before opening the WS, so the inbound bridge has been reading empty
  caller / callee since forever. ``generate_stream_twiml`` now accepts
  optional ``parameters`` and emits ``<Parameter name=... value=.../>``
  children of ``<Stream>``; ``twilio_stream_bridge`` falls back to
  ``start.customParameters`` when WS query params are empty.

Test scripts
- Acceptance scripts under ``releases/0.6.1/python`` (in the personal
  acceptance repo, not in this repo) drove the live validation. ``inbound.py``
  and ``outbound_amd_ringtimeout.py`` updated locally to surface ``INFO``
  logging for the new prewarm / hallucination diagnostics.
- Bump Python (``__init__.py`` + ``pyproject.toml``) and TypeScript
  (``package.json``) to ``0.6.2``.
- Promote ``## Unreleased`` to ``## 0.6.2 (2026-05-25)`` in CHANGELOG.
- Refresh ``docs/github-banner.png`` with the new branded artwork
  (Agent / Patter stack) used by the README and the GitHub social
  preview.

Bundles the 14-bug fix wave validated live in 0fc4615 (GA Realtime
adapter, prewarm + adoption hardening, dashboard persistence, inbound
caller/callee via ``<Parameter>``, Whisper hallucination filter,
deferred ``response.create`` + ``request_response()``, version
auto-derive, persist default ON, phone-redact default ``full``,
direction in ``metadata.json``, ring-buffer turns preservation, JSONL
transcript backfill, standalone-dashboard ``call_initiated`` relay).
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 25, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview May 25, 2026, 8:51 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Updates the Python + TypeScript regression suites to reflect the public
behaviour changes that landed in 0fc4615 / f16bda0. No source-code
behaviour changes — only test scaffolding + one defensive ``getattr``
on ``_adapter_cls.__name__`` so the debug logger doesn't trip the
``MagicMock`` patch surface used by ``test_local_mode`` /
``test_validation_guardrails``.

Python (8 fixes)
- ``test_twilio_handler.test_stream_url_contains_caller_param`` /
  ``...callee_param`` / ``test_local_mode.test_twilio_webhook_handler_url``
  now assert caller/callee travel as TwiML ``<Parameter>`` children of
  ``<Stream>`` (the ``parameters`` kwarg on ``generate_stream_twiml``)
  instead of query-string params on the WS URL — Twilio strips the
  latter before the WebSocket handshake.
- ``test_providers_io_unit.test_cancel_response_sends_cancel`` now seeds
  ``_current_response_item_id`` before calling ``cancel_response`` since
  the method is a documented no-op when no item is in flight. Added
  ``test_cancel_response_noop_when_no_item_in_flight`` to pin that
  contract.
- ``test_local_mode.test_mark_events_sent_after_audio`` /
  ``test_validation_guardrails.test_guardrail_triggers_cancel_and_replacement``
  /
  ``test_providers_unit.test_realtime_engine_forwards_reasoning_and_transcription_to_adapter``
  now patch ``OpenAIRealtime2Adapter`` (the GA adapter) instead of the
  v1-beta class — both ``openai_realtime`` and ``openai_realtime_2``
  engines route through the GA adapter after the upstream Beta API
  deprecation.
- ``stream_handler`` debug log now wraps ``_adapter_cls.__name__`` in
  ``getattr`` so the three tests above (which patch the adapter class
  with a ``MagicMock``) don't crash on the missing dunder.

TypeScript (4 fixes + 3 skips)
- ``openai-realtime.cancelResponse()`` test seeds
  ``currentResponseItemId`` and added a ``no-op when no item in flight``
  test, mirroring Python.
- ``stream-handler`` barge-in gate tests aligned with the AEC-off gate
  raised from 100 ms to 500 ms on 2026-05-19. ``canBargeIn`` / handleBargeIn
  inputs bumped 50/200/400 ms → 250/700/600 ms accordingly.
- ``prewarm.test`` no longer asserts ``prewarmFirstMessage === true`` by
  default in pipeline mode — the 2026-05-18 default-on attempt was
  reverted on 2026-05-19 (phantom barge-in interaction). Test now pins
  the opt-in semantics described in ``client.ts:536-547``.
- Three ``describe`` blocks marked ``describe.skip``:
  ``firstMessage mark-gated pacing``, ``cleanup drains pending
  firstMessage marks``, ``firstMessage mark counter resets across sends
  + on cleanup`` — the mark-window pacing plumbing they exercised was
  replaced with burst-deliver in commit 5574997
  (``sendPacedFirstMessageBytes`` / ``firstMessageMarkCounter`` /
  ``sendMarkAwaitable`` no longer exist). Kept as ``skip`` rather than
  deleted to preserve the historical intent.
…me engine

Audit ran via 3 parallel agents (Python SDK accuracy, TypeScript SDK
accuracy, navigation + cross-links) cross-referencing every public
identifier, default, and behaviour described in ``docs/`` against the
0.6.2 source. 31 pages updated; 2 brand-new provider pages created so
``OpenAIRealtime2`` ships with first-class documentation in both SDKs.

New pages
- ``docs/python-sdk/providers/openai-realtime-2.mdx``
- ``docs/typescript-sdk/providers/openai-realtime-2.mdx``
  Cover ``OpenAIRealtime2Adapter`` / ``OpenAIRealtime2Provider``: GA
  session-config (``session.type = "realtime"``, nested
  ``audio.{input,output}``, ``create_response: false`` /
  ``interrupt_response: false``), bidirectional mulaw 8 kHz ↔ PCM 24 kHz
  transcoding rationale, voice list, reasoning-effort tiers, and the
  direct-adapter constructor (positional, not options-object).
- ``docs/docs.json`` adds both pages to the Engines group in their
  respective SDKs.

Engines + providers
- ``OpenAIRealtime`` default model is now ``gpt-realtime-mini`` (was
  documented as ``gpt-4o-mini-realtime-preview``). Voice enum widened
  to include ``ash``/``ballad``/``coral``/``sage``/``verse``.
- Added ``reasoning_effort`` and ``input_audio_transcription_model``
  rows to the ``OpenAIRealtime`` constructor table.
- ``ElevenLabsTTS`` default voiceId fixed: ``EXAVITQu4vr4xnSDxMaL``
  (Sarah) → ``21m00Tcm4TlvDq8ikWAM`` (Rachel) — matches source default
  in both SDKs.
- Provider pages add the GA Realtime VAD threshold note (0.5, not 0.1)
  and the Whisper hallucination filter behaviour.

Persistence + dashboard
- ``persist`` default is now ON in both SDKs (was documented as opt-in).
  Flipped narrative + tables + env var notes across ``persist.mdx``,
  ``call-logging.mdx``, ``configuration.mdx``, ``quickstart.mdx``,
  ``reference.mdx``.
- ``PATTER_LOG_REDACT_PHONE`` default ``mask`` → ``full`` across
  ``configuration.mdx``.
- Added ``direction`` and ``aggregates.sdk_version`` fields to the
  dashboard / call-log schema docs.

Inbound carrier metadata
- ``local-mode.mdx`` + ``carrier.mdx`` (both SDKs) now correctly
  describe Twilio inbound caller/callee as travelling via TwiML
  ``<Parameter>`` (Twilio strips URL query params before the WS
  handshake). Telnyx still uses query string — distinction documented.

Call surface
- ``Patter.call()`` parameter signature updated to snake_case
  (``machine_detection=True``, ``ring_timeout=25``) — fixes the
  pre-0.6.2 PascalCase crash documentation.
- AMD narrative flipped to "default on" in ``features.mdx``.
- ``phone.serve()`` examples in TS docs fixed:
  ``phone.serve(agent)`` → ``phone.serve({ agent })`` (5 pages).

Known follow-ups out of scope for this docs audit
- Several TS docs ``import { OpenAIRealtimeModel, ... } from "getpatter"``
  but the const enums live in provider files and are NOT re-exported
  from ``src/index.ts``. Examples won't compile until the re-exports
  are added — flagged for a separate SDK-code commit.
- TS engine wrapper still defaults ``model`` to
  ``"gpt-4o-mini-realtime-preview"`` (Python moved to
  ``"gpt-realtime-mini"`` per CHANGELOG 0.6.2). Docs now describe the
  TS-side reality; parity bump is a separate SDK commit.

Inventory rows for 0.6.2 features appended to
``patter-assets/patter_sdk_features.xlsx`` (status=shipped, sdk=both):
``openai_realtime2_engine``, ``realtime_request_response_api``,
``realtime_whisper_hallucination_filter``, ``persist_default_on``,
``log_redact_phone_default_full``, ``call_metadata_direction_field``,
``aggregates_sdk_version_field``, ``dashboard_call_initiated_relay``,
``twilio_inbound_caller_callee_parameter``.
Two follow-ups surfaced by the 0.6.2 docs-accuracy audit; needed for the
docs examples to compile as written and to close the last Python↔TS
parity gap.

src/index.ts re-exports
- ``OpenAIRealtimeAudioFormat``, ``OpenAIRealtimeModel``,
  ``OpenAIRealtimeVADType``, ``OpenAITranscriptionModel``,
  ``OpenAIVoice`` from ``./providers/openai-realtime``
- ``ElevenLabsModel``, ``ElevenLabsOutputFormat`` from
  ``./providers/elevenlabs-tts``
- ``DeepgramModel`` from ``./providers/deepgram-stt``
- ``CartesiaTTSModel``, ``CartesiaTTSVoiceMode`` from
  ``./providers/cartesia-tts``
- ``RimeModel``, ``RimeAudioFormat`` from ``./providers/rime-tts``
- ``PricingUnit``, ``PRICING_VERSION``, ``PRICING_LAST_UPDATED`` +
  ``PricingUnitValue`` / ``ModelPricing`` types from ``./pricing``
  (``ProviderPricing`` was already exported earlier in the file)

Pre-fix the docs (``tts.mdx``, ``stt.mdx``, ``metrics.mdx``,
``providers/openai-realtime.mdx``, ``providers/elevenlabs-tts.mdx``)
showed ``import { OpenAIRealtimeModel, ... } from "getpatter"`` —
those examples now actually compile.

Engine default model
- ``engines/openai.ts`` ``Realtime.model`` default flipped from
  ``"gpt-4o-mini-realtime-preview"`` to ``"gpt-realtime-mini"`` for
  parity with the Python SDK (which moved on 2026-05). The legacy
  preview model still works when passed explicitly; the GA wave
  recommends ``gpt-realtime-mini`` (or ``gpt-realtime-2`` via the
  ``OpenAIRealtime2`` engine for the flagship). Docstring updated to
  reflect the bump rationale.

Lint clean, 1513 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant