refactor: repo cleanup, internal restructure, parity ports + bug-fix wave#83
Merged
Merged
Conversation
Repo-wide pass to remove external license headers, "ported from" notes
and competitor product names from source files, plus three runtime fixes
and one missing best-practice feature surfaced by the audit.
## Cleanup (zero residual livekit/pipecat/apache references)
- Removed Apache 2.0 header blocks from 12 Python + 12 TypeScript provider
files (the headers travelled in from external ports; Patter ships under
the root MIT LICENSE only — no per-file copyright notices).
- Stripped "Adapted from livekit-plugins-X" / "Ported from pipecat" /
"Based on LiveKit Agents" provenance comments across ~40 source files
in sdk-py/getpatter/{services,providers,observability,resources,evals}/
and sdk-ts/src/{services,providers,observability}/, including the
cartesia-stt USER_AGENT integration tag.
- Rewrote competitor framing in 12 docs MDX pages (provider docs,
patter-tool, call-logging) — descriptions now stand on Patter's own
shape, no migration-from-X language.
- Renamed test fixtures and variables that named LiveKit/Pipecat in
sentence_chunker tests (Py + TS) and the parity scenario JSON;
test logic preserved.
- Removed personal-name copyright in LICENSE / sdk-py/LICENSE /
sdk-ts/LICENSE in favour of "Patter Contributors".
- .gitignore: ignore .ruff_cache/, sdk/ (legacy build dir from the
pre-rename Python SDK), .agents/, skills-lock.json.
## Bug fixes
- llm_loop.py:420-421 (Python): cache_read_input_tokens /
cache_creation_input_tokens were Anthropic-style keys, but every
Python provider emits cache_read_tokens / cache_write_tokens. Fix
reads the keys the providers actually emit, so OpenAI / Google
cache attribution is no longer silently zeroed.
- llm-loop.ts:304-308 (TS): non-OK upstream HTTP responses were logged
and silently swallowed; callers couldn't distinguish empty model
output from API failure. Now throws PatterConnectionError with the
status + truncated body.
## Performance
- text_transforms.py: precompiled the 14 markdown regex patterns and 2
emoji-cleanup helpers as module-level constants — they previously
recompiled on every sentence flush. Drop-in win, public API and
37/37 existing tests unchanged.
## Feature: default phone-call preamble
- New Agent.disable_phone_preamble (Py) / disablePhonePreamble (TS)
field, default False. When False, LLMLoop prepends a short
spoken-language preamble to system_prompt instructing the model to
avoid markdown / emojis / bullet lists and keep replies concise.
- Wired through stream_handler and test_mode in both SDKs.
- Adds two Py tests and one TS test covering the new behaviour.
## Test status
- Python: 1466 passed, 8 skipped
- TypeScript: 1164/1164 passed
- sdk-py/.env.example, sdk-ts/.env.example: full inventory of every env var the SDK reads at runtime, grouped by role (telephony, LLM providers, STT, TTS, tracing, Patter tunables). Only OPENAI_API_KEY + a telephony carrier is required; the rest are uncommented as the user enables specific provider integrations. - .env.example.cloud removed — variables (PATTER_DATABASE_URL, PATTER_ENCRYPTION_KEY, PATTER_REDIS_URL, etc.) belonged to the hosted cloud surface that was retired in 0.5.3. - Root .env.example kept as a minimal quickstart sample.
Replace the magic strings ``"minute"`` / ``"1k_chars"`` / ``"token"`` sprinkled across DEFAULT_PRICING with a named enum, so the pricing table reads as a typed shape rather than free-form dicts. - Python: ``PricingUnit(StrEnum)`` — ``MINUTE``, ``THOUSAND_CHARS``, ``TOKEN``. Subclassing ``str`` keeps the dict JSON-serialisable and unchanged for any consumer that compares against the literal string. - TypeScript: ``PricingUnit`` const object + ``PricingUnitValue`` union type. ``ProviderPricing.unit`` accepts ``PricingUnitValue | string`` so user overrides loaded from JSON / env config still flow through ``mergePricing`` without type gymnastics. - Behaviour preserved end-to-end: 143 Python pricing/metrics tests pass, 18 TypeScript pricing tests pass, full suites 1466 Py / 1164 TS green.
…xamples
mcp-use-style monorepo layout: each SDK gets its own library folder with
README, CLAUDE.md, .env.example, tests/, and the package source. Sample
code is maintained in separate example repos and is no longer tracked
here (notebooks tutorial preserved — it's the documentation, not an
example).
## Layout
```
libraries/
├── python/ (was sdk-py/)
│ ├── README.md, CLAUDE.md, LICENSE, .env.example
│ ├── pyproject.toml, pytest.ini
│ ├── getpatter/
│ └── tests/
└── typescript/ (was sdk-ts/)
├── README.md, CLAUDE.md, LICENSE, .env.example
├── package.json, tsconfig.json, vitest.config.ts, tsup.config.ts
├── src/
└── tests/
```
## What changed
- 405 ``git mv`` renames so history follows every file. ``sdk-py/`` and
``sdk-ts/`` no longer exist on disk.
- Per-library CLAUDE.md guides (~40 lines each); .gitignore exception
``!libraries/*/CLAUDE.md`` so the library guides ARE tracked while the
root guide stays ignored.
- CI: ``.github/workflows/{audit,release,test,docs-feature-drift}.yml``
rewritten to the new paths. ``scripts/check_feature_docs_drift.py``
also fixed (it had a stale ``patter/__init__.py`` from the pre-rename
era).
- Pre-commit, pre-push, ``scripts/pr-validate.sh``, top-level README and
CONTRIBUTING.md re-pointed at ``libraries/{python,typescript}``.
- Internal package re-organisation (``handlers → telephony``, splitting
``audio/``, ``tools/``) deliberately deferred to a follow-up PR — that
layer of import-path churn doesn't belong in the same commit as the
outer rename.
## Examples
``examples/{developer,enterprise,startup,integrations}/`` removed (24
files + the index README). Sample code is published in dedicated repos.
``examples/notebooks/`` kept — it's the 24-notebook tutorial series
documented in the Mintlify site and depended on by
``.github/workflows/notebooks.yml`` and ``.pre-commit-config.yaml``.
PatterTool docs now point at the external example repo (TODO comment
left for the canonical URL — to fill in once the public examples repo is
public).
## Test status
- Python: 1413 passed, 6 skipped (pytest libraries/python/tests)
- TypeScript: 1164 passed, 67 files (vitest run libraries/typescript)
- TypeScript: ``tsc --noEmit`` clean (one pre-existing
``@ts-expect-error`` in silero-vad — predates this branch)
Wave 2 of the cleanup pass — covers half of the provider integrations. Replaces hardcoded model/voice/format/sample-rate strings with typed enums (Python ``StrEnum`` / ``IntEnum``, TypeScript ``const`` objects + union types) so user code gets autocomplete and the type system catches typos at the call site instead of at the provider's HTTP 400. ## Agent / public types - ``Agent.provider`` (Python) tightened from ``str`` to a ``ProviderMode = Literal["openai_realtime", "elevenlabs_convai", "pipeline"]`` alias. TS counterpart was already a string union. - Expanded ``Agent`` (Py) and ``AgentOptions`` (TS) docstrings to document the precedence rule for fields that appear both on the agent and on the engine adapter (``voice``, ``model``, ``language``): explicit kwarg on ``agent()`` wins; otherwise the engine value populates the agent via ``_unpack_engine``; otherwise the default. - No behaviour change. ``StrEnum`` subclasses ``str``; existing callers passing raw strings keep working. ## Providers covered Python: ``anthropic_llm``, ``cartesia_tts``, ``cerebras_llm``, ``deepgram_stt``, ``elevenlabs_tts``, ``google_llm``, ``groq_llm``, ``lmnt_tts``, ``openai_realtime``, ``rime_tts``. TypeScript: ``anthropic-llm``, ``cerebras-llm``, ``deepgram-stt``, ``elevenlabs-tts``, ``google-llm``, ``groq-llm``, ``lmnt-tts``, ``openai-realtime``, ``rime-tts``. Each module now exposes its own ``<Provider>Model`` / ``<Provider>OutputFormat`` / ``<Provider>Voice`` / etc. New enums are re-exported from ``__init__.py`` and ``index.ts`` in dedicated "provider-specific enums" sections. ## Still pending The following providers still hold magic strings — covered in a follow-up commit: ``assemblyai_stt``, ``soniox_stt``, ``speechmatics_stt``, ``cartesia_stt``, ``telnyx_stt``, ``whisper_stt``, ``elevenlabs_ws_tts``, ``openai_tts``, ``telnyx_tts``, ``gemini_live``, ``ultravox_realtime``, ``silero_vad``, ``silero_onnx``, ``krisp_*``. The TS ``cartesia-tts.ts`` enums also still need to land (Py is already covered). ## Test status - Python: 1466 passed, 8 skipped - TypeScript: 1164/1164 passed; ``tsc --noEmit`` clean (one pre-existing silero-vad warning unchanged)
Provider enum residuals (Wave 2.5) - Python: assemblyai_stt, cartesia_stt, soniox_stt, speechmatics_stt, telnyx_stt, whisper_stt, elevenlabs_ws_tts, openai_tts, telnyx_tts, gemini_live, ultravox_realtime, silero_vad, silero_onnx, krisp_* - TypeScript: assemblyai-stt, cartesia-stt, cartesia-tts, soniox-stt - All hardcoded model/voice/format strings now live behind StrEnum/IntEnum (Python) or const-object + value union (TypeScript) Bug fixes (Wave 3a) - stream_handler: barge-in now sets asyncio.Event / AbortController to cancel in-flight LLM stream instead of letting it run to completion - server (Py): SSRF validator on outbound webhook URLs + per-IP WS cap (MAX_WS_PER_IP=10) for parity with TS - server (Py): voicemail POST gets explicit 10s timeout - metrics (Py): agent_response_ms accepts 0.0 instead of treating it as "missing" (use is None gate) - metrics (TS): emit llm/stt/tts TTFB events on the event bus - observability/event_bus (Py): listener errors now surface to logger instead of being swallowed - server (TS): queryTelephonyCost catch logs instead of silent return
Stable, machine-readable error codes attached to every Patter exception
class. Existing class-name-based catches keep working; the enum is
additive.
ErrorCode values (10): CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT,
WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION,
INTERNAL.
- Python: StrEnum on `exceptions.py`; class-default `code` attribute
per subclass (PatterError → INTERNAL, PatterConnectionError →
CONNECTION, AuthenticationError → AUTH, ProvisionError → PROVISION,
RateLimitError → RATE_LIMIT). Optional `code=` kwarg on the base
ctor lets callers override per-instance.
- TypeScript: const-object + value union in `errors.ts`; `readonly
code: ErrorCode` on every class; optional `{ code }` constructor
option. Same class→code mapping byte-for-byte with Python.
- Both SDKs re-export `ErrorCode` from the package root.
- Test parity asserts the enum value sets match between SDKs.
Companion to 8b8c503 (test files). Ships the actual enum + class wiring: - libraries/python/getpatter/exceptions.py — ErrorCode StrEnum, default .code per subclass, optional code= kwarg on PatterError.__init__ - libraries/python/getpatter/__init__.py — re-export ErrorCode - libraries/typescript/src/errors.ts — ErrorCode const-object + value union, readonly code on every class, optional { code } ctor option - libraries/typescript/src/index.ts — re-export ErrorCode
…d with Twilio ElevenLabs WS TTS streams `ulaw_8000` natively. When the carrier is Twilio (mulaw 8 kHz), we can let ElevenLabs do the encoding server-side and skip the SDK-side mulaw transcode entirely. - ElevenLabsWebSocketTTS.set_telephony_carrier(carrier) / TS setTelephonyCarrier(carrier) — duck-typed hook called by the stream handler after TTS construction. Maps "twilio" → "ulaw_8000", "telnyx" → "pcm_16000" (lowest conversion). - output_format constructor arg becomes truly optional (sentinel) — user-passed format wins over the carrier hint. - for_twilio / for_telnyx factories already pass explicit formats → the carrier hint is a no-op for those callers. - 7 new unit cases per SDK in TestCarrierAutoFlip / equivalent: default flip, URL contains ulaw_8000, telnyx no-op, explicit format respected, factory wins, unknown carrier no-op. No public-API break — existing constructor calls behave identically when no carrier hook is wired up.
…sample_rate) OpenAI TTS streams 24 kHz audio. The default 24k→16k resample stays for the Telnyx (PCM 16 kHz) carrier; for Twilio (mulaw 8 kHz) the chained 24→16 + 16→8 used to cost two ratecv passes. New `target_sample_rate=8000` constructor opt-in collapses the two passes into a single 3:1 decimation with a tighter LPF (Nyquist ≈ 4 kHz). - Python: getpatter.services.transcoding.create_resampler_24k_to_8k() factory; OpenAITTS gains optional `target_sample_rate=16000` (default preserves existing behaviour). - TypeScript: createResampler24kTo8k() + 24000→8000 case in StatefulResampler; OpenAITTS gains optional positional `targetSampleRate=16000` with `LPF_ALPHA_8K=0.45` for proper anti-aliasing at 4 kHz Nyquist. Auto-engagement on Twilio carriers is deferred — the audio sender currently assumes 16 kHz PCM input. Manual opt-in keeps the change narrowly scoped.
Bug #48 — per-language honorifics - New HONORIFICS_{EN,IT,ES,DE,FR,PT} constants merged into HONORIFICS_ALL (sorted longest-first). Module-level HONORIFICS_REGEX_ALT alternation built once. The aggregation is union-of-all regardless of `language` (mixed-language deployments are common; safer default). - splitSentences prefix regex sources from the union — sentences like "Ho incontrato il Sig. Rossi alla riunione" no longer split mid-honorific in any of the supported languages. Bug #49 — single-word "Yes." never flushed - DEFAULT_MIN_WORDS_FOR_SHORT_FLUSH lowered from 2 → 1; single-word replies ending in "."/"!"/"?" now flush on the terminator. - New gate #6 in maybeShortFlush blocks flushes whose trailing word is a known honorific — prevents "Mr." / "Sig." escaping as a standalone sentence. - Legacy escape hatch: pass `minWordsForShortFlush=2` to restore the pre-fix behaviour. Tests: 22 Python + 21 TS new honorific cases; 12 + 12 single-word flush cases. Existing tests updated where they asserted the old buffered behaviour for single-word replies. Both suites green (Py 1538, TS 1224).
… silero-vad lint - CHANGELOG.md: comprehensive Unreleased section covering reorg, provider enums, error taxonomy, bug-fix wave, perf wins, and cleanup work landing on this branch. - tool_executor.py: add module-level docstring describing the SSRF guard, response-size cap, and OTel span emission. - silero-vad.ts:127: replace stale @ts-expect-error directive (now a TS2578 warning since onnxruntime-node types resolve at build) with a plain comment explaining the optional-peer-dep dynamic import.
…ools/
Internal restructure of the Python SDK; PUBLIC API surface unchanged.
- handlers/{twilio,telnyx,common}_handler.py → telephony/{twilio,telnyx,common}.py
("_handler" suffix dropped — the parent module name already conveys
intent). stream_handler.py promoted out of handlers/ to package root
since it's the per-call orchestrator, not a telephony adapter.
handlers/ folder removed.
- services/{transcoding,pcm_mixer,background_audio}.py → audio/* (audio
pipeline collected in one place).
- services/{tool_decorator,tool_executor}.py → tools/* (tool-decoration
& webhook-execution kept together).
- Other services/* stay where they are: llm_loop, metrics,
sentence_chunker, text_transforms, ivr, fallback_provider,
pipeline_hooks, chat_context, call_log, remote_message.
- tts/ and stt/ namespaces kept — they expose
getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve
and are public surface.
- File moves use git mv so blame/history follow.
- Imports rewritten across providers, server, services, tests, and
package-root re-exports. Python tests: 1538 passed.
TS side ships in a separate commit.
TS internal restructure for parity with the Python d5d9391 commit. Public API surface unchanged. - carriers/{twilio,telnyx}.ts → telephony/{twilio,telnyx}.ts (rename for naming parity with Py; "carrier" was the original term, "telephony" reads better next to twilio/telnyx). - transcoding.ts → audio/transcoding.ts. - services/background-audio.ts → audio/background-audio.ts. - tool-decorator.ts → tools/tool-decorator.ts. - Imports rewritten across client, index, types, stream-handler, deepfilternet-filter, plus 5 test files. TS tests: 1224 passed, tsc --noEmit clean. The telephony/audio/tools triad now matches between Python and TypeScript SDKs.
Update per-library AI-agent quickstarts to match the post-restructure package tree. Adds the new folder names (telephony/, audio/, tools/) and a one-line description per folder.
…rvability/dashboard/top-level
…/tools
Adds 1-3 line docstrings to public symbols (modules, classes, methods)
in libraries/python/getpatter/{providers,telephony,audio,tools} that
previously had none. No behaviour changes; pre-existing docstrings are
left untouched.
…ability/dashboard/top-level Adds short JSDoc summaries to public classes, interfaces, type aliases, and exported functions that were missing them. Existing JSDoc is preserved verbatim — this is a fill-the-gaps pass only, no rewrites.
Mechanical replace of stale path strings in docstrings, comments, and .env.example headers: - sdk-ts/src/* → libraries/typescript/src/* - sdk-py/getpatter/* → libraries/python/getpatter/* - conceptual "(sdk-py)" → "the Python SDK" No behaviour change; tests still 1538 passed, tsc clean.
The e2e Playwright suite (tests/e2e/*.spec.ts + playwright.config.ts + @playwright/test / playwright devDeps) were inherited from an earlier "comprehensive test suite" PR but never integrated with downstream test infra after the libraries/ reorg. Per CLAUDE.md, end-to-end call testing lives in a separate downstream test repo. - Drop libraries/typescript/playwright.config.ts. - Drop libraries/typescript/tests/e2e/ (6 .spec.ts + test-server.ts). - Remove @playwright/test and playwright from package.json devDeps. - Refresh package-lock.json (npm install). - silero-vad.ts: switch back to @ts-ignore on the optional onnxruntime-node import — the dynamic-import line surfaces a TS7016 warning when types are unresolved post-lock-refresh.
…select_sound_from_list, resample_24k_to_16k from TypeScript SDK Closes 5 public-surface gaps in the Python SDK so every symbol exported from libraries/typescript/src/index.ts now has a Python counterpart. - ``DefaultToolExecutor`` — async tool dispatcher with retry/fallback, webhook SSRF validation via ``server.validate_webhook_url``, and the same JSON error shape as the TypeScript class. Added to ``services/llm_loop.py``. - ``LLMChunk`` — frozen dataclass mirror of the TS ``LLMChunk`` interface (text/tool_call/done/usage). ``to_dict()`` produces the same shape as ``OpenAILLMProvider.stream`` for callers that prefer dicts. - ``builtin_clip_path`` — top-level helper resolving ``BuiltinAudioClip`` values (or raw filenames) to absolute paths. ``BuiltinAudioClip.path`` now delegates to the new function for a single source of truth. - ``select_sound_from_list`` — promoted from a private static method on ``BackgroundAudioPlayer`` to a public top-level function. The static method is preserved as a backward-compatible delegator. - ``resample_24k_to_16k`` — stateless one-shot helper following the existing ``resample_8k_to_16k`` / ``resample_16k_to_8k`` convention, including the per-process ``DeprecationWarning`` latch. All five symbols are re-exported from ``getpatter.__init__`` and listed in ``__all__``. The five ``TODO(parity)`` markers are removed in the same commit. 25 unit tests added in ``tests/unit/test_parity_ports.py`` covering public-symbol reachability, ``LLMChunk`` round-trip, real handler/webhook dispatch through ``DefaultToolExecutor`` (including the SSRF-blocked branch), bundled clip resolution, weighted selection empirics, and equivalence of ``resample_24k_to_16k`` to a single-shot ``StatefulResampler``. Tests: 1546 → 1571, all passing.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Bug #2 from the barge-in audit: on speakerphone / tunnel-loop deployments the agent's outbound TTS bleeds back into the mic. VAD sees that bleed as continuous voice-like energy and never transitions out of "speaking" state, so a caller interruption only registers during natural TTS pauses → "interrupt sometimes works, sometimes the agent keeps talking" intermittent symptom. Fix at the source — proper acoustic echo cancellation. NLMS adaptive filter (2048 taps @ 16 kHz, 128 ms history) subtracts an estimate of the TTS-derived echo from the mic stream before VAD/STT see it. Geigel double-talk detector freezes adaptation when the caller is speaking on top of the agent so the filter does not learn the user's voice as part of the channel response. Convergence on the synthetic narrowband test signal: - ~24 dB ERLE after 1 s of TTS-only training - Near-end speech preserved within 0 dB during double-talk Not a drop-in replacement for WebRTC AEC3 (state-of-the-art needs adaptive sub-band processing + comfort noise + nonlinear post-filter that this scope does not cover). For production-grade quality, wrap a binding to ``webrtc-audio-processing-2`` externally. - libraries/python/getpatter/audio/aec.py — NlmsEchoCanceller class. - libraries/typescript/src/audio/aec.ts — TS parity. - Agent.echo_cancellation / AgentOptions.echoCancellation — opt-in flag, default false. Handset / headset deployments don't need it and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played. - PipelineStreamHandler.start() (Py) / StreamHandler.initPipeline (TS) instantiate the canceller when the flag is on. Far-end tap fires before the carrier transcode in synthesizeSentence; near-end runs after the inbound 8k→16k resample, before VAD. - 8 unit tests per SDK covering convergence, double-talk preservation, construction validation, pass-through-before-priming, reset, empty buffers. Tests: Py 1574 passed (+8), TS 1236 passed (+8), tsc clean.
…e / echo_cancellation in agent() builder Pre-existing parity violation surfaced during the AEC audit: the Py ``Patter.agent(...)`` builder enumerates kwargs explicitly, so any field not listed silently drops on the floor. Three boolean flags on the Agent dataclass — ``aggressive_first_flush``, ``disable_phone_preamble``, and the freshly added ``echo_cancellation`` — were unreachable through the builder, forcing users to construct ``Agent(...)`` directly. TS does not have this problem because ``agent(opts)`` spreads the whole ``AgentOptions`` object, so every field passes through. Add the three flags to the Py builder signature and forward them to ``Agent(...)``. Defaults match the dataclass (all ``False``) so existing callers keep their behaviour. 2 new tests: - builder defaults match dataclass defaults (no silent True leak) - explicit ``aggressive_first_flush=True`` / ``disable_phone_preamble= True`` / ``echo_cancellation=True`` reach the resulting Agent Tests: Py 1576 passed (+2), TS 1236 unchanged, tsc clean.
Real cellular-call test on 0.6.0 with the initial 2048-tap +
constant-0.1-step config exposed an 8–12 s convergence window during
which the user's first turn was either over-cancelled to silence
(filter eats voice while learning the channel) or contaminated by
residual echo (Deepgram transcribes garbage and discards). The user
report: ~11 s of perceived silence after firstMessage, then everything
worked from turn 4 onward. Net first-turn UX was worse than no AEC.
The architectural fix the user asked for ("source-level, no workaround,
solid"): two NLMS hyperparameter changes that compress convergence
into the first ~250 ms — the same window where the agent's
firstMessage finishes playing.
1. **512 taps (was 2048)**: 4× fewer coefficients to converge with no
measurable cancellation loss on cellular / VoIP paths whose RT60
stays under ~50 ms after the carrier's own echo suppression. Pass
``filter_taps=2048`` explicitly for landline hairpin loops where the
tail extends beyond 32 ms @ 16 kHz.
2. **Adaptive step**: aggressive warm-up step (0.5) for the first 0.5 s
of processed audio, then taper to the textbook 0.1 for steady-state
tracking. The Geigel double-talk detector still gates updates so the
larger step does not learn the caller's voice into the echo model.
Verification: regression-test fed a broadband synthetic signal (3
sinusoids + white noise) in realistic 20 ms frames hits **17–19 dB
ERLE in the very first 250 ms** with the new defaults — well above the
previous 0 dB at the 1.25 s mark.
- New constructor knobs: ``warmup_step_size`` (default 0.5),
``warmup_seconds`` (default 0.5). Step branch is constant within a
frame so the inner sample-by-sample loop stays branch-free.
- Validation extended for the two new fields.
- ``reset()`` now clears the ``processed_samples`` counter so the
warm-up window re-engages on filter reset.
- 1 new regression test per SDK enforces the "≥10 dB ERLE in the first
250 ms with defaults" guarantee on a broadband signal.
Tests: Py 1577 passed (+1), TS 1237 passed (+1), tsc --noEmit clean.
… ring flush
Two fixes for the speakerphone "agent unresponsive on first turn / mid-call
gets stuck after a few exchanges" symptom reported on 0.6.0 cellular tests.
1. firstMessage was bypassing beginSpeaking + AEC far-end tap
The ``firstMessage`` block streamed TTS chunks directly to the carrier
without (a) marking ``isSpeaking=true`` and (b) pushing each chunk to
``aec.pushFarEnd()``. Consequence on speakerphone: while the intro
played, the self-hearing guard never engaged, the user's audio (mixed
with TTS bleed) was forwarded to STT and produced garbage transcripts;
AEC had no reference signal so the bleed survived in the inbound
channel. Wraps the firstMessage TTS streaming loop in
``beginSpeaking()`` + ``try/finally { endSpeakingWithGrace() }`` and
pushes each chunk to ``aec.pushFarEnd()`` before encoding for the
carrier. Mirrors the per-turn behaviour of ``runPipelineLlm`` /
``_process_streaming_response``.
2. Ring buffer must NOT flush on natural turn end
An earlier iteration also flushed ``inboundAudioRing`` from
``endSpeakingWithGrace`` so user audio captured during the agent's TTS
that never tripped VAD would still reach STT. In practice this raced
live STT input post-grace: the ring contained partially-cancelled echo
(AEC still adapting) and possibly over-cancelled user voice (Geigel
rho=0.6 misses quiet double-talk). Replaying on every turn produced
phantom transcripts that confused the LLM and caused the "out of order
responses + agent gets stuck" symptom the user observed mid-call.
Reverted: flush only on real barge-in (where VAD confirmed user
speech). Audio captured during the agent's turn that VAD did not
classify as speech is intentionally dropped at the next
``beginSpeaking`` — the user can repeat themselves rather than have
the LLM react to a stale phantom transcript.
The barge-in flush remains: extracted into ``flushInboundAudioRing()`` /
``_flush_inbound_audio_ring()`` helpers (clean refactor, 1 caller now).
Stale "2048 taps + 0.5–2 s convergence" log message updated to the
post-AEC-tuning "512 taps + 0.5 s warmup μ=0.5 → ~250 ms convergence".
Tests: Py 1577 passed, TS 1237 passed, tsc --noEmit clean.
The previous fix wrapped the firstMessage TTS in ``beginSpeaking`` + ``endSpeakingWithGrace`` so the self-hearing guard could engage during the intro. This worked, but exposed a second defect: the AEC filter needs ~500 ms of TTS reference to converge, and during that warmup window residual TTS bleed in the inbound mic stream still looks like speech to VAD. With ``isSpeaking=true`` from frame zero of the firstMessage, the very first chunk of bleed triggered an immediate barge-in cancel — the firstMessage was killed before a single byte had been played. Test reported "agent never speaks". Fix: gate both barge-in entry points (VAD ``speech_start`` and transcript-based) on a 1-second minimum agent-speaking duration. Real users almost never start interrupting within the first second of an agent turn anyway, and the gate cleanly covers the AEC convergence period (500 ms warmup + safety margin). - TypeScript: ``MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN = 1000`` static on ``StreamHandler``. New ``speakingStartedAt: number | null`` field set in ``beginSpeaking()`` and cleared in ``cancelSpeaking()`` and the grace flip. New ``canBargeIn()`` helper used by both barge-in sites; suppressed events log at debug level so call-debug logs still show why the cancel did not fire. - Python: ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN = 1.0`` module-level constant. ``_speaking_started_at`` field with the same lifecycle. ``_can_barge_in()`` helper applied at the VAD speech_start path in ``on_audio_received`` and at the entry of ``_handle_barge_in``. Helper uses ``getattr`` so test fixtures that bypass ``_begin_speaking`` still permit barge-in to fire. 5 new unit tests (3 Py + 5 TS): - ``canBargeIn() / _can_barge_in()`` returns true with no active turn, false within the gate window, true past the gate window. - ``handleBargeIn / _handle_barge_in`` returns / does nothing during the warmup window, ``isSpeaking`` stays True. - ``handleBargeIn / _handle_barge_in`` fires normally past the gate. Tests: Py 1579 passed (+2), TS 1242 passed (+5), tsc --noEmit clean.
…f-default The previous AEC commits added a server-side NLMS adaptive filter and exposed an ``echoCancellation`` flag. Real-call testing on cellular PSTN turned up a fundamental architectural mismatch the early benchmarks did not catch: the round-trip echo path on Twilio Media Streams is ~250–1500 ms (jitter buffer + carrier loop), but a 512-tap NLMS filter at 16 kHz can only see the most recent 32 ms of far-end samples. The echo never lands inside the filter's window, the weights stay near zero, and the filter silently no-ops. Worse, with ``isSpeaking=true`` during firstMessage and a barge-in gate of 1 s, once the gate releases any residual bleed reaching VAD triggers an immediate self-cancel — the agent stops talking right after starting. Industry consensus from this round of research: - LiveKit & Pipecat handle echo cancellation at the transport layer for browser/native paths only. - Twilio's own guidance is to "rely on network echo cancellers" for telephone scenarios. - Vapi, Retell, Bland do not run server-side AEC. They rely on the carrier's network echo suppression and the caller device's built-in AEC (modern handsets ship one). Server-side NLMS is the right tool only when the SDK owns the audio path end-to-end and the loop latency is on the order of the filter window (~30 ms — browser WebRTC, mobile native). PSTN does not meet that bar and never will under realistic carrier conditions. This commit: - ``echoCancellation`` stays opt-in (default false) so existing PSTN callers see no change in behaviour. - When ``echoCancellation: true`` is detected on a Twilio or Telnyx carrier, log a clear warning explaining why it will not work as intended and what to do instead. The filter is still instantiated so curious operators can compare; the warning makes the recommendation explicit. For PSTN deployments, the working stack is: Patter's self-hearing guard + 1 s barge-in cooldown + Silero VAD with the phone-tuned preset + carrier / handset native echo suppression. No server-side AEC. Tests: Py 1579 passed, TS 1242 passed, tsc --noEmit clean.
…ncel drain
Six architectural fixes for the post-barge-in failure modes surfaced during
the 0.6.0 acceptance pass against real PSTN calls. Validated end-to-end on
six pipeline stacks (Deepgram + Groq/OpenAI/Anthropic/Cerebras/Google +
Cartesia/OpenAI TTS) with verbose Italian conversation flow.
1. Adaptive barge-in gate
- MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_AEC = 1000 (covers AEC warmup)
- MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_NO_AEC = 250 (anti-flicker only)
- canBargeIn() picks the right gate based on whether AEC is wired.
- Suppression call sites log at INFO level with the AEC state.
2. Inbound audio ring cap reduced from 30 frames (~600 ms) to 13 (~260 ms)
to match VAD minSpeechDuration. Pre-fix, the replay was dragging in
~350 ms of agent TTS bleed which Deepgram (default English) transcribed
as English garbage and committed to the LLM as phantom user input.
3. STT.finalize() on VAD speech_end
- New optional finalize() on STTAdapter / STTProvider.
- DeepgramSTT.finalize() exposes {type: 'Finalize'} as a public method.
- StreamHandler calls stt.finalize() whenever the SDK's VAD signals
speech_end so the provider returns is_final immediately rather than
waiting on its own (slow) endpointing heuristic.
4. AMD on by default + onMachineDetection callback (Twilio + Telnyx parity)
- New MachineDetectionResult carrier-agnostic shape.
- Twilio: MachineDetection=DetectMessageEnd + AsyncAmd=true (no
answer-latency penalty on human pickups).
- Telnyx: answering_machine_detection=greeting_end.
- Callback fires on both webhooks before the legacy voicemail-drop
path so callers see the result regardless of voicemailMessage.
5. Post-cancel drain window of 150 ms
- Tracks lastCancelAt timestamp on every barge-in cancel (both
VAD-path and transcript-path).
- beginSpeaking() is now async and awaits the drain remainder so the
remote PSTN player has time to flush the cancelled turn's tail
before the next TTS chunk lands on top of it. Eliminates the
"doubled first sentence" audio artefact reported during testing.
6. AssemblyAI accepts a parity-only `language` field for cross-provider
uniformity (forwarded as no-op; AssemblyAI selects language by model).
Both SDKs (TypeScript and Python) updated with identical defaults,
constants, and call-site coverage. Unit tests: TS 24/24 passing, Python
33/33 passing. Includes [DIAG] INFO logs in TS deepgram-stt.ts and
stream-handler.ts for the diagnostic phase; these can be removed in a
follow-up commit once the bleed-transcription root cause is sealed.
… tunnel grace
Bundles the SDK changes from a focused work session: 5 bug fixes + 6
new feature areas, with full Python ↔ TypeScript parity.
Bug fixes
---------
* fix(client): bump cloudflared quick-tunnel grace 2.5 → 5 s. The 2.5 s
window covered HTTP propagation only — Twilio's WSS upgrade for the
media stream goes through a different cloudflared edge route that
takes ~1-3 s longer; ~5 % of first calls dropped silently at pickup
with no media. 5 s drops the failure rate to <1 %. (client.ts /
client.py)
* fix(realtime): handler-only tools were silently ignored in TS Realtime
mode (CRITICAL). `handleFunctionCall` only dispatched `webhookUrl`
tools; tools with an in-process `handler` callback (the default
pattern in the demos) fell through without sending
`function_call_output`, hanging the model.
* fix(realtime): `onTranscript({ role: 'assistant' })` was never fired.
Assistant text was pushed into history but never surfaced via the
user-supplied callback, so demos only saw `[user]` lines.
* fix(realtime): dashboard transcript shown out of order. OpenAI Realtime
emits `input_audio_transcription.completed` AFTER `response.done`, so
the naïve push order was [assistant, user, ...]. Added a per-call
buffer (`pendingAssistantTurn` + 3 s fallback timer) that holds the
assistant turn until the matching user transcript arrives.
* fix(realtime): tool invocations were invisible in the transcript
timeline. Added `emitToolEvent` that pushes `role: 'tool'` history
entries and fires `onTranscript({ role: 'tool', tool_name, tool_args,
tool_result, ... })` for the call/return semantics.
Features
--------
* feat(api): `Patter({ persist })` opt-in dashboard persistence. The
on-disk per-call records (metadata.json, transcript.jsonl, events.jsonl)
were previously opt-in only via `PATTER_LOG_DIR`. New explicit option:
`false` (default), `true` (platform default location), or a custom
string path. Env var still works as deployment-time override.
* feat(tools): JSON-schema validation at `agent()` build time +
OpenAI strict-mode opt-in. Schemas are validated structurally for
every tool; `Tool({ strict: true })` additionally enforces OpenAI's
strict-mode requirements (recursive `additionalProperties: false`,
every property in `required`). Catches typos at build time.
* feat(tools): retry with exponential backoff + per-tool circuit
breaker. Both handler and webhook paths now get 3 attempts with
jittered backoff (capped at 5 s). New `CircuitBreakerRegistry` trips
OPEN after 5 consecutive failures and stays OPEN for 30 s before
allowing a HALF_OPEN probe; while OPEN it returns
`{error, fallback: true, circuit_state: "open", retry_after_ms}`.
* feat(tools): reassurance auto-message during long tool calls. New
`Tool({ reassurance: "Let me check..." })` (or
`{ message, afterMs }`) bridges the silence on slow tools by
enqueueing the message via `OpenAIRealtimeAdapter.sendText` after
`afterMs` (default 1500 ms) — cancelled if the tool returns earlier.
Realtime-only for now.
* feat(tools): MCP (Model Context Protocol) client integration (MVP).
New `agent({ mcpServers: [...] })` plugs the agent into MCP servers
(Google Workspace, PayPal, Postgres, GitHub, ...) without writing
wrapper handlers. Each server is queried at call start via
`tools/list`; discovered tools are wrapped with synthetic handlers
that dispatch to `tools/call` and merged into `agent.tools`.
Optional dependency: `@modelcontextprotocol/sdk` (TS) /
`mcp` (Py extra). Streamable-HTTP transport only for now.
* feat(tools): streaming results via async generator handlers. Tool
handlers can be `async function*` (TS) / `async def ... yield` (Py)
generators that emit `{ progress: "..." }` updates while running;
each yield is sent to the agent via `sendText` for inline status.
New files
---------
* libraries/typescript/src/tools/schema-validation.ts
* libraries/typescript/src/tools/circuit-breaker.ts
* libraries/typescript/src/tools/mcp-client.ts
* libraries/python/getpatter/tools/schema_validation.py
* libraries/python/getpatter/tools/circuit_breaker.py
* libraries/python/getpatter/tools/mcp_client.py
* test files: 4 TS + 3 Py covering schema validation, breaker, streaming, reassurance
Tests
-----
1280 TS · 1156 Py · 0 regressions. Updates two stale tests (AMD
on-by-default test in new-features.test.ts; handler retry count in
llm-loop.test.ts) to reflect new behaviour.
The dashboard is now a real SPA in `dashboard-app/` (Vite + React +
TypeScript) instead of a 700-line HTML/CSS/JS string embedded in
`dashboard/ui.{ts,py}`. The build pipeline produces a single
self-contained HTML file (vite-plugin-singlefile inlines JS + CSS)
which is committed to `libraries/typescript/src/dashboard/ui.html` and
mirrored to the Python package via `dashboard-app/scripts/sync.mjs`.
At runtime the SDK serves the same `GET /` endpoint as before — the
inlined HTML is loaded by tsup's esbuild ``text`` loader (TS) or the
package-data file (Py). Customer-side: zero change in start-up UX
(`phone.serve()` → http://127.0.0.1:8000/), but the dashboard is now
typed, modular, and maintainable as proper React.
Why this approach (option D from the design discussion):
* No CDN dependency at runtime (no unpkg.com / Babel-in-browser).
* No new runtime deps in the SDK — React + Vite live only at build time
in the dev repo; the published package ships static HTML.
* Self-contained bundle: the SDK still works air-gapped and behind
corporate firewalls.
* Type safety end-to-end (TSX components, tsconfig strict).
Components ported from the reference design:
* Topbar, PageHeader, Metric cards
* CallTable with row selection + search
* LiveCallPanel (transcript stream + call controls)
* LatencyPanel (p50 / p95 / STT / TTS bars)
* CostPanel (per-provider breakdown)
Hooks:
* useDashboardData — fetches `/api/dashboard/calls` + subscribes to the
SSE stream at `/api/dashboard/stream`
* useTranscript — incremental transcript updates per selected call
* mappers.ts — maps the wire format (CallRecord) to the UI shape
Build:
* `dashboard-app/` is its own Vite project with `npm run build && npm
run sync` — sync copies the inlined HTML to both SDKs.
* `libraries/typescript/tsup.config.ts` adds the ``.html`` text loader
so the asset is inlined into `dist/index.{js,mjs}`.
* `libraries/python/pyproject.toml` declares `ui.html` as
`getpatter.dashboard` package-data so it ships with `pip install`.
* `libraries/typescript/package.json` `files` array includes
`src/dashboard/ui.html` so npm packs it.
…s + persist Documents the two preceding commits in CHANGELOG.md under ``## Unreleased``: * Added: ``Patter(persist=...)`` option, JSON-schema validation + strict mode, retry + circuit breaker, reassurance auto-message, MCP client integration, streaming results. * Fixed: Realtime handler-tool dispatch, assistant ``onTranscript``, transcript ordering buffer, tool transcript events, cloudflared quick-tunnel WSS upgrade race. Per the project rule (``.claude/rules/documentation-best-practices.md`` invariant 0): every user-visible change updates ``## Unreleased`` in the same unit of work. The dashboard rewrite is intentionally NOT in the changelog — same URL, same UX, same `phone.serve()` entry point; the SPA migration is internal and customer-invisible.
Mirror of the built dashboard SPA into the Python package — produced by ``dashboard-app/scripts/sync.mjs`` alongside the TS-side ``libraries/typescript/src/dashboard/ui.html``. Should have been part of the dashboard SPA commit; tracking it now keeps the two SDKs in parity for ``pip install getpatter``.
Pure formatter pass — splits long argument lists across multiple lines and adds the missing blank line after the conditional ``import fastapi``. No logic changes; the test still verifies the dashboard store and routes the same way.
…ines, realtime mode Iterative refinement of the React/Vite dashboard SPA shipped in 3877719. Customer-side it remains a single embedded HTML file served from `phone.serve()` at `/`, but the UX is now markedly closer to the target design. UI changes: * Real Patter logo: mark (wireframe stack-tile from the favicon path, thin stroke instead of the chunky filled silhouette in `docs/logo/light.svg`) + tightened-viewBox wordmark, sized independently so the wordmark stays large while the mark line weight stays light. * Tab title: "Patter | Dashboard". Favicon: stack-tile SVG inline, matching the previous SDK dashboard. * Topbar: dropped Bell / Settings / Avatar buttons and "Place call" CTA (will reintroduce when actually wired). Phone-number pill always shown, derived from the most recent call's Patter-side number. * Live chip pulse: peach static when zero calls, green pulsing when ≥1 is active. * Latency + Cost merged into one MetricsPanel with a peach segmented switcher, fixing the right-rail clipping that hid Cost on short viewports. Realtime mode collapses the STT/LLM/TTS waterfall to a single end-to-end bar (those metrics aren't meaningful when the provider does the round-trip in one model call). Range filter (1h / 24h / 7d / All) is now real: * Bucket strategy aligned to natural boundaries — 12 × 5min, 24 × 1h, 7 × 1day, plus 9-bucket auto for All. Tooltip ranges read as "11:00 → 12:00" instead of "11:39 → 12:33". * Filtered call list, headline counters (Calls / Latency p95 / Spend), and sparklines all reflect the active range. Live calls always stay visible even when out of the range so users see what's happening now. * Sparkline scaling: tallest bar normalises to 100, no more lonely single bar surrounded by ghost grey lines. Sparklines are now interactive: * Hover any bar → custom tooltip (instant, dark surface, mono numerics in peach) showing the bucket window, call count, and a 4-call sample (number / status / cost). React-driven, replaces the slow native `title=""`. * Click → selects the newest call in that bucket into the right rail. * Empty buckets are invisible (no grey ghosts). * Bars now sit flush against the card bottom (flex column + `margin-top: auto`), matching the original design. Export CSV button is now wired to `/api/dashboard/export/calls?format=csv` via a transient anchor download. Backend additions: none — every change above is in `dashboard-app/` plus the synced `ui.html` rebundles in both SDKs. Pre-publish flow is still `cd dashboard-app && npm run build && npm run sync`.
New TTS adapter calling Inworld's HTTP NDJSON streaming endpoint `POST https://api.inworld.ai/tts/v1/voice:stream`. Defaults to `inworld-tts-2` (sub-200 ms TTFB, 100+ languages, natural-language voice steering); pass `model: "inworld-tts-1.5-max"` for the prior generation. Default audio output is PCM_S16LE at 16 kHz so the result feeds straight into the Patter pipeline without transcoding. Public API parity: - TS: `import { InworldTTS } from "getpatter"` / `getpatter/tts/inworld` - Py: `from getpatter import InworldTTS` / `getpatter.tts.inworld` - Env-var auto-resolve via `INWORLD_API_KEY` (paste the Base64 token from the Inworld dashboard — already in `Authorization: Basic <token>` form). - Optional knobs: `language` (BCP-47), `temperature` (TTS-1.5 only), `speakingRate` (0.5-1.5), `deliveryMode` (`EXPRESSIVE`/`BALANCED`/ `STABLE` — TTS-2 only), `bitrate`. Pricing entry `inworld` added to both pricing tables (placeholder $0.020/1k chars — verify against current platform tier). Optional dependency `getpatter[inworld]` adds `aiohttp>=3.10`. 7 mocked unit tests per SDK covering payload shape, NDJSON line interleave (`audio, timestamp, audio`), base64 audio decoding, optional field omission, env-var fallback, and non-200 error surfacing. New files: - libraries/typescript/src/providers/inworld-tts.ts - libraries/typescript/src/tts/inworld.ts - libraries/python/getpatter/providers/inworld_tts.py - libraries/python/getpatter/tts/inworld.py - libraries/{typescript,python}/{tests/unit/inworld-tts*.test.*,tests/unit/test_inworld_tts.py}
Adds seven optional async callbacks to every Patter instance plus a read-only
conversation_state snapshot, mirroring the public APIs of LiveKit Agents,
Pipecat and OpenAI Realtime so downstream metrics map onto the canonical
Hamming AI / Coval / Cekura voice-agent metric set without translation:
on_user_speech_started - raw VAD positive edge
on_user_speech_ended - raw VAD trailing edge (not EOU)
on_user_speech_eos - committed end-of-utterance (canonical "user
finished" — anchors eos_to_first_token_ms)
on_agent_speech_started - first wire-time chunk (what user hears)
on_agent_speech_ended - last wire chunk; payload includes interrupted
on_llm_token - TTFT marker, fires once per turn
on_audio_out - first TTS chunk per turn (warmup, distinct
from wire-time)
Each event also records an OpenTelemetry span event on the current call
span (patter.event.*), with gen_ai.* attributes for the LLM event per the
OTel GenAI semconv. OTel branch is a zero-cost no-op when the peer dep is
missing.
Wired into the realtime stream handler so the user/agent edge events fire
automatically on the OpenAI Realtime + Twilio/Telnyx path; LLM/TTS-warmup
events are exposed on the dispatcher for adapter/pipeline integrations.
Public API: SpeechEvents, SpeechEventCallback, ConversationStateSnapshot,
UserState, AgentState, EouTrigger.
Tests: 16 unit tests Py + 15 unit tests TS covering payload schema,
state transitions, idempotency, OTel attach contract, callback-exception
isolation, and Patter-level proxy mirroring.
Motivated by patter-agent-runner's 15 turn-taking acceptance verbs that
previously auto-skipped because the SDK did not surface per-side speech
edges.
…ng for speech-edge events Three Realtime mode fixes (Python + TypeScript parity) plus the host- binding / observability plumbing required to drive the speech-edge event suite from external test runners. Realtime: first_message role swap --------------------------------- Agent.first_message was sent through send_text / sendText, which submits a `conversation.item.create` with role: "user". The model received its own greeting AS user input and replied to it instead of speaking it verbatim. Symptoms ranged from harmless (model continued as assistant anyway) to catastrophic (a receptionist agent saying "Hi! I'd like to schedule a haircut for Friday afternoon" — the model swapped role because it interpreted the greeting as a customer cue). Fix: new OpenAIRealtimeAdapter.send_first_message / sendFirstMessage that calls `response.create` with explicit `instructions: <text>`. This is the documented OpenAI Realtime path for "have the assistant say exactly this", and it produces the correct role attribution in transcripts (previously assistant turns were missing entirely from onTranscript callbacks). StreamHandler calls the new method via duck-typed lookup so older adapter builds without it silently fall back to send_text — no breaking change for downstream provider implementations. Dashboard: 404 spam from notify_dashboard ----------------------------------------- Embedded usages where Patter co-tenants port 8000 with another HTTP server (notably the agent-to-agent test runner where driver SDK + UUT SDK + dashboard ingest target all share 127.0.0.1:8000) saw 404 access- log spam on every call from notify_dashboard / notifyDashboard fire-and-forget POSTs. Send-side already swallows errors silently; the noise comes from the receiver's access log. Added PATTER_DASHBOARD_NOTIFY=0|false|no|off env- var opt-out that skips the POST entirely. Default behaviour unchanged. Speech-edge event plumbing (server + telephony bridge + handler) --------------------------------------------------------------- The speech-edge events shipped in 0de4111 (on_user_speech_started/ ended/eos, on_agent_speech_started/ended, llm_token, audio_out) need to flow Patter → EmbeddedServer → twilio_stream_bridge → StreamHandler. The 0de4111 commit wired the StreamHandler init param but missed the 5 hops in between, so external observers attached via Patter() never saw any events. This commit threads `speech_events` through: - Patter.__init__ stores the SpeechEvents instance - EmbeddedServer.from_patter passes it down - twilio_stream_bridge accepts speech_events kwarg - OpenAIRealtimeStreamHandler accepts and forwards to super().__init__ Pipeline and ConvAI handlers still TODO (verbs auto-skip when events aren't emitted, so this is non-breaking). Twilio also added: PipelineStreamHandler._llm_cancel_event init in __init__ (was lazily created on first cancel; race-condition prone), and a try/except ProcessLookupError around tunnel.proc.terminate (the cloudflared subprocess can race-exit before SIGTERM lands). Audio binding host ------------------ PATTER_BIND_HOST env var (default 127.0.0.1) — when running inside a Docker container with --publish 8000:8000, a loopback bind is unreachable from the host's port-mapping. PATTER_BIND_HOST=0.0.0.0 makes the embedded FastAPI / Express bind on all interfaces so Docker can forward host:8000 → container:8000. CHANGELOG entries for first_message role swap and notify_dashboard opt-out included.
OpenAI Realtime cancel_response now caps audio_end_ms by wall-clock elapsed (was byte-counter), fixing post-barge-in re-greeting and mid-sentence resume. Files: providers/openai_realtime.py:434-460 + TS parity in providers/openai-realtime.ts. Pipeline mode now fires on_transcript for assistant turns AND tool calls (previously emitted only by Realtime). LLMLoop exposes on_tool_call observer wired by stream-handler via _record_tool_call; new _emit_assistant_transcript helper pushes history AND fires on_transcript with observer-exception isolation. Files: stream_handler.py / stream-handler.ts, services/llm_loop.py / llm-loop.ts. AssemblyAI STT (Python): coalesce 20ms Twilio frames to 60ms target (above v3 50ms minimum), achieving parity with the TS adapter. New flush_audio() drains tail on close. Files: providers/assemblyai_stt.py. Cerebras + Groq pricing — silent under-billing fix. gpt-oss-120b (Cerebras default since 0.5.4) and 5 Groq models all billed $0. Now per-1M-token rates for every CerebrasModel / GroqModel enum. Files: pricing.py / pricing.ts. TypeScript port of SpeechmaticsSTT — closes long-standing Python-only gap. RT v2 WebSocket protocol direct via ws (no upstream Node SDK). Same options as Python adapter; legacy speechmatics() helper now returns a real STTConfig instead of throwing. Files (new): providers/speechmatics-stt.ts, stt/speechmatics.ts. Pricing tables now model-aware across STT/TTS/Realtime (was provider-only). New _resolve_provider_rates helper with longest-prefix fallback; mergePricing nested-shallow overlay so single-model overrides leave siblings intact. Auto-threading via CallMetricsAccumulator from agent adapter model. Built-in rates for Deepgram, Whisper/Transcribe, ElevenLabs, OpenAI TTS, Cartesia, Rime, LMNT, Inworld, OpenAI Realtime per-model. PRICING_VERSION 2026.2 → 2026.3. Standalone openai_realtime_2 entry collapsed under openai_realtime.models["gpt-realtime-2"]. CircuitBreakerOptions.cooldown_s → cooldown_ms (Python parity with TS cooldownMs). Backward-compat shim emits DeprecationWarning. Files: tools/circuit_breaker.py, tools/tool_executor.py. OpenAI Realtime engine wrapper now forwards reasoning_effort and input_audio_transcription_model to the underlying adapter (were silently dropped). Files: engines/openai.py / engines/openai.ts, models.py, client.py, stream_handler.py, server.ts. CHANGELOG.md: full Unreleased entries for each fix. package-lock.json regenerated to match package.json (mcp/hono/ajv/ jose/zod additions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cerebras + Groq pricing — 5 cases each in test_pricing.py and pricing.test.ts: full enum coverage, default model billing, specdec-vs-versatile rate distinction, deprecating llama3.1-8b. AssemblyAI buffering — new test_assemblyai_stt_buffering.py and assemblyai-stt-buffering.test.ts (5 cases each): 10×20ms frame coalescing, 60ms target sanity for mulaw 8kHz and PCM s16le 16kHz, flush_audio tail drain, pre-connect silent drop, empty-chunk noop, exact-target single send. Pipeline on_tool_call observer — 3 mocked Python cases plus a new TS describe block in llm-loop.test.ts asserting one user turn → one tool call yields exactly three on_transcript events (call + result + assistant) in order. OpenAI Realtime cancel_response wallclock cap — regression simulating 2s generated / 30ms played, asserts audio_end_ms <= 200ms. Realtime engine wrapper forwarding — assert reasoning_effort and input_audio_transcription_model reach the adapter constructor when set, omitted from kwargs when unset. CircuitBreaker cooldown rename — 4 regression tests cover the DeprecationWarning, seconds-shim parity, explicit ms wins, defaults match TypeScript byte-for-byte (30_000ms, threshold 5). Speechmatics TS port — 21-test mocked suite covering connect handshake, StartRecognition payload, partial vs final translation, error frame propagation, EndOfStream close path with last_seq_no, env-var resolution. MCP client — new mocked unit suites (Python + TypeScript). Dashboard HTML test — assertion updated for React+Vite SPA shell (`<title>Patter | Dashboard</title>`) instead of legacy "Live calls dashboard" subtitle that no longer lives in the static HTML. Full suite: Python 1707 / TypeScript 1381 — green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 33 new provider reference pages — full parity between
docs/python-sdk/providers/ and docs/typescript-sdk/providers/ for:
anthropic, cartesia-tts, cerebras, deepfilternet-filter, deepgram,
elevenlabs-convai, elevenlabs-tts, gemini-live, google, groq, inworld,
krisp-filter, openai-realtime, openai-tts, silero-vad, soniox,
speechmatics, telnyx-stt, telnyx-tts, ultravox-realtime, whisper.
Adds new MCP integration page (Py + TS) covering server
configuration, tool exposure, and lifecycle.
Updates 22 existing docs/{python,typescript}-sdk pages for the Phase 3
SDK changes — agents, call-logging, configuration, dashboard, engines,
events, features, metrics, stt, tools — including the new pricing
model-aware fields, on_tool_call observer, Realtime engine wrapper
forwarding, and circuit-breaker cooldown_ms rename.
docs.json navigation updated for the new provider sections.
docs/dev-tools/dashboard.mdx refreshed for the React+Vite SPA shell.
All pages follow existing Mintlify conventions (CodeGroup, ParamField,
Note components).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Strip trailing blank lines on 4 files flagged by `end-of-file-fixer`
pre-commit hook: `dashboard-app/src/App.tsx`, `docs/python-sdk/events.mdx`,
`docs/typescript-sdk/events.mdx`, `libraries/typescript/src/audio/aec.ts`.
- `tests/unit/test_aec.py` now uses `pytest.importorskip("numpy")` instead
of a top-level `import numpy as np`. numpy ships only with the SDK
`[aec]` / `[audio-filters]` extras, so the base CI Python SDK Tests job
was failing collection with `ModuleNotFoundError: No module named 'numpy'`.
The `[all-extras]` job installs numpy and exercises these tests for real.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s, chunk_size The HTTP-streaming ElevenLabs facade had a narrower `__init__` signature than the underlying `providers.elevenlabs_tts.ElevenLabsTTS` provider — accepting only api_key/voice_id/model_id/output_format. Users who built a TTS via the public facade silently lost language-aware synthesis and could not pass voice_settings. Multilingual scenarios in the agent-to-agent acceptance suite (`feat_italian_language_live`) crashed downstream once the runner started forwarding `language_code` correctly to the public class. Python `libraries/python/getpatter/tts/elevenlabs.py:42-55`: extended __init__ with language_code (default None), voice_settings (default None), chunk_size (default 4096). Backward-compatible — existing constructors compile and run unchanged. TypeScript `libraries/typescript/src/tts/elevenlabs.ts:6-17,50-58`: added languageCode + voiceSettings to ElevenLabsTTSOptions, tightened fields with `readonly` to match project immutability rule, switched the super-class call to the options-object overload so the optional fields actually reach the underlying ElevenLabsTTS adapter (the positional overload was dropping them). 7 new regression tests in `libraries/python/tests/unit/test_tts_facade_ language.py` and `libraries/typescript/tests/tts-facade-language.test.ts` covering: language_code forwarding, voice_settings forwarding, default preservation, env-key resolution, explicit-key wins, missing-key error, and forTwilio() carrier factory regression. Both suites pass clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Unreleased block accumulated across the refactor wave (58 prior commits on this branch since 0.5.4 — Realtime fixes, pipeline observability, Mintlify docs parity, ElevenLabs facade fix, model-aware pricing, Cerebras/Groq billing fix, AssemblyAI buffering, MCP support, Speechmatics TS port). Adds an empty `## Unreleased` placeholder for post-0.6.0 work. Versions in libraries/python/getpatter/__init__.py, libraries/python/pyproject.toml, libraries/typescript/package.json were already bumped to 0.6.0 earlier in this branch — this commit only finalises the CHANGELOG date stamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `manageWebhook?: boolean` (default `true`) to `ServeOptions`. When set to `false`, `serve()` skips the call to `autoConfigureCarrier`, leaving the carrier's `voice_url` untouched. Closes a hidden footgun for users running the SDK behind a router/gateway whose Twilio webhook is managed externally (Terraform, an edge gateway, a voice-router function in front of the agent) — without this opt-out, every boot silently overwrites the externally-managed value, bypassing the gating layer. `tunnel: true` overrides the opt-out — the tunnel hostname is dynamic and only known at runtime, so the carrier MUST be reconfigured for inbound calls to land. Parity note: Python SDK has never auto-configured carriers, so this brings TS `manageWebhook: false` mode in line with default Python behaviour. No Python change required. Files: - libraries/typescript/src/types.ts — `ServeOptions.manageWebhook?` + full doc comment. - libraries/typescript/src/client.ts:507-518 — gated `autoConfigureCarrier` call on `wantsCarrierManagement = opts .manageWebhook !== false || wantsCloudflared`. - libraries/typescript/tests/unit/client.test.ts — 3 authentic tests under `serve() > manageWebhook opt-out` swapping globalThis.fetch to capture Twilio API URLs (no mock-on-mock). - docs/typescript-sdk/local-mode.mdx — added `manageWebhook` row to the ServeOptions table. - CHANGELOG.md — Added entry under 0.6.0 (2026-05-08) > Added. Backward compat: default `true` preserves existing behaviour byte-for-byte; no required field, no changed default. Authentic tests pass (TS suite 31/31 in client.test.ts file). tsc --noEmit clean. Docs/DEVLOG entry from upstream PR #84 intentionally not ported (development log style — not for the public docs). Original PR: #84 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Repository cleanup, internal restructure, and bug-fix wave. Most changes are internal hardening; the breaking surfaces are the package-tree reorganisation and the
Agent.providerfield type tightening — both flagged below with migration guidance.tsc --noEmitcleanImplementation
Cleanup
.claude/rules/no-competitor-references.mdcodifies the policy going forward..env.examplefiles grouped by role (telephony, LLM, STT, TTS).LICENSEupdated to "Copyright (c) 2026 Patter Contributors".examples/, roottests/,playwright.config.ts+tests/e2e/*(E2E lives in downstream test repo), unusedDockerfilebits.DEVLOGrule (docs/DEVLOG.mdwas never created; CHANGELOG and commit messages already cover the same ground).Provider enums
Every voice / LLM / STT / TTS provider that historically accepted hardcoded model / voice / format strings now exposes a typed enum (
StrEnum/IntEnumin Python, const-object + value-union in TypeScript). String form is still accepted for backward compatibility. Coverage: AssemblyAI, Cartesia (STT/TTS), Cerebras, Deepgram, ElevenLabs (HTTP/WS), Gemini Live, Google, Groq, LMNT, OpenAI (Realtime/TTS), Rime, Silero, Soniox, Speechmatics, Telnyx (STT/TTS), Ultravox, Whisper, Anthropic, Krisp.Error taxonomy
Add an
ErrorCodeenum with 10 stable values (CONFIG,CONNECTION,AUTH,TIMEOUT,RATE_LIMIT,WEBHOOK_VERIFICATION,INPUT_VALIDATION,PROVIDER_ERROR,PROVISION,INTERNAL) attached as a default.codeattribute on everyPatterErrorsubclass. Class-name catches keep working; the enum lets downstream code branch on a stable code instead of class-name strings. Re-exported from both package roots.Pricing enum
PricingUnitenum (MINUTE/THOUSAND_CHARS/TOKEN) replaces free-form unit strings on everyProviderPricingrow. The TypeScript server back-compat shim still accepts plain strings on the wire.Repo + internal layout
sdk-py/→libraries/python/,sdk-ts/→libraries/typescript/(mcp-use-style layout). Top-level user-facing imports are unchanged (pip install getpatter,npm install getpatter).Agent.providertyped as a closed enumProviderMode = "openai_realtime" | "elevenlabs_convai" | "pipeline"(TypeScript mirrors). Free-form strings now error.telephony/,audio/,tools/subfolders consistently across both SDKs. Thetts/andstt/namespaces stay (they exposegetpatter.{tts,stt}.<provider>.{TTS,STT}with env-var auto-resolve and are real public API).Bug-fix wave (16 fixes)
asyncio.EventPy /AbortControllerTS) — previously LLM kept generating tokens after user interrupt.llm-loopHTTP errors nowthrow PatterConnectionError— was silently returning empty stream.cache_read_tokens(wascache_read_input_tokens— silently zero attribution on OpenAI/Google).system_prompt(opt-out viadisable_phone_preamble).text_transforms.py: precompile 14 markdown regexes (was: per-call recompile).defineTool→tool(rename to match Python).event_buslistener errors now logged (were swallowed).metrics.agent_response_msaccepts 0.0 (was treated as missing via truthy check).metricsemitsllm/stt/tts_ttfb_msevents on the EventBus (Py parity).queryTelephonyCostcatch logs (was silent return).target_sample_rate=8000collapses 24k→16k+16k→8k chain (perf, opt-in).output_format=ulaw_8000when paired with Twilio.Parity ports (Python ← TypeScript)
DefaultToolExecutor— standalone tool dispatcher.LLMChunk— streaming LLM output chunk.builtin_clip_path— background-audio clip path resolver.select_sound_from_list— weighted random selection helper.resample_24k_to_16kone-shot stateless helper.Documentation sweep
Every public module, class, function, method, interface, and exported type in both SDKs now has at least a one-line description. Pre-existing docstrings were left untouched. ~75 Python docstrings + ~290 TypeScript JSDoc blocks added across 100+ files. No behaviour changes.
Breaking changes
Agent.provideris now a closed enum — passing arbitrary strings (e.g. typo"openai-realtime") errors. Migration: use one of"openai_realtime","elevenlabs_convai","pipeline".getpatter.handlers.*/getpatter.services.{transcoding,pcm_mixer,background_audio,tool_decorator,tool_executor}. Migration:getpatter.handlers.{twilio,telnyx}_handler→getpatter.telephony.{twilio,telnyx}getpatter.handlers.common→getpatter.telephony.commongetpatter.handlers.stream_handler→getpatter.stream_handler(top-level)getpatter.services.{transcoding,pcm_mixer,background_audio}→getpatter.audio.{...}getpatter.services.{tool_decorator,tool_executor}→getpatter.tools.{...}src/carriers/{twilio,telnyx}→src/telephony/{twilio,telnyx}; top-leveltranscoding.tsandtool-decorator.ts→audio/transcoding.tsandtools/tool-decorator.ts.DEVLOGworkflow rule is removed; CHANGELOG.md and commit messages cover documentation needs going forward.Public top-level imports (
from getpatter import .../import { Patter } from "getpatter") are unchanged.Test plan
pytest libraries/python/tests/— 1538 passed, 8 skippednpm test(vitest) — 1224 passed across 68 filesnpm run lint(tsc --noEmit) — cleanfeature inventory(patter-assets/patter_sdk_features.xlsx) updated with 17 new rowsDocs updates
CHANGELOG.md— full Unreleased sectionlibraries/{python,typescript}/CLAUDE.md— folder layout updated.claude/rules/README.md— DEVLOG rule entry removeddocs/(Mintlify) — to be aligned bydocs-syncagent post-merge