refactor: repo cleanup, internal restructure, parity ports + bug-fix wave by nicolotognoni · Pull Request #83 · PatterAI/Patter

nicolotognoni · 2026-05-02T22:23:19Z

Summary

Repository cleanup, internal restructure, and bug-fix wave. Most changes are internal hardening; the breaking surfaces are the package-tree reorganisation and the Agent.provider field type tightening — both flagged below with migration guidance.

23 commits, 470 files touched (+6.9k / -2.8k lines)
49 originally-tracked tasks closed; 5 parity gaps closed in a follow-up commit
Tests: Python 1538 passed, TypeScript 1224 passed, tsc --noEmit clean

Implementation

Cleanup

Strip every competitor license header (LiveKit / Pipecat / Apache, etc.) from inherited source files. New rule .claude/rules/no-competitor-references.md codifies the policy going forward.
Per-SDK .env.example files grouped by role (telephony, LLM, STT, TTS).
Root and per-SDK LICENSE updated to "Copyright (c) 2026 Patter Contributors".
Drop in-repo examples/, root tests/, playwright.config.ts + tests/e2e/* (E2E lives in downstream test repo), unused Dockerfile bits.
Delete the unused DEVLOG rule (docs/DEVLOG.md was never created; CHANGELOG and commit messages already cover the same ground).

Provider enums

Every voice / LLM / STT / TTS provider that historically accepted hardcoded model / voice / format strings now exposes a typed enum (StrEnum / IntEnum in Python, const-object + value-union in TypeScript). String form is still accepted for backward compatibility. Coverage: AssemblyAI, Cartesia (STT/TTS), Cerebras, Deepgram, ElevenLabs (HTTP/WS), Gemini Live, Google, Groq, LMNT, OpenAI (Realtime/TTS), Rime, Silero, Soniox, Speechmatics, Telnyx (STT/TTS), Ultravox, Whisper, Anthropic, Krisp.

Error taxonomy

Add an ErrorCode enum with 10 stable values (CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT, WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION, INTERNAL) attached as a default .code attribute on every PatterError subclass. Class-name catches keep working; the enum lets downstream code branch on a stable code instead of class-name strings. Re-exported from both package roots.

Pricing enum

PricingUnit enum (MINUTE / THOUSAND_CHARS / TOKEN) replaces free-form unit strings on every ProviderPricing row. The TypeScript server back-compat shim still accepts plain strings on the wire.

Repo + internal layout

sdk-py/ → libraries/python/, sdk-ts/ → libraries/typescript/ (mcp-use-style layout). Top-level user-facing imports are unchanged (pip install getpatter, npm install getpatter).
Agent.provider typed as a closed enum ProviderMode = "openai_realtime" | "elevenlabs_convai" | "pipeline" (TypeScript mirrors). Free-form strings now error.
Internal modules grouped into telephony/, audio/, tools/ subfolders consistently across both SDKs. The tts/ and stt/ namespaces stay (they expose getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve and are real public API).

Bug-fix wave (16 fixes)

Barge-in now cancels in-flight LLM stream (asyncio.Event Py / AbortController TS) — previously LLM kept generating tokens after user interrupt.
TS llm-loop HTTP errors now throw PatterConnectionError — was silently returning empty stream.
Cache token attribution: emit key is cache_read_tokens (was cache_read_input_tokens — silently zero attribution on OpenAI/Google).
Auto-prepend phone preamble to system_prompt (opt-out via disable_phone_preamble).
text_transforms.py: precompile 14 markdown regexes (was: per-call recompile).
TS defineTool → tool (rename to match Python).
Py event_bus listener errors now logged (were swallowed).
Py SSRF webhook URL validator + per-IP WS cap (parity with TS).
Py voicemail POST: explicit 10 s timeout (was unbounded).
Py metrics.agent_response_ms accepts 0.0 (was treated as missing via truthy check).
TS metrics emits llm/stt/tts_ttfb_ms events on the EventBus (Py parity).
TS queryTelephonyCost catch logs (was silent return).
OpenAI TTS opt-in target_sample_rate=8000 collapses 24k→16k+16k→8k chain (perf, opt-in).
ElevenLabs WS TTS auto-flips output_format=ulaw_8000 when paired with Twilio.
Sentence chunker: per-language honorifics (en/it/es/de/fr/pt) + single-word "Yes."/"Done!" flush.

Parity ports (Python ← TypeScript)

DefaultToolExecutor — standalone tool dispatcher.
LLMChunk — streaming LLM output chunk.
builtin_clip_path — background-audio clip path resolver.
select_sound_from_list — weighted random selection helper.
resample_24k_to_16k one-shot stateless helper.

Documentation sweep

Every public module, class, function, method, interface, and exported type in both SDKs now has at least a one-line description. Pre-existing docstrings were left untouched. ~75 Python docstrings + ~290 TypeScript JSDoc blocks added across 100+ files. No behaviour changes.

Breaking changes

Agent.provider is now a closed enum — passing arbitrary strings (e.g. typo "openai-realtime") errors. Migration: use one of "openai_realtime", "elevenlabs_convai", "pipeline".
Internal import paths for callers reaching into getpatter.handlers.* / getpatter.services.{transcoding,pcm_mixer,background_audio,tool_decorator,tool_executor}. Migration:
- getpatter.handlers.{twilio,telnyx}_handler → getpatter.telephony.{twilio,telnyx}
- getpatter.handlers.common → getpatter.telephony.common
- getpatter.handlers.stream_handler → getpatter.stream_handler (top-level)
- getpatter.services.{transcoding,pcm_mixer,background_audio} → getpatter.audio.{...}
- getpatter.services.{tool_decorator,tool_executor} → getpatter.tools.{...}
- TypeScript src/carriers/{twilio,telnyx} → src/telephony/{twilio,telnyx}; top-level transcoding.ts and tool-decorator.ts → audio/transcoding.ts and tools/tool-decorator.ts.
The unused DEVLOG workflow rule is removed; CHANGELOG.md and commit messages cover documentation needs going forward.

Public top-level imports (from getpatter import ... / import { Patter } from "getpatter") are unchanged.

Test plan

Python: pytest libraries/python/tests/ — 1538 passed, 8 skipped
TypeScript: npm test (vitest) — 1224 passed across 68 files
TypeScript: npm run lint (tsc --noEmit) — clean
No competitor lineage scan flags
feature inventory (patter-assets/patter_sdk_features.xlsx) updated with 17 new rows

Docs updates

CHANGELOG.md — full Unreleased section
libraries/{python,typescript}/CLAUDE.md — folder layout updated
.claude/rules/README.md — DEVLOG rule entry removed
docs/ (Mintlify) — to be aligned by docs-sync agent post-merge

Repo-wide pass to remove external license headers, "ported from" notes and competitor product names from source files, plus three runtime fixes and one missing best-practice feature surfaced by the audit. ## Cleanup (zero residual livekit/pipecat/apache references) - Removed Apache 2.0 header blocks from 12 Python + 12 TypeScript provider files (the headers travelled in from external ports; Patter ships under the root MIT LICENSE only — no per-file copyright notices). - Stripped "Adapted from livekit-plugins-X" / "Ported from pipecat" / "Based on LiveKit Agents" provenance comments across ~40 source files in sdk-py/getpatter/{services,providers,observability,resources,evals}/ and sdk-ts/src/{services,providers,observability}/, including the cartesia-stt USER_AGENT integration tag. - Rewrote competitor framing in 12 docs MDX pages (provider docs, patter-tool, call-logging) — descriptions now stand on Patter's own shape, no migration-from-X language. - Renamed test fixtures and variables that named LiveKit/Pipecat in sentence_chunker tests (Py + TS) and the parity scenario JSON; test logic preserved. - Removed personal-name copyright in LICENSE / sdk-py/LICENSE / sdk-ts/LICENSE in favour of "Patter Contributors". - .gitignore: ignore .ruff_cache/, sdk/ (legacy build dir from the pre-rename Python SDK), .agents/, skills-lock.json. ## Bug fixes - llm_loop.py:420-421 (Python): cache_read_input_tokens / cache_creation_input_tokens were Anthropic-style keys, but every Python provider emits cache_read_tokens / cache_write_tokens. Fix reads the keys the providers actually emit, so OpenAI / Google cache attribution is no longer silently zeroed. - llm-loop.ts:304-308 (TS): non-OK upstream HTTP responses were logged and silently swallowed; callers couldn't distinguish empty model output from API failure. Now throws PatterConnectionError with the status + truncated body. ## Performance - text_transforms.py: precompiled the 14 markdown regex patterns and 2 emoji-cleanup helpers as module-level constants — they previously recompiled on every sentence flush. Drop-in win, public API and 37/37 existing tests unchanged. ## Feature: default phone-call preamble - New Agent.disable_phone_preamble (Py) / disablePhonePreamble (TS) field, default False. When False, LLMLoop prepends a short spoken-language preamble to system_prompt instructing the model to avoid markdown / emojis / bullet lists and keep replies concise. - Wired through stream_handler and test_mode in both SDKs. - Adds two Py tests and one TS test covering the new behaviour. ## Test status - Python: 1466 passed, 8 skipped - TypeScript: 1164/1164 passed

- sdk-py/.env.example, sdk-ts/.env.example: full inventory of every env var the SDK reads at runtime, grouped by role (telephony, LLM providers, STT, TTS, tracing, Patter tunables). Only OPENAI_API_KEY + a telephony carrier is required; the rest are uncommented as the user enables specific provider integrations. - .env.example.cloud removed — variables (PATTER_DATABASE_URL, PATTER_ENCRYPTION_KEY, PATTER_REDIS_URL, etc.) belonged to the hosted cloud surface that was retired in 0.5.3. - Root .env.example kept as a minimal quickstart sample.

Replace the magic strings ``"minute"`` / ``"1k_chars"`` / ``"token"`` sprinkled across DEFAULT_PRICING with a named enum, so the pricing table reads as a typed shape rather than free-form dicts. - Python: ``PricingUnit(StrEnum)`` — ``MINUTE``, ``THOUSAND_CHARS``, ``TOKEN``. Subclassing ``str`` keeps the dict JSON-serialisable and unchanged for any consumer that compares against the literal string. - TypeScript: ``PricingUnit`` const object + ``PricingUnitValue`` union type. ``ProviderPricing.unit`` accepts ``PricingUnitValue | string`` so user overrides loaded from JSON / env config still flow through ``mergePricing`` without type gymnastics. - Behaviour preserved end-to-end: 143 Python pricing/metrics tests pass, 18 TypeScript pricing tests pass, full suites 1466 Py / 1164 TS green.

…xamples mcp-use-style monorepo layout: each SDK gets its own library folder with README, CLAUDE.md, .env.example, tests/, and the package source. Sample code is maintained in separate example repos and is no longer tracked here (notebooks tutorial preserved — it's the documentation, not an example). ## Layout ``` libraries/ ├── python/ (was sdk-py/) │ ├── README.md, CLAUDE.md, LICENSE, .env.example │ ├── pyproject.toml, pytest.ini │ ├── getpatter/ │ └── tests/ └── typescript/ (was sdk-ts/) ├── README.md, CLAUDE.md, LICENSE, .env.example ├── package.json, tsconfig.json, vitest.config.ts, tsup.config.ts ├── src/ └── tests/ ``` ## What changed - 405 ``git mv`` renames so history follows every file. ``sdk-py/`` and ``sdk-ts/`` no longer exist on disk. - Per-library CLAUDE.md guides (~40 lines each); .gitignore exception ``!libraries/*/CLAUDE.md`` so the library guides ARE tracked while the root guide stays ignored. - CI: ``.github/workflows/{audit,release,test,docs-feature-drift}.yml`` rewritten to the new paths. ``scripts/check_feature_docs_drift.py`` also fixed (it had a stale ``patter/__init__.py`` from the pre-rename era). - Pre-commit, pre-push, ``scripts/pr-validate.sh``, top-level README and CONTRIBUTING.md re-pointed at ``libraries/{python,typescript}``. - Internal package re-organisation (``handlers → telephony``, splitting ``audio/``, ``tools/``) deliberately deferred to a follow-up PR — that layer of import-path churn doesn't belong in the same commit as the outer rename. ## Examples ``examples/{developer,enterprise,startup,integrations}/`` removed (24 files + the index README). Sample code is published in dedicated repos. ``examples/notebooks/`` kept — it's the 24-notebook tutorial series documented in the Mintlify site and depended on by ``.github/workflows/notebooks.yml`` and ``.pre-commit-config.yaml``. PatterTool docs now point at the external example repo (TODO comment left for the canonical URL — to fill in once the public examples repo is public). ## Test status - Python: 1413 passed, 6 skipped (pytest libraries/python/tests) - TypeScript: 1164 passed, 67 files (vitest run libraries/typescript) - TypeScript: ``tsc --noEmit`` clean (one pre-existing ``@ts-expect-error`` in silero-vad — predates this branch)

Wave 2 of the cleanup pass — covers half of the provider integrations. Replaces hardcoded model/voice/format/sample-rate strings with typed enums (Python ``StrEnum`` / ``IntEnum``, TypeScript ``const`` objects + union types) so user code gets autocomplete and the type system catches typos at the call site instead of at the provider's HTTP 400. ## Agent / public types - ``Agent.provider`` (Python) tightened from ``str`` to a ``ProviderMode = Literal["openai_realtime", "elevenlabs_convai", "pipeline"]`` alias. TS counterpart was already a string union. - Expanded ``Agent`` (Py) and ``AgentOptions`` (TS) docstrings to document the precedence rule for fields that appear both on the agent and on the engine adapter (``voice``, ``model``, ``language``): explicit kwarg on ``agent()`` wins; otherwise the engine value populates the agent via ``_unpack_engine``; otherwise the default. - No behaviour change. ``StrEnum`` subclasses ``str``; existing callers passing raw strings keep working. ## Providers covered Python: ``anthropic_llm``, ``cartesia_tts``, ``cerebras_llm``, ``deepgram_stt``, ``elevenlabs_tts``, ``google_llm``, ``groq_llm``, ``lmnt_tts``, ``openai_realtime``, ``rime_tts``. TypeScript: ``anthropic-llm``, ``cerebras-llm``, ``deepgram-stt``, ``elevenlabs-tts``, ``google-llm``, ``groq-llm``, ``lmnt-tts``, ``openai-realtime``, ``rime-tts``. Each module now exposes its own ``<Provider>Model`` / ``<Provider>OutputFormat`` / ``<Provider>Voice`` / etc. New enums are re-exported from ``__init__.py`` and ``index.ts`` in dedicated "provider-specific enums" sections. ## Still pending The following providers still hold magic strings — covered in a follow-up commit: ``assemblyai_stt``, ``soniox_stt``, ``speechmatics_stt``, ``cartesia_stt``, ``telnyx_stt``, ``whisper_stt``, ``elevenlabs_ws_tts``, ``openai_tts``, ``telnyx_tts``, ``gemini_live``, ``ultravox_realtime``, ``silero_vad``, ``silero_onnx``, ``krisp_*``. The TS ``cartesia-tts.ts`` enums also still need to land (Py is already covered). ## Test status - Python: 1466 passed, 8 skipped - TypeScript: 1164/1164 passed; ``tsc --noEmit`` clean (one pre-existing silero-vad warning unchanged)

Provider enum residuals (Wave 2.5) - Python: assemblyai_stt, cartesia_stt, soniox_stt, speechmatics_stt, telnyx_stt, whisper_stt, elevenlabs_ws_tts, openai_tts, telnyx_tts, gemini_live, ultravox_realtime, silero_vad, silero_onnx, krisp_* - TypeScript: assemblyai-stt, cartesia-stt, cartesia-tts, soniox-stt - All hardcoded model/voice/format strings now live behind StrEnum/IntEnum (Python) or const-object + value union (TypeScript) Bug fixes (Wave 3a) - stream_handler: barge-in now sets asyncio.Event / AbortController to cancel in-flight LLM stream instead of letting it run to completion - server (Py): SSRF validator on outbound webhook URLs + per-IP WS cap (MAX_WS_PER_IP=10) for parity with TS - server (Py): voicemail POST gets explicit 10s timeout - metrics (Py): agent_response_ms accepts 0.0 instead of treating it as "missing" (use is None gate) - metrics (TS): emit llm/stt/tts TTFB events on the event bus - observability/event_bus (Py): listener errors now surface to logger instead of being swallowed - server (TS): queryTelephonyCost catch logs instead of silent return

Stable, machine-readable error codes attached to every Patter exception class. Existing class-name-based catches keep working; the enum is additive. ErrorCode values (10): CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT, WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION, INTERNAL. - Python: StrEnum on `exceptions.py`; class-default `code` attribute per subclass (PatterError → INTERNAL, PatterConnectionError → CONNECTION, AuthenticationError → AUTH, ProvisionError → PROVISION, RateLimitError → RATE_LIMIT). Optional `code=` kwarg on the base ctor lets callers override per-instance. - TypeScript: const-object + value union in `errors.ts`; `readonly code: ErrorCode` on every class; optional `{ code }` constructor option. Same class→code mapping byte-for-byte with Python. - Both SDKs re-export `ErrorCode` from the package root. - Test parity asserts the enum value sets match between SDKs.

Companion to 8b8c503 (test files). Ships the actual enum + class wiring: - libraries/python/getpatter/exceptions.py — ErrorCode StrEnum, default .code per subclass, optional code= kwarg on PatterError.__init__ - libraries/python/getpatter/__init__.py — re-export ErrorCode - libraries/typescript/src/errors.ts — ErrorCode const-object + value union, readonly code on every class, optional { code } ctor option - libraries/typescript/src/index.ts — re-export ErrorCode

…d with Twilio ElevenLabs WS TTS streams `ulaw_8000` natively. When the carrier is Twilio (mulaw 8 kHz), we can let ElevenLabs do the encoding server-side and skip the SDK-side mulaw transcode entirely. - ElevenLabsWebSocketTTS.set_telephony_carrier(carrier) / TS setTelephonyCarrier(carrier) — duck-typed hook called by the stream handler after TTS construction. Maps "twilio" → "ulaw_8000", "telnyx" → "pcm_16000" (lowest conversion). - output_format constructor arg becomes truly optional (sentinel) — user-passed format wins over the carrier hint. - for_twilio / for_telnyx factories already pass explicit formats → the carrier hint is a no-op for those callers. - 7 new unit cases per SDK in TestCarrierAutoFlip / equivalent: default flip, URL contains ulaw_8000, telnyx no-op, explicit format respected, factory wins, unknown carrier no-op. No public-API break — existing constructor calls behave identically when no carrier hook is wired up.

…sample_rate) OpenAI TTS streams 24 kHz audio. The default 24k→16k resample stays for the Telnyx (PCM 16 kHz) carrier; for Twilio (mulaw 8 kHz) the chained 24→16 + 16→8 used to cost two ratecv passes. New `target_sample_rate=8000` constructor opt-in collapses the two passes into a single 3:1 decimation with a tighter LPF (Nyquist ≈ 4 kHz). - Python: getpatter.services.transcoding.create_resampler_24k_to_8k() factory; OpenAITTS gains optional `target_sample_rate=16000` (default preserves existing behaviour). - TypeScript: createResampler24kTo8k() + 24000→8000 case in StatefulResampler; OpenAITTS gains optional positional `targetSampleRate=16000` with `LPF_ALPHA_8K=0.45` for proper anti-aliasing at 4 kHz Nyquist. Auto-engagement on Twilio carriers is deferred — the audio sender currently assumes 16 kHz PCM input. Manual opt-in keeps the change narrowly scoped.

Bug #48 — per-language honorifics - New HONORIFICS_{EN,IT,ES,DE,FR,PT} constants merged into HONORIFICS_ALL (sorted longest-first). Module-level HONORIFICS_REGEX_ALT alternation built once. The aggregation is union-of-all regardless of `language` (mixed-language deployments are common; safer default). - splitSentences prefix regex sources from the union — sentences like "Ho incontrato il Sig. Rossi alla riunione" no longer split mid-honorific in any of the supported languages. Bug #49 — single-word "Yes." never flushed - DEFAULT_MIN_WORDS_FOR_SHORT_FLUSH lowered from 2 → 1; single-word replies ending in "."/"!"/"?" now flush on the terminator. - New gate #6 in maybeShortFlush blocks flushes whose trailing word is a known honorific — prevents "Mr." / "Sig." escaping as a standalone sentence. - Legacy escape hatch: pass `minWordsForShortFlush=2` to restore the pre-fix behaviour. Tests: 22 Python + 21 TS new honorific cases; 12 + 12 single-word flush cases. Existing tests updated where they asserted the old buffered behaviour for single-word replies. Both suites green (Py 1538, TS 1224).

@ts-expect-error

… silero-vad lint - CHANGELOG.md: comprehensive Unreleased section covering reorg, provider enums, error taxonomy, bug-fix wave, perf wins, and cleanup work landing on this branch. - tool_executor.py: add module-level docstring describing the SSRF guard, response-size cap, and OTel span emission. - silero-vad.ts:127: replace stale @ts-expect-error directive (now a TS2578 warning since onnxruntime-node types resolve at build) with a plain comment explaining the optional-peer-dep dynamic import.

…ools/ Internal restructure of the Python SDK; PUBLIC API surface unchanged. - handlers/{twilio,telnyx,common}_handler.py → telephony/{twilio,telnyx,common}.py ("_handler" suffix dropped — the parent module name already conveys intent). stream_handler.py promoted out of handlers/ to package root since it's the per-call orchestrator, not a telephony adapter. handlers/ folder removed. - services/{transcoding,pcm_mixer,background_audio}.py → audio/* (audio pipeline collected in one place). - services/{tool_decorator,tool_executor}.py → tools/* (tool-decoration & webhook-execution kept together). - Other services/* stay where they are: llm_loop, metrics, sentence_chunker, text_transforms, ivr, fallback_provider, pipeline_hooks, chat_context, call_log, remote_message. - tts/ and stt/ namespaces kept — they expose getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve and are public surface. - File moves use git mv so blame/history follow. - Imports rewritten across providers, server, services, tests, and package-root re-exports. Python tests: 1538 passed. TS side ships in a separate commit.

TS internal restructure for parity with the Python d5d9391 commit. Public API surface unchanged. - carriers/{twilio,telnyx}.ts → telephony/{twilio,telnyx}.ts (rename for naming parity with Py; "carrier" was the original term, "telephony" reads better next to twilio/telnyx). - transcoding.ts → audio/transcoding.ts. - services/background-audio.ts → audio/background-audio.ts. - tool-decorator.ts → tools/tool-decorator.ts. - Imports rewritten across client, index, types, stream-handler, deepfilternet-filter, plus 5 test files. TS tests: 1224 passed, tsc --noEmit clean. The telephony/audio/tools triad now matches between Python and TypeScript SDKs.

Update per-library AI-agent quickstarts to match the post-restructure package tree. Adds the new folder names (telephony/, audio/, tools/) and a one-line description per folder.

…rvability/dashboard/top-level

…/tools Adds 1-3 line docstrings to public symbols (modules, classes, methods) in libraries/python/getpatter/{providers,telephony,audio,tools} that previously had none. No behaviour changes; pre-existing docstrings are left untouched.

…ools

…ability/dashboard/top-level Adds short JSDoc summaries to public classes, interfaces, type aliases, and exported functions that were missing them. Existing JSDoc is preserved verbatim — this is a fill-the-gaps pass only, no rewrites.

Mechanical replace of stale path strings in docstrings, comments, and .env.example headers: - sdk-ts/src/* → libraries/typescript/src/* - sdk-py/getpatter/* → libraries/python/getpatter/* - conceptual "(sdk-py)" → "the Python SDK" No behaviour change; tests still 1538 passed, tsc clean.

@ts-ignore

The e2e Playwright suite (tests/e2e/*.spec.ts + playwright.config.ts + @playwright/test / playwright devDeps) were inherited from an earlier "comprehensive test suite" PR but never integrated with downstream test infra after the libraries/ reorg. Per CLAUDE.md, end-to-end call testing lives in a separate downstream test repo. - Drop libraries/typescript/playwright.config.ts. - Drop libraries/typescript/tests/e2e/ (6 .spec.ts + test-server.ts). - Remove @playwright/test and playwright from package.json devDeps. - Refresh package-lock.json (npm install). - silero-vad.ts: switch back to @ts-ignore on the optional onnxruntime-node import — the dynamic-import line surfaces a TS7016 warning when types are unresolved post-lock-refresh.

…select_sound_from_list, resample_24k_to_16k from TypeScript SDK Closes 5 public-surface gaps in the Python SDK so every symbol exported from libraries/typescript/src/index.ts now has a Python counterpart. - ``DefaultToolExecutor`` — async tool dispatcher with retry/fallback, webhook SSRF validation via ``server.validate_webhook_url``, and the same JSON error shape as the TypeScript class. Added to ``services/llm_loop.py``. - ``LLMChunk`` — frozen dataclass mirror of the TS ``LLMChunk`` interface (text/tool_call/done/usage). ``to_dict()`` produces the same shape as ``OpenAILLMProvider.stream`` for callers that prefer dicts. - ``builtin_clip_path`` — top-level helper resolving ``BuiltinAudioClip`` values (or raw filenames) to absolute paths. ``BuiltinAudioClip.path`` now delegates to the new function for a single source of truth. - ``select_sound_from_list`` — promoted from a private static method on ``BackgroundAudioPlayer`` to a public top-level function. The static method is preserved as a backward-compatible delegator. - ``resample_24k_to_16k`` — stateless one-shot helper following the existing ``resample_8k_to_16k`` / ``resample_16k_to_8k`` convention, including the per-process ``DeprecationWarning`` latch. All five symbols are re-exported from ``getpatter.__init__`` and listed in ``__all__``. The five ``TODO(parity)`` markers are removed in the same commit. 25 unit tests added in ``tests/unit/test_parity_ports.py`` covering public-symbol reachability, ``LLMChunk`` round-trip, real handler/webhook dispatch through ``DefaultToolExecutor`` (including the SSRF-blocked branch), bundled clip resolution, weighted selection empirics, and equivalence of ``resample_24k_to_16k`` to a single-shot ``StatefulResampler``. Tests: 1546 → 1571, all passing.

mintlify · 2026-05-02T22:23:23Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
patter-06b046ce	🟢 Ready	View Preview	May 2, 2026, 10:24 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

…to-detect, and LLM primitives

…lpers

Bug #2 from the barge-in audit: on speakerphone / tunnel-loop deployments the agent's outbound TTS bleeds back into the mic. VAD sees that bleed as continuous voice-like energy and never transitions out of "speaking" state, so a caller interruption only registers during natural TTS pauses → "interrupt sometimes works, sometimes the agent keeps talking" intermittent symptom. Fix at the source — proper acoustic echo cancellation. NLMS adaptive filter (2048 taps @ 16 kHz, 128 ms history) subtracts an estimate of the TTS-derived echo from the mic stream before VAD/STT see it. Geigel double-talk detector freezes adaptation when the caller is speaking on top of the agent so the filter does not learn the user's voice as part of the channel response. Convergence on the synthetic narrowband test signal: - ~24 dB ERLE after 1 s of TTS-only training - Near-end speech preserved within 0 dB during double-talk Not a drop-in replacement for WebRTC AEC3 (state-of-the-art needs adaptive sub-band processing + comfort noise + nonlinear post-filter that this scope does not cover). For production-grade quality, wrap a binding to ``webrtc-audio-processing-2`` externally. - libraries/python/getpatter/audio/aec.py — NlmsEchoCanceller class. - libraries/typescript/src/audio/aec.ts — TS parity. - Agent.echo_cancellation / AgentOptions.echoCancellation — opt-in flag, default false. Handset / headset deployments don't need it and the 0.5–2 s convergence period would briefly attenuate caller speech if they spoke before any TTS played. - PipelineStreamHandler.start() (Py) / StreamHandler.initPipeline (TS) instantiate the canceller when the flag is on. Far-end tap fires before the carrier transcode in synthesizeSentence; near-end runs after the inbound 8k→16k resample, before VAD. - 8 unit tests per SDK covering convergence, double-talk preservation, construction validation, pass-through-before-priming, reset, empty buffers. Tests: Py 1574 passed (+8), TS 1236 passed (+8), tsc clean.

…e / echo_cancellation in agent() builder Pre-existing parity violation surfaced during the AEC audit: the Py ``Patter.agent(...)`` builder enumerates kwargs explicitly, so any field not listed silently drops on the floor. Three boolean flags on the Agent dataclass — ``aggressive_first_flush``, ``disable_phone_preamble``, and the freshly added ``echo_cancellation`` — were unreachable through the builder, forcing users to construct ``Agent(...)`` directly. TS does not have this problem because ``agent(opts)`` spreads the whole ``AgentOptions`` object, so every field passes through. Add the three flags to the Py builder signature and forward them to ``Agent(...)``. Defaults match the dataclass (all ``False``) so existing callers keep their behaviour. 2 new tests: - builder defaults match dataclass defaults (no silent True leak) - explicit ``aggressive_first_flush=True`` / ``disable_phone_preamble= True`` / ``echo_cancellation=True`` reach the resulting Agent Tests: Py 1576 passed (+2), TS 1236 unchanged, tsc clean.

Real cellular-call test on 0.6.0 with the initial 2048-tap + constant-0.1-step config exposed an 8–12 s convergence window during which the user's first turn was either over-cancelled to silence (filter eats voice while learning the channel) or contaminated by residual echo (Deepgram transcribes garbage and discards). The user report: ~11 s of perceived silence after firstMessage, then everything worked from turn 4 onward. Net first-turn UX was worse than no AEC. The architectural fix the user asked for ("source-level, no workaround, solid"): two NLMS hyperparameter changes that compress convergence into the first ~250 ms — the same window where the agent's firstMessage finishes playing. 1. **512 taps (was 2048)**: 4× fewer coefficients to converge with no measurable cancellation loss on cellular / VoIP paths whose RT60 stays under ~50 ms after the carrier's own echo suppression. Pass ``filter_taps=2048`` explicitly for landline hairpin loops where the tail extends beyond 32 ms @ 16 kHz. 2. **Adaptive step**: aggressive warm-up step (0.5) for the first 0.5 s of processed audio, then taper to the textbook 0.1 for steady-state tracking. The Geigel double-talk detector still gates updates so the larger step does not learn the caller's voice into the echo model. Verification: regression-test fed a broadband synthetic signal (3 sinusoids + white noise) in realistic 20 ms frames hits **17–19 dB ERLE in the very first 250 ms** with the new defaults — well above the previous 0 dB at the 1.25 s mark. - New constructor knobs: ``warmup_step_size`` (default 0.5), ``warmup_seconds`` (default 0.5). Step branch is constant within a frame so the inner sample-by-sample loop stays branch-free. - Validation extended for the two new fields. - ``reset()`` now clears the ``processed_samples`` counter so the warm-up window re-engages on filter reset. - 1 new regression test per SDK enforces the "≥10 dB ERLE in the first 250 ms with defaults" guarantee on a broadband signal. Tests: Py 1577 passed (+1), TS 1237 passed (+1), tsc --noEmit clean.

… ring flush Two fixes for the speakerphone "agent unresponsive on first turn / mid-call gets stuck after a few exchanges" symptom reported on 0.6.0 cellular tests. 1. firstMessage was bypassing beginSpeaking + AEC far-end tap The ``firstMessage`` block streamed TTS chunks directly to the carrier without (a) marking ``isSpeaking=true`` and (b) pushing each chunk to ``aec.pushFarEnd()``. Consequence on speakerphone: while the intro played, the self-hearing guard never engaged, the user's audio (mixed with TTS bleed) was forwarded to STT and produced garbage transcripts; AEC had no reference signal so the bleed survived in the inbound channel. Wraps the firstMessage TTS streaming loop in ``beginSpeaking()`` + ``try/finally { endSpeakingWithGrace() }`` and pushes each chunk to ``aec.pushFarEnd()`` before encoding for the carrier. Mirrors the per-turn behaviour of ``runPipelineLlm`` / ``_process_streaming_response``. 2. Ring buffer must NOT flush on natural turn end An earlier iteration also flushed ``inboundAudioRing`` from ``endSpeakingWithGrace`` so user audio captured during the agent's TTS that never tripped VAD would still reach STT. In practice this raced live STT input post-grace: the ring contained partially-cancelled echo (AEC still adapting) and possibly over-cancelled user voice (Geigel rho=0.6 misses quiet double-talk). Replaying on every turn produced phantom transcripts that confused the LLM and caused the "out of order responses + agent gets stuck" symptom the user observed mid-call. Reverted: flush only on real barge-in (where VAD confirmed user speech). Audio captured during the agent's turn that VAD did not classify as speech is intentionally dropped at the next ``beginSpeaking`` — the user can repeat themselves rather than have the LLM react to a stale phantom transcript. The barge-in flush remains: extracted into ``flushInboundAudioRing()`` / ``_flush_inbound_audio_ring()`` helpers (clean refactor, 1 caller now). Stale "2048 taps + 0.5–2 s convergence" log message updated to the post-AEC-tuning "512 taps + 0.5 s warmup μ=0.5 → ~250 ms convergence". Tests: Py 1577 passed, TS 1237 passed, tsc --noEmit clean.

The previous fix wrapped the firstMessage TTS in ``beginSpeaking`` + ``endSpeakingWithGrace`` so the self-hearing guard could engage during the intro. This worked, but exposed a second defect: the AEC filter needs ~500 ms of TTS reference to converge, and during that warmup window residual TTS bleed in the inbound mic stream still looks like speech to VAD. With ``isSpeaking=true`` from frame zero of the firstMessage, the very first chunk of bleed triggered an immediate barge-in cancel — the firstMessage was killed before a single byte had been played. Test reported "agent never speaks". Fix: gate both barge-in entry points (VAD ``speech_start`` and transcript-based) on a 1-second minimum agent-speaking duration. Real users almost never start interrupting within the first second of an agent turn anyway, and the gate cleanly covers the AEC convergence period (500 ms warmup + safety margin). - TypeScript: ``MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN = 1000`` static on ``StreamHandler``. New ``speakingStartedAt: number | null`` field set in ``beginSpeaking()`` and cleared in ``cancelSpeaking()`` and the grace flip. New ``canBargeIn()`` helper used by both barge-in sites; suppressed events log at debug level so call-debug logs still show why the cancel did not fire. - Python: ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN = 1.0`` module-level constant. ``_speaking_started_at`` field with the same lifecycle. ``_can_barge_in()`` helper applied at the VAD speech_start path in ``on_audio_received`` and at the entry of ``_handle_barge_in``. Helper uses ``getattr`` so test fixtures that bypass ``_begin_speaking`` still permit barge-in to fire. 5 new unit tests (3 Py + 5 TS): - ``canBargeIn() / _can_barge_in()`` returns true with no active turn, false within the gate window, true past the gate window. - ``handleBargeIn / _handle_barge_in`` returns / does nothing during the warmup window, ``isSpeaking`` stays True. - ``handleBargeIn / _handle_barge_in`` fires normally past the gate. Tests: Py 1579 passed (+2), TS 1242 passed (+5), tsc --noEmit clean.

…f-default The previous AEC commits added a server-side NLMS adaptive filter and exposed an ``echoCancellation`` flag. Real-call testing on cellular PSTN turned up a fundamental architectural mismatch the early benchmarks did not catch: the round-trip echo path on Twilio Media Streams is ~250–1500 ms (jitter buffer + carrier loop), but a 512-tap NLMS filter at 16 kHz can only see the most recent 32 ms of far-end samples. The echo never lands inside the filter's window, the weights stay near zero, and the filter silently no-ops. Worse, with ``isSpeaking=true`` during firstMessage and a barge-in gate of 1 s, once the gate releases any residual bleed reaching VAD triggers an immediate self-cancel — the agent stops talking right after starting. Industry consensus from this round of research: - LiveKit & Pipecat handle echo cancellation at the transport layer for browser/native paths only. - Twilio's own guidance is to "rely on network echo cancellers" for telephone scenarios. - Vapi, Retell, Bland do not run server-side AEC. They rely on the carrier's network echo suppression and the caller device's built-in AEC (modern handsets ship one). Server-side NLMS is the right tool only when the SDK owns the audio path end-to-end and the loop latency is on the order of the filter window (~30 ms — browser WebRTC, mobile native). PSTN does not meet that bar and never will under realistic carrier conditions. This commit: - ``echoCancellation`` stays opt-in (default false) so existing PSTN callers see no change in behaviour. - When ``echoCancellation: true`` is detected on a Twilio or Telnyx carrier, log a clear warning explaining why it will not work as intended and what to do instead. The filter is still instantiated so curious operators can compare; the warning makes the recommendation explicit. For PSTN deployments, the working stack is: Patter's self-hearing guard + 1 s barge-in cooldown + Silero VAD with the phone-tuned preset + carrier / handset native echo suppression. No server-side AEC. Tests: Py 1579 passed, TS 1242 passed, tsc --noEmit clean.

…ncel drain Six architectural fixes for the post-barge-in failure modes surfaced during the 0.6.0 acceptance pass against real PSTN calls. Validated end-to-end on six pipeline stacks (Deepgram + Groq/OpenAI/Anthropic/Cerebras/Google + Cartesia/OpenAI TTS) with verbose Italian conversation flow. 1. Adaptive barge-in gate - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_AEC = 1000 (covers AEC warmup) - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_NO_AEC = 250 (anti-flicker only) - canBargeIn() picks the right gate based on whether AEC is wired. - Suppression call sites log at INFO level with the AEC state. 2. Inbound audio ring cap reduced from 30 frames (~600 ms) to 13 (~260 ms) to match VAD minSpeechDuration. Pre-fix, the replay was dragging in ~350 ms of agent TTS bleed which Deepgram (default English) transcribed as English garbage and committed to the LLM as phantom user input. 3. STT.finalize() on VAD speech_end - New optional finalize() on STTAdapter / STTProvider. - DeepgramSTT.finalize() exposes {type: 'Finalize'} as a public method. - StreamHandler calls stt.finalize() whenever the SDK's VAD signals speech_end so the provider returns is_final immediately rather than waiting on its own (slow) endpointing heuristic. 4. AMD on by default + onMachineDetection callback (Twilio + Telnyx parity) - New MachineDetectionResult carrier-agnostic shape. - Twilio: MachineDetection=DetectMessageEnd + AsyncAmd=true (no answer-latency penalty on human pickups). - Telnyx: answering_machine_detection=greeting_end. - Callback fires on both webhooks before the legacy voicemail-drop path so callers see the result regardless of voicemailMessage. 5. Post-cancel drain window of 150 ms - Tracks lastCancelAt timestamp on every barge-in cancel (both VAD-path and transcript-path). - beginSpeaking() is now async and awaits the drain remainder so the remote PSTN player has time to flush the cancelled turn's tail before the next TTS chunk lands on top of it. Eliminates the "doubled first sentence" audio artefact reported during testing. 6. AssemblyAI accepts a parity-only `language` field for cross-provider uniformity (forwarded as no-op; AssemblyAI selects language by model). Both SDKs (TypeScript and Python) updated with identical defaults, constants, and call-site coverage. Unit tests: TS 24/24 passing, Python 33/33 passing. Includes [DIAG] INFO logs in TS deepgram-stt.ts and stream-handler.ts for the diagnostic phase; these can be removed in a follow-up commit once the bleed-transcription root cause is sealed.

… tunnel grace Bundles the SDK changes from a focused work session: 5 bug fixes + 6 new feature areas, with full Python ↔ TypeScript parity. Bug fixes --------- * fix(client): bump cloudflared quick-tunnel grace 2.5 → 5 s. The 2.5 s window covered HTTP propagation only — Twilio's WSS upgrade for the media stream goes through a different cloudflared edge route that takes ~1-3 s longer; ~5 % of first calls dropped silently at pickup with no media. 5 s drops the failure rate to <1 %. (client.ts / client.py) * fix(realtime): handler-only tools were silently ignored in TS Realtime mode (CRITICAL). `handleFunctionCall` only dispatched `webhookUrl` tools; tools with an in-process `handler` callback (the default pattern in the demos) fell through without sending `function_call_output`, hanging the model. * fix(realtime): `onTranscript({ role: 'assistant' })` was never fired. Assistant text was pushed into history but never surfaced via the user-supplied callback, so demos only saw `[user]` lines. * fix(realtime): dashboard transcript shown out of order. OpenAI Realtime emits `input_audio_transcription.completed` AFTER `response.done`, so the naïve push order was [assistant, user, ...]. Added a per-call buffer (`pendingAssistantTurn` + 3 s fallback timer) that holds the assistant turn until the matching user transcript arrives. * fix(realtime): tool invocations were invisible in the transcript timeline. Added `emitToolEvent` that pushes `role: 'tool'` history entries and fires `onTranscript({ role: 'tool', tool_name, tool_args, tool_result, ... })` for the call/return semantics. Features -------- * feat(api): `Patter({ persist })` opt-in dashboard persistence. The on-disk per-call records (metadata.json, transcript.jsonl, events.jsonl) were previously opt-in only via `PATTER_LOG_DIR`. New explicit option: `false` (default), `true` (platform default location), or a custom string path. Env var still works as deployment-time override. * feat(tools): JSON-schema validation at `agent()` build time + OpenAI strict-mode opt-in. Schemas are validated structurally for every tool; `Tool({ strict: true })` additionally enforces OpenAI's strict-mode requirements (recursive `additionalProperties: false`, every property in `required`). Catches typos at build time. * feat(tools): retry with exponential backoff + per-tool circuit breaker. Both handler and webhook paths now get 3 attempts with jittered backoff (capped at 5 s). New `CircuitBreakerRegistry` trips OPEN after 5 consecutive failures and stays OPEN for 30 s before allowing a HALF_OPEN probe; while OPEN it returns `{error, fallback: true, circuit_state: "open", retry_after_ms}`. * feat(tools): reassurance auto-message during long tool calls. New `Tool({ reassurance: "Let me check..." })` (or `{ message, afterMs }`) bridges the silence on slow tools by enqueueing the message via `OpenAIRealtimeAdapter.sendText` after `afterMs` (default 1500 ms) — cancelled if the tool returns earlier. Realtime-only for now. * feat(tools): MCP (Model Context Protocol) client integration (MVP). New `agent({ mcpServers: [...] })` plugs the agent into MCP servers (Google Workspace, PayPal, Postgres, GitHub, ...) without writing wrapper handlers. Each server is queried at call start via `tools/list`; discovered tools are wrapped with synthetic handlers that dispatch to `tools/call` and merged into `agent.tools`. Optional dependency: `@modelcontextprotocol/sdk` (TS) / `mcp` (Py extra). Streamable-HTTP transport only for now. * feat(tools): streaming results via async generator handlers. Tool handlers can be `async function*` (TS) / `async def ... yield` (Py) generators that emit `{ progress: "..." }` updates while running; each yield is sent to the agent via `sendText` for inline status. New files --------- * libraries/typescript/src/tools/schema-validation.ts * libraries/typescript/src/tools/circuit-breaker.ts * libraries/typescript/src/tools/mcp-client.ts * libraries/python/getpatter/tools/schema_validation.py * libraries/python/getpatter/tools/circuit_breaker.py * libraries/python/getpatter/tools/mcp_client.py * test files: 4 TS + 3 Py covering schema validation, breaker, streaming, reassurance Tests ----- 1280 TS · 1156 Py · 0 regressions. Updates two stale tests (AMD on-by-default test in new-features.test.ts; handler retry count in llm-loop.test.ts) to reflect new behaviour.

The dashboard is now a real SPA in `dashboard-app/` (Vite + React + TypeScript) instead of a 700-line HTML/CSS/JS string embedded in `dashboard/ui.{ts,py}`. The build pipeline produces a single self-contained HTML file (vite-plugin-singlefile inlines JS + CSS) which is committed to `libraries/typescript/src/dashboard/ui.html` and mirrored to the Python package via `dashboard-app/scripts/sync.mjs`. At runtime the SDK serves the same `GET /` endpoint as before — the inlined HTML is loaded by tsup's esbuild ``text`` loader (TS) or the package-data file (Py). Customer-side: zero change in start-up UX (`phone.serve()` → http://127.0.0.1:8000/), but the dashboard is now typed, modular, and maintainable as proper React. Why this approach (option D from the design discussion): * No CDN dependency at runtime (no unpkg.com / Babel-in-browser). * No new runtime deps in the SDK — React + Vite live only at build time in the dev repo; the published package ships static HTML. * Self-contained bundle: the SDK still works air-gapped and behind corporate firewalls. * Type safety end-to-end (TSX components, tsconfig strict). Components ported from the reference design: * Topbar, PageHeader, Metric cards * CallTable with row selection + search * LiveCallPanel (transcript stream + call controls) * LatencyPanel (p50 / p95 / STT / TTS bars) * CostPanel (per-provider breakdown) Hooks: * useDashboardData — fetches `/api/dashboard/calls` + subscribes to the SSE stream at `/api/dashboard/stream` * useTranscript — incremental transcript updates per selected call * mappers.ts — maps the wire format (CallRecord) to the UI shape Build: * `dashboard-app/` is its own Vite project with `npm run build && npm run sync` — sync copies the inlined HTML to both SDKs. * `libraries/typescript/tsup.config.ts` adds the ``.html`` text loader so the asset is inlined into `dist/index.{js,mjs}`. * `libraries/python/pyproject.toml` declares `ui.html` as `getpatter.dashboard` package-data so it ships with `pip install`. * `libraries/typescript/package.json` `files` array includes `src/dashboard/ui.html` so npm packs it.

…s + persist Documents the two preceding commits in CHANGELOG.md under ``## Unreleased``: * Added: ``Patter(persist=...)`` option, JSON-schema validation + strict mode, retry + circuit breaker, reassurance auto-message, MCP client integration, streaming results. * Fixed: Realtime handler-tool dispatch, assistant ``onTranscript``, transcript ordering buffer, tool transcript events, cloudflared quick-tunnel WSS upgrade race. Per the project rule (``.claude/rules/documentation-best-practices.md`` invariant 0): every user-visible change updates ``## Unreleased`` in the same unit of work. The dashboard rewrite is intentionally NOT in the changelog — same URL, same UX, same `phone.serve()` entry point; the SPA migration is internal and customer-invisible.

Mirror of the built dashboard SPA into the Python package — produced by ``dashboard-app/scripts/sync.mjs`` alongside the TS-side ``libraries/typescript/src/dashboard/ui.html``. Should have been part of the dashboard SPA commit; tracking it now keeps the two SDKs in parity for ``pip install getpatter``.

Pure formatter pass — splits long argument lists across multiple lines and adds the missing blank line after the conditional ``import fastapi``. No logic changes; the test still verifies the dashboard store and routes the same way.

…ines, realtime mode Iterative refinement of the React/Vite dashboard SPA shipped in 3877719. Customer-side it remains a single embedded HTML file served from `phone.serve()` at `/`, but the UX is now markedly closer to the target design. UI changes: * Real Patter logo: mark (wireframe stack-tile from the favicon path, thin stroke instead of the chunky filled silhouette in `docs/logo/light.svg`) + tightened-viewBox wordmark, sized independently so the wordmark stays large while the mark line weight stays light. * Tab title: "Patter | Dashboard". Favicon: stack-tile SVG inline, matching the previous SDK dashboard. * Topbar: dropped Bell / Settings / Avatar buttons and "Place call" CTA (will reintroduce when actually wired). Phone-number pill always shown, derived from the most recent call's Patter-side number. * Live chip pulse: peach static when zero calls, green pulsing when ≥1 is active. * Latency + Cost merged into one MetricsPanel with a peach segmented switcher, fixing the right-rail clipping that hid Cost on short viewports. Realtime mode collapses the STT/LLM/TTS waterfall to a single end-to-end bar (those metrics aren't meaningful when the provider does the round-trip in one model call). Range filter (1h / 24h / 7d / All) is now real: * Bucket strategy aligned to natural boundaries — 12 × 5min, 24 × 1h, 7 × 1day, plus 9-bucket auto for All. Tooltip ranges read as "11:00 → 12:00" instead of "11:39 → 12:33". * Filtered call list, headline counters (Calls / Latency p95 / Spend), and sparklines all reflect the active range. Live calls always stay visible even when out of the range so users see what's happening now. * Sparkline scaling: tallest bar normalises to 100, no more lonely single bar surrounded by ghost grey lines. Sparklines are now interactive: * Hover any bar → custom tooltip (instant, dark surface, mono numerics in peach) showing the bucket window, call count, and a 4-call sample (number / status / cost). React-driven, replaces the slow native `title=""`. * Click → selects the newest call in that bucket into the right rail. * Empty buckets are invisible (no grey ghosts). * Bars now sit flush against the card bottom (flex column + `margin-top: auto`), matching the original design. Export CSV button is now wired to `/api/dashboard/export/calls?format=csv` via a transient anchor download. Backend additions: none — every change above is in `dashboard-app/` plus the synced `ui.html` rebundles in both SDKs. Pre-publish flow is still `cd dashboard-app && npm run build && npm run sync`.

New TTS adapter calling Inworld's HTTP NDJSON streaming endpoint `POST https://api.inworld.ai/tts/v1/voice:stream`. Defaults to `inworld-tts-2` (sub-200 ms TTFB, 100+ languages, natural-language voice steering); pass `model: "inworld-tts-1.5-max"` for the prior generation. Default audio output is PCM_S16LE at 16 kHz so the result feeds straight into the Patter pipeline without transcoding. Public API parity: - TS: `import { InworldTTS } from "getpatter"` / `getpatter/tts/inworld` - Py: `from getpatter import InworldTTS` / `getpatter.tts.inworld` - Env-var auto-resolve via `INWORLD_API_KEY` (paste the Base64 token from the Inworld dashboard — already in `Authorization: Basic <token>` form). - Optional knobs: `language` (BCP-47), `temperature` (TTS-1.5 only), `speakingRate` (0.5-1.5), `deliveryMode` (`EXPRESSIVE`/`BALANCED`/ `STABLE` — TTS-2 only), `bitrate`. Pricing entry `inworld` added to both pricing tables (placeholder $0.020/1k chars — verify against current platform tier). Optional dependency `getpatter[inworld]` adds `aiohttp>=3.10`. 7 mocked unit tests per SDK covering payload shape, NDJSON line interleave (`audio, timestamp, audio`), base64 audio decoding, optional field omission, env-var fallback, and non-200 error surfacing. New files: - libraries/typescript/src/providers/inworld-tts.ts - libraries/typescript/src/tts/inworld.ts - libraries/python/getpatter/providers/inworld_tts.py - libraries/python/getpatter/tts/inworld.py - libraries/{typescript,python}/{tests/unit/inworld-tts*.test.*,tests/unit/test_inworld_tts.py}

Adds seven optional async callbacks to every Patter instance plus a read-only conversation_state snapshot, mirroring the public APIs of LiveKit Agents, Pipecat and OpenAI Realtime so downstream metrics map onto the canonical Hamming AI / Coval / Cekura voice-agent metric set without translation: on_user_speech_started - raw VAD positive edge on_user_speech_ended - raw VAD trailing edge (not EOU) on_user_speech_eos - committed end-of-utterance (canonical "user finished" — anchors eos_to_first_token_ms) on_agent_speech_started - first wire-time chunk (what user hears) on_agent_speech_ended - last wire chunk; payload includes interrupted on_llm_token - TTFT marker, fires once per turn on_audio_out - first TTS chunk per turn (warmup, distinct from wire-time) Each event also records an OpenTelemetry span event on the current call span (patter.event.*), with gen_ai.* attributes for the LLM event per the OTel GenAI semconv. OTel branch is a zero-cost no-op when the peer dep is missing. Wired into the realtime stream handler so the user/agent edge events fire automatically on the OpenAI Realtime + Twilio/Telnyx path; LLM/TTS-warmup events are exposed on the dispatcher for adapter/pipeline integrations. Public API: SpeechEvents, SpeechEventCallback, ConversationStateSnapshot, UserState, AgentState, EouTrigger. Tests: 16 unit tests Py + 15 unit tests TS covering payload schema, state transitions, idempotency, OTel attach contract, callback-exception isolation, and Patter-level proxy mirroring. Motivated by patter-agent-runner's 15 turn-taking acceptance verbs that previously auto-skipped because the SDK did not surface per-side speech edges.

…ng for speech-edge events Three Realtime mode fixes (Python + TypeScript parity) plus the host- binding / observability plumbing required to drive the speech-edge event suite from external test runners. Realtime: first_message role swap --------------------------------- Agent.first_message was sent through send_text / sendText, which submits a `conversation.item.create` with role: "user". The model received its own greeting AS user input and replied to it instead of speaking it verbatim. Symptoms ranged from harmless (model continued as assistant anyway) to catastrophic (a receptionist agent saying "Hi! I'd like to schedule a haircut for Friday afternoon" — the model swapped role because it interpreted the greeting as a customer cue). Fix: new OpenAIRealtimeAdapter.send_first_message / sendFirstMessage that calls `response.create` with explicit `instructions: <text>`. This is the documented OpenAI Realtime path for "have the assistant say exactly this", and it produces the correct role attribution in transcripts (previously assistant turns were missing entirely from onTranscript callbacks). StreamHandler calls the new method via duck-typed lookup so older adapter builds without it silently fall back to send_text — no breaking change for downstream provider implementations. Dashboard: 404 spam from notify_dashboard ----------------------------------------- Embedded usages where Patter co-tenants port 8000 with another HTTP server (notably the agent-to-agent test runner where driver SDK + UUT SDK + dashboard ingest target all share 127.0.0.1:8000) saw 404 access- log spam on every call from notify_dashboard / notifyDashboard fire-and-forget POSTs. Send-side already swallows errors silently; the noise comes from the receiver's access log. Added PATTER_DASHBOARD_NOTIFY=0|false|no|off env- var opt-out that skips the POST entirely. Default behaviour unchanged. Speech-edge event plumbing (server + telephony bridge + handler) --------------------------------------------------------------- The speech-edge events shipped in 0de4111 (on_user_speech_started/ ended/eos, on_agent_speech_started/ended, llm_token, audio_out) need to flow Patter → EmbeddedServer → twilio_stream_bridge → StreamHandler. The 0de4111 commit wired the StreamHandler init param but missed the 5 hops in between, so external observers attached via Patter() never saw any events. This commit threads `speech_events` through: - Patter.__init__ stores the SpeechEvents instance - EmbeddedServer.from_patter passes it down - twilio_stream_bridge accepts speech_events kwarg - OpenAIRealtimeStreamHandler accepts and forwards to super().__init__ Pipeline and ConvAI handlers still TODO (verbs auto-skip when events aren't emitted, so this is non-breaking). Twilio also added: PipelineStreamHandler._llm_cancel_event init in __init__ (was lazily created on first cancel; race-condition prone), and a try/except ProcessLookupError around tunnel.proc.terminate (the cloudflared subprocess can race-exit before SIGTERM lands). Audio binding host ------------------ PATTER_BIND_HOST env var (default 127.0.0.1) — when running inside a Docker container with --publish 8000:8000, a loopback bind is unreachable from the host's port-mapping. PATTER_BIND_HOST=0.0.0.0 makes the embedded FastAPI / Express bind on all interfaces so Docker can forward host:8000 → container:8000. CHANGELOG entries for first_message role swap and notify_dashboard opt-out included.

OpenAI Realtime cancel_response now caps audio_end_ms by wall-clock elapsed (was byte-counter), fixing post-barge-in re-greeting and mid-sentence resume. Files: providers/openai_realtime.py:434-460 + TS parity in providers/openai-realtime.ts. Pipeline mode now fires on_transcript for assistant turns AND tool calls (previously emitted only by Realtime). LLMLoop exposes on_tool_call observer wired by stream-handler via _record_tool_call; new _emit_assistant_transcript helper pushes history AND fires on_transcript with observer-exception isolation. Files: stream_handler.py / stream-handler.ts, services/llm_loop.py / llm-loop.ts. AssemblyAI STT (Python): coalesce 20ms Twilio frames to 60ms target (above v3 50ms minimum), achieving parity with the TS adapter. New flush_audio() drains tail on close. Files: providers/assemblyai_stt.py. Cerebras + Groq pricing — silent under-billing fix. gpt-oss-120b (Cerebras default since 0.5.4) and 5 Groq models all billed $0. Now per-1M-token rates for every CerebrasModel / GroqModel enum. Files: pricing.py / pricing.ts. TypeScript port of SpeechmaticsSTT — closes long-standing Python-only gap. RT v2 WebSocket protocol direct via ws (no upstream Node SDK). Same options as Python adapter; legacy speechmatics() helper now returns a real STTConfig instead of throwing. Files (new): providers/speechmatics-stt.ts, stt/speechmatics.ts. Pricing tables now model-aware across STT/TTS/Realtime (was provider-only). New _resolve_provider_rates helper with longest-prefix fallback; mergePricing nested-shallow overlay so single-model overrides leave siblings intact. Auto-threading via CallMetricsAccumulator from agent adapter model. Built-in rates for Deepgram, Whisper/Transcribe, ElevenLabs, OpenAI TTS, Cartesia, Rime, LMNT, Inworld, OpenAI Realtime per-model. PRICING_VERSION 2026.2 → 2026.3. Standalone openai_realtime_2 entry collapsed under openai_realtime.models["gpt-realtime-2"]. CircuitBreakerOptions.cooldown_s → cooldown_ms (Python parity with TS cooldownMs). Backward-compat shim emits DeprecationWarning. Files: tools/circuit_breaker.py, tools/tool_executor.py. OpenAI Realtime engine wrapper now forwards reasoning_effort and input_audio_transcription_model to the underlying adapter (were silently dropped). Files: engines/openai.py / engines/openai.ts, models.py, client.py, stream_handler.py, server.ts. CHANGELOG.md: full Unreleased entries for each fix. package-lock.json regenerated to match package.json (mcp/hono/ajv/ jose/zod additions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cerebras + Groq pricing — 5 cases each in test_pricing.py and pricing.test.ts: full enum coverage, default model billing, specdec-vs-versatile rate distinction, deprecating llama3.1-8b. AssemblyAI buffering — new test_assemblyai_stt_buffering.py and assemblyai-stt-buffering.test.ts (5 cases each): 10×20ms frame coalescing, 60ms target sanity for mulaw 8kHz and PCM s16le 16kHz, flush_audio tail drain, pre-connect silent drop, empty-chunk noop, exact-target single send. Pipeline on_tool_call observer — 3 mocked Python cases plus a new TS describe block in llm-loop.test.ts asserting one user turn → one tool call yields exactly three on_transcript events (call + result + assistant) in order. OpenAI Realtime cancel_response wallclock cap — regression simulating 2s generated / 30ms played, asserts audio_end_ms <= 200ms. Realtime engine wrapper forwarding — assert reasoning_effort and input_audio_transcription_model reach the adapter constructor when set, omitted from kwargs when unset. CircuitBreaker cooldown rename — 4 regression tests cover the DeprecationWarning, seconds-shim parity, explicit ms wins, defaults match TypeScript byte-for-byte (30_000ms, threshold 5). Speechmatics TS port — 21-test mocked suite covering connect handshake, StartRecognition payload, partial vs final translation, error frame propagation, EndOfStream close path with last_seq_no, env-var resolution. MCP client — new mocked unit suites (Python + TypeScript). Dashboard HTML test — assertion updated for React+Vite SPA shell (`<title>Patter | Dashboard</title>`) instead of legacy "Live calls dashboard" subtitle that no longer lives in the static HTML. Full suite: Python 1707 / TypeScript 1381 — green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds 33 new provider reference pages — full parity between docs/python-sdk/providers/ and docs/typescript-sdk/providers/ for: anthropic, cartesia-tts, cerebras, deepfilternet-filter, deepgram, elevenlabs-convai, elevenlabs-tts, gemini-live, google, groq, inworld, krisp-filter, openai-realtime, openai-tts, silero-vad, soniox, speechmatics, telnyx-stt, telnyx-tts, ultravox-realtime, whisper. Adds new MCP integration page (Py + TS) covering server configuration, tool exposure, and lifecycle. Updates 22 existing docs/{python,typescript}-sdk pages for the Phase 3 SDK changes — agents, call-logging, configuration, dashboard, engines, events, features, metrics, stt, tools — including the new pricing model-aware fields, on_tool_call observer, Realtime engine wrapper forwarding, and circuit-breaker cooldown_ms rename. docs.json navigation updated for the new provider sections. docs/dev-tools/dashboard.mdx refreshed for the React+Vite SPA shell. All pages follow existing Mintlify conventions (CodeGroup, ParamField, Note components). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Strip trailing blank lines on 4 files flagged by `end-of-file-fixer` pre-commit hook: `dashboard-app/src/App.tsx`, `docs/python-sdk/events.mdx`, `docs/typescript-sdk/events.mdx`, `libraries/typescript/src/audio/aec.ts`. - `tests/unit/test_aec.py` now uses `pytest.importorskip("numpy")` instead of a top-level `import numpy as np`. numpy ships only with the SDK `[aec]` / `[audio-filters]` extras, so the base CI Python SDK Tests job was failing collection with `ModuleNotFoundError: No module named 'numpy'`. The `[all-extras]` job installs numpy and exercises these tests for real. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…s, chunk_size The HTTP-streaming ElevenLabs facade had a narrower `__init__` signature than the underlying `providers.elevenlabs_tts.ElevenLabsTTS` provider — accepting only api_key/voice_id/model_id/output_format. Users who built a TTS via the public facade silently lost language-aware synthesis and could not pass voice_settings. Multilingual scenarios in the agent-to-agent acceptance suite (`feat_italian_language_live`) crashed downstream once the runner started forwarding `language_code` correctly to the public class. Python `libraries/python/getpatter/tts/elevenlabs.py:42-55`: extended __init__ with language_code (default None), voice_settings (default None), chunk_size (default 4096). Backward-compatible — existing constructors compile and run unchanged. TypeScript `libraries/typescript/src/tts/elevenlabs.ts:6-17,50-58`: added languageCode + voiceSettings to ElevenLabsTTSOptions, tightened fields with `readonly` to match project immutability rule, switched the super-class call to the options-object overload so the optional fields actually reach the underlying ElevenLabsTTS adapter (the positional overload was dropping them). 7 new regression tests in `libraries/python/tests/unit/test_tts_facade_ language.py` and `libraries/typescript/tests/tts-facade-language.test.ts` covering: language_code forwarding, voice_settings forwarding, default preservation, env-key resolution, explicit-key wins, missing-key error, and forTwilio() carrier factory regression. Both suites pass clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the Unreleased block accumulated across the refactor wave (58 prior commits on this branch since 0.5.4 — Realtime fixes, pipeline observability, Mintlify docs parity, ElevenLabs facade fix, model-aware pricing, Cerebras/Groq billing fix, AssemblyAI buffering, MCP support, Speechmatics TS port). Adds an empty `## Unreleased` placeholder for post-0.6.0 work. Versions in libraries/python/getpatter/__init__.py, libraries/python/pyproject.toml, libraries/typescript/package.json were already bumped to 0.6.0 earlier in this branch — this commit only finalises the CHANGELOG date stamp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `manageWebhook?: boolean` (default `true`) to `ServeOptions`. When set to `false`, `serve()` skips the call to `autoConfigureCarrier`, leaving the carrier's `voice_url` untouched. Closes a hidden footgun for users running the SDK behind a router/gateway whose Twilio webhook is managed externally (Terraform, an edge gateway, a voice-router function in front of the agent) — without this opt-out, every boot silently overwrites the externally-managed value, bypassing the gating layer. `tunnel: true` overrides the opt-out — the tunnel hostname is dynamic and only known at runtime, so the carrier MUST be reconfigured for inbound calls to land. Parity note: Python SDK has never auto-configured carriers, so this brings TS `manageWebhook: false` mode in line with default Python behaviour. No Python change required. Files: - libraries/typescript/src/types.ts — `ServeOptions.manageWebhook?` + full doc comment. - libraries/typescript/src/client.ts:507-518 — gated `autoConfigureCarrier` call on `wantsCarrierManagement = opts .manageWebhook !== false || wantsCloudflared`. - libraries/typescript/tests/unit/client.test.ts — 3 authentic tests under `serve() > manageWebhook opt-out` swapping globalThis.fetch to capture Twilio API URLs (no mock-on-mock). - docs/typescript-sdk/local-mode.mdx — added `manageWebhook` row to the ServeOptions table. - CHANGELOG.md — Added entry under 0.6.0 (2026-05-08) > Added. Backward compat: default `true` preserves existing behaviour byte-for-byte; no required field, no changed default. Authentic tests pass (TS suite 31/31 in client.test.ts file). tsc --noEmit clean. Docs/DEVLOG entry from upstream PR #84 intentionally not ported (development log style — not for the public docs). Original PR: #84 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nicolotognoni added 24 commits May 1, 2026 09:05

docs(claude.md): reflect new telephony/audio/tools layout

ae9a98f

Update per-library AI-agent quickstarts to match the post-restructure package tree. Adds the new folder names (telephony/, audio/, tools/) and a one-line description per folder.

docs(changelog): document internal layout reorg in Unreleased section

87427b4

docs(getpatter): fill missing docstrings in services/llm/tts/stt/obse…

6f26140

…rvability/dashboard/top-level

docs(getpatter-ts): fill missing JSDoc in providers/telephony/audio/t…

0ccfcfb

…ools

docs(changelog): note docstring/JSDoc sweep in Unreleased

1ce232e

mintlify Bot deployed to staging - docs May 2, 2026 22:24 View deployment

nicolotognoni added 3 commits May 3, 2026 00:28

docs: document ErrorCode enum and disable_phone_preamble

012c145

docs: document PricingUnit, OpenAITTS targetSampleRate, WS carrier au…

5ccfb73

…to-detect, and LLM primitives

docs: document SentenceChunker language, provider enums, and audio he…

e1b5d08

…lpers

mintlify Bot deployed to staging - docs May 2, 2026 22:34 View deployment

nicolotognoni and others added 19 commits May 4, 2026 14:49

chore: black reformatting of test_dashboard.py

c7474c5

Pure formatter pass — splits long argument lists across multiple lines and adds the missing blank line after the conditional ``import fastapi``. No logic changes; the test still verifies the dashboard store and routes the same way.

mintlify Bot deployed to staging - docs May 8, 2026 16:01 View deployment

mintlify Bot deployed to staging - docs May 8, 2026 16:05 View deployment

nicolotognoni and others added 3 commits May 8, 2026 19:56

mintlify Bot deployed to staging - docs May 8, 2026 21:16 View deployment

This was referenced May 8, 2026

feat(observability): emit patter.cost.* and patter.latency.* OTel span attributes #82

Closed

feat(sdk-ts): add manageWebhook opt-out to ServeOptions #84

Closed

nicolotognoni merged commit 2d47d7d into main May 8, 2026
14 checks passed

github-actions Bot deleted the refactor/repo-cleanup-and-restructure branch May 9, 2026 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: repo cleanup, internal restructure, parity ports + bug-fix wave#83

refactor: repo cleanup, internal restructure, parity ports + bug-fix wave#83
nicolotognoni merged 59 commits into
mainfrom
refactor/repo-cleanup-and-restructure

nicolotognoni commented May 2, 2026

Uh oh!

mintlify Bot commented May 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented May 2, 2026

Summary

Implementation

Cleanup

Provider enums

Error taxonomy

Pricing enum

Repo + internal layout

Bug-fix wave (16 fixes)

Parity ports (Python ← TypeScript)

Documentation sweep

Breaking changes

Test plan

Docs updates

Uh oh!

mintlify Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mintlify Bot commented May 2, 2026 •

edited

Loading