Skip to content

refactor: repo cleanup, internal restructure, parity ports + bug-fix wave#83

Merged
nicolotognoni merged 59 commits into
mainfrom
refactor/repo-cleanup-and-restructure
May 8, 2026
Merged

refactor: repo cleanup, internal restructure, parity ports + bug-fix wave#83
nicolotognoni merged 59 commits into
mainfrom
refactor/repo-cleanup-and-restructure

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

Repository cleanup, internal restructure, and bug-fix wave. Most changes are internal hardening; the breaking surfaces are the package-tree reorganisation and the Agent.provider field type tightening — both flagged below with migration guidance.

  • 23 commits, 470 files touched (+6.9k / -2.8k lines)
  • 49 originally-tracked tasks closed; 5 parity gaps closed in a follow-up commit
  • Tests: Python 1538 passed, TypeScript 1224 passed, tsc --noEmit clean

Implementation

Cleanup

  • Strip every competitor license header (LiveKit / Pipecat / Apache, etc.) from inherited source files. New rule .claude/rules/no-competitor-references.md codifies the policy going forward.
  • Per-SDK .env.example files grouped by role (telephony, LLM, STT, TTS).
  • Root and per-SDK LICENSE updated to "Copyright (c) 2026 Patter Contributors".
  • Drop in-repo examples/, root tests/, playwright.config.ts + tests/e2e/* (E2E lives in downstream test repo), unused Dockerfile bits.
  • Delete the unused DEVLOG rule (docs/DEVLOG.md was never created; CHANGELOG and commit messages already cover the same ground).

Provider enums

Every voice / LLM / STT / TTS provider that historically accepted hardcoded model / voice / format strings now exposes a typed enum (StrEnum / IntEnum in Python, const-object + value-union in TypeScript). String form is still accepted for backward compatibility. Coverage: AssemblyAI, Cartesia (STT/TTS), Cerebras, Deepgram, ElevenLabs (HTTP/WS), Gemini Live, Google, Groq, LMNT, OpenAI (Realtime/TTS), Rime, Silero, Soniox, Speechmatics, Telnyx (STT/TTS), Ultravox, Whisper, Anthropic, Krisp.

Error taxonomy

Add an ErrorCode enum with 10 stable values (CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT, WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION, INTERNAL) attached as a default .code attribute on every PatterError subclass. Class-name catches keep working; the enum lets downstream code branch on a stable code instead of class-name strings. Re-exported from both package roots.

Pricing enum

PricingUnit enum (MINUTE / THOUSAND_CHARS / TOKEN) replaces free-form unit strings on every ProviderPricing row. The TypeScript server back-compat shim still accepts plain strings on the wire.

Repo + internal layout

  • sdk-py/libraries/python/, sdk-ts/libraries/typescript/ (mcp-use-style layout). Top-level user-facing imports are unchanged (pip install getpatter, npm install getpatter).
  • Agent.provider typed as a closed enum ProviderMode = "openai_realtime" | "elevenlabs_convai" | "pipeline" (TypeScript mirrors). Free-form strings now error.
  • Internal modules grouped into telephony/, audio/, tools/ subfolders consistently across both SDKs. The tts/ and stt/ namespaces stay (they expose getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve and are real public API).

Bug-fix wave (16 fixes)

  • Barge-in now cancels in-flight LLM stream (asyncio.Event Py / AbortController TS) — previously LLM kept generating tokens after user interrupt.
  • TS llm-loop HTTP errors now throw PatterConnectionError — was silently returning empty stream.
  • Cache token attribution: emit key is cache_read_tokens (was cache_read_input_tokens — silently zero attribution on OpenAI/Google).
  • Auto-prepend phone preamble to system_prompt (opt-out via disable_phone_preamble).
  • text_transforms.py: precompile 14 markdown regexes (was: per-call recompile).
  • TS defineTooltool (rename to match Python).
  • Py event_bus listener errors now logged (were swallowed).
  • Py SSRF webhook URL validator + per-IP WS cap (parity with TS).
  • Py voicemail POST: explicit 10 s timeout (was unbounded).
  • Py metrics.agent_response_ms accepts 0.0 (was treated as missing via truthy check).
  • TS metrics emits llm/stt/tts_ttfb_ms events on the EventBus (Py parity).
  • TS queryTelephonyCost catch logs (was silent return).
  • OpenAI TTS opt-in target_sample_rate=8000 collapses 24k→16k+16k→8k chain (perf, opt-in).
  • ElevenLabs WS TTS auto-flips output_format=ulaw_8000 when paired with Twilio.
  • Sentence chunker: per-language honorifics (en/it/es/de/fr/pt) + single-word "Yes."/"Done!" flush.

Parity ports (Python ← TypeScript)

  • DefaultToolExecutor — standalone tool dispatcher.
  • LLMChunk — streaming LLM output chunk.
  • builtin_clip_path — background-audio clip path resolver.
  • select_sound_from_list — weighted random selection helper.
  • resample_24k_to_16k one-shot stateless helper.

Documentation sweep

Every public module, class, function, method, interface, and exported type in both SDKs now has at least a one-line description. Pre-existing docstrings were left untouched. ~75 Python docstrings + ~290 TypeScript JSDoc blocks added across 100+ files. No behaviour changes.

Breaking changes

  1. Agent.provider is now a closed enum — passing arbitrary strings (e.g. typo "openai-realtime") errors. Migration: use one of "openai_realtime", "elevenlabs_convai", "pipeline".
  2. Internal import paths for callers reaching into getpatter.handlers.* / getpatter.services.{transcoding,pcm_mixer,background_audio,tool_decorator,tool_executor}. Migration:
    • getpatter.handlers.{twilio,telnyx}_handlergetpatter.telephony.{twilio,telnyx}
    • getpatter.handlers.commongetpatter.telephony.common
    • getpatter.handlers.stream_handlergetpatter.stream_handler (top-level)
    • getpatter.services.{transcoding,pcm_mixer,background_audio}getpatter.audio.{...}
    • getpatter.services.{tool_decorator,tool_executor}getpatter.tools.{...}
    • TypeScript src/carriers/{twilio,telnyx}src/telephony/{twilio,telnyx}; top-level transcoding.ts and tool-decorator.tsaudio/transcoding.ts and tools/tool-decorator.ts.
  3. The unused DEVLOG workflow rule is removed; CHANGELOG.md and commit messages cover documentation needs going forward.

Public top-level imports (from getpatter import ... / import { Patter } from "getpatter") are unchanged.

Test plan

  • Python: pytest libraries/python/tests/ — 1538 passed, 8 skipped
  • TypeScript: npm test (vitest) — 1224 passed across 68 files
  • TypeScript: npm run lint (tsc --noEmit) — clean
  • No competitor lineage scan flags
  • feature inventory (patter-assets/patter_sdk_features.xlsx) updated with 17 new rows

Docs updates

  • CHANGELOG.md — full Unreleased section
  • libraries/{python,typescript}/CLAUDE.md — folder layout updated
  • .claude/rules/README.md — DEVLOG rule entry removed
  • docs/ (Mintlify) — to be aligned by docs-sync agent post-merge

Repo-wide pass to remove external license headers, "ported from" notes
and competitor product names from source files, plus three runtime fixes
and one missing best-practice feature surfaced by the audit.

## Cleanup (zero residual livekit/pipecat/apache references)

- Removed Apache 2.0 header blocks from 12 Python + 12 TypeScript provider
  files (the headers travelled in from external ports; Patter ships under
  the root MIT LICENSE only — no per-file copyright notices).
- Stripped "Adapted from livekit-plugins-X" / "Ported from pipecat" /
  "Based on LiveKit Agents" provenance comments across ~40 source files
  in sdk-py/getpatter/{services,providers,observability,resources,evals}/
  and sdk-ts/src/{services,providers,observability}/, including the
  cartesia-stt USER_AGENT integration tag.
- Rewrote competitor framing in 12 docs MDX pages (provider docs,
  patter-tool, call-logging) — descriptions now stand on Patter's own
  shape, no migration-from-X language.
- Renamed test fixtures and variables that named LiveKit/Pipecat in
  sentence_chunker tests (Py + TS) and the parity scenario JSON;
  test logic preserved.
- Removed personal-name copyright in LICENSE / sdk-py/LICENSE /
  sdk-ts/LICENSE in favour of "Patter Contributors".
- .gitignore: ignore .ruff_cache/, sdk/ (legacy build dir from the
  pre-rename Python SDK), .agents/, skills-lock.json.

## Bug fixes

- llm_loop.py:420-421 (Python): cache_read_input_tokens /
  cache_creation_input_tokens were Anthropic-style keys, but every
  Python provider emits cache_read_tokens / cache_write_tokens. Fix
  reads the keys the providers actually emit, so OpenAI / Google
  cache attribution is no longer silently zeroed.
- llm-loop.ts:304-308 (TS): non-OK upstream HTTP responses were logged
  and silently swallowed; callers couldn't distinguish empty model
  output from API failure. Now throws PatterConnectionError with the
  status + truncated body.

## Performance

- text_transforms.py: precompiled the 14 markdown regex patterns and 2
  emoji-cleanup helpers as module-level constants — they previously
  recompiled on every sentence flush. Drop-in win, public API and
  37/37 existing tests unchanged.

## Feature: default phone-call preamble

- New Agent.disable_phone_preamble (Py) / disablePhonePreamble (TS)
  field, default False. When False, LLMLoop prepends a short
  spoken-language preamble to system_prompt instructing the model to
  avoid markdown / emojis / bullet lists and keep replies concise.
- Wired through stream_handler and test_mode in both SDKs.
- Adds two Py tests and one TS test covering the new behaviour.

## Test status

- Python: 1466 passed, 8 skipped
- TypeScript: 1164/1164 passed
- sdk-py/.env.example, sdk-ts/.env.example: full inventory of every
  env var the SDK reads at runtime, grouped by role (telephony, LLM
  providers, STT, TTS, tracing, Patter tunables). Only OPENAI_API_KEY
  + a telephony carrier is required; the rest are uncommented as the
  user enables specific provider integrations.
- .env.example.cloud removed — variables (PATTER_DATABASE_URL,
  PATTER_ENCRYPTION_KEY, PATTER_REDIS_URL, etc.) belonged to the
  hosted cloud surface that was retired in 0.5.3.
- Root .env.example kept as a minimal quickstart sample.
Replace the magic strings ``"minute"`` / ``"1k_chars"`` / ``"token"``
sprinkled across DEFAULT_PRICING with a named enum, so the pricing
table reads as a typed shape rather than free-form dicts.

- Python: ``PricingUnit(StrEnum)`` — ``MINUTE``, ``THOUSAND_CHARS``,
  ``TOKEN``. Subclassing ``str`` keeps the dict JSON-serialisable and
  unchanged for any consumer that compares against the literal string.
- TypeScript: ``PricingUnit`` const object + ``PricingUnitValue`` union
  type. ``ProviderPricing.unit`` accepts ``PricingUnitValue | string``
  so user overrides loaded from JSON / env config still flow through
  ``mergePricing`` without type gymnastics.
- Behaviour preserved end-to-end: 143 Python pricing/metrics tests pass,
  18 TypeScript pricing tests pass, full suites 1466 Py / 1164 TS green.
…xamples

mcp-use-style monorepo layout: each SDK gets its own library folder with
README, CLAUDE.md, .env.example, tests/, and the package source. Sample
code is maintained in separate example repos and is no longer tracked
here (notebooks tutorial preserved — it's the documentation, not an
example).

## Layout

```
libraries/
├── python/         (was sdk-py/)
│   ├── README.md, CLAUDE.md, LICENSE, .env.example
│   ├── pyproject.toml, pytest.ini
│   ├── getpatter/
│   └── tests/
└── typescript/     (was sdk-ts/)
    ├── README.md, CLAUDE.md, LICENSE, .env.example
    ├── package.json, tsconfig.json, vitest.config.ts, tsup.config.ts
    ├── src/
    └── tests/
```

## What changed

- 405 ``git mv`` renames so history follows every file. ``sdk-py/`` and
  ``sdk-ts/`` no longer exist on disk.
- Per-library CLAUDE.md guides (~40 lines each); .gitignore exception
  ``!libraries/*/CLAUDE.md`` so the library guides ARE tracked while the
  root guide stays ignored.
- CI: ``.github/workflows/{audit,release,test,docs-feature-drift}.yml``
  rewritten to the new paths. ``scripts/check_feature_docs_drift.py``
  also fixed (it had a stale ``patter/__init__.py`` from the pre-rename
  era).
- Pre-commit, pre-push, ``scripts/pr-validate.sh``, top-level README and
  CONTRIBUTING.md re-pointed at ``libraries/{python,typescript}``.
- Internal package re-organisation (``handlers → telephony``, splitting
  ``audio/``, ``tools/``) deliberately deferred to a follow-up PR — that
  layer of import-path churn doesn't belong in the same commit as the
  outer rename.

## Examples

``examples/{developer,enterprise,startup,integrations}/`` removed (24
files + the index README). Sample code is published in dedicated repos.
``examples/notebooks/`` kept — it's the 24-notebook tutorial series
documented in the Mintlify site and depended on by
``.github/workflows/notebooks.yml`` and ``.pre-commit-config.yaml``.

PatterTool docs now point at the external example repo (TODO comment
left for the canonical URL — to fill in once the public examples repo is
public).

## Test status

- Python: 1413 passed, 6 skipped (pytest libraries/python/tests)
- TypeScript: 1164 passed, 67 files (vitest run libraries/typescript)
- TypeScript: ``tsc --noEmit`` clean (one pre-existing
  ``@ts-expect-error`` in silero-vad — predates this branch)
Wave 2 of the cleanup pass — covers half of the provider integrations.
Replaces hardcoded model/voice/format/sample-rate strings with typed
enums (Python ``StrEnum`` / ``IntEnum``, TypeScript ``const`` objects +
union types) so user code gets autocomplete and the type system catches
typos at the call site instead of at the provider's HTTP 400.

## Agent / public types

- ``Agent.provider`` (Python) tightened from ``str`` to a
  ``ProviderMode = Literal["openai_realtime", "elevenlabs_convai",
  "pipeline"]`` alias. TS counterpart was already a string union.
- Expanded ``Agent`` (Py) and ``AgentOptions`` (TS) docstrings to
  document the precedence rule for fields that appear both on the agent
  and on the engine adapter (``voice``, ``model``, ``language``):
  explicit kwarg on ``agent()`` wins; otherwise the engine value
  populates the agent via ``_unpack_engine``; otherwise the default.
- No behaviour change. ``StrEnum`` subclasses ``str``; existing callers
  passing raw strings keep working.

## Providers covered

Python: ``anthropic_llm``, ``cartesia_tts``, ``cerebras_llm``,
``deepgram_stt``, ``elevenlabs_tts``, ``google_llm``, ``groq_llm``,
``lmnt_tts``, ``openai_realtime``, ``rime_tts``.

TypeScript: ``anthropic-llm``, ``cerebras-llm``, ``deepgram-stt``,
``elevenlabs-tts``, ``google-llm``, ``groq-llm``, ``lmnt-tts``,
``openai-realtime``, ``rime-tts``.

Each module now exposes its own ``<Provider>Model`` /
``<Provider>OutputFormat`` / ``<Provider>Voice`` / etc. New enums are
re-exported from ``__init__.py`` and ``index.ts`` in dedicated
"provider-specific enums" sections.

## Still pending

The following providers still hold magic strings — covered in a
follow-up commit:
``assemblyai_stt``, ``soniox_stt``, ``speechmatics_stt``,
``cartesia_stt``, ``telnyx_stt``, ``whisper_stt``,
``elevenlabs_ws_tts``, ``openai_tts``, ``telnyx_tts``, ``gemini_live``,
``ultravox_realtime``, ``silero_vad``, ``silero_onnx``, ``krisp_*``.
The TS ``cartesia-tts.ts`` enums also still need to land (Py is
already covered).

## Test status

- Python: 1466 passed, 8 skipped
- TypeScript: 1164/1164 passed; ``tsc --noEmit`` clean (one pre-existing
  silero-vad warning unchanged)
Provider enum residuals (Wave 2.5)
- Python: assemblyai_stt, cartesia_stt, soniox_stt, speechmatics_stt,
  telnyx_stt, whisper_stt, elevenlabs_ws_tts, openai_tts, telnyx_tts,
  gemini_live, ultravox_realtime, silero_vad, silero_onnx, krisp_*
- TypeScript: assemblyai-stt, cartesia-stt, cartesia-tts, soniox-stt
- All hardcoded model/voice/format strings now live behind StrEnum/IntEnum
  (Python) or const-object + value union (TypeScript)

Bug fixes (Wave 3a)
- stream_handler: barge-in now sets asyncio.Event / AbortController to
  cancel in-flight LLM stream instead of letting it run to completion
- server (Py): SSRF validator on outbound webhook URLs + per-IP WS cap
  (MAX_WS_PER_IP=10) for parity with TS
- server (Py): voicemail POST gets explicit 10s timeout
- metrics (Py): agent_response_ms accepts 0.0 instead of treating it as
  "missing" (use is None gate)
- metrics (TS): emit llm/stt/tts TTFB events on the event bus
- observability/event_bus (Py): listener errors now surface to logger
  instead of being swallowed
- server (TS): queryTelephonyCost catch logs instead of silent return
Stable, machine-readable error codes attached to every Patter exception
class. Existing class-name-based catches keep working; the enum is
additive.

ErrorCode values (10): CONFIG, CONNECTION, AUTH, TIMEOUT, RATE_LIMIT,
WEBHOOK_VERIFICATION, INPUT_VALIDATION, PROVIDER_ERROR, PROVISION,
INTERNAL.

- Python: StrEnum on `exceptions.py`; class-default `code` attribute
  per subclass (PatterError → INTERNAL, PatterConnectionError →
  CONNECTION, AuthenticationError → AUTH, ProvisionError → PROVISION,
  RateLimitError → RATE_LIMIT). Optional `code=` kwarg on the base
  ctor lets callers override per-instance.
- TypeScript: const-object + value union in `errors.ts`; `readonly
  code: ErrorCode` on every class; optional `{ code }` constructor
  option. Same class→code mapping byte-for-byte with Python.
- Both SDKs re-export `ErrorCode` from the package root.
- Test parity asserts the enum value sets match between SDKs.
Companion to 8b8c503 (test files). Ships the actual enum + class wiring:

- libraries/python/getpatter/exceptions.py — ErrorCode StrEnum, default
  .code per subclass, optional code= kwarg on PatterError.__init__
- libraries/python/getpatter/__init__.py — re-export ErrorCode
- libraries/typescript/src/errors.ts — ErrorCode const-object + value
  union, readonly code on every class, optional { code } ctor option
- libraries/typescript/src/index.ts — re-export ErrorCode
…d with Twilio

ElevenLabs WS TTS streams `ulaw_8000` natively. When the carrier is
Twilio (mulaw 8 kHz), we can let ElevenLabs do the encoding server-side
and skip the SDK-side mulaw transcode entirely.

- ElevenLabsWebSocketTTS.set_telephony_carrier(carrier) / TS
  setTelephonyCarrier(carrier) — duck-typed hook called by the stream
  handler after TTS construction. Maps "twilio" → "ulaw_8000",
  "telnyx" → "pcm_16000" (lowest conversion).
- output_format constructor arg becomes truly optional (sentinel) —
  user-passed format wins over the carrier hint.
- for_twilio / for_telnyx factories already pass explicit formats →
  the carrier hint is a no-op for those callers.
- 7 new unit cases per SDK in TestCarrierAutoFlip / equivalent: default
  flip, URL contains ulaw_8000, telnyx no-op, explicit format respected,
  factory wins, unknown carrier no-op.

No public-API break — existing constructor calls behave identically
when no carrier hook is wired up.
…sample_rate)

OpenAI TTS streams 24 kHz audio. The default 24k→16k resample stays for
the Telnyx (PCM 16 kHz) carrier; for Twilio (mulaw 8 kHz) the chained
24→16 + 16→8 used to cost two ratecv passes. New `target_sample_rate=8000`
constructor opt-in collapses the two passes into a single 3:1 decimation
with a tighter LPF (Nyquist ≈ 4 kHz).

- Python: getpatter.services.transcoding.create_resampler_24k_to_8k()
  factory; OpenAITTS gains optional `target_sample_rate=16000` (default
  preserves existing behaviour).
- TypeScript: createResampler24kTo8k() + 24000→8000 case in
  StatefulResampler; OpenAITTS gains optional positional
  `targetSampleRate=16000` with `LPF_ALPHA_8K=0.45` for proper
  anti-aliasing at 4 kHz Nyquist.

Auto-engagement on Twilio carriers is deferred — the audio sender
currently assumes 16 kHz PCM input. Manual opt-in keeps the change
narrowly scoped.
Bug #48 — per-language honorifics
- New HONORIFICS_{EN,IT,ES,DE,FR,PT} constants merged into HONORIFICS_ALL
  (sorted longest-first). Module-level HONORIFICS_REGEX_ALT alternation
  built once. The aggregation is union-of-all regardless of `language`
  (mixed-language deployments are common; safer default).
- splitSentences prefix regex sources from the union — sentences like
  "Ho incontrato il Sig. Rossi alla riunione" no longer split mid-honorific
  in any of the supported languages.

Bug #49 — single-word "Yes." never flushed
- DEFAULT_MIN_WORDS_FOR_SHORT_FLUSH lowered from 2 → 1; single-word
  replies ending in "."/"!"/"?" now flush on the terminator.
- New gate #6 in maybeShortFlush blocks flushes whose trailing word is a
  known honorific — prevents "Mr." / "Sig." escaping as a standalone
  sentence.
- Legacy escape hatch: pass `minWordsForShortFlush=2` to restore the
  pre-fix behaviour.

Tests: 22 Python + 21 TS new honorific cases; 12 + 12 single-word flush
cases. Existing tests updated where they asserted the old buffered
behaviour for single-word replies. Both suites green (Py 1538, TS 1224).
… silero-vad lint

- CHANGELOG.md: comprehensive Unreleased section covering reorg,
  provider enums, error taxonomy, bug-fix wave, perf wins, and
  cleanup work landing on this branch.
- tool_executor.py: add module-level docstring describing the SSRF
  guard, response-size cap, and OTel span emission.
- silero-vad.ts:127: replace stale @ts-expect-error directive (now
  a TS2578 warning since onnxruntime-node types resolve at build) with
  a plain comment explaining the optional-peer-dep dynamic import.
…ools/

Internal restructure of the Python SDK; PUBLIC API surface unchanged.

- handlers/{twilio,telnyx,common}_handler.py → telephony/{twilio,telnyx,common}.py
  ("_handler" suffix dropped — the parent module name already conveys
  intent). stream_handler.py promoted out of handlers/ to package root
  since it's the per-call orchestrator, not a telephony adapter.
  handlers/ folder removed.
- services/{transcoding,pcm_mixer,background_audio}.py → audio/* (audio
  pipeline collected in one place).
- services/{tool_decorator,tool_executor}.py → tools/* (tool-decoration
  & webhook-execution kept together).
- Other services/* stay where they are: llm_loop, metrics,
  sentence_chunker, text_transforms, ivr, fallback_provider,
  pipeline_hooks, chat_context, call_log, remote_message.
- tts/ and stt/ namespaces kept — they expose
  getpatter.{tts,stt}.<provider>.{TTS,STT} with env-var auto-resolve
  and are public surface.
- File moves use git mv so blame/history follow.
- Imports rewritten across providers, server, services, tests, and
  package-root re-exports. Python tests: 1538 passed.

TS side ships in a separate commit.
TS internal restructure for parity with the Python d5d9391 commit.
Public API surface unchanged.

- carriers/{twilio,telnyx}.ts → telephony/{twilio,telnyx}.ts (rename for
  naming parity with Py; "carrier" was the original term, "telephony"
  reads better next to twilio/telnyx).
- transcoding.ts → audio/transcoding.ts.
- services/background-audio.ts → audio/background-audio.ts.
- tool-decorator.ts → tools/tool-decorator.ts.
- Imports rewritten across client, index, types, stream-handler,
  deepfilternet-filter, plus 5 test files. TS tests: 1224 passed,
  tsc --noEmit clean.

The telephony/audio/tools triad now matches between Python and
TypeScript SDKs.
Update per-library AI-agent quickstarts to match the post-restructure
package tree. Adds the new folder names (telephony/, audio/, tools/)
and a one-line description per folder.
…/tools

Adds 1-3 line docstrings to public symbols (modules, classes, methods)
in libraries/python/getpatter/{providers,telephony,audio,tools} that
previously had none. No behaviour changes; pre-existing docstrings are
left untouched.
…ability/dashboard/top-level

Adds short JSDoc summaries to public classes, interfaces, type aliases, and
exported functions that were missing them. Existing JSDoc is preserved
verbatim — this is a fill-the-gaps pass only, no rewrites.
Mechanical replace of stale path strings in docstrings, comments, and
.env.example headers:
- sdk-ts/src/* → libraries/typescript/src/*
- sdk-py/getpatter/* → libraries/python/getpatter/*
- conceptual "(sdk-py)" → "the Python SDK"

No behaviour change; tests still 1538 passed, tsc clean.
The e2e Playwright suite (tests/e2e/*.spec.ts + playwright.config.ts +
@playwright/test / playwright devDeps) were inherited from an earlier
"comprehensive test suite" PR but never integrated with downstream
test infra after the libraries/ reorg. Per CLAUDE.md, end-to-end call
testing lives in a separate downstream test repo.

- Drop libraries/typescript/playwright.config.ts.
- Drop libraries/typescript/tests/e2e/ (6 .spec.ts + test-server.ts).
- Remove @playwright/test and playwright from package.json devDeps.
- Refresh package-lock.json (npm install).
- silero-vad.ts: switch back to @ts-ignore on the optional
  onnxruntime-node import — the dynamic-import line surfaces a TS7016
  warning when types are unresolved post-lock-refresh.
…select_sound_from_list, resample_24k_to_16k from TypeScript SDK

Closes 5 public-surface gaps in the Python SDK so every symbol exported
from libraries/typescript/src/index.ts now has a Python counterpart.

- ``DefaultToolExecutor`` — async tool dispatcher with retry/fallback,
  webhook SSRF validation via ``server.validate_webhook_url``, and the
  same JSON error shape as the TypeScript class. Added to
  ``services/llm_loop.py``.
- ``LLMChunk`` — frozen dataclass mirror of the TS ``LLMChunk``
  interface (text/tool_call/done/usage). ``to_dict()`` produces the same
  shape as ``OpenAILLMProvider.stream`` for callers that prefer dicts.
- ``builtin_clip_path`` — top-level helper resolving ``BuiltinAudioClip``
  values (or raw filenames) to absolute paths. ``BuiltinAudioClip.path``
  now delegates to the new function for a single source of truth.
- ``select_sound_from_list`` — promoted from a private static method on
  ``BackgroundAudioPlayer`` to a public top-level function. The static
  method is preserved as a backward-compatible delegator.
- ``resample_24k_to_16k`` — stateless one-shot helper following the
  existing ``resample_8k_to_16k`` / ``resample_16k_to_8k`` convention,
  including the per-process ``DeprecationWarning`` latch.

All five symbols are re-exported from ``getpatter.__init__`` and listed
in ``__all__``. The five ``TODO(parity)`` markers are removed in the
same commit.

25 unit tests added in ``tests/unit/test_parity_ports.py`` covering
public-symbol reachability, ``LLMChunk`` round-trip, real
handler/webhook dispatch through ``DefaultToolExecutor`` (including the
SSRF-blocked branch), bundled clip resolution, weighted selection
empirics, and equivalence of ``resample_24k_to_16k`` to a single-shot
``StatefulResampler``.

Tests: 1546 → 1571, all passing.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 2, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview May 2, 2026, 10:24 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

nicolotognoni and others added 19 commits May 4, 2026 14:49
Bug #2 from the barge-in audit: on speakerphone / tunnel-loop
deployments the agent's outbound TTS bleeds back into the mic. VAD
sees that bleed as continuous voice-like energy and never transitions
out of "speaking" state, so a caller interruption only registers
during natural TTS pauses → "interrupt sometimes works, sometimes the
agent keeps talking" intermittent symptom.

Fix at the source — proper acoustic echo cancellation. NLMS adaptive
filter (2048 taps @ 16 kHz, 128 ms history) subtracts an estimate of
the TTS-derived echo from the mic stream before VAD/STT see it.
Geigel double-talk detector freezes adaptation when the caller is
speaking on top of the agent so the filter does not learn the user's
voice as part of the channel response.

Convergence on the synthetic narrowband test signal:
- ~24 dB ERLE after 1 s of TTS-only training
- Near-end speech preserved within 0 dB during double-talk

Not a drop-in replacement for WebRTC AEC3 (state-of-the-art needs
adaptive sub-band processing + comfort noise + nonlinear post-filter
that this scope does not cover). For production-grade quality, wrap
a binding to ``webrtc-audio-processing-2`` externally.

- libraries/python/getpatter/audio/aec.py — NlmsEchoCanceller class.
- libraries/typescript/src/audio/aec.ts — TS parity.
- Agent.echo_cancellation / AgentOptions.echoCancellation — opt-in
  flag, default false. Handset / headset deployments don't need it
  and the 0.5–2 s convergence period would briefly attenuate caller
  speech if they spoke before any TTS played.
- PipelineStreamHandler.start() (Py) / StreamHandler.initPipeline
  (TS) instantiate the canceller when the flag is on. Far-end tap
  fires before the carrier transcode in synthesizeSentence; near-end
  runs after the inbound 8k→16k resample, before VAD.
- 8 unit tests per SDK covering convergence, double-talk preservation,
  construction validation, pass-through-before-priming, reset, empty
  buffers.

Tests: Py 1574 passed (+8), TS 1236 passed (+8), tsc clean.
…e / echo_cancellation in agent() builder

Pre-existing parity violation surfaced during the AEC audit: the Py
``Patter.agent(...)`` builder enumerates kwargs explicitly, so any field
not listed silently drops on the floor. Three boolean flags on the Agent
dataclass — ``aggressive_first_flush``, ``disable_phone_preamble``, and
the freshly added ``echo_cancellation`` — were unreachable through the
builder, forcing users to construct ``Agent(...)`` directly.

TS does not have this problem because ``agent(opts)`` spreads the whole
``AgentOptions`` object, so every field passes through.

Add the three flags to the Py builder signature and forward them to
``Agent(...)``. Defaults match the dataclass (all ``False``) so existing
callers keep their behaviour.

2 new tests:
- builder defaults match dataclass defaults (no silent True leak)
- explicit ``aggressive_first_flush=True`` / ``disable_phone_preamble=
  True`` / ``echo_cancellation=True`` reach the resulting Agent

Tests: Py 1576 passed (+2), TS 1236 unchanged, tsc clean.
Real cellular-call test on 0.6.0 with the initial 2048-tap +
constant-0.1-step config exposed an 8–12 s convergence window during
which the user's first turn was either over-cancelled to silence
(filter eats voice while learning the channel) or contaminated by
residual echo (Deepgram transcribes garbage and discards). The user
report: ~11 s of perceived silence after firstMessage, then everything
worked from turn 4 onward. Net first-turn UX was worse than no AEC.

The architectural fix the user asked for ("source-level, no workaround,
solid"): two NLMS hyperparameter changes that compress convergence
into the first ~250 ms — the same window where the agent's
firstMessage finishes playing.

1. **512 taps (was 2048)**: 4× fewer coefficients to converge with no
   measurable cancellation loss on cellular / VoIP paths whose RT60
   stays under ~50 ms after the carrier's own echo suppression. Pass
   ``filter_taps=2048`` explicitly for landline hairpin loops where the
   tail extends beyond 32 ms @ 16 kHz.
2. **Adaptive step**: aggressive warm-up step (0.5) for the first 0.5 s
   of processed audio, then taper to the textbook 0.1 for steady-state
   tracking. The Geigel double-talk detector still gates updates so the
   larger step does not learn the caller's voice into the echo model.

Verification: regression-test fed a broadband synthetic signal (3
sinusoids + white noise) in realistic 20 ms frames hits **17–19 dB
ERLE in the very first 250 ms** with the new defaults — well above the
previous 0 dB at the 1.25 s mark.

- New constructor knobs: ``warmup_step_size`` (default 0.5),
  ``warmup_seconds`` (default 0.5). Step branch is constant within a
  frame so the inner sample-by-sample loop stays branch-free.
- Validation extended for the two new fields.
- ``reset()`` now clears the ``processed_samples`` counter so the
  warm-up window re-engages on filter reset.
- 1 new regression test per SDK enforces the "≥10 dB ERLE in the first
  250 ms with defaults" guarantee on a broadband signal.

Tests: Py 1577 passed (+1), TS 1237 passed (+1), tsc --noEmit clean.
… ring flush

Two fixes for the speakerphone "agent unresponsive on first turn / mid-call
gets stuck after a few exchanges" symptom reported on 0.6.0 cellular tests.

1. firstMessage was bypassing beginSpeaking + AEC far-end tap
   The ``firstMessage`` block streamed TTS chunks directly to the carrier
   without (a) marking ``isSpeaking=true`` and (b) pushing each chunk to
   ``aec.pushFarEnd()``. Consequence on speakerphone: while the intro
   played, the self-hearing guard never engaged, the user's audio (mixed
   with TTS bleed) was forwarded to STT and produced garbage transcripts;
   AEC had no reference signal so the bleed survived in the inbound
   channel. Wraps the firstMessage TTS streaming loop in
   ``beginSpeaking()`` + ``try/finally { endSpeakingWithGrace() }`` and
   pushes each chunk to ``aec.pushFarEnd()`` before encoding for the
   carrier. Mirrors the per-turn behaviour of ``runPipelineLlm`` /
   ``_process_streaming_response``.

2. Ring buffer must NOT flush on natural turn end
   An earlier iteration also flushed ``inboundAudioRing`` from
   ``endSpeakingWithGrace`` so user audio captured during the agent's TTS
   that never tripped VAD would still reach STT. In practice this raced
   live STT input post-grace: the ring contained partially-cancelled echo
   (AEC still adapting) and possibly over-cancelled user voice (Geigel
   rho=0.6 misses quiet double-talk). Replaying on every turn produced
   phantom transcripts that confused the LLM and caused the "out of order
   responses + agent gets stuck" symptom the user observed mid-call.
   Reverted: flush only on real barge-in (where VAD confirmed user
   speech). Audio captured during the agent's turn that VAD did not
   classify as speech is intentionally dropped at the next
   ``beginSpeaking`` — the user can repeat themselves rather than have
   the LLM react to a stale phantom transcript.

The barge-in flush remains: extracted into ``flushInboundAudioRing()`` /
``_flush_inbound_audio_ring()`` helpers (clean refactor, 1 caller now).

Stale "2048 taps + 0.5–2 s convergence" log message updated to the
post-AEC-tuning "512 taps + 0.5 s warmup μ=0.5 → ~250 ms convergence".

Tests: Py 1577 passed, TS 1237 passed, tsc --noEmit clean.
The previous fix wrapped the firstMessage TTS in
``beginSpeaking`` + ``endSpeakingWithGrace`` so the self-hearing guard
could engage during the intro. This worked, but exposed a second
defect: the AEC filter needs ~500 ms of TTS reference to converge, and
during that warmup window residual TTS bleed in the inbound mic stream
still looks like speech to VAD. With ``isSpeaking=true`` from frame
zero of the firstMessage, the very first chunk of bleed triggered an
immediate barge-in cancel — the firstMessage was killed before a
single byte had been played. Test reported "agent never speaks".

Fix: gate both barge-in entry points (VAD ``speech_start`` and
transcript-based) on a 1-second minimum agent-speaking duration. Real
users almost never start interrupting within the first second of an
agent turn anyway, and the gate cleanly covers the AEC convergence
period (500 ms warmup + safety margin).

- TypeScript: ``MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN = 1000`` static
  on ``StreamHandler``. New ``speakingStartedAt: number | null`` field
  set in ``beginSpeaking()`` and cleared in ``cancelSpeaking()`` and
  the grace flip. New ``canBargeIn()`` helper used by both barge-in
  sites; suppressed events log at debug level so call-debug logs still
  show why the cancel did not fire.
- Python: ``MIN_AGENT_SPEAKING_S_BEFORE_BARGE_IN = 1.0`` module-level
  constant. ``_speaking_started_at`` field with the same lifecycle.
  ``_can_barge_in()`` helper applied at the VAD speech_start path in
  ``on_audio_received`` and at the entry of ``_handle_barge_in``.
  Helper uses ``getattr`` so test fixtures that bypass
  ``_begin_speaking`` still permit barge-in to fire.

5 new unit tests (3 Py + 5 TS):
- ``canBargeIn() / _can_barge_in()`` returns true with no active turn,
  false within the gate window, true past the gate window.
- ``handleBargeIn / _handle_barge_in`` returns / does nothing during
  the warmup window, ``isSpeaking`` stays True.
- ``handleBargeIn / _handle_barge_in`` fires normally past the gate.

Tests: Py 1579 passed (+2), TS 1242 passed (+5), tsc --noEmit clean.
…f-default

The previous AEC commits added a server-side NLMS adaptive filter and
exposed an ``echoCancellation`` flag. Real-call testing on cellular
PSTN turned up a fundamental architectural mismatch the early
benchmarks did not catch: the round-trip echo path on Twilio Media
Streams is ~250–1500 ms (jitter buffer + carrier loop), but a 512-tap
NLMS filter at 16 kHz can only see the most recent 32 ms of far-end
samples. The echo never lands inside the filter's window, the weights
stay near zero, and the filter silently no-ops. Worse, with
``isSpeaking=true`` during firstMessage and a barge-in gate of 1 s,
once the gate releases any residual bleed reaching VAD triggers an
immediate self-cancel — the agent stops talking right after starting.

Industry consensus from this round of research:
- LiveKit & Pipecat handle echo cancellation at the transport layer
  for browser/native paths only.
- Twilio's own guidance is to "rely on network echo cancellers" for
  telephone scenarios.
- Vapi, Retell, Bland do not run server-side AEC. They rely on the
  carrier's network echo suppression and the caller device's built-in
  AEC (modern handsets ship one).

Server-side NLMS is the right tool only when the SDK owns the audio
path end-to-end and the loop latency is on the order of the filter
window (~30 ms — browser WebRTC, mobile native). PSTN does not meet
that bar and never will under realistic carrier conditions.

This commit:

- ``echoCancellation`` stays opt-in (default false) so existing PSTN
  callers see no change in behaviour.
- When ``echoCancellation: true`` is detected on a Twilio or Telnyx
  carrier, log a clear warning explaining why it will not work as
  intended and what to do instead. The filter is still instantiated so
  curious operators can compare; the warning makes the recommendation
  explicit.

For PSTN deployments, the working stack is: Patter's self-hearing
guard + 1 s barge-in cooldown + Silero VAD with the phone-tuned preset
+ carrier / handset native echo suppression. No server-side AEC.

Tests: Py 1579 passed, TS 1242 passed, tsc --noEmit clean.
…ncel drain

Six architectural fixes for the post-barge-in failure modes surfaced during
the 0.6.0 acceptance pass against real PSTN calls. Validated end-to-end on
six pipeline stacks (Deepgram + Groq/OpenAI/Anthropic/Cerebras/Google +
Cartesia/OpenAI TTS) with verbose Italian conversation flow.

1. Adaptive barge-in gate
   - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_AEC = 1000 (covers AEC warmup)
   - MIN_AGENT_SPEAKING_MS_BEFORE_BARGE_IN_NO_AEC = 250 (anti-flicker only)
   - canBargeIn() picks the right gate based on whether AEC is wired.
   - Suppression call sites log at INFO level with the AEC state.

2. Inbound audio ring cap reduced from 30 frames (~600 ms) to 13 (~260 ms)
   to match VAD minSpeechDuration. Pre-fix, the replay was dragging in
   ~350 ms of agent TTS bleed which Deepgram (default English) transcribed
   as English garbage and committed to the LLM as phantom user input.

3. STT.finalize() on VAD speech_end
   - New optional finalize() on STTAdapter / STTProvider.
   - DeepgramSTT.finalize() exposes {type: 'Finalize'} as a public method.
   - StreamHandler calls stt.finalize() whenever the SDK's VAD signals
     speech_end so the provider returns is_final immediately rather than
     waiting on its own (slow) endpointing heuristic.

4. AMD on by default + onMachineDetection callback (Twilio + Telnyx parity)
   - New MachineDetectionResult carrier-agnostic shape.
   - Twilio: MachineDetection=DetectMessageEnd + AsyncAmd=true (no
     answer-latency penalty on human pickups).
   - Telnyx: answering_machine_detection=greeting_end.
   - Callback fires on both webhooks before the legacy voicemail-drop
     path so callers see the result regardless of voicemailMessage.

5. Post-cancel drain window of 150 ms
   - Tracks lastCancelAt timestamp on every barge-in cancel (both
     VAD-path and transcript-path).
   - beginSpeaking() is now async and awaits the drain remainder so the
     remote PSTN player has time to flush the cancelled turn's tail
     before the next TTS chunk lands on top of it. Eliminates the
     "doubled first sentence" audio artefact reported during testing.

6. AssemblyAI accepts a parity-only `language` field for cross-provider
   uniformity (forwarded as no-op; AssemblyAI selects language by model).

Both SDKs (TypeScript and Python) updated with identical defaults,
constants, and call-site coverage. Unit tests: TS 24/24 passing, Python
33/33 passing. Includes [DIAG] INFO logs in TS deepgram-stt.ts and
stream-handler.ts for the diagnostic phase; these can be removed in a
follow-up commit once the bleed-transcription root cause is sealed.
… tunnel grace

Bundles the SDK changes from a focused work session: 5 bug fixes + 6
new feature areas, with full Python ↔ TypeScript parity.

Bug fixes
---------
* fix(client): bump cloudflared quick-tunnel grace 2.5 → 5 s. The 2.5 s
  window covered HTTP propagation only — Twilio's WSS upgrade for the
  media stream goes through a different cloudflared edge route that
  takes ~1-3 s longer; ~5 % of first calls dropped silently at pickup
  with no media. 5 s drops the failure rate to <1 %. (client.ts /
  client.py)
* fix(realtime): handler-only tools were silently ignored in TS Realtime
  mode (CRITICAL). `handleFunctionCall` only dispatched `webhookUrl`
  tools; tools with an in-process `handler` callback (the default
  pattern in the demos) fell through without sending
  `function_call_output`, hanging the model.
* fix(realtime): `onTranscript({ role: 'assistant' })` was never fired.
  Assistant text was pushed into history but never surfaced via the
  user-supplied callback, so demos only saw `[user]` lines.
* fix(realtime): dashboard transcript shown out of order. OpenAI Realtime
  emits `input_audio_transcription.completed` AFTER `response.done`, so
  the naïve push order was [assistant, user, ...]. Added a per-call
  buffer (`pendingAssistantTurn` + 3 s fallback timer) that holds the
  assistant turn until the matching user transcript arrives.
* fix(realtime): tool invocations were invisible in the transcript
  timeline. Added `emitToolEvent` that pushes `role: 'tool'` history
  entries and fires `onTranscript({ role: 'tool', tool_name, tool_args,
  tool_result, ... })` for the call/return semantics.

Features
--------
* feat(api): `Patter({ persist })` opt-in dashboard persistence. The
  on-disk per-call records (metadata.json, transcript.jsonl, events.jsonl)
  were previously opt-in only via `PATTER_LOG_DIR`. New explicit option:
  `false` (default), `true` (platform default location), or a custom
  string path. Env var still works as deployment-time override.
* feat(tools): JSON-schema validation at `agent()` build time +
  OpenAI strict-mode opt-in. Schemas are validated structurally for
  every tool; `Tool({ strict: true })` additionally enforces OpenAI's
  strict-mode requirements (recursive `additionalProperties: false`,
  every property in `required`). Catches typos at build time.
* feat(tools): retry with exponential backoff + per-tool circuit
  breaker. Both handler and webhook paths now get 3 attempts with
  jittered backoff (capped at 5 s). New `CircuitBreakerRegistry` trips
  OPEN after 5 consecutive failures and stays OPEN for 30 s before
  allowing a HALF_OPEN probe; while OPEN it returns
  `{error, fallback: true, circuit_state: "open", retry_after_ms}`.
* feat(tools): reassurance auto-message during long tool calls. New
  `Tool({ reassurance: "Let me check..." })` (or
  `{ message, afterMs }`) bridges the silence on slow tools by
  enqueueing the message via `OpenAIRealtimeAdapter.sendText` after
  `afterMs` (default 1500 ms) — cancelled if the tool returns earlier.
  Realtime-only for now.
* feat(tools): MCP (Model Context Protocol) client integration (MVP).
  New `agent({ mcpServers: [...] })` plugs the agent into MCP servers
  (Google Workspace, PayPal, Postgres, GitHub, ...) without writing
  wrapper handlers. Each server is queried at call start via
  `tools/list`; discovered tools are wrapped with synthetic handlers
  that dispatch to `tools/call` and merged into `agent.tools`.
  Optional dependency: `@modelcontextprotocol/sdk` (TS) /
  `mcp` (Py extra). Streamable-HTTP transport only for now.
* feat(tools): streaming results via async generator handlers. Tool
  handlers can be `async function*` (TS) / `async def ... yield` (Py)
  generators that emit `{ progress: "..." }` updates while running;
  each yield is sent to the agent via `sendText` for inline status.

New files
---------
* libraries/typescript/src/tools/schema-validation.ts
* libraries/typescript/src/tools/circuit-breaker.ts
* libraries/typescript/src/tools/mcp-client.ts
* libraries/python/getpatter/tools/schema_validation.py
* libraries/python/getpatter/tools/circuit_breaker.py
* libraries/python/getpatter/tools/mcp_client.py
* test files: 4 TS + 3 Py covering schema validation, breaker, streaming, reassurance

Tests
-----
1280 TS · 1156 Py · 0 regressions. Updates two stale tests (AMD
on-by-default test in new-features.test.ts; handler retry count in
llm-loop.test.ts) to reflect new behaviour.
The dashboard is now a real SPA in `dashboard-app/` (Vite + React +
TypeScript) instead of a 700-line HTML/CSS/JS string embedded in
`dashboard/ui.{ts,py}`. The build pipeline produces a single
self-contained HTML file (vite-plugin-singlefile inlines JS + CSS)
which is committed to `libraries/typescript/src/dashboard/ui.html` and
mirrored to the Python package via `dashboard-app/scripts/sync.mjs`.

At runtime the SDK serves the same `GET /` endpoint as before — the
inlined HTML is loaded by tsup's esbuild ``text`` loader (TS) or the
package-data file (Py). Customer-side: zero change in start-up UX
(`phone.serve()` → http://127.0.0.1:8000/), but the dashboard is now
typed, modular, and maintainable as proper React.

Why this approach (option D from the design discussion):
* No CDN dependency at runtime (no unpkg.com / Babel-in-browser).
* No new runtime deps in the SDK — React + Vite live only at build time
  in the dev repo; the published package ships static HTML.
* Self-contained bundle: the SDK still works air-gapped and behind
  corporate firewalls.
* Type safety end-to-end (TSX components, tsconfig strict).

Components ported from the reference design:
* Topbar, PageHeader, Metric cards
* CallTable with row selection + search
* LiveCallPanel (transcript stream + call controls)
* LatencyPanel (p50 / p95 / STT / TTS bars)
* CostPanel (per-provider breakdown)

Hooks:
* useDashboardData — fetches `/api/dashboard/calls` + subscribes to the
  SSE stream at `/api/dashboard/stream`
* useTranscript — incremental transcript updates per selected call
* mappers.ts — maps the wire format (CallRecord) to the UI shape

Build:
* `dashboard-app/` is its own Vite project with `npm run build && npm
  run sync` — sync copies the inlined HTML to both SDKs.
* `libraries/typescript/tsup.config.ts` adds the ``.html`` text loader
  so the asset is inlined into `dist/index.{js,mjs}`.
* `libraries/python/pyproject.toml` declares `ui.html` as
  `getpatter.dashboard` package-data so it ships with `pip install`.
* `libraries/typescript/package.json` `files` array includes
  `src/dashboard/ui.html` so npm packs it.
…s + persist

Documents the two preceding commits in CHANGELOG.md under
``## Unreleased``:

* Added: ``Patter(persist=...)`` option, JSON-schema validation +
  strict mode, retry + circuit breaker, reassurance auto-message,
  MCP client integration, streaming results.
* Fixed: Realtime handler-tool dispatch, assistant ``onTranscript``,
  transcript ordering buffer, tool transcript events, cloudflared
  quick-tunnel WSS upgrade race.

Per the project rule (``.claude/rules/documentation-best-practices.md``
invariant 0): every user-visible change updates ``## Unreleased`` in
the same unit of work. The dashboard rewrite is intentionally NOT in
the changelog — same URL, same UX, same `phone.serve()` entry point;
the SPA migration is internal and customer-invisible.
Mirror of the built dashboard SPA into the Python package — produced by
``dashboard-app/scripts/sync.mjs`` alongside the TS-side
``libraries/typescript/src/dashboard/ui.html``. Should have been part
of the dashboard SPA commit; tracking it now keeps the two SDKs in
parity for ``pip install getpatter``.
Pure formatter pass — splits long argument lists across multiple lines
and adds the missing blank line after the conditional ``import
fastapi``. No logic changes; the test still verifies the dashboard
store and routes the same way.
…ines, realtime mode

Iterative refinement of the React/Vite dashboard SPA shipped in
3877719. Customer-side it remains a single embedded HTML file served
from `phone.serve()` at `/`, but the UX is now markedly closer to the
target design.

UI changes:
* Real Patter logo: mark (wireframe stack-tile from the favicon path,
  thin stroke instead of the chunky filled silhouette in
  `docs/logo/light.svg`) + tightened-viewBox wordmark, sized
  independently so the wordmark stays large while the mark line weight
  stays light.
* Tab title: "Patter | Dashboard". Favicon: stack-tile SVG inline,
  matching the previous SDK dashboard.
* Topbar: dropped Bell / Settings / Avatar buttons and "Place call" CTA
  (will reintroduce when actually wired). Phone-number pill always shown,
  derived from the most recent call's Patter-side number.
* Live chip pulse: peach static when zero calls, green pulsing when ≥1
  is active.
* Latency + Cost merged into one MetricsPanel with a peach segmented
  switcher, fixing the right-rail clipping that hid Cost on short
  viewports. Realtime mode collapses the STT/LLM/TTS waterfall to a
  single end-to-end bar (those metrics aren't meaningful when the
  provider does the round-trip in one model call).

Range filter (1h / 24h / 7d / All) is now real:
* Bucket strategy aligned to natural boundaries — 12 × 5min, 24 × 1h,
  7 × 1day, plus 9-bucket auto for All. Tooltip ranges read as
  "11:00 → 12:00" instead of "11:39 → 12:33".
* Filtered call list, headline counters (Calls / Latency p95 / Spend),
  and sparklines all reflect the active range. Live calls always stay
  visible even when out of the range so users see what's happening now.
* Sparkline scaling: tallest bar normalises to 100, no more lonely
  single bar surrounded by ghost grey lines.

Sparklines are now interactive:
* Hover any bar → custom tooltip (instant, dark surface, mono numerics
  in peach) showing the bucket window, call count, and a 4-call sample
  (number / status / cost). React-driven, replaces the slow native
  `title=""`.
* Click → selects the newest call in that bucket into the right rail.
* Empty buckets are invisible (no grey ghosts).
* Bars now sit flush against the card bottom (flex column +
  `margin-top: auto`), matching the original design.

Export CSV button is now wired to `/api/dashboard/export/calls?format=csv`
via a transient anchor download.

Backend additions: none — every change above is in `dashboard-app/`
plus the synced `ui.html` rebundles in both SDKs. Pre-publish flow is
still `cd dashboard-app && npm run build && npm run sync`.
New TTS adapter calling Inworld's HTTP NDJSON streaming endpoint
`POST https://api.inworld.ai/tts/v1/voice:stream`. Defaults to
`inworld-tts-2` (sub-200 ms TTFB, 100+ languages, natural-language voice
steering); pass `model: "inworld-tts-1.5-max"` for the prior generation.
Default audio output is PCM_S16LE at 16 kHz so the result feeds straight
into the Patter pipeline without transcoding.

Public API parity:
- TS:  `import { InworldTTS } from "getpatter"` / `getpatter/tts/inworld`
- Py:  `from getpatter import InworldTTS`        / `getpatter.tts.inworld`
- Env-var auto-resolve via `INWORLD_API_KEY` (paste the Base64 token from
  the Inworld dashboard — already in `Authorization: Basic <token>` form).
- Optional knobs: `language` (BCP-47), `temperature` (TTS-1.5 only),
  `speakingRate` (0.5-1.5), `deliveryMode` (`EXPRESSIVE`/`BALANCED`/
  `STABLE` — TTS-2 only), `bitrate`.

Pricing entry `inworld` added to both pricing tables (placeholder
$0.020/1k chars — verify against current platform tier). Optional
dependency `getpatter[inworld]` adds `aiohttp>=3.10`.

7 mocked unit tests per SDK covering payload shape, NDJSON line
interleave (`audio, timestamp, audio`), base64 audio decoding, optional
field omission, env-var fallback, and non-200 error surfacing.

New files:
- libraries/typescript/src/providers/inworld-tts.ts
- libraries/typescript/src/tts/inworld.ts
- libraries/python/getpatter/providers/inworld_tts.py
- libraries/python/getpatter/tts/inworld.py
- libraries/{typescript,python}/{tests/unit/inworld-tts*.test.*,tests/unit/test_inworld_tts.py}
Adds seven optional async callbacks to every Patter instance plus a read-only
conversation_state snapshot, mirroring the public APIs of LiveKit Agents,
Pipecat and OpenAI Realtime so downstream metrics map onto the canonical
Hamming AI / Coval / Cekura voice-agent metric set without translation:

  on_user_speech_started   - raw VAD positive edge
  on_user_speech_ended     - raw VAD trailing edge (not EOU)
  on_user_speech_eos       - committed end-of-utterance (canonical "user
                             finished" — anchors eos_to_first_token_ms)
  on_agent_speech_started  - first wire-time chunk (what user hears)
  on_agent_speech_ended    - last wire chunk; payload includes interrupted
  on_llm_token             - TTFT marker, fires once per turn
  on_audio_out             - first TTS chunk per turn (warmup, distinct
                             from wire-time)

Each event also records an OpenTelemetry span event on the current call
span (patter.event.*), with gen_ai.* attributes for the LLM event per the
OTel GenAI semconv. OTel branch is a zero-cost no-op when the peer dep is
missing.

Wired into the realtime stream handler so the user/agent edge events fire
automatically on the OpenAI Realtime + Twilio/Telnyx path; LLM/TTS-warmup
events are exposed on the dispatcher for adapter/pipeline integrations.

Public API: SpeechEvents, SpeechEventCallback, ConversationStateSnapshot,
UserState, AgentState, EouTrigger.

Tests: 16 unit tests Py + 15 unit tests TS covering payload schema,
state transitions, idempotency, OTel attach contract, callback-exception
isolation, and Patter-level proxy mirroring.

Motivated by patter-agent-runner's 15 turn-taking acceptance verbs that
previously auto-skipped because the SDK did not surface per-side speech
edges.
…ng for speech-edge events

Three Realtime mode fixes (Python + TypeScript parity) plus the host-
binding / observability plumbing required to drive the speech-edge
event suite from external test runners.

Realtime: first_message role swap
---------------------------------
Agent.first_message was sent through send_text / sendText, which submits
a `conversation.item.create` with role: "user". The model received its
own greeting AS user input and replied to it instead of speaking it
verbatim. Symptoms ranged from harmless (model continued as assistant
anyway) to catastrophic (a receptionist agent saying "Hi! I'd like to
schedule a haircut for Friday afternoon" — the model swapped role
because it interpreted the greeting as a customer cue).

Fix: new OpenAIRealtimeAdapter.send_first_message / sendFirstMessage
that calls `response.create` with explicit `instructions: <text>`. This
is the documented OpenAI Realtime path for "have the assistant say
exactly this", and it produces the correct role attribution in
transcripts (previously assistant turns were missing entirely from
onTranscript callbacks).

StreamHandler calls the new method via duck-typed lookup so older
adapter builds without it silently fall back to send_text — no breaking
change for downstream provider implementations.

Dashboard: 404 spam from notify_dashboard
-----------------------------------------
Embedded usages where Patter co-tenants port 8000 with another HTTP
server (notably the agent-to-agent test runner where driver SDK + UUT
SDK + dashboard ingest target all share 127.0.0.1:8000) saw 404 access-
log spam on every call from notify_dashboard / notifyDashboard
fire-and-forget POSTs.

Send-side already swallows errors silently; the noise comes from the
receiver's access log. Added PATTER_DASHBOARD_NOTIFY=0|false|no|off env-
var opt-out that skips the POST entirely. Default behaviour unchanged.

Speech-edge event plumbing (server + telephony bridge + handler)
---------------------------------------------------------------
The speech-edge events shipped in 0de4111 (on_user_speech_started/
ended/eos, on_agent_speech_started/ended, llm_token, audio_out) need
to flow Patter → EmbeddedServer → twilio_stream_bridge → StreamHandler.
The 0de4111 commit wired the StreamHandler init param but missed the 5
hops in between, so external observers attached via Patter() never saw
any events.

This commit threads `speech_events` through:
- Patter.__init__ stores the SpeechEvents instance
- EmbeddedServer.from_patter passes it down
- twilio_stream_bridge accepts speech_events kwarg
- OpenAIRealtimeStreamHandler accepts and forwards to super().__init__

Pipeline and ConvAI handlers still TODO (verbs auto-skip when events
aren't emitted, so this is non-breaking).

Twilio also added: PipelineStreamHandler._llm_cancel_event init in
__init__ (was lazily created on first cancel; race-condition prone),
and a try/except ProcessLookupError around tunnel.proc.terminate (the
cloudflared subprocess can race-exit before SIGTERM lands).

Audio binding host
------------------
PATTER_BIND_HOST env var (default 127.0.0.1) — when running inside a
Docker container with --publish 8000:8000, a loopback bind is
unreachable from the host's port-mapping. PATTER_BIND_HOST=0.0.0.0
makes the embedded FastAPI / Express bind on all interfaces so Docker
can forward host:8000 → container:8000.

CHANGELOG entries for first_message role swap and notify_dashboard
opt-out included.
OpenAI Realtime cancel_response now caps audio_end_ms by wall-clock
elapsed (was byte-counter), fixing post-barge-in re-greeting and
mid-sentence resume. Files: providers/openai_realtime.py:434-460 +
TS parity in providers/openai-realtime.ts.

Pipeline mode now fires on_transcript for assistant turns AND tool
calls (previously emitted only by Realtime). LLMLoop exposes
on_tool_call observer wired by stream-handler via _record_tool_call;
new _emit_assistant_transcript helper pushes history AND fires
on_transcript with observer-exception isolation. Files:
stream_handler.py / stream-handler.ts, services/llm_loop.py / llm-loop.ts.

AssemblyAI STT (Python): coalesce 20ms Twilio frames to 60ms target
(above v3 50ms minimum), achieving parity with the TS adapter. New
flush_audio() drains tail on close. Files: providers/assemblyai_stt.py.

Cerebras + Groq pricing — silent under-billing fix. gpt-oss-120b
(Cerebras default since 0.5.4) and 5 Groq models all billed $0.
Now per-1M-token rates for every CerebrasModel / GroqModel enum.
Files: pricing.py / pricing.ts.

TypeScript port of SpeechmaticsSTT — closes long-standing Python-only
gap. RT v2 WebSocket protocol direct via ws (no upstream Node SDK).
Same options as Python adapter; legacy speechmatics() helper now
returns a real STTConfig instead of throwing. Files (new):
providers/speechmatics-stt.ts, stt/speechmatics.ts.

Pricing tables now model-aware across STT/TTS/Realtime (was
provider-only). New _resolve_provider_rates helper with longest-prefix
fallback; mergePricing nested-shallow overlay so single-model
overrides leave siblings intact. Auto-threading via
CallMetricsAccumulator from agent adapter model. Built-in rates for
Deepgram, Whisper/Transcribe, ElevenLabs, OpenAI TTS, Cartesia, Rime,
LMNT, Inworld, OpenAI Realtime per-model. PRICING_VERSION 2026.2 →
2026.3. Standalone openai_realtime_2 entry collapsed under
openai_realtime.models["gpt-realtime-2"].

CircuitBreakerOptions.cooldown_s → cooldown_ms (Python parity with TS
cooldownMs). Backward-compat shim emits DeprecationWarning. Files:
tools/circuit_breaker.py, tools/tool_executor.py.

OpenAI Realtime engine wrapper now forwards reasoning_effort and
input_audio_transcription_model to the underlying adapter (were
silently dropped). Files: engines/openai.py / engines/openai.ts,
models.py, client.py, stream_handler.py, server.ts.

CHANGELOG.md: full Unreleased entries for each fix.
package-lock.json regenerated to match package.json (mcp/hono/ajv/
jose/zod additions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cerebras + Groq pricing — 5 cases each in test_pricing.py and
pricing.test.ts: full enum coverage, default model billing,
specdec-vs-versatile rate distinction, deprecating llama3.1-8b.

AssemblyAI buffering — new test_assemblyai_stt_buffering.py and
assemblyai-stt-buffering.test.ts (5 cases each): 10×20ms frame
coalescing, 60ms target sanity for mulaw 8kHz and PCM s16le 16kHz,
flush_audio tail drain, pre-connect silent drop, empty-chunk noop,
exact-target single send.

Pipeline on_tool_call observer — 3 mocked Python cases plus a new TS
describe block in llm-loop.test.ts asserting one user turn → one tool
call yields exactly three on_transcript events (call + result +
assistant) in order.

OpenAI Realtime cancel_response wallclock cap — regression simulating
2s generated / 30ms played, asserts audio_end_ms <= 200ms.

Realtime engine wrapper forwarding — assert reasoning_effort and
input_audio_transcription_model reach the adapter constructor when
set, omitted from kwargs when unset.

CircuitBreaker cooldown rename — 4 regression tests cover the
DeprecationWarning, seconds-shim parity, explicit ms wins, defaults
match TypeScript byte-for-byte (30_000ms, threshold 5).

Speechmatics TS port — 21-test mocked suite covering connect handshake,
StartRecognition payload, partial vs final translation, error frame
propagation, EndOfStream close path with last_seq_no, env-var
resolution.

MCP client — new mocked unit suites (Python + TypeScript).

Dashboard HTML test — assertion updated for React+Vite SPA shell
(`<title>Patter | Dashboard</title>`) instead of legacy "Live calls
dashboard" subtitle that no longer lives in the static HTML.

Full suite: Python 1707 / TypeScript 1381 — green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 33 new provider reference pages — full parity between
docs/python-sdk/providers/ and docs/typescript-sdk/providers/ for:
anthropic, cartesia-tts, cerebras, deepfilternet-filter, deepgram,
elevenlabs-convai, elevenlabs-tts, gemini-live, google, groq, inworld,
krisp-filter, openai-realtime, openai-tts, silero-vad, soniox,
speechmatics, telnyx-stt, telnyx-tts, ultravox-realtime, whisper.

Adds new MCP integration page (Py + TS) covering server
configuration, tool exposure, and lifecycle.

Updates 22 existing docs/{python,typescript}-sdk pages for the Phase 3
SDK changes — agents, call-logging, configuration, dashboard, engines,
events, features, metrics, stt, tools — including the new pricing
model-aware fields, on_tool_call observer, Realtime engine wrapper
forwarding, and circuit-breaker cooldown_ms rename.

docs.json navigation updated for the new provider sections.
docs/dev-tools/dashboard.mdx refreshed for the React+Vite SPA shell.

All pages follow existing Mintlify conventions (CodeGroup, ParamField,
Note components).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Strip trailing blank lines on 4 files flagged by `end-of-file-fixer`
  pre-commit hook: `dashboard-app/src/App.tsx`, `docs/python-sdk/events.mdx`,
  `docs/typescript-sdk/events.mdx`, `libraries/typescript/src/audio/aec.ts`.

- `tests/unit/test_aec.py` now uses `pytest.importorskip("numpy")` instead
  of a top-level `import numpy as np`. numpy ships only with the SDK
  `[aec]` / `[audio-filters]` extras, so the base CI Python SDK Tests job
  was failing collection with `ModuleNotFoundError: No module named 'numpy'`.
  The `[all-extras]` job installs numpy and exercises these tests for real.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nicolotognoni and others added 3 commits May 8, 2026 19:56
…s, chunk_size

The HTTP-streaming ElevenLabs facade had a narrower `__init__` signature
than the underlying `providers.elevenlabs_tts.ElevenLabsTTS` provider —
accepting only api_key/voice_id/model_id/output_format. Users who built
a TTS via the public facade silently lost language-aware synthesis and
could not pass voice_settings.

Multilingual scenarios in the agent-to-agent acceptance suite
(`feat_italian_language_live`) crashed downstream once the runner
started forwarding `language_code` correctly to the public class.

Python `libraries/python/getpatter/tts/elevenlabs.py:42-55`: extended
__init__ with language_code (default None), voice_settings (default
None), chunk_size (default 4096). Backward-compatible — existing
constructors compile and run unchanged.

TypeScript `libraries/typescript/src/tts/elevenlabs.ts:6-17,50-58`:
added languageCode + voiceSettings to ElevenLabsTTSOptions, tightened
fields with `readonly` to match project immutability rule, switched
the super-class call to the options-object overload so the optional
fields actually reach the underlying ElevenLabsTTS adapter (the
positional overload was dropping them).

7 new regression tests in `libraries/python/tests/unit/test_tts_facade_
language.py` and `libraries/typescript/tests/tts-facade-language.test.ts`
covering: language_code forwarding, voice_settings forwarding, default
preservation, env-key resolution, explicit-key wins, missing-key error,
and forTwilio() carrier factory regression. Both suites pass clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the Unreleased block accumulated across the refactor wave
(58 prior commits on this branch since 0.5.4 — Realtime fixes,
pipeline observability, Mintlify docs parity, ElevenLabs facade
fix, model-aware pricing, Cerebras/Groq billing fix, AssemblyAI
buffering, MCP support, Speechmatics TS port). Adds an empty
`## Unreleased` placeholder for post-0.6.0 work.

Versions in libraries/python/getpatter/__init__.py,
libraries/python/pyproject.toml, libraries/typescript/package.json
were already bumped to 0.6.0 earlier in this branch — this commit
only finalises the CHANGELOG date stamp.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `manageWebhook?: boolean` (default `true`) to `ServeOptions`. When
set to `false`, `serve()` skips the call to `autoConfigureCarrier`,
leaving the carrier's `voice_url` untouched. Closes a hidden footgun
for users running the SDK behind a router/gateway whose Twilio webhook
is managed externally (Terraform, an edge gateway, a voice-router
function in front of the agent) — without this opt-out, every boot
silently overwrites the externally-managed value, bypassing the gating
layer.

`tunnel: true` overrides the opt-out — the tunnel hostname is dynamic
and only known at runtime, so the carrier MUST be reconfigured for
inbound calls to land.

Parity note: Python SDK has never auto-configured carriers, so this
brings TS `manageWebhook: false` mode in line with default Python
behaviour. No Python change required.

Files:
- libraries/typescript/src/types.ts — `ServeOptions.manageWebhook?`
  + full doc comment.
- libraries/typescript/src/client.ts:507-518 — gated
  `autoConfigureCarrier` call on `wantsCarrierManagement = opts
  .manageWebhook !== false || wantsCloudflared`.
- libraries/typescript/tests/unit/client.test.ts — 3 authentic tests
  under `serve() > manageWebhook opt-out` swapping globalThis.fetch
  to capture Twilio API URLs (no mock-on-mock).
- docs/typescript-sdk/local-mode.mdx — added `manageWebhook` row to
  the ServeOptions table.
- CHANGELOG.md — Added entry under 0.6.0 (2026-05-08) > Added.

Backward compat: default `true` preserves existing behaviour
byte-for-byte; no required field, no changed default.

Authentic tests pass (TS suite 31/31 in client.test.ts file). tsc
--noEmit clean. Docs/DEVLOG entry from upstream PR #84 intentionally
not ported (development log style — not for the public docs).

Original PR: #84
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nicolotognoni nicolotognoni merged commit 2d47d7d into main May 8, 2026
14 checks passed
@github-actions github-actions Bot deleted the refactor/repo-cleanup-and-restructure branch May 9, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant