Skip to content

feat(providers): declarative TTS + STT provider config (#979 phase 2)#985

Merged
chaholl merged 1 commit intomainfrom
feat/declarative-tts-stt-providers
Apr 14, 2026
Merged

feat(providers): declarative TTS + STT provider config (#979 phase 2)#985
chaholl merged 1 commit intomainfrom
feat/declarative-tts-stt-providers

Conversation

@chaholl
Copy link
Copy Markdown
Contributor

@chaholl chaholl commented Apr 14, 2026

Closes #979. TTS and STT providers now follow the same declarative shape as chat (existing) and embedding (#984) providers.

What's new

spec:
  tts_providers:
    - id: voice
      type: elevenlabs
      model: eleven_turbo_v2
      credential:
        credential_env: ELEVEN_API_KEY
    - id: cart
      type: cartesia
      additional_config:
        ws_url: wss://api.cartesia.ai/tts/websocket
  stt_providers:
    - type: openai
      model: whisper-1
      credential:
        credential_env: OPENAI_API_KEY
  • + 'pkg/config' + TTSProviderConfig/STTProviderConfig and matching slices on RuntimeConfigSpec; validation enforces type allowlists (TTS: openai/elevenlabs/cartesia, STT: openai) and rejects duplicate IDs.
  • + 'runtime/tts' + and + 'runtime/stt' + — ProviderSpec/Factory/CreateFromSpec/RegisterFactory/ResolveCredential, mirroring the embedding-factory pattern. Each provider self-registers via init() in an *_register.go file.
  • + 'sdk' + — applyTTSProviders + applySTTProviders build instances from the spec; first declared entry becomes the default ttsService/sttService unless one is set programmatically via WithTTS / WithVADMode.
  • New how-to doc at docs/sdk/how-to/declarative-tts-stt-providers.md.

Test plan

  • go test ./pkg/config ./runtime/tts ./runtime/stt ./sdk -count=1 -race green
  • golangci-lint run --new-from-rev=main clean
  • Coverage on changed files ≥ 80% (100% on register closures)
  • Schema regenerated
  • CI green

This wraps #979 — chat, embedding, TTS, and STT providers all share the same declarative shape now.

Closes #979. TTS and STT providers now follow the same declarative
shape as chat and embedding providers.

- pkg/config: TTSProviderConfig + STTProviderConfig (id, type, model,
  base_url, credential, additional_config) and matching slices on
  RuntimeConfigSpec. Validation enforces type allowlists (TTS:
  openai/elevenlabs/cartesia, STT: openai) and rejects duplicate IDs.
- runtime/tts and runtime/stt: ProviderSpec + Factory + CreateFromSpec
  + RegisterFactory + ResolveCredential, mirroring the embedding
  factory pattern. Each per-provider .go file in the package adds an
  *_register.go that calls RegisterFactory from init() — keeps the
  registration surface co-located with its provider and avoids any
  cross-package import cycle.
- sdk: applyTTSProviders + applySTTProviders build instances from the
  spec and store them by ID. First declared entry becomes the
  default ttsService / sttService unless one is already set via
  WithTTS / WithVADMode.
- Cartesia's ws_url passes through additional_config.
- Tests cover validation, apply, default-ID-from-type, duplicate
  detection, programmatic precedence, and per-provider register
  closures (100% coverage on the register files).
- New how-to doc covers the YAML shape and the extension pattern for
  new providers.

This wraps the #979 work — chat, embedding, TTS, and STT providers
all share the same declarative shape now. The selector context's
SelectorContext.Embeddings bridge to RAG-configured embedding
instances (M1 of #980) lights up automatically when an embedding
provider is declared.
@chaholl chaholl merged commit 3b3dfd8 into main Apr 14, 2026
24 checks passed
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
12.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@chaholl chaholl deleted the feat/declarative-tts-stt-providers branch April 18, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Declarative config for embedding, TTS, and STT providers

1 participant