Skip to content

Add Speech to Text support via ISpeechToTextClient implementation#134

Merged
kzu merged 1 commit intomainfrom
dev/stt
Apr 30, 2026
Merged

Add Speech to Text support via ISpeechToTextClient implementation#134
kzu merged 1 commit intomainfrom
dev/stt

Conversation

@kzu
Copy link
Copy Markdown
Member

@kzu kzu commented Apr 30, 2026

Implements ISpeechToTextClient for xAI's Grok models, including:

  • GrokSpeechToTextClient: unary transcription via POST /v1/stt and streaming transcription via wss://.../v1/stt WebSocket protocol. Handles session handshake, chunked audio upload, interim/final transcript events, word-level timing, and language detection.

  • GrokSpeechToTextOptions: Grok-specific options for audio format, sample rate, multichannel, diarization, interim results, and endpointing timeout.

  • AsISpeechToTextClient() extension on GrokClient wires up the client with the correct HTTP handler and WebSocket factory.

Fix: voice REST clients (TTS and STT) were accidentally reusing the gRPC channel's BalancerHttpHandler, which throws for plain HTTP/1.1 requests. Added GrokClient.HttpHandler, backed by a separate httpHandlers cache using the same Polly retry pipeline but independent of the gRPC channel. AsITextToSpeechClient and AsISpeechToTextClient now use client.HttpHandler instead of client.ChannelHandler.Handler. The channels dictionary now holds ChannelBase directly rather than a tuple, since the HttpMessageHandler is no longer needed from it.

Add TextToSpeech_SpeechToText integration test that streams TTS audio to a temp file and transcribes it back with STT, asserting the roundtrip text matches (punctuation-insensitive via NormalizeTranscription).

Update readme with ISpeechToTextClient usage examples alongside the existing TTS documentation.

Implements ISpeechToTextClient for xAI's Grok models, including:

- GrokSpeechToTextClient: unary transcription via POST /v1/stt and
  streaming transcription via wss://.../v1/stt WebSocket protocol.
  Handles session handshake, chunked audio upload, interim/final
  transcript events, word-level timing, and language detection.

- GrokSpeechToTextOptions: Grok-specific options for audio format,
  sample rate, multichannel, diarization, interim results, and
  endpointing timeout.

- AsISpeechToTextClient() extension on GrokClient wires up the
  client with the correct HTTP handler and WebSocket factory.

Fix: voice REST clients (TTS and STT) were accidentally reusing the
gRPC channel's BalancerHttpHandler, which throws for plain HTTP/1.1
requests. Added GrokClient.HttpHandler, backed by a separate
httpHandlers cache using the same Polly retry pipeline but independent
of the gRPC channel. AsITextToSpeechClient and AsISpeechToTextClient
now use client.HttpHandler instead of client.ChannelHandler.Handler.
The channels dictionary now holds ChannelBase directly rather than a
tuple, since the HttpMessageHandler is no longer needed from it.

Add TextToSpeech_SpeechToText integration test that streams TTS audio
to a temp file and transcribes it back with STT, asserting the
roundtrip text matches (punctuation-insensitive via NormalizeTranscription).

Update readme with ISpeechToTextClient usage examples alongside the
existing TTS documentation.
@kzu kzu added the enhancement New feature or request label Apr 30, 2026
@kzu kzu enabled auto-merge (rebase) April 30, 2026 17:57
@kzu
Copy link
Copy Markdown
Member Author

kzu commented Apr 30, 2026

65 passed 65 passed 36 skipped

🧪 Details on Ubuntu 24.04.4 LTS

from retest v1.1.0 on .NET 10.0.7 with 💜 by @devlooped

@kzu kzu merged commit fb41835 into main Apr 30, 2026
4 checks passed
@kzu kzu deleted the dev/stt branch April 30, 2026 17:58
@kzu kzu changed the title Add ISpeechToTextClient support via GrokSpeechToTextClient Add Speech to Text support via ISpeechToTextClient implementation Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant