Add Speech to Text support via ISpeechToTextClient implementation by kzu · Pull Request #134 · devlooped/xAI

kzu · 2026-04-30T17:57:04Z

Implements ISpeechToTextClient for xAI's Grok models, including:

GrokSpeechToTextClient: unary transcription via POST /v1/stt and streaming transcription via wss://.../v1/stt WebSocket protocol. Handles session handshake, chunked audio upload, interim/final transcript events, word-level timing, and language detection.
GrokSpeechToTextOptions: Grok-specific options for audio format, sample rate, multichannel, diarization, interim results, and endpointing timeout.
AsISpeechToTextClient() extension on GrokClient wires up the client with the correct HTTP handler and WebSocket factory.

Fix: voice REST clients (TTS and STT) were accidentally reusing the gRPC channel's BalancerHttpHandler, which throws for plain HTTP/1.1 requests. Added GrokClient.HttpHandler, backed by a separate httpHandlers cache using the same Polly retry pipeline but independent of the gRPC channel. AsITextToSpeechClient and AsISpeechToTextClient now use client.HttpHandler instead of client.ChannelHandler.Handler. The channels dictionary now holds ChannelBase directly rather than a tuple, since the HttpMessageHandler is no longer needed from it.

Add TextToSpeech_SpeechToText integration test that streams TTS audio to a temp file and transcribes it back with STT, asserting the roundtrip text matches (punctuation-insensitive via NormalizeTranscription).

Update readme with ISpeechToTextClient usage examples alongside the existing TTS documentation.

Implements ISpeechToTextClient for xAI's Grok models, including: - GrokSpeechToTextClient: unary transcription via POST /v1/stt and streaming transcription via wss://.../v1/stt WebSocket protocol. Handles session handshake, chunked audio upload, interim/final transcript events, word-level timing, and language detection. - GrokSpeechToTextOptions: Grok-specific options for audio format, sample rate, multichannel, diarization, interim results, and endpointing timeout. - AsISpeechToTextClient() extension on GrokClient wires up the client with the correct HTTP handler and WebSocket factory. Fix: voice REST clients (TTS and STT) were accidentally reusing the gRPC channel's BalancerHttpHandler, which throws for plain HTTP/1.1 requests. Added GrokClient.HttpHandler, backed by a separate httpHandlers cache using the same Polly retry pipeline but independent of the gRPC channel. AsITextToSpeechClient and AsISpeechToTextClient now use client.HttpHandler instead of client.ChannelHandler.Handler. The channels dictionary now holds ChannelBase directly rather than a tuple, since the HttpMessageHandler is no longer needed from it. Add TextToSpeech_SpeechToText integration test that streams TTS audio to a temp file and transcribes it back with STT, asserting the roundtrip text matches (punctuation-insensitive via NormalizeTranscription). Update readme with ISpeechToTextClient usage examples alongside the existing TTS documentation.

kzu · 2026-04-30T17:58:37Z

🧪 Details on Ubuntu 24.04.4 LTS

from retest v1.1.0 on .NET 10.0.7 with 💜 by @devlooped

kzu added the enhancement New feature or request label Apr 30, 2026

kzu enabled auto-merge (rebase) April 30, 2026 17:57

kzu merged commit fb41835 into main Apr 30, 2026
4 checks passed

kzu deleted the dev/stt branch April 30, 2026 17:58

kzu changed the title ~~Add ISpeechToTextClient support via GrokSpeechToTextClient~~ Add Speech to Text support via ISpeechToTextClient implementation Apr 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Speech to Text support via ISpeechToTextClient implementation#134

Add Speech to Text support via ISpeechToTextClient implementation#134
kzu merged 1 commit intomainfrom
dev/stt

kzu commented Apr 30, 2026

Uh oh!

kzu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kzu commented Apr 30, 2026

Uh oh!

kzu commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant