feat(ai-grok): audio, speech, and realtime adapters + example wiring#506
Conversation
Add `grokSpeech` (TTS via /v1/tts), `grokTranscription` (STT via /v1/stt),
and `grokRealtime` + `grokRealtimeToken` (Voice Agent via /v1/realtime)
because xAI's standalone audio APIs were shipped publicly and the
adapter previously exposed only text/image/summarize. The TTS/STT
endpoints are not OpenAI-compatible so these adapters use direct fetch
rather than the OpenAI SDK; the realtime API mirrors OpenAI's shape with
URL/provider swaps. E2E coverage is wired via mock.mount('/v1/tts'...)
on aimock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adopt the @tanstack/ai/adapter-internals logger across grokSpeech, grokTranscription, grokRealtimeToken, and grokRealtime so users can toggle debug output the same way they do on other adapters — `debug: true` for full tracing, `debug: false` to silence, or a DebugConfig for per-category control and a custom Logger. Replaces the remaining console.error / console.warn calls in the realtime adapter with logger.errors so nothing is lost when debugging is off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Grok audio support: TTS, STT (word-level timestamps), and a realtime WebRTC voice-agent with ephemeral tokens; introduces adapters, types, utilities, tests, example app/UI/server wiring, and e2e mocks to exercise Grok audio flows. Changes
Sequence DiagramssequenceDiagram
actor User
participant Client as Browser Client
participant Server as App Server
participant Grok as Grok API
User->>Client: Request speech (text)
Client->>Server: POST /api/generate-speech (provider: grok, text)
Server->>Grok: POST /v1/tts (text, voice, output_format) + Bearer
Grok-->>Server: audio bytes (ArrayBuffer)
Server->>Server: base64-encode, assemble TTSResult
Server-->>Client: { audio, format, contentType, id }
Client->>User: Play audio
sequenceDiagram
actor User
participant Client as Browser Client
participant Server as App Server
participant Grok as Grok API
User->>Client: Upload/record audio
Client->>Server: POST /api/transcribe (provider: grok, file)
Server->>Grok: POST /v1/stt (multipart/form-data) + Bearer
Grok-->>Server: JSON { text, language, words:[{word,start,end}] }
Server-->>Client: { text, language, words }
Client->>User: Show transcript with timestamps
sequenceDiagram
actor User
participant Client as Browser Client
participant TokenSvc as App Server (token endpoint)
participant GrokSvc as Grok Realtime (signaling)
User->>Client: Start realtime session
Client->>TokenSvc: Request ephemeral token
TokenSvc->>GrokSvc: POST /v1/realtime/client_secrets (model) + Bearer
GrokSvc-->>TokenSvc: { client_secret, expires_at, model, features }
TokenSvc-->>Client: { client_secret.value, expiresAt }
Client->>GrokSvc: WebRTC offer/answer, open data channel
GrokSvc-->>Client: transcripts, assistant audio, events
Client->>User: Play audio / show partial transcripts
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View your CI Pipeline Execution ↗ for commit 268ad6a
☁️ Nx Cloud last updated this comment at |
The transcription and TTS adapters were calling super(config, model), but BaseTranscriptionAdapter/BaseTTSAdapter expect (model, config), causing TS2345 build errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
After the logger was wired into the audio adapters, the unit tests need to provide one when calling transcribe/generateSpeech directly (activities normally inject it via resolveDebugOption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per project convention, tests should not invoke adapter methods directly — they call generateSpeech()/generateTranscription() with the adapter instance, so the core function injects logger, emits events, and exercises the real public surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (10)
packages/typescript/ai-grok/src/utils/audio.ts (1)
20-31: Tighten string-input handling intoAudioFile.A couple of robustness gaps in the string branch:
- The
data:path assumes base64 encoding without checking for a;base64marker. A percent-encoded data URL (e.g.data:audio/wav,%52%49...) will be fed toatoband throw an opaqueInvalidCharacterError.- The non-
data:string branch unconditionally treats the input as base64 and labels itaudio/mpeg. If a caller accidentally passes a URL, filesystem path, or raw text,atobwill throw without any hint about what went wrong.- Minor: for
audio/mpegthe derived filename becomesaudio.mpegrather than the more conventionalaudio.mp3, and MIME subtypes likex-wavyield an invalid extension. Servers typically key offContent-Type, so this is cosmetic, but a small map would be cleaner.♻️ Suggested hardening
+const MIME_EXTENSION: Record<string, string> = { + 'audio/mpeg': 'mp3', + 'audio/mp3': 'mp3', + 'audio/wav': 'wav', + 'audio/x-wav': 'wav', + 'audio/ogg': 'ogg', + 'audio/flac': 'flac', + 'audio/aac': 'aac', + 'audio/mp4': 'mp4', + 'audio/webm': 'webm', +} + if (typeof audio === 'string') { if (audio.startsWith('data:')) { const [header = '', base64Data = ''] = audio.split(',') + if (!/;base64/i.test(header)) { + throw new Error( + 'toAudioFile: only base64-encoded data URLs are supported', + ) + } const mimeType = header.match(/data:([^;]+)/)?.[1] || 'audio/mpeg' const buffer = base64ToArrayBuffer(base64Data) - const extension = mimeType.split('/')[1] || 'mp3' + const extension = MIME_EXTENSION[mimeType] || 'bin' return new File([buffer], `audio.${extension}`, { type: mimeType }) } - const buffer = base64ToArrayBuffer(audio) - return new File([buffer], 'audio.mp3', { type: 'audio/mpeg' }) + try { + const buffer = base64ToArrayBuffer(audio) + return new File([buffer], 'audio.mp3', { type: 'audio/mpeg' }) + } catch { + throw new Error( + 'toAudioFile: string input must be a base64 payload or data: URL', + ) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/utils/audio.ts` around lines 20 - 31, In toAudioFile's string handling (the branch that checks typeof audio === 'string'), ensure data: URLs are checked for a ';base64' marker before using base64ToArrayBuffer/atob and if not base64 decode the payload via decodeURIComponent; validate the non-data string input with a conservative base64 regex before treating it as base64 and throw a clear error if it looks like a URL/path/plain text (don’t call base64ToArrayBuffer blindly). Also normalize MIME→extension using a small map (e.g., audio/mpeg -> mp3, audio/wav or audio/x-wav -> wav, audio/ogg -> ogg) when deriving the filename, and use base64ToArrayBuffer only for confirmed base64 payloads (update calls in toAudioFile accordingly).examples/ts-react-chat/src/routes/realtime.tsx (2)
216-220: Nit: voice<option>labels display lowercase IDs.The speech providers catalog renders voice labels with capitalized names (e.g.,
Eve,Ara). Surfacing the raw id here is functional but inconsistent with how voices are presented elsewhere. Consider a small{ id, label }map or just titlecasingv.✨ Optional polish
- {GROK_VOICES.map((v) => ( - <option key={v} value={v}> - {v} - </option> - ))} + {GROK_VOICES.map((v) => ( + <option key={v} value={v}> + {v.charAt(0).toUpperCase() + v.slice(1)} + </option> + ))}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 216 - 220, The voice <option> labels currently render raw IDs from GROK_VOICES (mapped as v) and appear lowercase; update the rendering in the map over GROK_VOICES so the visible label is titlecased or derived from an { id, label } mapping (keep option value={v or id} unchanged) — locate the GROK_VOICES usage in the realtime.tsx dropdown mapping and replace the displayed text {v} with a titlecased version (or use v.label) while preserving the option key/value as the id.
226-305: Optional: extract theopenai || grokpredicate.The
(provider === 'openai' || provider === 'grok')check is repeated 5 times for OpenAI-compatible realtime UI (output mode, temperature, eagerness, tools indicator, image upload). A small local constant likeconst isOpenAICompat = provider === 'openai' || provider === 'grok'would reduce drift risk when a third compatible provider is added.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/routes/realtime.tsx` around lines 226 - 305, Extract the repeated (provider === 'openai' || provider === 'grok') predicate into a local boolean (e.g., const isOpenAICompat = provider === 'openai' || provider === 'grok') and replace all occurrences in this file that gate OpenAI-compatible UI (the select for OUTPUT_MODE_OPTIONS / setOutputMode, the temperature range using setTemperature, the semanticEagerness select using setSemanticEagerness, the tools indicator block, and the image upload block) with isOpenAICompat; ensure the new constant is declared in the component scope above these uses so the renders (and calls to getStatusColor/getModeIndicator) behave identically.packages/typescript/ai-grok/package.json (1)
15-24: Consider adding/adaptersand/realtimesubpath exports for tree-shakeability.The new
grokSpeech,grokTranscription,grokRealtime, andgrokRealtimeTokenadapters (plus the newsrc/realtime/index.tsbarrel) are only reachable via the package root (.). Adding dedicated subpath exports keeps bundles tree-shakeable for consumers that only import a single capability, and aligns with the project-wide pattern for provider adapter packages.As per coding guidelines: "Export tree-shakeable adapters from /adapters subpath exports in provider adapter packages".
📦 Suggested exports additions
"exports": { ".": { "types": "./dist/esm/index.d.ts", "import": "./dist/esm/index.js" }, "./tools": { "types": "./dist/esm/tools/index.d.ts", "import": "./dist/esm/tools/index.js" + }, + "./adapters": { + "types": "./dist/esm/adapters/index.d.ts", + "import": "./dist/esm/adapters/index.js" + }, + "./realtime": { + "types": "./dist/esm/realtime/index.d.ts", + "import": "./dist/esm/realtime/index.js" } },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/package.json` around lines 15 - 24, The package exports currently only expose the package root, so the new adapters and realtime barrel aren’t tree-shakeable; add explicit subpath exports for "/adapters" (pointing to the built adapter barrel(s) that export grokSpeech, grokTranscription, grokRealtimeToken) and "/realtime" (pointing to the built realtime barrel that exports grokRealtime / the src/realtime/index.ts barrel) in package.json exports so consumers can import e.g. "ai-grok/adapters" or "ai-grok/realtime" directly; update the exports object to map "./adapters": { "types": "./dist/esm/adapters/index.d.ts", "import": "./dist/esm/adapters/index.js" } and "./realtime": { "types": "./dist/esm/realtime/index.d.ts", "import": "./dist/esm/realtime/index.js" } (ensure the built files exist or update build output paths accordingly).packages/typescript/ai-grok/src/realtime/types.ts (1)
79-111:GrokRealtimeSessionResponseis over-specified and unvalidated at runtime.In
token.tsonlyclient_secret.value,client_secret.expires_at, andmodelare actually read, and the JSON response is narrowed with a plainas GrokRealtimeSessionResponsecast. Two downsides:
- Most fields are typed as required/non-optional — if xAI omits any of them (or changes
tool_choice: stringto a differently-shaped value), consumers who destructure them will hit runtimeundefineds that TypeScript promised wouldn't exist.- There's no runtime validation, so a malformed response surfaces as a cryptic
TypeErrordeep in the caller instead of a clear "Invalid Grok session response" error.Recommend either (a) marking the unused-by-SDK fields
?/unknown, or — preferably, per the repo guideline to "Use Zod for schema validation" inpackages/typescript/**/src/**/*.ts— defining a smallz.object({ model: z.string(), client_secret: z.object({ value: z.string(), expires_at: z.number() }) })schema intoken.tsand parsing the response through it. As per coding guidelines: "Use Zod for schema validation and tool definition across the library".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/types.ts` around lines 79 - 111, GrokRealtimeSessionResponse is over-specified and unvalidated; update token.ts to validate only the fields you actually read (model, client_secret.value, client_secret.expires_at) by replacing the broad cast with a Zod schema and parsing the fetched JSON through it (e.g., define a z.object({ model: z.string(), client_secret: z.object({ value: z.string(), expires_at: z.number() }) }) and use safe parsing), then use the parsed values when accessing client_secret.value / client_secret.expires_at and model; alternatively mark all other properties on GrokRealtimeSessionResponse as optional/unknown if you choose not to add validation.packages/typescript/ai-grok/src/realtime/token.ts (1)
35-35: API key is read at adapter construction, not at token-generation time.
getGrokApiKeyFromEnv()runs whengrokRealtimeToken(options)is called, so any module that imports/builds this adapter at startup will throw ifXAI_API_KEYhasn't been loaded yet (e.g., beforedotenv/framework config), and subsequent env updates won't be picked up. Consider reading it insidegenerateToken()for lazy evaluation, matching the pattern used by many request-time adapters:♻️ Proposed refactor
export function grokRealtimeToken( options: GrokRealtimeTokenOptions = {}, ): RealtimeTokenAdapter { - const apiKey = getGrokApiKeyFromEnv() const logger = resolveDebugOption(options.debug) return { provider: 'grok', async generateToken(): Promise<RealtimeToken> { + const apiKey = getGrokApiKeyFromEnv() const model: GrokRealtimeModel = options.model ?? 'grok-voice-fast-1.0'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/token.ts` at line 35, The adapter currently calls getGrokApiKeyFromEnv() at construction time inside grokRealtimeToken(), causing startup-time throws and preventing later env updates from being picked up; move the call into the generateToken() function so the API key is read lazily at token-generation time, i.e., remove the top-level/constructor invocation of getGrokApiKeyFromEnv() and instead call it inside generateToken() (and handle missing key there with the existing error/log flow) so the behavior matches request-time adapters.examples/ts-react-chat/src/lib/use-realtime.ts (1)
49-58: Consider adding an exhaustiveness guard foradapterForProvider.The switch returns in each
casebut has nodefault. Today theProviderunion is fully covered, so the implicit return isundefinedand the types are fine. If a new provider is ever added toProviderwithout a matchingcase, callers will silently receiveundefinedat runtime and TS may or may not surface it depending on config. Adding aneverguard future‑proofs this:♻️ Proposed refactor
function adapterForProvider(provider: Provider) { switch (provider) { case 'openai': return openaiRealtime() case 'elevenlabs': return elevenlabsRealtime() case 'grok': return grokRealtime() + default: { + const _exhaustive: never = provider + throw new Error(`Unknown provider: ${_exhaustive as string}`) + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/lib/use-realtime.ts` around lines 49 - 58, The switch in adapterForProvider(provider: Provider) lacks an exhaustiveness guard and can return undefined if Provider later gains a member; update adapterForProvider to include a default branch that asserts unreachable (use a never-typed variable or throw an Error) so TypeScript will force handling of new Provider cases at compile time and ensure callers never get undefined at runtime.packages/typescript/ai-grok/tests/audio-adapters.test.ts (1)
1-254: Solid black-box coverage for TTS/STT; consider adding a realtime adapter test too.The unit tests exercise the public request/response contract cleanly (URL, headers, JSON body, codec mapping, FormData fields, error strings) and properly restore
globalThis.fetchinafterEach. One gap worth filling in a follow-up: the newgrokRealtimeTokenPOSTs to/v1/realtime/client_secretsand multipliesexpires_at * 1000; a mocked-fetch unit test would lock in the URL/body/expiresAtmath without needing WebRTC. Not a blocker for this PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/tests/audio-adapters.test.ts` around lines 1 - 254, Add a unit test that mocks globalThis.fetch to verify the realtime token flow: call the function grokRealtimeToken (or the adapter method that obtains realtime tokens) and assert fetch was POSTed to '/v1/realtime/client_secrets' with the expected Authorization header and request body, return a mocked JSON containing client_id, client_secret and expires_at (unix seconds) and assert the function multiplies expires_at by 1000 to produce the returned expiresAt value; ensure the test restores globalThis.fetch in afterEach and checks the response mapping (client_id/client_secret and expiresAt) to lock in the URL/body/expires_at math.packages/typescript/ai-grok/src/adapters/transcription.ts (2)
61-66: Constructor argument order differs from base class; note OpenAI follows the same pattern.
BaseTranscriptionAdapterexpects(model, config), butGrokTranscriptionAdapteruses(config, model)and swaps the arguments insuper(model, config). While this works correctly,OpenAITranscriptionAdapterfollows the same pattern—suggesting it may be intentional—whereasFalTranscriptionAdapteraligns with the base class signature. Consider aligning all adapters to the base class convention for consistency across the package.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/adapters/transcription.ts` around lines 61 - 66, The GrokTranscriptionAdapter constructor currently declares parameters as (config, model) but calls super(model, config), which differs from BaseTranscriptionAdapter's (model, config) signature; update GrokTranscriptionAdapter to declare its constructor as (model: TModel, config: GrokTranscriptionConfig) (matching BaseTranscriptionAdapter and FalTranscriptionAdapter), keep the super(model, config) call, and then read this.apiKey, this.baseURL and this.defaultHeaders from the config parameter; also scan and update any callers/tests that instantiate new GrokTranscriptionAdapter to pass (model, config) order if needed so adapters are consistent with BaseTranscriptionAdapter and OpenAITranscriptionAdapter.
102-110: Add timeout support to the fetch request.The
fetchcall lacks a timeout mechanism, so a stalled xAI/v1/sttresponse will hang indefinitely. Multipart uploads over unreliable networks are particularly susceptible to this issue. Consider adding atimeoutproperty toGrokTranscriptionConfigand wiring it to anAbortControlleron the fetch request.(The suggestion to propagate
options.signalfromTranscriptionOptionsis not currently viable, as that interface does not expose an abort signal.)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/adapters/transcription.ts` around lines 102 - 110, The fetch to `${this.baseURL}/stt` needs timeout support: add an optional timeout:number to GrokTranscriptionConfig, create an AbortController in the method that calls fetch (use controller.signal in the fetch options), start a timer (setTimeout) that calls controller.abort() after timeout ms, and clear the timer once fetch resolves or throws; ensure the abort is handled (AbortError) and that headers/body usage remains unchanged. Reference: GrokTranscriptionConfig, the fetch call to `${this.baseURL}/stt`, and TranscriptionOptions (not exposing a signal) when wiring the AbortController and timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-grok/src/adapters/transcription.ts`:
- Around line 121-127: The mapping that builds words in the function producing
TranscriptionWord is dropping Grok STT's optional confidence and speaker
(GrokSTTWord) fields—update the code to preserve these by extending the returned
shape (e.g., add optional confidence:number and speaker:string to
TranscriptionWord or create a provider-specific GrokTranscriptionWord extension)
and then include w.confidence and w.speaker in the map that constructs words;
also update any type declarations and downstream consumers to accept the new
optional fields (or document the limitation if you choose not to extend the
type).
In `@packages/typescript/ai-grok/src/adapters/tts.ts`:
- Around line 85-91: The headers object currently spreads this.defaultHeaders
last which allows callers to override critical headers; change the order so
caller-provided headers are spread first (i.e., ...this.defaultHeaders) and then
explicitly set the required headers Authorization: `Bearer ${this.apiKey}` and
'Content-Type': 'application/json' afterwards so the adapter's Authorization and
Content-Type in the TTS request always take precedence; update the headers
construction where it's created in the TTS adapter (look for the headers block
that references this.defaultHeaders and this.apiKey).
- Around line 100-101: GrokSpeechAdapter uses Node's Buffer (`const audio =
Buffer.from(arrayBuffer).toString('base64')`) which throws in browsers; update
GrokSpeechAdapter (and the code around arrayBuffer/audio) to detect environment
and use a fallback: if typeof Buffer !== 'undefined' use
Buffer.from(...).toString('base64'), otherwise convert the ArrayBuffer to base64
via a browser-safe path (e.g. Uint8Array -> String.fromCharCode -> btoa), or
alternatively add a clear JSDoc comment on GrokSpeechAdapter/export indicating
it is server-only so bundlers/tree-shakers don't include it in browser builds.
Ensure the runtime check targets the Buffer usage in the response.arrayBuffer()
handling.
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 415-425: sendEvent currently pushes events into pendingEvents when
dataChannel isn't open, but after disconnect() sets dataChannel = null those
events are buffered forever; add an explicit closed boolean flag (e.g.,
isClosed) set to true in disconnect(), then update sendEvent to check isClosed
first and either log/error via logger.provider or throw when callers attempt to
send after disconnect, instead of pushing to pendingEvents; keep existing
behavior of buffering only when the channel exists but is not yet open
(dataChannel != null && dataChannel.readyState !== 'open') and ensure
pendingEvents is flushed only when a new dataChannel is established.
- Around line 136-142: The dataChannel.onerror handler is interpolating the
RTCErrorEvent directly (producing “[object Event]”); update the
dataChannel.onerror callback (the handler assigned to dataChannel.onerror) to
extract a useful message from the RTCErrorEvent (e.g., check evt.error?.message,
evt.message, or serialize the event) and include that extracted string in both
logger.errors('grok.realtime fatal', ...) and the emit('error', ...) payload so
the logged and emitted error contain meaningful diagnostics instead of the raw
Event object.
- Around line 682-684: The await dataChannelReady in connect() can hang forever
because dataChannelReady only resolves on dataChannel.onopen and has no timeout
or rejection; update connect() to race dataChannelReady with a timeout (e.g.,
Promise.race with a configurable timeout) and wire rejection paths by attaching
dataChannel.onerror to reject and peer connection state changes
(pc.onconnectionstatechange) to reject when state === 'failed' or
'disconnected'; ensure any rejection triggers the existing cleanup logic to stop
the mic track and close the RTCPeerConnection so resources aren't left active.
- Around line 533-592: The updateSession function currently always sets
sessionUpdate.input_audio_transcription = { model: 'grok-stt' }, which
overwrites caller intent and prevents disabling/overriding STT; change
updateSession (in the updateSession method) to only set
sessionUpdate.input_audio_transcription when the caller explicitly provided
transcription settings (e.g. via
config.providerOptions?.input_audio_transcription or a new
config.inputAudioTranscription field), and allow explicit null to disable
transcription; do not unconditionally write the default model here (keep the
default only during session.create or when no prior value exists elsewhere).
- Around line 173-199: Wrap the SDP negotiation (the sequence using
pc.createOffer, pc.setLocalDescription, fetch to GROK_REALTIME_URL, and
pc.setRemoteDescription) in a try/catch and in the catch perform deterministic
cleanup before rethrowing: stop all MediaStream tracks on localStream, close pc
via pc.close(), close the data channel if present (e.g. dataChannel.close()),
and close the AudioContext if created (audioContext.close()), then rethrow the
original error; also apply the same cleanup in the existing getUserMedia catch
path so that if getUserMedia fails you still close/cleanup pc, dataChannel,
localStream tracks and audioContext if they exist. Ensure each cleanup step
checks for existence (localStream, pc, dataChannel, audioContext) to avoid
additional errors and preserve the original thrown error when rethrowing.
In `@packages/typescript/ai-grok/src/realtime/types.ts`:
- Around line 12-14: The GrokRealtimeModel union is duplicated here causing a
split source of truth; remove the local declaration in realtime/types.ts and
re-export the canonical type from model-meta.ts (which defines
GROK_REALTIME_MODELS and exports type GrokRealtimeModel = (typeof
GROK_REALTIME_MODELS)[number]) so consumers import the single authoritative
GrokRealtimeModel; update the export in realtime/types.ts to re-export the type
symbol GrokRealtimeModel from model-meta.ts and delete the explicit union in
this file.
---
Nitpick comments:
In `@examples/ts-react-chat/src/lib/use-realtime.ts`:
- Around line 49-58: The switch in adapterForProvider(provider: Provider) lacks
an exhaustiveness guard and can return undefined if Provider later gains a
member; update adapterForProvider to include a default branch that asserts
unreachable (use a never-typed variable or throw an Error) so TypeScript will
force handling of new Provider cases at compile time and ensure callers never
get undefined at runtime.
In `@examples/ts-react-chat/src/routes/realtime.tsx`:
- Around line 216-220: The voice <option> labels currently render raw IDs from
GROK_VOICES (mapped as v) and appear lowercase; update the rendering in the map
over GROK_VOICES so the visible label is titlecased or derived from an { id,
label } mapping (keep option value={v or id} unchanged) — locate the GROK_VOICES
usage in the realtime.tsx dropdown mapping and replace the displayed text {v}
with a titlecased version (or use v.label) while preserving the option key/value
as the id.
- Around line 226-305: Extract the repeated (provider === 'openai' || provider
=== 'grok') predicate into a local boolean (e.g., const isOpenAICompat =
provider === 'openai' || provider === 'grok') and replace all occurrences in
this file that gate OpenAI-compatible UI (the select for OUTPUT_MODE_OPTIONS /
setOutputMode, the temperature range using setTemperature, the semanticEagerness
select using setSemanticEagerness, the tools indicator block, and the image
upload block) with isOpenAICompat; ensure the new constant is declared in the
component scope above these uses so the renders (and calls to
getStatusColor/getModeIndicator) behave identically.
In `@packages/typescript/ai-grok/package.json`:
- Around line 15-24: The package exports currently only expose the package root,
so the new adapters and realtime barrel aren’t tree-shakeable; add explicit
subpath exports for "/adapters" (pointing to the built adapter barrel(s) that
export grokSpeech, grokTranscription, grokRealtimeToken) and "/realtime"
(pointing to the built realtime barrel that exports grokRealtime / the
src/realtime/index.ts barrel) in package.json exports so consumers can import
e.g. "ai-grok/adapters" or "ai-grok/realtime" directly; update the exports
object to map "./adapters": { "types": "./dist/esm/adapters/index.d.ts",
"import": "./dist/esm/adapters/index.js" } and "./realtime": { "types":
"./dist/esm/realtime/index.d.ts", "import": "./dist/esm/realtime/index.js" }
(ensure the built files exist or update build output paths accordingly).
In `@packages/typescript/ai-grok/src/adapters/transcription.ts`:
- Around line 61-66: The GrokTranscriptionAdapter constructor currently declares
parameters as (config, model) but calls super(model, config), which differs from
BaseTranscriptionAdapter's (model, config) signature; update
GrokTranscriptionAdapter to declare its constructor as (model: TModel, config:
GrokTranscriptionConfig) (matching BaseTranscriptionAdapter and
FalTranscriptionAdapter), keep the super(model, config) call, and then read
this.apiKey, this.baseURL and this.defaultHeaders from the config parameter;
also scan and update any callers/tests that instantiate new
GrokTranscriptionAdapter to pass (model, config) order if needed so adapters are
consistent with BaseTranscriptionAdapter and OpenAITranscriptionAdapter.
- Around line 102-110: The fetch to `${this.baseURL}/stt` needs timeout support:
add an optional timeout:number to GrokTranscriptionConfig, create an
AbortController in the method that calls fetch (use controller.signal in the
fetch options), start a timer (setTimeout) that calls controller.abort() after
timeout ms, and clear the timer once fetch resolves or throws; ensure the abort
is handled (AbortError) and that headers/body usage remains unchanged.
Reference: GrokTranscriptionConfig, the fetch call to `${this.baseURL}/stt`, and
TranscriptionOptions (not exposing a signal) when wiring the AbortController and
timeout.
In `@packages/typescript/ai-grok/src/realtime/token.ts`:
- Line 35: The adapter currently calls getGrokApiKeyFromEnv() at construction
time inside grokRealtimeToken(), causing startup-time throws and preventing
later env updates from being picked up; move the call into the generateToken()
function so the API key is read lazily at token-generation time, i.e., remove
the top-level/constructor invocation of getGrokApiKeyFromEnv() and instead call
it inside generateToken() (and handle missing key there with the existing
error/log flow) so the behavior matches request-time adapters.
In `@packages/typescript/ai-grok/src/realtime/types.ts`:
- Around line 79-111: GrokRealtimeSessionResponse is over-specified and
unvalidated; update token.ts to validate only the fields you actually read
(model, client_secret.value, client_secret.expires_at) by replacing the broad
cast with a Zod schema and parsing the fetched JSON through it (e.g., define a
z.object({ model: z.string(), client_secret: z.object({ value: z.string(),
expires_at: z.number() }) }) and use safe parsing), then use the parsed values
when accessing client_secret.value / client_secret.expires_at and model;
alternatively mark all other properties on GrokRealtimeSessionResponse as
optional/unknown if you choose not to add validation.
In `@packages/typescript/ai-grok/src/utils/audio.ts`:
- Around line 20-31: In toAudioFile's string handling (the branch that checks
typeof audio === 'string'), ensure data: URLs are checked for a ';base64' marker
before using base64ToArrayBuffer/atob and if not base64 decode the payload via
decodeURIComponent; validate the non-data string input with a conservative
base64 regex before treating it as base64 and throw a clear error if it looks
like a URL/path/plain text (don’t call base64ToArrayBuffer blindly). Also
normalize MIME→extension using a small map (e.g., audio/mpeg -> mp3, audio/wav
or audio/x-wav -> wav, audio/ogg -> ogg) when deriving the filename, and use
base64ToArrayBuffer only for confirmed base64 payloads (update calls in
toAudioFile accordingly).
In `@packages/typescript/ai-grok/tests/audio-adapters.test.ts`:
- Around line 1-254: Add a unit test that mocks globalThis.fetch to verify the
realtime token flow: call the function grokRealtimeToken (or the adapter method
that obtains realtime tokens) and assert fetch was POSTed to
'/v1/realtime/client_secrets' with the expected Authorization header and request
body, return a mocked JSON containing client_id, client_secret and expires_at
(unix seconds) and assert the function multiplies expires_at by 1000 to produce
the returned expiresAt value; ensure the test restores globalThis.fetch in
afterEach and checks the response mapping (client_id/client_secret and
expiresAt) to lock in the URL/body/expires_at math.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bd7948d1-7f46-4bfd-a320-09263fbbc664
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (26)
.changeset/grok-audio-speech-support.mdexamples/ts-react-chat/src/lib/audio-providers.tsexamples/ts-react-chat/src/lib/server-audio-adapters.tsexamples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/lib/use-realtime.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/realtime.tsxpackages/typescript/ai-grok/package.jsonpackages/typescript/ai-grok/src/adapters/transcription.tspackages/typescript/ai-grok/src/adapters/tts.tspackages/typescript/ai-grok/src/audio/transcription-provider-options.tspackages/typescript/ai-grok/src/audio/tts-provider-options.tspackages/typescript/ai-grok/src/index.tspackages/typescript/ai-grok/src/model-meta.tspackages/typescript/ai-grok/src/realtime/adapter.tspackages/typescript/ai-grok/src/realtime/index.tspackages/typescript/ai-grok/src/realtime/token.tspackages/typescript/ai-grok/src/realtime/types.tspackages/typescript/ai-grok/src/utils/audio.tspackages/typescript/ai-grok/src/utils/index.tspackages/typescript/ai-grok/tests/audio-adapters.test.tstesting/e2e/global-setup.tstesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/media-providers.tstesting/e2e/tests/test-matrix.ts
| export type GrokRealtimeModel = | ||
| | 'grok-voice-fast-1.0' | ||
| | 'grok-voice-think-fast-1.0' |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
GrokRealtimeModel is duplicated between realtime/types.ts and model-meta.ts.
model-meta.ts already defines GROK_REALTIME_MODELS plus export type GrokRealtimeModel = (typeof GROK_REALTIME_MODELS)[number]. Redeclaring the same union here creates two independent sources of truth that will drift (e.g., if a new realtime model is added to GROK_REALTIME_MODELS but not here, consumers importing from ./types will get stale types). Prefer re-exporting from model-meta.ts:
♻️ Proposed refactor
-/**
- * Grok realtime (Voice Agent) model options.
- */
-export type GrokRealtimeModel =
- | 'grok-voice-fast-1.0'
- | 'grok-voice-think-fast-1.0'
+import type { GrokRealtimeModel } from '../model-meta'
+export type { GrokRealtimeModel }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export type GrokRealtimeModel = | |
| | 'grok-voice-fast-1.0' | |
| | 'grok-voice-think-fast-1.0' | |
| import type { GrokRealtimeModel } from '../model-meta' | |
| export type { GrokRealtimeModel } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-grok/src/realtime/types.ts` around lines 12 - 14, The
GrokRealtimeModel union is duplicated here causing a split source of truth;
remove the local declaration in realtime/types.ts and re-export the canonical
type from model-meta.ts (which defines GROK_REALTIME_MODELS and exports type
GrokRealtimeModel = (typeof GROK_REALTIME_MODELS)[number]) so consumers import
the single authoritative GrokRealtimeModel; update the export in
realtime/types.ts to re-export the type symbol GrokRealtimeModel from
model-meta.ts and delete the explicit union in this file.
CR Round 1 — cr-loop findings (default mode)Scope: diff-scoped + load-bearing findings from a 7-agent unbiased review. Pre-existing code is recorded but deferred in default mode. Round 1: 23a / 12b / 0c findings → fixing bucket (a) Bucket (a) — MANDATORY (will be addressed in fix commits)ai-grok package
examples/ts-react-chat testing/e2e Bucket (b) — discretionary / speculative, not fixing this roundvoice validation table; Bucket (c) — pre-existing / out of diff: none. |
ai-grok realtime adapter: - cleanup pc/localStream/audioContext/dataChannel on connect() failure - dataChannelReady rejects on error/close/ICE-failed/timeout - RTCErrorEvent extracted properly instead of [object Event] - onmessage parse errors emit to consumers - input_audio_transcription no longer overrides caller on every update - response.done preserves idle mode after stopAudioCapture - setupOutputAudioAnalysis disposes prior audioElement, surfaces autoplay blocks - audioContext.resume failures emit error instead of silent swallow - currentMessageId reset on response.created (tool-only turns) - pc.onconnectionstatechange / oniceconnectionstatechange emit status_change - sendImage uses object image_url for OpenAI-realtime compatibility - unknown server events logged via default branch ai-grok TTS/STT: - getContentType returns audio/L16 for pcm (valid IANA MIME) - toAudioFile requires explicit audio_format for bare base64 - transcription option renamed format -> inverse_text_normalization ai-grok realtime token: - expires_at unit-safety guard (seconds vs ms) ai-grok types: - single source of truth for GrokRealtimeModel (model-meta) ai-grok tests: - cover aac/flac in pickCodec test - normalize header assertions via Headers() - add realtime-token unit-safety tests examples/ts-react-chat: - resolveModel fails loud via InvalidModelOverrideError (no silent fallback) - audio/speech/transcribe routes return 400 with structured body testing/e2e: - media-providers uses valid grok-2-image-1212 model - test-matrix imports from feature-support (dedupe)
CR Round 2 (confirmation) — new load-bearing findingsRound 2: 12a / 30+b / 2c → fixing bucket (a). These are mostly fix-regressions introduced by the Round 1 fix batch, plus a few genuine gaps agents newly surfaced. Bucket (a) — MANDATORYai-grok/realtime
ai-grok/utils ai-grok/adapters examples/ts-react-chat tests Bucket (b) — deferredvoice-at-adapter-boundary validation, multichannel/channels pair validation, inverse_text_normalization/language coupling, silent opus/aac→mp3 log, Buffer.from browser compat, Bucket (c) — pre-existing
|
ai-grok realtime adapter: - shared teardownConnection() helper runs on SDP and post-SDP failure paths, disposing input/output analysers, audio element, mic, data channel, pc, and audio context - pre-open dataChannelReady rejection on failed/closed/disconnected pc states - pc.onconnectionstatechange is sole source of status_change (ice handler only rejects) - sendImage detects data: prefix (no more double-wrap) ai-grok audio utils: - malformed data: URI MIME parse throws instead of silently defaulting to audio/mpeg - empty/missing base64 payload throws - explicit audioFormat argument wins over URI-embedded MIME ai-grok TTS: - audio/L16 content-type includes required rate= parameter from modelOptions.sample_rate ai-grok tests: - realtime-token afterEach restores original XAI_API_KEY - new coverage for malformed data URIs, audioFormat precedence, and rate= in audio/L16 examples/ts-react-chat: - new UnknownProviderError typed class, 400 mapping in audio/speech/transcribe routes - server-fns ServerFnError wraps typed adapter errors with stable code/details
CR Round 3 (confirmation) — 4 new load-bearing findingsRound 3: 4a / 40+b / 3c → fixing bucket (a). Most agent findings this round are speculative xAI-docs-verification concerns, defensive-coding suggestions, or pre-existing patterns (bucket b/c). The 4 truly new bucket (a) items:
Bucket (b) deferredExtensive list: xAI-docs-verification items ( Bucket (c) pre-existing
|
examples/ts-react-chat: - generateSpeechFn/transcribeFn/generateSpeechStreamFn/transcribeStreamFn now wrap adapter construction with rethrowAudioAdapterError for consistent typed-error responses - realtime image display guards against data:/http(s): double-wrap ai-grok realtime adapter: - teardownConnection drains pendingEvents; sendEvent logs and skips after teardown ai-grok TTS: - sample_rate always forwarded in output_format so body and contentType rate agree
CR Round 4 (confirmation) — 7 new load-bearing findingsRound 4: 7a / 40+b / 2c → fixing bucket (a). Most items are fix-regressions introduced by Rounds 2/3 plus one leak I missed. Bucket (a) — MANDATORY
Bucket (b) — deferredxAI docs-verification items (wire field names), diarization data drop, voice validation table, RFC-exact μ-law/A-law content-type, RTCPeerConnection iceServers, example routes err.message leakage, ElevenLabs Bucket (c) — pre-existing
|
ai-grok realtime adapter: - teardownConnection on getUserMedia failure (mic/pc/dataChannel leak on mic denial) - response.function_call_arguments.done drops event if call_id absent (no item_id fallback) - isTornDown set at top of teardown to guard handlers firing during close() awaits - setupInputAudioAnalysis/setupOutputAudioAnalysis skip when torn down - onconnectionstatechange no longer double-emits status_change during disconnect() ai-grok audio utils: - toAudioFile Blob/File branch prefers explicit audioFormat over Blob.type ai-grok TTS: - sample_rate forwarded only when caller provides one or codec is pcm (don't override server defaults for container codecs) Tests updated to cover new audioFormat precedence paths and adjusted sample_rate assertions.
CR Round 5 (confirmation) — 6 new load-bearing findingsRound 5: 6a / 30+b / 4c → fixing bucket (a). Bucket (a) — MANDATORY
Bucket (b) — deferredxAI wire-name verification items, Bucket (c) — pre-existingopenai runtime dep, missing |
ai-grok realtime adapter: - pc.connectionState=failed triggers automatic teardownConnection (mic/pc/audioContext no longer leak on spontaneous failure) - flushPendingEvents wraps send in try/catch; emits error on failure instead of hanging caller - handleServerEvent case 'error' validates shape of event.error; preserves code/type/param; safe against null/missing fields - autoplay and audioContext.resume failures log without emitting fatal error events (routine browser-policy outcomes) - dataChannel.onerror/onclose gated behind isTornDown to suppress post-disconnect error events examples/ts-react-chat: - realtime.tsx handleImageUpload validates FileReader result, file.type, and base64 extraction; surfaces errors visibly
CR Round 6 (confirmation) — 1 new load-bearing findingRound 6: 1a / 40+b / 6c → fixing bucket (a). Bucket (a) — MANDATORY
Bucket (b) — deferredMost Round 6 findings recur from prior rounds and depend on xAI docs verification or fall into stylistic/design-call territory:
Bucket (c) — pre-existingMissing |
…enames utils/audio.ts produced 'audio.basic' and 'audio.x-alaw-basic' for mulaw/alaw via the default-branch MIME split. Servers using filename as a format hint now see 'audio.mulaw' / 'audio.alaw', matching the reverse toMimeType mapping.
CR Round 7 (confirmation) — CONVERGED on bucket (a)Round 7: 0a / 40+b / 9c. Every finding in this round repeats a previously-deferred bucket (b) item or a pre-existing bucket (c) concern. No new fix-regressions introduced by prior fix batches. Convergence trajectory: 23 → 12 → 4 → 7 → 6 → 1 → 0 bucket (a) findings over 7 rounds. Proceeding to final verification (CI on PR HEAD). Bucket (b) items remain explicitly deferred in default mode — mostly xAI-docs verification items ( Commits in this CR cycle |
CR-loop complete ✅CI status on PR HEAD (47ed1fb): all gating checks green — E2E Tests, Test (PR), autofix, Preview, Provenance, Version Preview, Continuous Releases. (CodeRabbit is an external review bot, not a gating check.) Convergence: 0 bucket-(a) findings after a 7-agent unbiased confirmation round in Round 7. Trajectory: 23 → 12 → 4 → 7 → 6 → 1 → 0 bucket-(a) findings over 7 rounds. 6 fix batches:
Deferred (bucket b/c): mostly xAI-docs-verification items (wire field names for ITN/modalities/endpoint paths, |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
examples/ts-react-chat/src/lib/server-fns.ts (1)
111-127: Optional: extract thelet adapter; try { ... } catch { rethrow }dance into a helper.The same 6-line pattern is duplicated across five handlers. A small helper like
buildOrRethrow(() => buildSpeechAdapter(...))would deduplicate and make the intent (translate adapter errors) more obvious at call sites, without changing behavior.Also applies to: 137-153, 164-180, 279-298, 308-327
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/ts-react-chat/src/lib/server-fns.ts` around lines 111 - 127, Extract the repeated try/catch that calls buildSpeechAdapter and rethrowAudioAdapterError into a small helper (e.g., buildOrRethrow) and replace each duplicated block with a single call to that helper; specifically, create a helper that accepts a zero-arg function (or provider string) to call buildSpeechAdapter, catches any error and calls rethrowAudioAdapterError(err), and returns the adapter so handlers like the anonymous function in .handler and the other handlers referencing buildSpeechAdapter (lines around start of generateSpeech calls) simply call buildOrRethrow(() => buildSpeechAdapter(data.provider ?? 'openai')) (or buildOrRethrow(data.provider)) to get the adapter and then pass it to generateSpeech.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/ts-react-chat/src/routes/api.generate.audio.ts`:
- Around line 73-92: The two 400 error branches return inconsistent payload keys
for the provider; in the InvalidModelOverrideError branch (the block that calls
jsonError with error 'invalid_model_override') change the returned key named
provider to providerId so it matches the UnknownProviderError branch and the
rest of the codebase (see rethrowAudioAdapterError in server-fns.ts and the
sibling route api.transcribe.ts). Locate the jsonError call inside the
InvalidModelOverrideError handling and rename the provider property to
providerId while keeping the same value (err.providerId or err.provider
depending on how the error exposes it), ensuring both branches emit providerId
consistently.
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 825-834: sendEvent currently checks dataChannel?.readyState ===
'open' then calls dataChannel.send without try/catch, which can throw (e.g.,
InvalidStateError) and leak into callers; change sendEvent so that the send is
wrapped in a try/catch similar to flushPendingEvents: call logger.provider
before sending, attempt JSON.stringify and dataChannel.send inside a try block,
and on any error catch it, log the error (use logger.provider or logger.error
with the same context including event type), and push the event onto
pendingEvents so it will be retried by flushPendingEvents; reference sendEvent,
flushPendingEvents, dataChannel, pendingEvents, and logger.provider in your
change.
---
Nitpick comments:
In `@examples/ts-react-chat/src/lib/server-fns.ts`:
- Around line 111-127: Extract the repeated try/catch that calls
buildSpeechAdapter and rethrowAudioAdapterError into a small helper (e.g.,
buildOrRethrow) and replace each duplicated block with a single call to that
helper; specifically, create a helper that accepts a zero-arg function (or
provider string) to call buildSpeechAdapter, catches any error and calls
rethrowAudioAdapterError(err), and returns the adapter so handlers like the
anonymous function in .handler and the other handlers referencing
buildSpeechAdapter (lines around start of generateSpeech calls) simply call
buildOrRethrow(() => buildSpeechAdapter(data.provider ?? 'openai')) (or
buildOrRethrow(data.provider)) to get the adapter and then pass it to
generateSpeech.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7c97a895-bf28-41fb-bd8b-ead54c3f5abb
📒 Files selected for processing (19)
examples/ts-react-chat/src/lib/server-audio-adapters.tsexamples/ts-react-chat/src/lib/server-fns.tsexamples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tsexamples/ts-react-chat/src/routes/realtime.tsxpackages/typescript/ai-grok/src/adapters/transcription.tspackages/typescript/ai-grok/src/adapters/tts.tspackages/typescript/ai-grok/src/audio/transcription-provider-options.tspackages/typescript/ai-grok/src/realtime/adapter.tspackages/typescript/ai-grok/src/realtime/index.tspackages/typescript/ai-grok/src/realtime/token.tspackages/typescript/ai-grok/src/realtime/types.tspackages/typescript/ai-grok/src/utils/audio.tspackages/typescript/ai-grok/tests/audio-adapters.test.tspackages/typescript/ai-grok/tests/realtime-token.test.tstesting/e2e/src/lib/feature-support.tstesting/e2e/src/lib/media-providers.tstesting/e2e/tests/test-matrix.ts
✅ Files skipped from review due to trivial changes (3)
- packages/typescript/ai-grok/src/audio/transcription-provider-options.ts
- packages/typescript/ai-grok/src/realtime/index.ts
- packages/typescript/ai-grok/src/realtime/types.ts
🚧 Files skipped from review as they are similar to previous changes (8)
- examples/ts-react-chat/src/routes/api.transcribe.ts
- testing/e2e/src/lib/feature-support.ts
- testing/e2e/src/lib/media-providers.ts
- examples/ts-react-chat/src/lib/server-audio-adapters.ts
- packages/typescript/ai-grok/src/realtime/token.ts
- packages/typescript/ai-grok/src/adapters/transcription.ts
- examples/ts-react-chat/src/routes/realtime.tsx
- testing/e2e/tests/test-matrix.ts
| if (err instanceof InvalidModelOverrideError) { | ||
| return jsonError(400, { | ||
| error: 'invalid_model_override', | ||
| message: err.message, | ||
| provider: err.providerId, | ||
| requestedModel: err.requestedModel, | ||
| allowedModels: err.allowedModels, | ||
| }) | ||
| } | ||
| // Defense-in-depth: the Zod enum schema above should already reject | ||
| // unknown providers, but surface a typed 400 here in case that | ||
| // validation drifts or is bypassed. | ||
| if (err instanceof UnknownProviderError) { | ||
| return jsonError(400, { | ||
| error: 'unknown_provider', | ||
| message: err.message, | ||
| providerId: err.providerId, | ||
| allowedProviders: err.allowedProviders, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Minor: payload field name differs between the two 400 cases (provider vs providerId).
The invalid_model_override branch emits the provider under the key provider, while the unknown_provider branch uses providerId. Consumers doing generic error rendering will need a special case for each code. server-fns.ts' rethrowAudioAdapterError uses providerId for both, and the sibling route api.transcribe.ts likely matches — consider aligning to providerId everywhere for a uniform client contract.
🔧 Proposed fix
if (err instanceof InvalidModelOverrideError) {
return jsonError(400, {
error: 'invalid_model_override',
message: err.message,
- provider: err.providerId,
+ providerId: err.providerId,
requestedModel: err.requestedModel,
allowedModels: err.allowedModels,
})
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (err instanceof InvalidModelOverrideError) { | |
| return jsonError(400, { | |
| error: 'invalid_model_override', | |
| message: err.message, | |
| provider: err.providerId, | |
| requestedModel: err.requestedModel, | |
| allowedModels: err.allowedModels, | |
| }) | |
| } | |
| // Defense-in-depth: the Zod enum schema above should already reject | |
| // unknown providers, but surface a typed 400 here in case that | |
| // validation drifts or is bypassed. | |
| if (err instanceof UnknownProviderError) { | |
| return jsonError(400, { | |
| error: 'unknown_provider', | |
| message: err.message, | |
| providerId: err.providerId, | |
| allowedProviders: err.allowedProviders, | |
| }) | |
| } | |
| if (err instanceof InvalidModelOverrideError) { | |
| return jsonError(400, { | |
| error: 'invalid_model_override', | |
| message: err.message, | |
| providerId: err.providerId, | |
| requestedModel: err.requestedModel, | |
| allowedModels: err.allowedModels, | |
| }) | |
| } | |
| // Defense-in-depth: the Zod enum schema above should already reject | |
| // unknown providers, but surface a typed 400 here in case that | |
| // validation drifts or is bypassed. | |
| if (err instanceof UnknownProviderError) { | |
| return jsonError(400, { | |
| error: 'unknown_provider', | |
| message: err.message, | |
| providerId: err.providerId, | |
| allowedProviders: err.allowedProviders, | |
| }) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/ts-react-chat/src/routes/api.generate.audio.ts` around lines 73 -
92, The two 400 error branches return inconsistent payload keys for the
provider; in the InvalidModelOverrideError branch (the block that calls
jsonError with error 'invalid_model_override') change the returned key named
provider to providerId so it matches the UnknownProviderError branch and the
rest of the codebase (see rethrowAudioAdapterError in server-fns.ts and the
sibling route api.transcribe.ts). Locate the jsonError call inside the
InvalidModelOverrideError handling and rename the provider property to
providerId while keeping the same value (err.providerId or err.provider
depending on how the error exposes it), ensuring both branches emit providerId
consistently.
| if (dataChannel?.readyState === 'open') { | ||
| logger.provider( | ||
| `provider=grok direction=out type=${(event.type as string | undefined) ?? '<unknown>'}`, | ||
| { frame: event }, | ||
| ) | ||
| dataChannel.send(JSON.stringify(event)) | ||
| } else { | ||
| pendingEvents.push(event) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Can RTCDataChannel.send() throw even when readyState is 'open' due to a race with the state transitioning to 'closing'?
💡 Result:
Yes, RTCDataChannel.send can throw an InvalidStateError even when readyState is 'open' due to a race condition with the state transitioning to 'closing'.
Citations:
- 1: RTCDataChannel.send during 'closing' state w3c/webrtc-pc#1827
- 2: https://developer.mozilla.org/en-US/docs/Web/API/RTCDataChannel/readyState
- 3: DataChannel .send InvalidStateError node-webrtc/node-webrtc#417
- 4: DOMException : Failed to execute "send" on 'RTCDataChannel' webrtc/samples#990
- 5: InvalidStateError: RTCDataChannel.readyState is not 'open' feross/simple-peer#480
🏁 Script executed:
cd /repository && find . -type f -name "adapter.ts" | grep -i realtime | head -5Repository: TanStack/ai
Length of output: 117
🏁 Script executed:
# Check the file and verify the code snippet and surrounding context
cat -n packages/typescript/ai-grok/src/realtime/adapter.ts | sed -n '820,870p'Repository: TanStack/ai
Length of output: 2303
🏁 Script executed:
# Search for flushPendingEvents implementation to verify it has try/catch
cat -n packages/typescript/ai-grok/src/realtime/adapter.ts | sed -n '837,861p'Repository: TanStack/ai
Length of output: 1285
🏁 Script executed:
# Check for error handling patterns in the file - look for emit('error') and logger usage
rg "emit\('error'" packages/typescript/ai-grok/src/realtime/adapter.ts -A 2 -B 2Repository: TanStack/ai
Length of output: 1423
🏁 Script executed:
# Check logger.errors usage pattern in this file
rg "logger\.errors" packages/typescript/ai-grok/src/realtime/adapter.ts -A 1 -B 1 | head -20Repository: TanStack/ai
Length of output: 537
sendEvent doesn't guard against dataChannel.send throwing — inconsistent with flushPendingEvents.
RTCDataChannel.send can throw after the readyState === 'open' check — e.g., InvalidStateError if the channel transitions to closing/closed between the check and the call, or TypeError/NetworkError for buffer/network issues. flushPendingEvents (lines 837–861) wraps its send loop in try/catch; sendEvent does not. Since the public methods (sendText, sendImage, updateSession, interrupt) are synchronous fire-and-forget calls typically invoked from React event handlers, an uncaught throw here will propagate into consumer code and likely trip an error boundary.
Proposed fix
if (dataChannel?.readyState === 'open') {
logger.provider(
`provider=grok direction=out type=${(event.type as string | undefined) ?? '<unknown>'}`,
{ frame: event },
)
- dataChannel.send(JSON.stringify(event))
+ try {
+ dataChannel.send(JSON.stringify(event))
+ } catch (error) {
+ logger.errors('grok.realtime sendEvent failed', {
+ error,
+ source: 'grok.realtime',
+ eventType: (event.type as string | undefined) ?? '<unknown>',
+ })
+ emit('error', {
+ error: error instanceof Error ? error : new Error(String(error)),
+ })
+ }
} else {
pendingEvents.push(event)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 825 - 834,
sendEvent currently checks dataChannel?.readyState === 'open' then calls
dataChannel.send without try/catch, which can throw (e.g., InvalidStateError)
and leak into callers; change sendEvent so that the send is wrapped in a
try/catch similar to flushPendingEvents: call logger.provider before sending,
attempt JSON.stringify and dataChannel.send inside a try block, and on any error
catch it, log the error (use logger.provider or logger.error with the same
context including event type), and push the event onto pendingEvents so it will
be retried by flushPendingEvents; reference sendEvent, flushPendingEvents,
dataChannel, pendingEvents, and logger.provider in your change.
…tion, fix xAI realtime event names
Refactors from user review:
adapters:
- tts.ts: extract buildTTSRequestBody helper (codec/sample_rate/voice default
resolution + body assembly). Export getContentType for consumer use.
- transcription.ts: extract buildTranscriptionFormData helper (wire-field
mapping including xAI's named 'format' boolean toggle for inverse text
normalization).
model-meta.ts: audio and realtime models now use the same
`as const satisfies ModelMeta` convention as chat/image models
(GROK_TTS, GROK_STT, GROK_VOICE_FAST_1, GROK_VOICE_THINK_FAST_1) with
input/output modalities and tool_calling / reasoning capabilities.
realtime adapter:
- Replace drive-by 'as' casts on untyped server events with runtime-checked
readers (readString, readObject, readObjectArray); malformed frames return
undefined instead of throwing a TypeError.
- Accept both legacy OpenAI-realtime event names and current xAI voice-agent
names per docs.x.ai: response.output_audio.* / response.output_audio_transcript.* /
response.text.* (plus existing response.audio.* / response.audio_transcript.* /
response.output_text.* aliases for compatibility).
- RealtimeServerError type replaces repeated 'as Error & { code?: string }' casts.
realtime token:
- Wrap request body with { session: { model } } per xAI /v1/realtime/client_secrets
schema (was bare { model } before).
Post-CR refactor — review feedback addressedPushed follow-up commits (183a7cc, aa53a9c) responding to maintainer review: 1. Helper extraction
2.
3.
4. xAI docs verification (resolved Round 7 bucket (b) items) ✅ Confirmed correct:
🔧 Fixed per docs:
5. |
…time contract The RealtimeAdapter / RealtimeConnection interfaces are duplicated locally in src/realtime/realtime-contract.ts. The adapter imports them from there instead of @tanstack/ai-client, so consumers of @tanstack/ai-grok no longer have to install @tanstack/ai-client unless they also want to construct a RealtimeClient from it (structural typing covers that use case). @tanstack/ai-client stays as a devDependency to run a type-level drift check (tests/realtime-contract.drift.test-d.ts) that asserts our inlined contract is bidirectionally assignable to the canonical one. If ai-client ever changes the interface, that file will fail to compile and we update both in lockstep. publint --strict: clean.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
packages/typescript/ai-grok/src/adapters/tts.ts (1)
144-146:bit_ratesilently dropped for non-mp3 codecs.If a caller sets
modelOptions.bit_ratealongside e.g.codec: 'opus'(or uses a format that maps to mp3 viapickCodec, but explicitly overrides codec to something else), the bit_rate is discarded without any signal. At minimum, consider a debug log via the existingloggerso this isn't silently ignored; alternatively document thatbit_rateonly applies when the resolved codec ismp3.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/adapters/tts.ts` around lines 144 - 146, The code currently drops modelOptions.bit_rate when the resolved codec (variable codec from pickCodec) is not 'mp3' with no signal; update the tts adapter to detect when modelOptions.bit_rate is set but codec !== 'mp3' and emit a clear logger warning/debug message (use the existing logger instance) stating that bit_rate is ignored for non-mp3 codecs and showing the provided bit_rate and resolved codec; alternatively, if you prefer stricter behavior, validate and throw an error instead—references: codec, modelOptions.bit_rate, outputFormat, and pickCodec so you can locate the logic to add the log/validation.packages/typescript/ai-grok/src/realtime/adapter.ts (3)
100-107:connectionMode !== 'webrtc'becomes a runtime error with no type-level signal.If a caller sets
connectionMode: 'websocket', the failure happens asynchronously insideconnect()rather than at construction. IfGrokRealtimeOptions['connectionMode']is typed to include'websocket', that's a correctness hazard; if it's typed as'webrtc'only, this wholeelsebranch is unreachable. Either:
- narrow the type to
'webrtc'and drop the dead branch, or- fail fast in
grokRealtime()itself (synchronously) so the error surfaces at setup time rather than after the caller has already obtained a token.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 100 - 107, The runtime-only error in connect() for non-'webrtc' connectionMode must be moved to a synchronous validation or removed by narrowing the option type: either update GrokRealtimeOptions['connectionMode'] to only allow 'webrtc' and delete the dead else branch in connect(), or add an immediate check in grokRealtime(options) that throws (and logs via logger.errors('grok.realtime fatal', ...)) when options.connectionMode !== 'webrtc' so callers fail fast before token acquisition; reference the connect() implementation and grokRealtime() entry point to locate and apply the change.
693-711: Nitpick: innercontentshadows the outercontentarray.On line 695
contentis the array of content parts; on line 705 a newconst content = readString(part, 'text')shadows it within theelse ifbranch. Readers tracking the loop have to re-bind the name mentally. Renaming the inner one (e.g.textContentor justtext) would avoid the shadow:♻️ Suggested rename
- } else if (partType === 'text') { - const content = readString(part, 'text') - if (content) { - message.parts.push({ type: 'text', content }) - } - } + } else if (partType === 'text') { + const text = readString(part, 'text') + if (text) { + message.parts.push({ type: 'text', content: text }) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 693 - 711, The inner const named `content` in the `else if (partType === 'text')` branch shadows the outer `content` array (from `const content = readObjectArray(item, 'content')`) inside the loop `for (const part of content)`, which is confusing; rename the inner variable (e.g. to `text` or `textContent`) and update the subsequent `message.parts.push({ type: 'text', content })` to use the new name so the outer `content` array (and the `for (const part of content)` loop) is not shadowed; the changes affect the `readString(part, 'text')` usage and the `message.parts.push` call in this block.
119-121: Add ICE server configuration toRTCPeerConnectionfor restrictive networks.
new RTCPeerConnection()with noiceServersrelies entirely on host candidates + server-advertised ICE endpoints. On restrictive networks (symmetric NAT, corporate firewalls), this fails silently as a 15sdataChannelReadytimeout rather than a clear ICE gathering failure.Consider either:
- Default to a public STUN server (e.g.,
stun:stun.l.google.com:19302)- Expose an
iceServersoption onGrokRealtimeOptionsso consumers can supply TURN credentials when neededBoth approaches match WebRTC production patterns and align with OpenAI realtime adapter best practices.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 119 - 121, The RTCPeerConnection is created without ICE servers which causes silent failures on restrictive networks; update the RTCPeerConnection instantiation in adapter.ts (where pc is declared) to accept an iceServers config by: add an iceServers option to GrokRealtimeOptions and use it when constructing pc (e.g., pass { iceServers: options.iceServers ?? [{ urls: 'stun:stun.l.google.com:19302' }] }) so consumers can supply TURN credentials while defaulting to a public STUN; ensure types/interfaces for GrokRealtimeOptions are updated to include iceServers and that any callers pass through or rely on the default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-grok/src/adapters/tts.ts`:
- Around line 138-150: The contentType advertised for mulaw/alaw drops an
explicitly provided sample_rate, causing consumers to assume 8kHz; update the
logic that builds the Content-Type (the code using getContentType and the
sampleRateForContentType variable) so that when codec === 'mulaw' or 'alaw' and
callerSampleRate or modelOptions.sample_rate is set you include a rate=
parameter in the Content-Type (i.e., ensure sampleRateForContentType picks up
callerSampleRate and propagate it into the getContentType output) —
alternatively, if you prefer stricter behavior, validate in the same spot (using
codec, callerSampleRate, modelOptions.sample_rate, and outputFormat) and throw a
clear error when a non-default rate is passed for mulaw/alaw; implement one of
these fixes and apply the same change where similar logic exists (the other
block referenced around lines 205-211).
In `@packages/typescript/ai-grok/src/realtime/token.ts`:
- Around line 58-65: generateToken currently calls
fetch(GROK_REALTIME_CLIENT_SECRETS_URL, ...) without a timeout which can hang;
update generateToken in token.ts to create an AbortSignal with a default timeout
(e.g., 15s) and pass it to fetch via the signal option, exposing the timeout as
an optional parameter on generateToken (or options object) so callers can
override; prefer AbortSignal.timeout(...) when running on Node versions that
support it and fall back to a manual AbortController + setTimeout that calls
controller.abort(), and ensure you handle/propagate abort errors consistently
from generateToken.
- Around line 74-89: The code dereferences sessionData.client_secret.expires_at
and sessionData.model without runtime validation, causing opaque TypeErrors on
malformed responses; update the token parsing in token.ts to validate the
response shape (use a Zod schema for GrokRealtimeSessionResponse or at minimum
check sessionData.client_secret?.value and sessionData.client_secret?.expires_at
exist) before using them, normalize expires_at as before (treat >1e12 as ms),
fall back to the requested model if sessionData.model is missing, and throw a
clear, descriptive Error that includes HTTP context (status/code and body) when
validation fails so callers get an actionable error instead of a TypeError.
- Around line 50-57: The comment is accurate: the requestBody in the token
creation code never wires an expires_after override from the caller; add an
optional expiresAfterSeconds?: number property to GrokRealtimeTokenOptions and,
in the function that builds requestBody (the variable requestBody and the
session: { model } assignment), conditionally add requestBody.expires_after = {
seconds: expiresAfterSeconds } when that option is provided so the generated
body matches the xAI docs and the JSDoc comment.
---
Nitpick comments:
In `@packages/typescript/ai-grok/src/adapters/tts.ts`:
- Around line 144-146: The code currently drops modelOptions.bit_rate when the
resolved codec (variable codec from pickCodec) is not 'mp3' with no signal;
update the tts adapter to detect when modelOptions.bit_rate is set but codec !==
'mp3' and emit a clear logger warning/debug message (use the existing logger
instance) stating that bit_rate is ignored for non-mp3 codecs and showing the
provided bit_rate and resolved codec; alternatively, if you prefer stricter
behavior, validate and throw an error instead—references: codec,
modelOptions.bit_rate, outputFormat, and pickCodec so you can locate the logic
to add the log/validation.
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 100-107: The runtime-only error in connect() for non-'webrtc'
connectionMode must be moved to a synchronous validation or removed by narrowing
the option type: either update GrokRealtimeOptions['connectionMode'] to only
allow 'webrtc' and delete the dead else branch in connect(), or add an immediate
check in grokRealtime(options) that throws (and logs via
logger.errors('grok.realtime fatal', ...)) when options.connectionMode !==
'webrtc' so callers fail fast before token acquisition; reference the connect()
implementation and grokRealtime() entry point to locate and apply the change.
- Around line 693-711: The inner const named `content` in the `else if (partType
=== 'text')` branch shadows the outer `content` array (from `const content =
readObjectArray(item, 'content')`) inside the loop `for (const part of
content)`, which is confusing; rename the inner variable (e.g. to `text` or
`textContent`) and update the subsequent `message.parts.push({ type: 'text',
content })` to use the new name so the outer `content` array (and the `for
(const part of content)` loop) is not shadowed; the changes affect the
`readString(part, 'text')` usage and the `message.parts.push` call in this
block.
- Around line 119-121: The RTCPeerConnection is created without ICE servers
which causes silent failures on restrictive networks; update the
RTCPeerConnection instantiation in adapter.ts (where pc is declared) to accept
an iceServers config by: add an iceServers option to GrokRealtimeOptions and use
it when constructing pc (e.g., pass { iceServers: options.iceServers ?? [{ urls:
'stun:stun.l.google.com:19302' }] }) so consumers can supply TURN credentials
while defaulting to a public STUN; ensure types/interfaces for
GrokRealtimeOptions are updated to include iceServers and that any callers pass
through or rely on the default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 10e51339-3598-4e76-a5d1-79eee5db4732
📒 Files selected for processing (6)
packages/typescript/ai-grok/src/adapters/transcription.tspackages/typescript/ai-grok/src/adapters/tts.tspackages/typescript/ai-grok/src/model-meta.tspackages/typescript/ai-grok/src/realtime/adapter.tspackages/typescript/ai-grok/src/realtime/token.tspackages/typescript/ai-grok/tests/realtime-token.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/typescript/ai-grok/tests/realtime-token.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/typescript/ai-grok/src/model-meta.ts
| // xAI docs (docs.x.ai/developers/rest-api-reference/inference/voice) | ||
| // specify the body as `{ expires_after: { seconds }, session: { model } }`. | ||
| // `expires_after` defaults to 600s on the server, so we only set it | ||
| // if the caller overrides; `session.model` is required to pin the | ||
| // voice agent model for this token. | ||
| const requestBody: Record<string, unknown> = { | ||
| session: { model }, | ||
| } |
There was a problem hiding this comment.
Comment references expires_after override that isn't wired up.
The JSDoc-style comment on lines 50–54 says "we only set it if the caller overrides," but requestBody is unconditionally { session: { model } } with no branch reading options.expires_after (or similar) from GrokRealtimeTokenOptions. Either drop the misleading sentence or wire the override through so callers can actually set it:
♻️ Suggested fix (wire through, if that's the intent)
- const requestBody: Record<string, unknown> = {
- session: { model },
- }
+ const requestBody: Record<string, unknown> = {
+ session: { model },
+ }
+ if (options.expiresAfterSeconds !== undefined) {
+ requestBody.expires_after = { seconds: options.expiresAfterSeconds }
+ }(Requires adding expiresAfterSeconds?: number to GrokRealtimeTokenOptions.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-grok/src/realtime/token.ts` around lines 50 - 57, The
comment is accurate: the requestBody in the token creation code never wires an
expires_after override from the caller; add an optional expiresAfterSeconds?:
number property to GrokRealtimeTokenOptions and, in the function that builds
requestBody (the variable requestBody and the session: { model } assignment),
conditionally add requestBody.expires_after = { seconds: expiresAfterSeconds }
when that option is provided so the generated body matches the xAI docs and the
JSDoc comment.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
packages/typescript/ai-grok/src/realtime/adapter.ts (2)
157-167: A throwing handler aborts remaining subscribers inemit.If any registered handler throws, the
forloop unwinds and subsequent handlers for the same event are never invoked. For events likeerror/status_change/mode_change, one misbehaving subscriber can silently prevent the others (e.g. internal bookkeeping inRealtimeClient) from running. Wrapping each handler call in atry/catchthat routes the error throughlogger.errorswould contain the blast radius.♻️ Proposed fix
function emit<TEvent extends RealtimeEvent>( event: TEvent, payload: Parameters<RealtimeEventHandler<TEvent>>[0], ) { const handlers = eventHandlers.get(event) if (handlers) { for (const handler of handlers) { - handler(payload) + try { + handler(payload) + } catch (error) { + logger.errors('grok.realtime event handler threw', { + error, + source: 'grok.realtime', + event, + }) + } } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 157 - 167, The emit function currently stops invoking remaining subscribers if one handler throws; update emit (function emit and its use of eventHandlers) to wrap each handler(payload) call in a try/catch so a thrown error does not abort the loop, and in the catch call logger.errors (include context such as event and handler identity if available) to report the error while continuing to invoke remaining handlers.
1107-1170: Per-callUint8Arrayallocations in the visualization getters.
calculateLeveland the fourget*Datagetters each allocate a freshUint8Arrayevery time they're read. These are typically polled at the visualization render rate (~60 FPS) for the session duration, producing a steady stream of short-lived buffers and GC pressure. Consider hoisting per-analyser reusableUint8Arrays into the outer closure and just re-filling them on each call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 1107 - 1170, The visualization getters allocate new Uint8Array each call; hoist reusable buffers into the getAudioVisualization closure (e.g., inputTimeDomainBuffer, outputTimeDomainBuffer, inputFreqBuffer, outputFreqBuffer) and reuse them in calculateLevel, getInputTimeDomainData, getOutputTimeDomainData, getInputFrequencyData, and getOutputFrequencyData; ensure you allocate or resize each buffer when its analyser appears or its fftSize/frequencyBinCount changes (compare analyser.fftSize / frequencyBinCount before reuse) and fill the existing buffers with analyser.getByteTimeDomainData / getByteFrequencyData instead of creating new Uint8Array instances per call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 154-155: emptyFrequencyData and emptyTimeDomainData are
module-scoped shared Uint8Arrays that get returned directly from
getInputFrequencyData/getOutputFrequencyData/getInputTimeDomainData/getOutputTimeDomainData
and can be mutated by callers; change this by replacing the shared buffers with
factory functions (e.g. makeEmptyFrequencyData and makeEmptyTimeDomainData) that
return a new Uint8Array each call (zero-filled for frequency, 128-filled for
time-domain) and update the four getters (getInputFrequencyData,
getOutputFrequencyData, getInputTimeDomainData, getOutputTimeDomainData) to
return makeEmptyFrequencyData()/makeEmptyTimeDomainData() instead of the shared
arrays so callers receive fresh buffers.
- Around line 719-721: The case handling for 'conversation.item.truncated' only
emits 'interrupted' and leaves currentMode (usually 'speaking') unchanged;
mirror the logic in the interrupt() method by setting currentMode to 'listening'
and emitting a 'mode_change' event so listeners see the transition; update the
'conversation.item.truncated' branch (where emit('interrupted', { messageId:
currentMessageId ?? undefined }) is called) to also set currentMode =
'listening' and emit('mode_change', { mode: 'listening' }) (keeping
currentMessageId handling consistent), ensuring consistency with response.done
guards and avoiding a stale speaking state.
In `@packages/typescript/ai-grok/src/realtime/realtime-contract.ts`:
- Line 15: Update the JSDoc reference that currently points to
realtime-contract.drift-check.ts so it correctly names the actual drift-check
file present in this PR (realtime-contract.drift.test-d.ts); locate the comment
in realtime-contract.ts and replace the stale filename with the correct filename
to avoid dead references.
---
Nitpick comments:
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 157-167: The emit function currently stops invoking remaining
subscribers if one handler throws; update emit (function emit and its use of
eventHandlers) to wrap each handler(payload) call in a try/catch so a thrown
error does not abort the loop, and in the catch call logger.errors (include
context such as event and handler identity if available) to report the error
while continuing to invoke remaining handlers.
- Around line 1107-1170: The visualization getters allocate new Uint8Array each
call; hoist reusable buffers into the getAudioVisualization closure (e.g.,
inputTimeDomainBuffer, outputTimeDomainBuffer, inputFreqBuffer,
outputFreqBuffer) and reuse them in calculateLevel, getInputTimeDomainData,
getOutputTimeDomainData, getInputFrequencyData, and getOutputFrequencyData;
ensure you allocate or resize each buffer when its analyser appears or its
fftSize/frequencyBinCount changes (compare analyser.fftSize / frequencyBinCount
before reuse) and fill the existing buffers with analyser.getByteTimeDomainData
/ getByteFrequencyData instead of creating new Uint8Array instances per call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 81f896d1-85ab-4c4d-844b-3205706da0df
📒 Files selected for processing (4)
packages/typescript/ai-grok/package.jsonpackages/typescript/ai-grok/src/realtime/adapter.tspackages/typescript/ai-grok/src/realtime/realtime-contract.tspackages/typescript/ai-grok/tests/realtime-contract.drift.test-d.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/typescript/ai-grok/package.json
- tts.ts / transcription.ts: spread `defaultHeaders` BEFORE Authorization /
Content-Type so a caller-supplied header can't silently clobber the bearer
token or auth content-type.
- utils/audio.ts: new `arrayBufferToBase64` helper — Buffer fast path on
Node, chunked btoa fallback everywhere else (browser, Workers, Bun). Replaces
the Node-only `Buffer.from(arrayBuffer).toString('base64')` in tts.ts.
- transcription.ts: new `GrokTranscriptionWord` interface extends the core
`TranscriptionWord` with optional `confidence` and `speaker`. The adapter
now preserves both fields when xAI returns them, so callers that narrow via
`as Array<GrokTranscriptionWord>` get the diarization output they asked
for. Test expectations updated.
- tts.ts: mulaw/alaw `contentType` now includes a `;rate=…` parameter (as
`audio/PCMU` / `audio/PCMA` per RFC 3551) when the caller requests a
non-default sample rate, instead of the 8 kHz-implying `audio/basic` /
`audio/x-alaw-basic`.
- realtime/adapter.ts: `conversation.item.truncated` flips mode back to
`listening` so the visualiser can't get stuck on `speaking` after an
interrupt. `sendEvent` wraps `dataChannel.send` in try/catch consistent
with `flushPendingEvents`. The shared `emptyFrequencyData` /
`emptyTimeDomainData` buffers are gone — `getAudioVisualization`
returns a fresh `Uint8Array` per call so consumers can't mutate a
module-level instance.
- realtime/token.ts: adds a 15s `AbortController` timeout on the
client_secrets request so a dead endpoint can't hang the caller forever.
Validates `client_secret.value` / `expires_at` shape at runtime before
dereferencing so a malformed response throws a descriptive error.
- realtime/realtime-contract.ts: JSDoc filename ref updated.
- examples/ts-react-chat audio/speech/transcribe routes: unify the 400
unknown_provider payload under the `provider` key (was `providerId`)
to match the invalid_model_override branch and the request body.
CodeRabbit feedback addressed (92695f6)Went through every CodeRabbit inline comment on the PR. Most had already been fixed earlier in the cr-loop; the remaining real items are in this commit. Fixes landed in this commitAuth / header precedence (tts.ts, transcription.ts)
Cross-runtime base64 (utils/audio.ts, tts.ts)
Diarization / confidence data preservation (transcription.ts)
μ-law / A-law
Realtime interruption state (adapter.ts)
Audio-visualization buffer aliasing (adapter.ts)
Realtime token robustness (token.ts)
Error payload consistency (examples/ts-react-chat audio/speech/transcribe routes)
Doc fix (realtime-contract.ts)
Items already resolved earlier in the cr-loop
All CodeRabbit inline items are now either fixed or part of an earlier commit on this branch. Typecheck clean, 53/53 tests pass, |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/typescript/ai-grok/src/adapters/tts.ts (1)
134-154: Slightly misleading comment aroundpcmDefault— consider tightening.The comment at lines 135–137 says
pcmDefaultis "used only when we MUST attach a rate to the contentType (pcm)", but it's also used as the fallback forsampleRateForContentTypeon non-pcm codecs (line 154), where it happens to coincidentally match xAI's documented default of 24000 Hz for all codecs. That coincidence is what makes the mulaw/alawaudio/PCMU;rate=24000labeling correct when the caller doesn't pin a rate — worth making explicit so the invariant doesn't silently break if xAI ever changes the default.♻️ Suggested comment rewording
- // Default sample rate documented in GrokTTSProviderOptions is 24000 Hz — - // used only when we MUST attach a rate to the contentType (pcm) and the - // caller didn't pick one. + // xAI's documented default sample rate across codecs is 24000 Hz. We use + // it in two places: + // 1. `output_format.sample_rate` for pcm (where the Content-Type carries + // the rate via `audio/L16;rate=…`, so the label MUST match the bytes). + // 2. `sampleRateForContentType` for non-pcm codecs (mulaw/alaw) when the + // caller didn't pin a rate — relies on the server actually producing + // 24000 Hz output when no rate is sent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/typescript/ai-grok/src/adapters/tts.ts` around lines 134 - 154, Update the comment around the pcmDefault constant to state that pcmDefault (24000) is not only used when embedding a rate for pcm but also serves as the fallback used by sampleRateForContentType for constructing content-type labels for other codecs (e.g., mulaw/alaw), and explicitly note this relies on xAI's documented default of 24000 Hz so the invariant must be revisited if xAI changes defaults; refer to pcmDefault, sampleRateForContentType, callerSampleRate, codec and modelOptions when adding the clarification.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/typescript/ai-grok/src/realtime/adapter.ts`:
- Around line 1037-1114: In updateSession, remove the undocumented default
input_audio_transcription payload (the sessionUpdate.input_audio_transcription =
{ model: 'grok-stt' } branch) so we don't send an unsupported parameter to xAI;
preserve the caller override behavior that copies an explicit
providerOptions.inputAudioTranscription / input_audio_transcription value
(including null/false) into sessionUpdate.input_audio_transcription when
present, but if the caller did not provide it, do not inject any default (or set
it explicitly to null if you prefer explicit disablement); update references in
the function (providerOptions, callerTranscription, hasSentInitialSessionUpdate,
sendEvent) accordingly to only send sessionUpdate.input_audio_transcription when
callerTranscription is defined.
---
Nitpick comments:
In `@packages/typescript/ai-grok/src/adapters/tts.ts`:
- Around line 134-154: Update the comment around the pcmDefault constant to
state that pcmDefault (24000) is not only used when embedding a rate for pcm but
also serves as the fallback used by sampleRateForContentType for constructing
content-type labels for other codecs (e.g., mulaw/alaw), and explicitly note
this relies on xAI's documented default of 24000 Hz so the invariant must be
revisited if xAI changes defaults; refer to pcmDefault,
sampleRateForContentType, callerSampleRate, codec and modelOptions when adding
the clarification.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 500e52a3-14ab-4bc3-a4c1-98cc1c845c0d
📒 Files selected for processing (11)
examples/ts-react-chat/src/routes/api.generate.audio.tsexamples/ts-react-chat/src/routes/api.generate.speech.tsexamples/ts-react-chat/src/routes/api.transcribe.tspackages/typescript/ai-grok/src/adapters/transcription.tspackages/typescript/ai-grok/src/adapters/tts.tspackages/typescript/ai-grok/src/realtime/adapter.tspackages/typescript/ai-grok/src/realtime/realtime-contract.tspackages/typescript/ai-grok/src/realtime/token.tspackages/typescript/ai-grok/src/utils/audio.tspackages/typescript/ai-grok/src/utils/index.tspackages/typescript/ai-grok/tests/audio-adapters.test.ts
✅ Files skipped from review due to trivial changes (1)
- packages/typescript/ai-grok/src/realtime/realtime-contract.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/typescript/ai-grok/src/utils/index.ts
- examples/ts-react-chat/src/routes/api.generate.audio.ts
| updateSession(config: Partial<RealtimeSessionConfig>) { | ||
| const sessionUpdate: Record<string, unknown> = {} | ||
|
|
||
| if (config.instructions) { | ||
| sessionUpdate.instructions = config.instructions | ||
| } | ||
|
|
||
| if (config.voice) { | ||
| sessionUpdate.voice = config.voice | ||
| } | ||
|
|
||
| if (config.vadMode) { | ||
| if (config.vadMode === 'semantic') { | ||
| sessionUpdate.turn_detection = { | ||
| type: 'semantic_vad', | ||
| eagerness: config.semanticEagerness ?? 'medium', | ||
| } | ||
| } else if (config.vadMode === 'server') { | ||
| sessionUpdate.turn_detection = { | ||
| type: 'server_vad', | ||
| threshold: config.vadConfig?.threshold ?? 0.5, | ||
| prefix_padding_ms: config.vadConfig?.prefixPaddingMs ?? 300, | ||
| silence_duration_ms: config.vadConfig?.silenceDurationMs ?? 500, | ||
| } | ||
| } else { | ||
| sessionUpdate.turn_detection = null | ||
| } | ||
| } | ||
|
|
||
| if (config.tools !== undefined) { | ||
| sessionUpdate.tools = config.tools.map((t) => ({ | ||
| type: 'function', | ||
| name: t.name, | ||
| description: t.description, | ||
| parameters: t.inputSchema ?? { type: 'object', properties: {} }, | ||
| })) | ||
| sessionUpdate.tool_choice = 'auto' | ||
| } | ||
|
|
||
| if (config.outputModalities) { | ||
| sessionUpdate.modalities = config.outputModalities | ||
| } | ||
|
|
||
| if (config.temperature !== undefined) { | ||
| sessionUpdate.temperature = config.temperature | ||
| } | ||
|
|
||
| if (config.maxOutputTokens !== undefined) { | ||
| sessionUpdate.max_response_output_tokens = config.maxOutputTokens | ||
| } | ||
|
|
||
| // Let callers forward an explicit `input_audio_transcription` value | ||
| // through `providerOptions` — including `null` / `false` to disable | ||
| // the feature. Only apply our `grok-stt` default on the first | ||
| // session.update and only if the caller hasn't set it themselves. | ||
| const providerOptions: Record<string, unknown> = | ||
| config.providerOptions ?? {} | ||
| const callerTranscription = | ||
| 'inputAudioTranscription' in providerOptions | ||
| ? providerOptions.inputAudioTranscription | ||
| : 'input_audio_transcription' in providerOptions | ||
| ? providerOptions.input_audio_transcription | ||
| : undefined | ||
| if (callerTranscription !== undefined) { | ||
| sessionUpdate.input_audio_transcription = | ||
| callerTranscription === false ? null : callerTranscription | ||
| } else if (!hasSentInitialSessionUpdate) { | ||
| sessionUpdate.input_audio_transcription = { model: 'grok-stt' } | ||
| } | ||
|
|
||
| if (Object.keys(sessionUpdate).length > 0) { | ||
| sendEvent({ | ||
| type: 'session.update', | ||
| session: sessionUpdate, | ||
| }) | ||
| hasSentInitialSessionUpdate = true | ||
| } | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
xAI realtime session update input_audio_transcription model names grok-stt documentation
💡 Result:
In xAI's realtime Voice Agent API (wss://api.x.ai/v1/realtime), input audio transcription is provided automatically via the server event conversation.item.input_audio_transcription.completed, which delivers the transcript of user input audio. Official xAI documentation does not explicitly document a configurable input_audio_transcription parameter (like OpenAI's realtime API) in the session.update event for specifying STT model names. The listed session parameters include instructions, voice, tools, turn_detection, and audio formats, but no STT model field. The Voice Agent models are grok-voice-fast-1.0 (default, deprecated) and grok-voice-think-fast-1.0 (recommended), specified in the WebSocket URL query (e.g., ?model=grok-voice-think-fast-1.0). xAI offers a separate standalone Grok STT API (https://api.x.ai/v1/stt for batch, wss://api.x.ai/v1/stt for streaming). Third-party sources suggest using model=grok-stt for this API, but official docs do not list a model parameter for STT endpoints—instead, configuration uses query params like language, format, diarize. No explicit "grok-stt" model name appears in official Voice Agent or STT docs. LiveKit's xAI plugin sets a default input_audio_transcription but omits the model, implying built-in/default STT for the realtime session without user-configurable model selection.
Citations:
- 1: https://docs.x.ai/developers/rest-api-reference/inference/voice
- 2: https://docs.livekit.io/reference/python/v1/livekit/plugins/xai/realtime/realtime_model.html
- 3: https://docs.x.ai/developers/model-capabilities/audio/voice-agent
- 4: https://docs.x.ai/developers/model-capabilities/audio/speech-to-text
- 5: https://blog.laozhang.ai/en/posts/grok-speech-to-text-api
Remove or reconsider the input_audio_transcription session parameter — it is not documented in xAI's realtime API.
xAI's official realtime session.update documentation does not include an input_audio_transcription parameter. The realtime Voice Agent API provides transcript data automatically through server events (conversation.item.input_audio_transcription.completed). Sending { model: 'grok-stt' } as an undocumented parameter may be rejected by xAI's API, causing first-session connections to fail with a server error frame.
The caller-override mechanism (explicit null/false, object, or default) is sound in design, but the default value should either be removed entirely or set to null to avoid unsupported parameters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/typescript/ai-grok/src/realtime/adapter.ts` around lines 1037 -
1114, In updateSession, remove the undocumented default
input_audio_transcription payload (the sessionUpdate.input_audio_transcription =
{ model: 'grok-stt' } branch) so we don't send an unsupported parameter to xAI;
preserve the caller override behavior that copies an explicit
providerOptions.inputAudioTranscription / input_audio_transcription value
(including null/false) into sessionUpdate.input_audio_transcription when
present, but if the caller did not provide it, do not inject any default (or set
it explicitly to null if you prefer explicit disablement); update references in
the function (providerOptions, callerTranscription, hasSentInitialSessionUpdate,
sendEvent) accordingly to only send sessionUpdate.input_audio_transcription when
callerTranscription is defined.
Summary
grokSpeech), STT (grokTranscription), and realtime voice agent (grokRealtime/grokRealtimeToken) adapters in@tanstack/ai-grok, with model metadata, provider options, and unit tests.ts-react-chatexample across the realtime, speech, and transcription pages by extending the sharedSPEECH_PROVIDERS/TRANSCRIPTION_PROVIDERScatalogs and thebuildSpeechAdapter/buildTranscriptionAdapterfactories.@tanstack/ai/adapter-internalsdebug logger (resolveDebugOption,InternalLogger) across the new adapters so users can flipdebug: true/false/DebugConfigthe same way they do on other providers. Replaces the remainingconsole.error/console.warncalls in the realtime adapter withlogger.errors.Test plan
pnpm --filter @tanstack/ai-grok test:lib— unit tests for TTS, transcription, and realtimepnpm --filter @tanstack/ai-e2e test:e2e— end-to-end media suite picks up Grok via the feature-support matrixXAI_API_KEY=... pnpm --filter ts-react-chat dev/realtime— select "Grok Voice Agent", pick a voice (eve/ara/rex/sal/leo), hold a conversation; verify tools and image upload still work/generations/speech— select Grok TTS, try each voice across streaming / direct / server-fn modes/generations/transcription— select Grok STT, upload an audio file across all three modesdebug: trueongrokRealtime,grokSpeech, andgrokTranscriptioncalls; confirm request / provider / errors output appears with the[tanstack-ai:...]prefixes🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation