Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughA new Text-to-Speech (TTS) provider implementation for the xAI plugin is added, featuring the Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant TTS as XAITTS
participant API as xAI TTS API
participant Decoder as Audio Decoder
Client->>TTS: stream_audio(text, **kwargs)
TTS->>TTS: Prepare request payload<br/>(voice, codec, sample_rate, etc.)
loop Retry Logic (up to 3 attempts)
TTS->>API: POST /tts/generate<br/>(with exponential backoff on 429/500/503)
API-->>TTS: Audio bytes (PCM/WAV/MP3/G.711)
end
TTS->>Decoder: _decode_audio(bytes, codec)
alt codec == "pcm"
Decoder->>Decoder: Pass through raw PCM
else codec == "mulaw" or "alaw"
Decoder->>Decoder: Numpy-based G.711 decoding
else codec == "wav"
Decoder->>Decoder: wave module unpacking
else codec == "mp3"
Decoder->>Decoder: pydub MP3 decoding
end
Decoder-->>TTS: PcmData
TTS-->>Client: PcmData | Iterator | AsyncIterator
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
llm.py: drop the `getattr` chain in `_extract_tool_calls_from_response` in favor of direct attribute access — `Response.tool_calls` always returns a list and the `ToolCall` proto fields (id, function, name, arguments) are guaranteed-present. Removes the dead `call_id` fallback (no such field on the proto) and narrows the bare `except Exception` to `json.JSONDecodeError`. xai_realtime.py: - Refresh the stale "as of xai-sdk 1.5.0" docstrings; verified xai-sdk 1.11 still ships no realtime/voice/websocket wrapper, so the raw `websockets` implementation remains correct. - Bump cosmetic `DEFAULT_MODEL` from "grok-3-fast" to "grok-4" (per the existing docstring this value is informational and not sent to the API). - Hoist `aiohttp` import to the module top. - Narrow each `except Exception` to specific tuples — `OSError`/ `WebSocketException`/`TimeoutError` for connect, `ConnectionError`/ `WebSocketException` for send paths, and the processing loop now swallows only transient transport/decode errors so programming bugs surface instead of being silently logged. - Pass `plugin_name="xai"` on the `LLMResponseChunkEvent` emitted from `_handle_response_done`, matching every other event in the file.
Session config now mirrors the livekit xAI plugin's known-working shape:
- Send model name ("grok-4-1-fast-non-reasoning") in session.update
- Include input_audio_transcription for server-side transcription
- Expand turn_detection from bare {"type":"server_vad"} to full
ServerVad config with threshold, padding, duration, and
interrupt_response=False (prevents mic echo from cancelling the
agent's own response mid-sentence)
- Fix DEFAULT_SAMPLE_RATE from 48000 to 24000 — xAI's realtime
model emits PCM at 24 kHz; tagging frames as 48 kHz caused 2x
playback speed and premature buffer drain
- Hoist aiohttp import to module level
Diagnostics:
- Explicitly handle response.cancelled / response.cancel events with
a WARNING log so server-initiated interrupts are visible
- Bump unhandled event types from DEBUG to INFO for runtime visibility
- Handle rate_limits.updated at DEBUG
on_audio_done was calling _audio_track.flush() unconditionally on every
RealtimeAudioOutputDoneEvent. flush() discards the buffer immediately
("Playback stops immediately"), which truncates audio when the server
finishes sending faster than real-time playback drains.
Now flush() is only called when event.interrupted is True (barge-in).
On normal completion the buffer drains naturally through playback.
This only affects realtime plugins that deliver audio via WebSocket
events through the _audio_track buffer (currently xAI). OpenAI and
Gemini use WebRTC where audio bypasses this buffer path entirely.
Grok TTS plugin support
Summary by CodeRabbit
New Features
Bug Fixes
Documentation