Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# xAI SDK implementation notes

- `GrokClient` is primarily backed by generated gRPC protocol clients, but text to speech uses xAI's documented REST/WebSocket voice endpoints because there are no generated TTS protocol types in `src\xAI.Protocol`.
- `AsITextToSpeechClient` returns an `ITextToSpeechClient` implementation that uses `POST /v1/tts` for unary audio and `wss://.../v1/tts` for streaming audio.
- TTS defaults follow xAI docs: voice `eve`, language `en` when omitted by `TextToSpeechOptions`, and MP3 output when no codec is specified.
72 changes: 72 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,12 @@ var chat = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)

var images = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
.AsIImageGenerator("grok-imagine-image");

var speech = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
.AsITextToSpeechClient();

var audio = await speech.GetAudioAsync("Hello! Welcome to xAI text to speech.",
new TextToSpeechOptions { VoiceId = "eve", Language = "en" });
```

## File Attachments
Expand Down Expand Up @@ -393,6 +399,72 @@ var editedImage = (UriContent)result.Contents.First();
Console.WriteLine($"Edited image URL: {editedImage.Uri}");
```

## Text to Speech

Grok supports text to speech via the `ITextToSpeechClient` abstraction from Microsoft.Extensions.AI.
Use `AsITextToSpeechClient` to get a TTS client:

```csharp
var speech = new GrokClient(Environment.GetEnvironmentVariable("XAI_API_KEY")!)
.AsITextToSpeechClient();
```

### Unary (single response)

Call `GetAudioAsync` to synthesize speech in a single request. The result contains a `DataContent`
with the audio bytes and media type:

```csharp
var response = await speech.GetAudioAsync("Hello! Welcome to xAI text to speech.",
new TextToSpeechOptions { VoiceId = "eve", Language = "en" });

var audio = (DataContent)response.Contents.First();
// audio.MediaType == "audio/mpeg" (MP3 by default)
await File.WriteAllBytesAsync("output.mp3", audio.Data.ToArray());
```

Available voices include `ara`, `eve`, `leo`, `rex`, and `sal`. Defaults to `eve` and English when
`VoiceId`/`Language` are not specified.

### Streaming

Call `GetStreamingAudioAsync` to receive audio chunks as they are generated, enabling low-latency
playback or progressive file writes:

```csharp
await using var fileStream = File.Create("output.mp3");

await foreach (var update in speech.GetStreamingAudioAsync("Hello from streaming TTS!",
new TextToSpeechOptions { VoiceId = "eve", AudioFormat = "mp3" }))
{
if (update.Kind == TextToSpeechResponseUpdateKind.AudioUpdating)
{
foreach (var content in update.Contents.OfType<DataContent>())
await fileStream.WriteAsync(content.Data);
}
}
```

### Grok-Specific Options

Use `GrokTextToSpeechOptions` to control audio quality and streaming behavior beyond the base
`TextToSpeechOptions`:

```csharp
var options = new GrokTextToSpeechOptions
{
VoiceId = "rex",
Language = "en",
AudioFormat = "mp3", // mp3 | wav | pcm | mulaw | alaw
SampleRate = 24000, // Hz
BitRate = 128000, // bits per second (MP3 only)
OptimizeStreamingLatency = 1, // 0–4; higher trades quality for lower latency
TextNormalization = true, // expand abbreviations and numbers before synthesis
};

var response = await speech.GetAudioAsync("Streaming at 24 kHz, 128 kbps.", options);
```

<!-- #xai -->

# xAI.Protocol
Expand Down
8 changes: 5 additions & 3 deletions src/xAI.Tests/ChatClientTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ public async Task OpenAIInvokesTools()
{ "user", "What day is today?" },
};

var chat = new OpenAIClient(Configuration["OPENAI_API_KEY"]!).GetChatClient("gpt-5.4").AsIChatClient()
var chat = new OpenAIClient(Configuration["OPENAI_API_KEY"]!)
.GetChatClient("gpt-5.4")
.AsIChatClient()
.AsBuilder()
.UseFunctionInvocation(configure: client => client.MaximumIterationsPerRequest = 3)
.UseLogging(output.AsLoggerFactory())
Expand Down Expand Up @@ -96,10 +98,10 @@ public async Task GrokInvokesTools()
[SecretsFact("XAI_API_KEY")]
public async Task GrokReasoningModelOutputsBothContentAndEncryptedReasoning()
{
var grok = new GrokClient(Configuration["XAI_API_KEY"]!).AsIChatClient("grok-4-1-fast");
var grok = new GrokClient(Configuration["XAI_API_KEY"]!).AsIChatClient("grok-4-1-fast-reasoning");

var response = await grok.GetResponseAsync(
"What is 3 + 4? Respond with just the number.",
"What is 3 + 4? Respond with just the number, think about it really well.",
new GrokChatOptions
{
UseEncryptedContent = true
Expand Down
Loading
Loading