Add ITextToSpeechClient abstraction, middleware, and OpenAI implementation by stephentoub · Pull Request #7381 · dotnet/extensions

stephentoub · 2026-03-10T14:09:59Z

This PR adds a comprehensive ITextToSpeechClient abstraction to Microsoft.Extensions.AI, the inverse of the existing ISpeechToTextClient.

Abstractions (Microsoft.Extensions.AI.Abstractions)

ITextToSpeechClient - Core interface with GetAudioAsync/GetStreamingAudioAsync
DelegatingTextToSpeechClient - Pipeline delegation base class
TextToSpeechOptions - Options with ModelId, VoiceId, Language, AudioFormat, Speed, Pitch, Volume, and RawRepresentationFactory
TextToSpeechResponse / TextToSpeechResponseUpdate - Response types with DataContent binary audio
TextToSpeechResponseUpdateKind - Kind struct (SessionOpen, Error, AudioUpdating, AudioUpdated, SessionClose)
TextToSpeechResponseUpdateExtensions - Coalescing extensions
TextToSpeechClientMetadata / TextToSpeechClientExtensions - Metadata and GetService

Middleware (Microsoft.Extensions.AI)

TextToSpeechClientBuilder - Builder pattern with DI integration
ConfigureOptionsTextToSpeechClient - Options configuration middleware
LoggingTextToSpeechClient - Logging middleware (skips binary audio serialization)
OpenTelemetryTextToSpeechClient - OpenTelemetry tracing middleware

OpenAI Implementation (Microsoft.Extensions.AI.OpenAI)

OpenAITextToSpeechClient wrapping AudioClient.GenerateSpeechAsync, exposed via AsITextToSpeechClient()
Maps VoiceId, Speed, AudioFormat; supports RawRepresentationFactory for full SDK escape hatch

Tests

9 abstraction tests, 5 middleware tests, 16 OpenAI unit tests, 5 integration tests
All pass across net462, net8.0, net9.0, net10.0

Microsoft Reviewers: Open in CodeFlow

Copilot

Pull request overview

This PR adds a comprehensive ITextToSpeechClient abstraction to Microsoft.Extensions.AI, mirroring the existing ISpeechToTextClient pattern. It introduces core abstraction types, middleware components (logging, OpenTelemetry, options configuration), DI integration, and an OpenAI implementation backed by AudioClient.GenerateSpeechAsync.

Changes:

New abstraction types in Microsoft.Extensions.AI.Abstractions: ITextToSpeechClient, DelegatingTextToSpeechClient, TextToSpeechOptions, TextToSpeechResponse, TextToSpeechResponseUpdate, TextToSpeechResponseUpdateKind, TextToSpeechClientMetadata, TextToSpeechClientExtensions, and TextToSpeechResponseUpdateExtensions.
Middleware pipeline in Microsoft.Extensions.AI: TextToSpeechClientBuilder, DI service collection extensions, ConfigureOptionsTextToSpeechClient, LoggingTextToSpeechClient, and OpenTelemetryTextToSpeechClient with builder extensions.
OpenAI implementation (OpenAITextToSpeechClient) wrapping AudioClient with voice/speed/format mapping and non-streaming fallback for GetStreamingAudioAsync.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/.../TextToSpeech/ITextToSpeechClient.cs`	Core interface with `GetAudioAsync`, `GetStreamingAudioAsync`, `GetService`
`src/.../TextToSpeech/DelegatingTextToSpeechClient.cs`	Base class for pipeline delegation
`src/.../TextToSpeech/TextToSpeechOptions.cs`	Options with ModelId, VoiceId, Language, AudioFormat, Speed, Pitch, Volume, RawRepresentationFactory
`src/.../TextToSpeech/TextToSpeechResponse.cs`	Response type with Contents, Usage, ToTextToSpeechResponseUpdates
`src/.../TextToSpeech/TextToSpeechResponseUpdate.cs`	Streaming update type with Kind, Contents
`src/.../TextToSpeech/TextToSpeechResponseUpdateKind.cs`	Kind struct (SessionOpen, Error, AudioUpdating, AudioUpdated, SessionClose)
`src/.../TextToSpeech/TextToSpeechResponseUpdateExtensions.cs`	Coalescing extensions for updates → response
`src/.../TextToSpeech/TextToSpeechClientMetadata.cs`	Metadata with ProviderName, ProviderUri, DefaultModelId
`src/.../TextToSpeech/TextToSpeechClientExtensions.cs`	GetService extension
`src/.../AI/TextToSpeech/TextToSpeechClientBuilder.cs`	Builder pattern for pipelines
`src/.../AI/TextToSpeech/TextToSpeechClientBuilderServiceCollectionExtensions.cs`	DI registration (keyed + unkeyed)
`src/.../AI/TextToSpeech/TextToSpeechClientBuilderTextToSpeechClientExtensions.cs`	AsBuilder() extension
`src/.../AI/TextToSpeech/ConfigureOptionsTextToSpeechClient.cs`	Options configuration middleware
`src/.../AI/TextToSpeech/ConfigureOptionsTextToSpeechClientBuilderExtensions.cs`	ConfigureOptions builder extension
`src/.../AI/TextToSpeech/LoggingTextToSpeechClient.cs`	Logging middleware (skips binary audio serialization)
`src/.../AI/TextToSpeech/LoggingTextToSpeechClientBuilderExtensions.cs`	UseLogging builder extension
`src/.../AI/TextToSpeech/OpenTelemetryTextToSpeechClient.cs`	OpenTelemetry tracing/metrics middleware
`src/.../AI/TextToSpeech/OpenTelemetryTextToSpeechClientBuilderExtensions.cs`	UseOpenTelemetry builder extension
`src/.../AI/OpenTelemetryConsts.cs`	Adds `TypeAudio` constant
`src/.../AI.OpenAI/OpenAITextToSpeechClient.cs`	OpenAI implementation with format mapping
`src/.../AI.OpenAI/OpenAIClientExtensions.cs`	`AsITextToSpeechClient()` extension on AudioClient
`src/.../AI.Abstractions/Utilities/AIJsonUtilities.Defaults.cs`	Registers TTS types for source-gen JSON serialization
`src/Shared/DiagnosticIds/DiagnosticIds.cs`	Adds `AITextToSpeech` experimental diagnostic ID
`test/.../TestTextToSpeechClient.cs`	Test helper client
`test/.../TestJsonSerializerContext.cs`	Adds TTS types to test serialization context
`test/.../TextToSpeech/*Tests.cs`	Comprehensive tests for all new types
`test/.../OpenAITextToSpeechClientTests.cs`	Unit tests for OpenAI implementation
`test/.../OpenAITextToSpeechClientIntegrationTests.cs`	Integration tests
`test/.../TextToSpeechClientIntegrationTests.cs`	Base integration test class

ericstj

This looks pretty good and rather straight forward adaptation of established patterns.

...ibraries/Microsoft.Extensions.AI.Abstractions/TextToSpeech/TextToSpeechResponseUpdateKind.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToSpeech/TextToSpeechResponse.cs

MikeAlhayek · 2026-03-13T18:38:38Z

@stephentoub I am wondering is we care to add another interface for generating supported voices? The TTS service should be able to support a specific voice.

stephentoub · 2026-03-13T19:31:04Z

@stephentoub I am wondering is we care to add another interface for generating supported voices? The TTS service should be able to support a specific voice.

I don't understand the question. Can you elaborate?

MikeAlhayek · 2026-03-13T22:34:52Z

@stephentoub

As you know, TTS providers typically offer multiple voices that can be used when synthesizing audio. In order to request synthesis using a specific voice, we need a way to discover which voices a provider supports.

I recently worked on a similar initiative to add TTS support and introduced a Task<SpeechVoice[]> GetSpeechVoicesAsync() method that returns the list of supported voices (when the provider exposes them). Having such a method is very useful because it allows consumers to know which voices are available and request the synthesized utterance using one of those voices.

I’m wondering if we could add a similar method to ITextToSpeechClient to expose the list of available voices. I'm not entirely sure whether ITextToSpeechClient is the best place for this or if it should be introduced through a separate interface. That said, adding it directly to ITextToSpeechClient might still make sense since it is closely related to the same client and its capabilities.

I did this here:
https://github.com/CrestApps/CrestApps.OrchardCore/blob/eedb7ecf8550509fb8ec222856642faf44d580ab/src/Abstractions/CrestApps.OrchardCore.AI.Abstractions/IAIClientProvider.cs#L54-L68

stephentoub · 2026-03-14T01:36:29Z

Thanks. My concern is, at least as far as I'm aware, a bunch of services don't actually expose that, e.g. to my knowledge OpenAI doesn't provide an API for retrieving the list of voices, Gemini doesn't appear to, etc. Am I just missing it?

MikeAlhayek · 2026-03-14T04:40:02Z

I hear you. However, many providers expose these voices. In many real world use cases these are crucial. I think all providers will probably expose multiple voices as demand grows. I use ElevenLabs and Azure TTS and both have ways to retrieve it. Azure TTS done not support OpenAI as far as I know yet they expose them

Add ITextToSpeechClient and friends

8780366

stephentoub requested review from a team as code owners March 10, 2026 14:10

Copilot AI review requested due to automatic review settings March 10, 2026 14:10

github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Mar 10, 2026

Copilot started reviewing on behalf of stephentoub March 10, 2026 14:10 View session

dotnet-policy-service bot assigned stephentoub Mar 10, 2026

Copilot AI reviewed Mar 10, 2026

View reviewed changes

ericstj approved these changes Mar 10, 2026

View reviewed changes

...ibraries/Microsoft.Extensions.AI.Abstractions/TextToSpeech/TextToSpeechResponseUpdateKind.cs Show resolved Hide resolved

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToSpeech/TextToSpeechResponse.cs Outdated Show resolved Hide resolved

Refactor TextToSpeechResponse to use Contents property

50d5df8

stephentoub enabled auto-merge (squash) March 10, 2026 19:58

stephentoub merged commit cbde52a into dotnet:main Mar 10, 2026
6 checks passed

stephentoub deleted the tts branch March 13, 2026 19:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ITextToSpeechClient abstraction, middleware, and OpenAI implementation#7381

Add ITextToSpeechClient abstraction, middleware, and OpenAI implementation#7381
stephentoub merged 2 commits intodotnet:mainfrom
stephentoub:tts

stephentoub commented Mar 10, 2026 •

edited by dotnet-policy-service bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ericstj left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MikeAlhayek commented Mar 13, 2026

Uh oh!

stephentoub commented Mar 13, 2026

Uh oh!

MikeAlhayek commented Mar 13, 2026

Uh oh!

stephentoub commented Mar 14, 2026

Uh oh!

MikeAlhayek commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

stephentoub commented Mar 10, 2026 • edited by dotnet-policy-service bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Abstractions (Microsoft.Extensions.AI.Abstractions)

Middleware (Microsoft.Extensions.AI)

OpenAI Implementation (Microsoft.Extensions.AI.OpenAI)

Tests

Microsoft Reviewers: Open in CodeFlow

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ericstj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MikeAlhayek commented Mar 13, 2026

Uh oh!

stephentoub commented Mar 13, 2026

Uh oh!

MikeAlhayek commented Mar 13, 2026

Uh oh!

stephentoub commented Mar 14, 2026

Uh oh!

MikeAlhayek commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stephentoub commented Mar 10, 2026 •

edited by dotnet-policy-service bot

Loading