fix: stabilize model-level TTS routing and add model TTS settings#1632
Conversation
📝 WalkthroughWalkthroughThis PR implements unified TTS provider support as a first-class model capability. It introduces ChangesUnified TTS Provider
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
resources/model-db/providers.json (1)
181588-181610: ⚡ Quick winNormalize
gpt-4o-mini-ttsfield shape with peer TTS entries.Line 181588 onward omits
limitandopen_weights, while neighboring TTS entries include them. Keeping a consistent record shape reduces downstream null-guard branching.Proposed diff
{ "id": "gpt-4o-mini-tts", "name": "gpt-4o-mini-tts", "display_name": "gpt-4o-mini-tts", "modalities": { "input": [ "text" ], "output": [ "audio" ] }, + "limit": { + "context": 8192, + "output": 8192 + }, "temperature": false, "tool_call": false, "reasoning": { "supported": false }, "attachment": false, + "open_weights": false, "cost": { "input": 0.48, "output": 0.96 }, "type": "tts" },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@resources/model-db/providers.json` around lines 181588 - 181610, The JSON entry for the TTS model with id/name "gpt-4o-mini-tts" is missing the standard fields "limit" and "open_weights" used by other TTS entries; update the object for "gpt-4o-mini-tts" to include the same "limit" structure (e.g., requests/characters/hour or whatever shape peers use) and the "open_weights" boolean/metadata key with the same defaults as neighboring TTS entries so the record shape matches peers and avoids extra null checks in consumers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`:
- Around line 313-318: The audio extraction is brittle: update the logic around
firstMessage and audioData to defensively check both paths and handle missing
fields—inspect firstMessage.audio?.data first, then iterate firstMessage.content
(if Array.isArray) and return the first item where item?.type === 'audio' &&
item?.audio?.data exists; if none found, also consider item?.audio (in case data
is top-level) or item?.data as fallback before throwing. Modify the code around
the audioData computation (referencing firstMessage and audioData) to perform
these guarded checks and only throw the Error('TTS response missing audio data
in choices[0].message.audio.data') after all fallbacks are exhausted.
In `@src/renderer/src/components/settings/TtsSettingsFields.vue`:
- Around line 54-61: The Label and Input in TtsSettingsFields.vue are using the
wrong i18n keys (settings.model.modelConfig.timeout.label and
settings.model.modelConfig.name.placeholder); replace them with dedicated
"instructions" keys (for example settings.model.instructions.label and
settings.model.instructions.placeholder) in the two t(...) calls used by Label
and the Input's placeholder, keep the binding to t and the `@update`:model-value
handler onInstructionsInput unchanged, and then add/update those new keys in the
i18n resource files so translations are available.
---
Nitpick comments:
In `@resources/model-db/providers.json`:
- Around line 181588-181610: The JSON entry for the TTS model with id/name
"gpt-4o-mini-tts" is missing the standard fields "limit" and "open_weights" used
by other TTS entries; update the object for "gpt-4o-mini-tts" to include the
same "limit" structure (e.g., requests/characters/hour or whatever shape peers
use) and the "open_weights" boolean/metadata key with the same defaults as
neighboring TTS entries so the record shape matches peers and avoids extra null
checks in consumers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d3fd03ba-47f7-4fee-9ec4-b6c314ffb6a4
📒 Files selected for processing (18)
docs/features/unified-tts-provider/plan.mddocs/features/unified-tts-provider/spec.mddocs/features/unified-tts-provider/tasks.mdresources/model-db/providers.jsonsrc/main/presenter/configPresenter/index.tssrc/main/presenter/configPresenter/modelConfig.tssrc/main/presenter/llmProviderPresenter/aiSdk/runtime.tssrc/main/presenter/llmProviderPresenter/providers/aiSdkProvider.tssrc/renderer/settings/components/ProviderModelList.vuesrc/renderer/src/components/settings/ModelConfigDialog.vuesrc/renderer/src/components/settings/TtsSettingsFields.vuesrc/renderer/src/composables/useModelTypeDetection.tssrc/shared/contracts/common.tssrc/shared/contracts/domainSchemas.tssrc/shared/model.tssrc/shared/ttsSettings.tssrc/shared/types/model-db.tssrc/shared/types/presenters/legacy.presenters.d.ts
| const audioData = | ||
| firstMessage?.audio?.data ?? | ||
| firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data | ||
| if (!audioData) { | ||
| throw new Error('TTS response missing audio data in choices[0].message.audio.data') | ||
| } |
There was a problem hiding this comment.
Pattern B audio extraction may miss edge cases.
The fallback path at Line 315 searches for content array items with type === 'audio', but the primary path checks message.audio.data. If the response structure varies (e.g., message.content exists but item.type is not 'audio', or item.audio is missing), the extraction will fail.
🛡️ Consider a more defensive extraction
- const audioData =
- firstMessage?.audio?.data ??
- firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data
+ const audioData =
+ firstMessage?.audio?.data ??
+ (Array.isArray(firstMessage?.content)
+ ? firstMessage.content.find((item) => item?.type === 'audio')?.audio?.data
+ : undefined)
if (!audioData) {
throw new Error('TTS response missing audio data in choices[0].message.audio.data')
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const audioData = | |
| firstMessage?.audio?.data ?? | |
| firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data | |
| if (!audioData) { | |
| throw new Error('TTS response missing audio data in choices[0].message.audio.data') | |
| } | |
| const audioData = | |
| firstMessage?.audio?.data ?? | |
| (Array.isArray(firstMessage?.content) | |
| ? firstMessage.content.find((item) => item?.type === 'audio')?.audio?.data | |
| : undefined) | |
| if (!audioData) { | |
| throw new Error('TTS response missing audio data in choices[0].message.audio.data') | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` around lines 313 -
318, The audio extraction is brittle: update the logic around firstMessage and
audioData to defensively check both paths and handle missing fields—inspect
firstMessage.audio?.data first, then iterate firstMessage.content (if
Array.isArray) and return the first item where item?.type === 'audio' &&
item?.audio?.data exists; if none found, also consider item?.audio (in case data
is top-level) or item?.data as fallback before throwing. Modify the code around
the audioData computation (referencing firstMessage and audioData) to perform
these guarded checks and only throw the Error('TTS response missing audio data
in choices[0].message.audio.data') after all fallbacks are exhausted.
| <div class="space-y-2"> | ||
| <Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label> | ||
| <Input | ||
| :model-value="tts.instructions ?? ''" | ||
| :placeholder="t('settings.model.modelConfig.name.placeholder')" | ||
| @update:model-value="onInstructionsInput" | ||
| /> | ||
| </div> |
There was a problem hiding this comment.
Incorrect i18n keys for instructions field.
Line 55 uses settings.model.modelConfig.timeout.label for the instructions label, and Line 58 uses settings.model.modelConfig.name.placeholder for the instructions placeholder. These keys appear to be copy-paste errors and do not match the semantic purpose of the "instructions" field.
🔧 Proposed fix to use dedicated i18n keys
<div class="space-y-2">
- <Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label>
+ <Label>{{ t('settings.provider.voiceai.instructions.label') }}</Label>
<Input
:model-value="tts.instructions ?? ''"
- :placeholder="t('settings.model.modelConfig.name.placeholder')"
+ :placeholder="t('settings.provider.voiceai.instructions.placeholder')"
`@update`:model-value="onInstructionsInput"
/>
</div>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <div class="space-y-2"> | |
| <Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label> | |
| <Input | |
| :model-value="tts.instructions ?? ''" | |
| :placeholder="t('settings.model.modelConfig.name.placeholder')" | |
| @update:model-value="onInstructionsInput" | |
| /> | |
| </div> | |
| <div class="space-y-2"> | |
| <Label>{{ t('settings.provider.voiceai.instructions.label') }}</Label> | |
| <Input | |
| :model-value="tts.instructions ?? ''" | |
| :placeholder="t('settings.provider.voiceai.instructions.placeholder')" | |
| `@update`:model-value="onInstructionsInput" | |
| /> | |
| </div> |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/renderer/src/components/settings/TtsSettingsFields.vue` around lines 54 -
61, The Label and Input in TtsSettingsFields.vue are using the wrong i18n keys
(settings.model.modelConfig.timeout.label and
settings.model.modelConfig.name.placeholder); replace them with dedicated
"instructions" keys (for example settings.model.instructions.label and
settings.model.instructions.placeholder) in the two t(...) calls used by Label
and the Input's placeholder, keep the binding to t and the `@update`:model-value
handler onInstructionsInput unchanged, and then add/update those new keys in the
i18n resource files so translations are available.
Summary
This PR finishes and hardens model-level TTS support across runtime, model DB inference, and settings UI.
Runtime / Routing
openai/tts-1) are recognizedmodalities: ["text", "audio"]Model Type / Endpoint Inference
ttsin provider-model type inference pathsModelType.TTStoApiEndpointType.AudioSpeechSettings UI
voice,responseFormat,speed,instructions)ModelConfigDialogchat/image/audio-speech)Model DB
aihubmixTTS models astype: "tts"xiaomimimoprovider entry (not present)Verification
pnpm run typecheckpnpm run formatpnpm run i18npnpm run lintAll checks pass locally.
Summary by CodeRabbit
Release Notes