Conversation
- Implemented audio input handling in NewThreadPage.vue, allowing users to attach audio files and transcribe them. - Enhanced ChatInputBox and ChatInputToolbar components to support voice input functionality. - Added speech recognition capabilities using the useSpeechRecognition composable. - Updated model capabilities to include support for audio input and speech recognition. - Introduced new routes for audio transcription and updated related tests to ensure functionality. - Added tests for audio input handling, speech recognition, and integration with the AI SDK.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (19)
✅ Files skipped from review due to trivial changes (5)
🚧 Files skipped from review as they are similar to previous changes (11)
📝 WalkthroughWalkthroughAdds local microphone recording, WAV normalization and upload, a typed transcription route + renderer client, provider-native OpenAI-style transcription with a completion fallback, model capability gating for audio, UI recording controls and animations, audio-attachment filtering, i18n strings, and tests. ChangesVoice Input Transcription
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/shared/modelConfigDefaults.ts (1)
8-18:⚠️ Potential issue | 🟠 Major | ⚡ Quick winWire
speechRecognitioninto the fallback defaults.
DEFAULT_MODEL_SPEECH_RECOGNITIONis defined but not included inDEFAULT_MODEL_CAPABILITY_FALLBACKS, so fallback-derived configs can still exposespeechRecognitionasundefinedinstead offalse.🔧 Proposed fix
export const DEFAULT_MODEL_CAPABILITY_FALLBACKS = Object.freeze({ contextLength: DEFAULT_MODEL_CONTEXT_LENGTH, maxTokens: DEFAULT_MODEL_MAX_TOKENS, vision: DEFAULT_MODEL_VISION, + speechRecognition: DEFAULT_MODEL_SPEECH_RECOGNITION, functionCall: DEFAULT_MODEL_FUNCTION_CALL, reasoning: DEFAULT_MODEL_REASONING })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/shared/modelConfigDefaults.ts` around lines 8 - 18, DEFAULT_MODEL_SPEECH_RECOGNITION is defined but not included in DEFAULT_MODEL_CAPABILITY_FALLBACKS, leaving fallback-derived configs with speechRecognition undefined; update the DEFAULT_MODEL_CAPABILITY_FALLBACKS Object.freeze block to include a speechRecognition property set to DEFAULT_MODEL_SPEECH_RECOGNITION so fallback resolution returns false by default (modify the DEFAULT_MODEL_CAPABILITY_FALLBACKS object where contextLength, maxTokens, vision, functionCall, and reasoning are defined).
🧹 Nitpick comments (1)
src/renderer/api/ModelClient.ts (1)
222-237: ⚡ Quick winConsider simplifying the optional filename spread.
The conditional spread for the optional
filenameparameter works correctly but could be more concise using short-circuit evaluation.♻️ Simpler optional parameter pattern
const result = await bridge.invoke(modelsTranscribeAudioRoute.name, { providerId, modelId, audioBase64, mimeType, - ...(filename ? { filename } : {}) + ...(filename && { filename }) })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/renderer/api/ModelClient.ts` around lines 222 - 237, The conditional spread for the optional filename in transcribeAudio is verbose; replace the ternary spread (...(filename ? { filename } : {})) with a concise short-circuit spread like ...(filename && { filename }) when building the payload for modelsTranscribeAudioRoute.name so the filename is included only when defined.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/presenter/llmProviderPresenter/aiSdk/messageMapper.ts`:
- Around line 151-157: The code computes a fallback mediaType but still passes
the original actualMediaType into buildAudioProviderOptions, causing the
generated data URL to use an unsupported MIME type; update the call to
buildAudioProviderOptions to pass the computed mediaType (not actualMediaType)
and adjust buildAudioProviderOptions' signature/usage to accept and use this
mediaType so the data:<mime>;base64,... string and provider options consistently
use OPENAI_COMPATIBLE_AUDIO_FALLBACK_MEDIA_TYPE when applicable.
In `@src/main/presenter/llmProviderPresenter/index.ts`:
- Around line 401-403: The check uses normalizedMimeType but may be comparing
with mixed-case input (e.g., "Audio/WAV"); ensure normalizedMimeType is created
by lowercasing the incoming mimeType (e.g., normalizedMimeType =
mimeType.toLowerCase()) before performing the startsWith('audio/') validation so
the condition accepts valid audio types regardless of case; update the logic
around the normalizedMimeType variable where the MIME is validated in the LLM
provider presenter (the block that throws Error(`Invalid audio MIME type for
transcription: ${mimeType}`)) to use the lowercased value.
In `@src/renderer/src/components/chat/composables/useAudioRecorder.ts`:
- Around line 88-95: mediaRecorder.onstop currently calls options.onRecorded
even after cleanupRecorder()/cleanup() has run, which can emit stale callbacks;
fix by adding a disposal guard: introduce and set a local boolean flag (e.g.,
isDisposed or isActiveRecording) that cleanupRecorder()/cleanup() flips to true,
and in mediaRecorder.onstop check the flag before invoking options.onRecorded
(or alternatively null out options.onRecorded in cleanup and guard for its
existence in onstop); reference mediaRecorder.onstop, cleanupRecorder(),
cleanup(), and options.onRecorded when implementing the guard.
In `@src/renderer/src/components/chat/composables/useSpeechRecognition.ts`:
- Around line 100-106: The switch in useSpeechRecognition.ts that inspects
error.message currently groups 'transcription-timeout' with decode failures;
update the switch in the function handling speech errors so that
'transcription-timeout' is not returned as 'decode-failed'—remove it from the
decode-failed case and return a distinct, appropriate error key (e.g.,
'transcription-timeout' or 'timeout') in the switch's default/own case so
callers of the composable (useSpeechRecognition) can distinguish timeout vs
decode failures.
In `@src/renderer/src/i18n/fa-IR/settings.json`:
- Line 427: In the fa-IR translation entry (the JSON description string
currently reading "مشخص میکند آیا این مدل ورود صوتی با تبدیل محلی گفتار به متن
را مجاز میکند یا نه."), replace the phrase "ورود صوتی" with "ورودی صوتی" so the
description reads with the correct wording for "voice input" and matches other
feature labels; update the value for the same "description" string in
settings.json accordingly.
In `@src/renderer/src/i18n/ru-RU/chat.json`:
- Line 71: The message for the JSON key "audioInputUnsupportedDescription"
contains the pseudo-plural "аудиовложение(й)"; replace that with a neutral,
natural Russian phrase that works for any count (for example, use "аудиофайлы"
or "аудиозаписи") so the string reads smoothly: "Модель {model} не поддерживает
аудиоввод. {count} аудиофайлы были пропущены." Update the value for
audioInputUnsupportedDescription accordingly.
In `@src/renderer/src/pages/NewThreadPage.vue`:
- Around line 899-918: prepareFilesForCurrentModel currently calls
resolveModel(), which can return a different model than the active ACP draft
target and thus mis-filter attachments; replace the resolveModel() call with the
actual submission target used for the ACP draft (either by reading the active
ACP draft target from state/context or by adding a parameter like
submissionTarget to prepareFilesForCurrentModel) and use that selection when
calling modelClient.getCapabilities(selection.providerId, selection.modelId);
keep the filtering logic with filterUnsupportedAudioAttachments and
notifyUnsupportedAudioAttachments the same and preserve the early-return when no
selection or files are empty.
In `@src/shared/contracts/routes/models.routes.ts`:
- Around line 192-198: The input schema's audioBase64 and mimeType are unbounded
causing potential oversized IPC payloads; update the zod schema in
models.routes.ts (the input: z.object({...}) block) to add max limits: add
.max(15_000_000) to audioBase64 to cap base64 audio around ~10MB binary and add
.max(255) to mimeType (and consider .max(255) on filename.optional() if
desired). Keep the field names providerId, modelId, audioBase64, mimeType, and
filename unchanged when applying these .max(...) constraints.
In `@test/renderer/lib/audioInputSupport.test.ts`:
- Around line 12-22: Add a negative test that asserts isAudioAttachment returns
false for non-audio MIME types by creating a file via createFile with a
non-audio mimeType (e.g., application/pdf, text/plain, or image/png) and
expecting the result toBe(false); place the new test alongside the existing
'detects audio attachments from mime type' test and name it something like
'returns false for non-audio attachments' to clearly cover the negative case.
---
Outside diff comments:
In `@src/shared/modelConfigDefaults.ts`:
- Around line 8-18: DEFAULT_MODEL_SPEECH_RECOGNITION is defined but not included
in DEFAULT_MODEL_CAPABILITY_FALLBACKS, leaving fallback-derived configs with
speechRecognition undefined; update the DEFAULT_MODEL_CAPABILITY_FALLBACKS
Object.freeze block to include a speechRecognition property set to
DEFAULT_MODEL_SPEECH_RECOGNITION so fallback resolution returns false by default
(modify the DEFAULT_MODEL_CAPABILITY_FALLBACKS object where contextLength,
maxTokens, vision, functionCall, and reasoning are defined).
---
Nitpick comments:
In `@src/renderer/api/ModelClient.ts`:
- Around line 222-237: The conditional spread for the optional filename in
transcribeAudio is verbose; replace the ternary spread (...(filename ? {
filename } : {})) with a concise short-circuit spread like ...(filename && {
filename }) when building the payload for modelsTranscribeAudioRoute.name so the
filename is included only when defined.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f530e40c-8f2f-4439-8f75-4caa292bfd90
📒 Files selected for processing (68)
docs/features/voice-input-transcription/plan.mddocs/features/voice-input-transcription/spec.mddocs/features/voice-input-transcription/tasks.mdsrc/main/presenter/agentRuntimePresenter/compactionService.tssrc/main/presenter/agentRuntimePresenter/contextBuilder.tssrc/main/presenter/agentRuntimePresenter/index.tssrc/main/presenter/configPresenter/index.tssrc/main/presenter/configPresenter/modelCapabilities.tssrc/main/presenter/configPresenter/modelConfig.tssrc/main/presenter/llmProviderPresenter/aiSdk/messageMapper.tssrc/main/presenter/llmProviderPresenter/aiSdk/runtime.tssrc/main/presenter/llmProviderPresenter/baseProvider.tssrc/main/presenter/llmProviderPresenter/index.tssrc/main/presenter/llmProviderPresenter/providers/aiSdkProvider.tssrc/main/routes/config/configRouteSupport.tssrc/main/routes/models/modelRouteHandler.tssrc/renderer/api/ModelClient.tssrc/renderer/src/components/chat/ChatInputBox.vuesrc/renderer/src/components/chat/ChatInputToolbar.vuesrc/renderer/src/components/chat/ChatStatusBar.vuesrc/renderer/src/components/chat/composables/useAudioRecorder.tssrc/renderer/src/components/chat/composables/useSpeechRecognition.tssrc/renderer/src/components/chat/composables/useVoiceInput.tssrc/renderer/src/components/settings/ModelConfigDialog.vuesrc/renderer/src/i18n/da-DK/chat.jsonsrc/renderer/src/i18n/da-DK/settings.jsonsrc/renderer/src/i18n/en-US/chat.jsonsrc/renderer/src/i18n/en-US/settings.jsonsrc/renderer/src/i18n/fa-IR/chat.jsonsrc/renderer/src/i18n/fa-IR/settings.jsonsrc/renderer/src/i18n/fr-FR/chat.jsonsrc/renderer/src/i18n/fr-FR/settings.jsonsrc/renderer/src/i18n/he-IL/chat.jsonsrc/renderer/src/i18n/he-IL/settings.jsonsrc/renderer/src/i18n/ja-JP/chat.jsonsrc/renderer/src/i18n/ja-JP/settings.jsonsrc/renderer/src/i18n/ko-KR/chat.jsonsrc/renderer/src/i18n/ko-KR/settings.jsonsrc/renderer/src/i18n/pt-BR/chat.jsonsrc/renderer/src/i18n/pt-BR/settings.jsonsrc/renderer/src/i18n/ru-RU/chat.jsonsrc/renderer/src/i18n/ru-RU/settings.jsonsrc/renderer/src/i18n/zh-CN/chat.jsonsrc/renderer/src/i18n/zh-CN/settings.jsonsrc/renderer/src/i18n/zh-HK/chat.jsonsrc/renderer/src/i18n/zh-HK/settings.jsonsrc/renderer/src/i18n/zh-TW/chat.jsonsrc/renderer/src/i18n/zh-TW/settings.jsonsrc/renderer/src/lib/audioInputSupport.tssrc/renderer/src/pages/ChatPage.vuesrc/renderer/src/pages/NewThreadPage.vuesrc/shared/contracts/domainSchemas.tssrc/shared/contracts/routes.tssrc/shared/contracts/routes/models.routes.tssrc/shared/modelConfigDefaults.tssrc/shared/types/core/chat-message.tssrc/shared/types/presenters/legacy.presenters.d.tssrc/shared/types/presenters/llmprovider.presenter.d.tstest/main/presenter/agentRuntimePresenter/contextBuilder.test.tstest/main/presenter/llmProviderPresenter.test.tstest/main/presenter/llmProviderPresenter/aiSdkMessageMapper.test.tstest/main/presenter/llmProviderPresenter/openAICompatibleProvider.test.tstest/main/presenter/llmProviderPresenter/openAIResponsesProvider.test.tstest/renderer/components/ChatInputBox.test.tstest/renderer/components/ChatInputToolbar.test.tstest/renderer/components/ModelConfigDialog.test.tstest/renderer/composables/useSpeechRecognition.test.tstest/renderer/lib/audioInputSupport.test.ts
👮 Files not reviewed due to content moderation or server errors (4)
- src/main/presenter/agentRuntimePresenter/contextBuilder.ts
- test/main/presenter/agentRuntimePresenter/contextBuilder.test.ts
- src/main/presenter/agentRuntimePresenter/index.ts
- src/main/presenter/agentRuntimePresenter/compactionService.ts
20260514_173950.mp4
20260514_174252.mp4
Summary by CodeRabbit
New Features
Documentation
Localization