Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the voice dictation feature to simplify the codebase before implementing local whisper support. The changes consolidate multiple scattered files and components into a unified architecture, replacing localStorage-based settings with backend configuration storage.
Changes:
- Consolidated audio transcription endpoints from
/audio/*to/dictation/*with unified provider handling - Replaced localStorage settings persistence with backend config API (
voice_dictation_provider) - Simplified UI components by merging 5 separate dictation components into a single
DictationSettingscomponent - Removed the
VOICE_DICTATION_ELEVENLABS_ENABLEDfeature flag - Replaced complex waveform visualizer with simple "Recording..." indicator
- Removed 378 lines from
useWhisper.tsand replaced with 249-line simplifieduseAudioRecorder.ts
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| ui/desktop/src/updates.ts | Removed ElevenLabs feature flag |
| ui/desktop/src/hooks/useWhisper.ts | Deleted, replaced by useAudioRecorder.ts |
| ui/desktop/src/hooks/useDictationSettings.ts | Deleted, settings now in backend config |
| ui/desktop/src/hooks/dictationConstants.ts | Deleted, constants moved to backend |
| ui/desktop/src/hooks/useAudioRecorder.ts | New simplified audio recording hook using backend API |
| ui/desktop/src/components/settings/dictation/* | Five component files deleted, replaced with single DictationSettings.tsx |
| ui/desktop/src/components/settings/dictation/DictationSettings.tsx | New unified settings component with provider selection and API key management |
| ui/desktop/src/components/settings/chat/ChatSettingsSection.tsx | Updated imports and reorganized settings layout |
| ui/desktop/src/components/ChatInput.tsx | Updated to use new hook, removed WaveformVisualizer, simplified recording UI |
| ui/desktop/src/api/* | Generated types and SDK methods for new dictation endpoints |
| ui/desktop/openapi.json | Added new dictation endpoint schemas |
| crates/goose-server/src/routes/mod.rs | Replaced audio module with dictation module |
| crates/goose-server/src/routes/dictation.rs | New unified backend handling both OpenAI and ElevenLabs with provider-agnostic API |
| crates/goose-server/src/routes/audio.rs | Deleted old implementation |
| crates/goose-server/src/openapi.rs | Updated OpenAPI schema definitions |
| audio: { | ||
| echoCancellation: true, | ||
| noiseSuppression: true, | ||
| autoGainControl: true, |
There was a problem hiding this comment.
The sampleRate: 44100 constraint was removed from the audio configuration. While the browser will use its default sample rate, this could result in inconsistent audio quality across different browsers and devices. Consider keeping an explicit sample rate to ensure consistent transcription quality.
| autoGainControl: true, | |
| autoGainControl: true, | |
| sampleRate: 44100, |
| pub fn routes(state: Arc<AppState>) -> Router { | ||
| Router::new() | ||
| .route("/dictation/transcribe", post(transcribe_dictation)) | ||
| .route("/dictation/config", get(get_dictation_config)) | ||
| .with_state(state) | ||
| } |
There was a problem hiding this comment.
The old audio.rs tests were removed but no replacement tests were added for the new dictation.rs module. Consider adding tests to verify the transcription endpoints and provider configuration logic work correctly.
| <b>Chat</b> {'>'} <b>Voice Dictation.</b> | ||
| </p> | ||
| ) : dictationSettings.provider === null ? ( | ||
| ) : dictationProvider === null ? ( |
There was a problem hiding this comment.
Variable 'dictationProvider' cannot be of type null, but it is compared to an expression of type null.
| ) : dictationProvider === null ? ( | |
| ) : dictationProvider === undefined ? ( |
|
we talked about cleaning this up @Abhijay007 & @lifeizhou-ap - here's my take. my next plan is to build support for local whisper. |
lifeizhou-ap
left a comment
There was a problem hiding this comment.
The cleanup is great! It also uses dictation/config to check whether the api key is configured, it is much faster than render the config secret key!
Co-authored-by: Douwe Osinga <douwe@squareup.com> Signed-off-by: Harrison <hcstebbins@gmail.com>
Summary
I was going to look at making whisper run locally, but noticed that the current flows are, eh, a mess, so cleaning that up first.