-
Notifications
You must be signed in to change notification settings - Fork 27
🤖 feat: voice input mode with OpenAI transcription #836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add Ctrl+D / Cmd+D keybind to toggle voice recording - Add mic button next to send button (hidden on mobile or when no OpenAI key) - Add command palette command for voice input toggle - Record audio via MediaRecorder, transcribe via Whisper API - Show recording indicator while capturing, spinner while transcribing - Append dictated text to existing input - Handle errors with user-friendly toast messages Requires OpenAI API key to be configured in Settings > Providers. _Generated with mux_
- Show overlay during both recording AND transcribing states (prevents jarring snap-back to empty textarea when waiting for API) - Change colors from red (error-like) to blue (recording) and amber (transcribing) - Disable overlay button while transcribing to prevent double-clicks
- Space key during recording: stops and sends immediately - Ctrl+D/Cmd+D: stops recording, keeps text in input (existing) - Reduced border from border-2 to border (less crowded near controls) - Updated overlay text to show both shortcuts
- Auto-focus the recording overlay button so spacebar works - Add mb-1 margin to prevent border touching controls below
- Add onSend callback to useVoiceInput options - Add stopListeningAndSend method that sets a flag before stopping - When transcription completes, if flag was set, call onSend - Use setTimeout(0) to let React flush state update before sending - Simplifies ChatInput code by moving logic into the hook
- Reduce gap between control rows from gap-1 to gap-0.5 - Reduce vertical wrap gap from gap-y-2 to gap-y-1 - Reduce send button padding from px-2 py-1 to px-1.5 py-0.5
- Change ToggleGroup padding from px-2 py-1 to px-1.5 py-0.5 - Keeps mode selector and send button visually consistent
User education: - Show mic button even without OpenAI key (disabled with tooltip) - Tooltip explains: 'Configure in Settings → Providers' - Toast error when trying to use keybind/command without key DRY improvements: - Extract WaveformBars component for reusable animated bars - Remove unused Web Speech API error message mappings Code quality: - Add isApiKeySet to hook result for explicit checking - shouldShowUI now only checks platform support, not API key - Verified no race conditions in hook logic
- Replace isListening/isTranscribing booleans with single state enum - Merge stopListening/stopListeningAndSend into stop(options?) - Rename methods: startListening→start, toggleListening→toggle - Consolidate callback refs into single callbacksRef object - Move platform checks (isMobile, isSupported) to module scope - Simplify VoiceInputButton with STATE_CONFIG lookup table - Inline simple callbacks in ChatInput (no separate handlers)
- Pressing space on empty chat input starts voice recording (convenient alternative to Cmd+D) - Pressing escape during recording cancels without transcribing - Add cancel() method to voice hook that sets flag to skip transcribe - Updated overlay text to show all shortcuts
- Add focus:outline-none to recording overlay button - Update tooltip to document all shortcuts: - Space on empty input to start - Cmd+D anytime to toggle - Space during recording to send - Escape to cancel
- Add window-level keydown listener active only during recording - Space and Escape work even if overlay button loses focus - Removed redundant local onKeyDown and auto-focus from button
- Add providersConfig option to setupSimpleChatStory helper - Add VoiceInputNoApiKey story showing disabled mic with tooltip - Documents user education for missing OpenAI key
Ensures start() is a no-op on mobile even if somehow called directly, complementing the UI-layer shouldShowUI guards.
Merge command palette toggle and global recording keybinds into single useEffect with shared cleanup.
- Rename isMobile to HAS_TOUCH_DICTATION with clear doc comment - Remove screen size check (iPads have dictation regardless of size) - Add section headers for visual organization - Extract releaseStream helper to reduce duplication - Improve variable names (recorder, chunks, buffer) - Add early returns to reduce nesting in transcribe() - Rename refs for clarity (shouldSendRef, wasCancelledRef) - Better comments explaining the state machine and logic
40957e4 to
7e1294b
Compare
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds voice dictation capability to the chat input using OpenAI's gpt-4o-transcribe model.
Features
UI States
Implementation
useVoiceInputhook with clean state enum (idle/recording/transcribing)VoiceInputButtonfloating componentWaveformBarsreusable animated componentvoice:transcribefor backend API callsFiles Changed
src/browser/hooks/useVoiceInput.ts- Core hooksrc/browser/components/ChatInput/VoiceInputButton.tsx- Button componentsrc/browser/components/ChatInput/WaveformBars.tsx- Animation componentsrc/browser/components/ChatInput/index.tsx- Integrationsrc/node/services/ipcMain.ts- Backend transcription handlersrc/common/constants/ipc-constants.ts- IPC channelsrc/common/types/ipc.ts- Type definitionsGenerated with
mux