🤖 feat: voice input mode with OpenAI transcription #836

ammar-agent · 2025-12-02T06:07:41Z

Adds voice dictation capability to the chat input using OpenAI's gpt-4o-transcribe model.

Features

Voice recording via MediaRecorder API (webm/opus format)
OpenAI transcription via backend IPC (API key stays server-side)
Recording overlay replaces textarea with animated waveform visualization
Multiple shortcuts:
- Space on empty input → start recording
- Space during recording → stop and send immediately
- Ctrl+D / Cmd+D → toggle recording anytime
- Escape → cancel recording (discard audio)
Global keybinds during recording work regardless of focus
User education when OpenAI key not configured (disabled button with tooltip)

UI States

State	Appearance
Idle	Subtle gray mic icon in textarea corner
Recording	Blue border overlay with animated waveform
Transcribing	Amber border overlay, waiting for API
No API Key	Disabled mic with explanatory tooltip

Implementation

useVoiceInput hook with clean state enum (idle / recording / transcribing)
VoiceInputButton floating component
WaveformBars reusable animated component
IPC channel voice:transcribe for backend API calls
Hidden on mobile (native keyboards have built-in dictation)

Files Changed

src/browser/hooks/useVoiceInput.ts - Core hook
src/browser/components/ChatInput/VoiceInputButton.tsx - Button component
src/browser/components/ChatInput/WaveformBars.tsx - Animation component
src/browser/components/ChatInput/index.tsx - Integration
src/node/services/ipcMain.ts - Backend transcription handler
src/common/constants/ipc-constants.ts - IPC channel
src/common/types/ipc.ts - Type definitions

Generated with mux

- Add Ctrl+D / Cmd+D keybind to toggle voice recording - Add mic button next to send button (hidden on mobile or when no OpenAI key) - Add command palette command for voice input toggle - Record audio via MediaRecorder, transcribe via Whisper API - Show recording indicator while capturing, spinner while transcribing - Append dictated text to existing input - Handle errors with user-friendly toast messages Requires OpenAI API key to be configured in Settings > Providers. _Generated with mux_

- Show overlay during both recording AND transcribing states (prevents jarring snap-back to empty textarea when waiting for API) - Change colors from red (error-like) to blue (recording) and amber (transcribing) - Disable overlay button while transcribing to prevent double-clicks

- Space key during recording: stops and sends immediately - Ctrl+D/Cmd+D: stops recording, keeps text in input (existing) - Reduced border from border-2 to border (less crowded near controls) - Updated overlay text to show both shortcuts

- Auto-focus the recording overlay button so spacebar works - Add mb-1 margin to prevent border touching controls below

- Add onSend callback to useVoiceInput options - Add stopListeningAndSend method that sets a flag before stopping - When transcription completes, if flag was set, call onSend - Use setTimeout(0) to let React flush state update before sending - Simplifies ChatInput code by moving logic into the hook

- Reduce gap between control rows from gap-1 to gap-0.5 - Reduce vertical wrap gap from gap-y-2 to gap-y-1 - Reduce send button padding from px-2 py-1 to px-1.5 py-0.5

- Change ToggleGroup padding from px-2 py-1 to px-1.5 py-0.5 - Keeps mode selector and send button visually consistent

User education: - Show mic button even without OpenAI key (disabled with tooltip) - Tooltip explains: 'Configure in Settings → Providers' - Toast error when trying to use keybind/command without key DRY improvements: - Extract WaveformBars component for reusable animated bars - Remove unused Web Speech API error message mappings Code quality: - Add isApiKeySet to hook result for explicit checking - shouldShowUI now only checks platform support, not API key - Verified no race conditions in hook logic

- Replace isListening/isTranscribing booleans with single state enum - Merge stopListening/stopListeningAndSend into stop(options?) - Rename methods: startListening→start, toggleListening→toggle - Consolidate callback refs into single callbacksRef object - Move platform checks (isMobile, isSupported) to module scope - Simplify VoiceInputButton with STATE_CONFIG lookup table - Inline simple callbacks in ChatInput (no separate handlers)

- Pressing space on empty chat input starts voice recording (convenient alternative to Cmd+D) - Pressing escape during recording cancels without transcribing - Add cancel() method to voice hook that sets flag to skip transcribe - Updated overlay text to show all shortcuts

- Add focus:outline-none to recording overlay button - Update tooltip to document all shortcuts: - Space on empty input to start - Cmd+D anytime to toggle - Space during recording to send - Escape to cancel

- Add window-level keydown listener active only during recording - Space and Escape work even if overlay button loses focus - Removed redundant local onKeyDown and auto-focus from button

- Add providersConfig option to setupSimpleChatStory helper - Add VoiceInputNoApiKey story showing disabled mic with tooltip - Documents user education for missing OpenAI key

Ensures start() is a no-op on mobile even if somehow called directly, complementing the UI-layer shouldShowUI guards.

Merge command palette toggle and global recording keybinds into single useEffect with shared cleanup.

- Rename isMobile to HAS_TOUCH_DICTATION with clear doc comment - Remove screen size check (iPads have dictation regardless of size) - Add section headers for visual organization - Extract releaseStream helper to reduce duplication - Improve variable names (recorder, chunks, buffer) - Add early returns to reduce nesting in transcribe() - Rename refs for clarity (shouldSendRef, wasCancelledRef) - Better comments explaining the state machine and logic

ammar-agent added 18 commits December 1, 2025 23:30

fix: auto-focus recording button for spacebar, add margin

5914ed1

- Auto-focus the recording overlay button so spacebar works - Add mb-1 margin to prevent border touching controls below

style: reduce vertical space in chat controls

53cefb3

- Reduce gap between control rows from gap-1 to gap-0.5 - Reduce vertical wrap gap from gap-y-2 to gap-y-1 - Reduce send button padding from px-2 py-1 to px-1.5 py-0.5

style: make toggle group match send button size

bc400fb

- Change ToggleGroup padding from px-2 py-1 to px-1.5 py-0.5 - Keeps mode selector and send button visually consistent

fix: remove ugly focus ring, improve voice tooltip

f82b49f

- Add focus:outline-none to recording overlay button - Update tooltip to document all shortcuts: - Space on empty input to start - Cmd+D anytime to toggle - Space during recording to send - Escape to cancel

fix: use gpt-4o-transcribe model instead of whisper-1

91c65e7

fix: global keybinds during recording work regardless of focus

92e31f3

- Add window-level keydown listener active only during recording - Space and Escape work even if overlay button loses focus - Removed redundant local onKeyDown and auto-focus from button

test: add Storybook story for voice input without API key

5d71600

- Add providersConfig option to setupSimpleChatStory helper - Add VoiceInputNoApiKey story showing disabled mic with tooltip - Documents user education for missing OpenAI key

fix: add defense-in-depth mobile check in voice start()

fe15e4d

Ensures start() is a no-op on mobile even if somehow called directly, complementing the UI-layer shouldShowUI guards.

refactor: consolidate voice input useEffects

8c56c9c

Merge command palette toggle and global recording keybinds into single useEffect with shared cleanup.

fix: make E2E test more specific for OpenAI provider button

7e1294b

ammar-agent force-pushed the voice-input-mode branch from 40957e4 to 7e1294b Compare December 2, 2025 06:22

ammario linked an issue Dec 2, 2025 that may be closed by this pull request

Voice to text input #521

Closed

ammario merged commit 2820a98 into main Dec 2, 2025
13 checks passed

ammario deleted the voice-input-mode branch December 2, 2025 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 feat: voice input mode with OpenAI transcription #836

🤖 feat: voice input mode with OpenAI transcription #836

Uh oh!

ammar-agent commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 feat: voice input mode with OpenAI transcription #836

🤖 feat: voice input mode with OpenAI transcription #836

Uh oh!

Conversation

ammar-agent commented Dec 2, 2025

Features

UI States

Implementation

Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants