Skip to content

feat(chat): add voice input with real-time transcription#3105

Merged
guitavano merged 5 commits intomainfrom
feat/voice-input-chat
Apr 13, 2026
Merged

feat(chat): add voice input with real-time transcription#3105
guitavano merged 5 commits intomainfrom
feat/voice-input-chat

Conversation

@guitavano
Copy link
Copy Markdown
Contributor

@guitavano guitavano commented Apr 13, 2026

Summary

  • Adds a microphone button to the chat input (only shown when the browser supports SpeechRecognition)
  • Real-time speech-to-text using the browser-native Web Speech API — no backend changes required
  • Live waveform visualizer driven by the Web Audio API (AnalyserNode) animates while listening
  • Transcribed text is appended to any existing editor content (not replaced), so the user can mix typing and dictation freely
  • Cancel (×) and confirm (✓) controls match the existing input style
  • Proper microphone permission handling: getUserMedia is called upfront to trigger the browser prompt; if denied, the mic button turns red with a descriptive tooltip and clicking it re-prompts

Test plan

  • Click the mic button — browser should ask for microphone permission
  • Deny permission — mic button should turn red with tooltip "Microphone access denied — click to try again"
  • Grant permission — waveform overlay should appear and animate while speaking
  • Speak — interim transcript (dimmed) should appear live, final transcript (solid) should accumulate
  • Press ✓ — transcribed text should be inserted at the cursor / appended to existing text in the editor
  • Press × — overlay should dismiss with no changes to the editor
  • Type something first, then use voice — transcribed text should be appended after existing content
  • Test on Firefox (Speech API unsupported) — mic button should not appear

Made with Cursor


Summary by cubic

Add voice input to chat with real-time transcription that types directly into the editor and shows a live waveform. During recording the editor locks; cancel restores previous text, confirm keeps the transcription.

  • New Features

    • Mic button shows only when SpeechRecognition is available; hidden on unsupported browsers.
    • Transcript appears live in the editor while recording; input is disabled to prevent edits.
    • Bottom bar switches to waveform with ×/✓; cancel restores prior content, confirm keeps the text.
    • Permission flow via getUserMedia; denied state turns the mic red with a tooltip and allows retry.
    • New use-voice-input hook and VoiceWaveform; TiptapInput adds appendText(), syncVoiceText(), and restoreContent().
  • Bug Fixes

    • Added Web Speech API global types in globals.d.ts for SpeechRecognition and related interfaces to fix TypeScript builds and noUncheckedIndexedAccess issues.
    • Prevented losing last interim words on stop and ensured a space is added before appended voice text when baseline content exists.
    • Removed unused VoiceInputOverlay export.

Written for commit ec23fa4. Summary will update on new commits.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Release Options

Suggested: Minor (2.261.0) — based on feat: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.260.3-alpha.1
🎉 Patch 2.260.3
❤️ Minor 2.261.0
🚀 Major 3.0.0

Current version: 2.260.2

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/mesh/src/web/components/chat/input.tsx">

<violation number="1" location="apps/mesh/src/web/components/chat/input.tsx:334">
P1: Transcription can be lost on confirm because `appendText` is called while `TiptapInput` is unmounted in recording mode.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Transcript now appears live in the normal text area instead of a
separate overlay. The bottom bar switches to waveform + accept/decline
during recording. Waveform uses chart-2 color and reads low-mid
frequency bins where speech energy lives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/mesh/src/web/components/chat/input.tsx">

<violation number="1" location="apps/mesh/src/web/components/chat/input.tsx:337">
P1: Use the final text returned by `stopRecording()` when confirming, otherwise the last dictated words can be lost.</violation>

<violation number="2" location="apps/mesh/src/web/components/chat/input.tsx:354">
P2: Prefix dictated text with a space when baseline content is non-empty so transcription truly appends instead of concatenating words.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

rafavalls and others added 2 commits April 13, 2026 14:18
Use the final text returned by stopRecording() to guarantee interim
words captured in the ref are committed. Add a space before voice
text when baseline content is non-empty to avoid word concatenation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@viktormarinho viktormarinho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@guitavano guitavano merged commit a4ac87e into main Apr 13, 2026
15 checks passed
@guitavano guitavano deleted the feat/voice-input-chat branch April 13, 2026 18:09
tlgimenes pushed a commit that referenced this pull request Apr 13, 2026
* feat(chat): add voice input with real-time transcription and waveform

Made-with: Cursor

* fix(chat): add Web Speech API global types and fix noUncheckedIndexedAccess

Made-with: Cursor

* fix(chat): rework voice input to type directly into textarea

Transcript now appears live in the normal text area instead of a
separate overlay. The bottom bar switches to waveform + accept/decline
during recording. Waveform uses chart-2 color and reads low-mid
frequency bins where speech energy lives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(chat): prevent last words being lost and fix space before voice text

Use the final text returned by stopRecording() to guarantee interim
words captured in the ref are committed. Add a space before voice
text when baseline content is non-empty to avoid word concatenation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(chat): remove unused VoiceInputOverlay export

Made-with: Cursor

---------

Co-authored-by: rafavalls <valls@deco.cx>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
tlgimenes pushed a commit that referenced this pull request Apr 14, 2026
* feat(chat): add voice input with real-time transcription and waveform

Made-with: Cursor

* fix(chat): add Web Speech API global types and fix noUncheckedIndexedAccess

Made-with: Cursor

* fix(chat): rework voice input to type directly into textarea

Transcript now appears live in the normal text area instead of a
separate overlay. The bottom bar switches to waveform + accept/decline
during recording. Waveform uses chart-2 color and reads low-mid
frequency bins where speech energy lives.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(chat): prevent last words being lost and fix space before voice text

Use the final text returned by stopRecording() to guarantee interim
words captured in the ref are committed. Add a space before voice
text when baseline content is non-empty to avoid word concatenation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(chat): remove unused VoiceInputOverlay export

Made-with: Cursor

---------

Co-authored-by: rafavalls <valls@deco.cx>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants