feat: voice-typing using external Whisper / ALM API by heimoshuiyu · Pull Request #9264 · anomalyco/opencode

heimoshuiyu · 2026-01-18T16:51:08Z

This pull request implements the voice typing feature. It uses external Whisper API or multimodal large models (such as gpt-4o, qwen3-omni) for voice input. It uses the last message as a prompt (context) to improve contextual recognition accuracy (key!).

Related issues and PR:

This feature follows the frontend-backend separation design of opencode.

TUI: Uses tools like ffmpeg, sox, rec, arecord to record the microphone (I only tested ffmpeg on Linux, as I don't have other devices)
APP: Uses the browser's native microphone recording interface

Two types of speech recognition services can be configured, with whisper as the default:

whisper: Compatible with the OpenAI /v1/audio/transcriptions interface
alm: Uses the speech input capability of multimodal large models for transcription

It uses the last assistant message in the current session + the text in the input box as a prompt for speech transcription. This contextual understanding ability allows you to directly voice input special terms like code paths, variable names, etc.!

For testing convenience, I have set up a web frontend with voice input functionality at https://d3ir6x3lfy3u68.cloudfront.net. And replaced the hardcoded https://app.opencode.ai in opencode (in the third commit Add web deploy skill and configurable web proxy)

Disclaimer: Most of the code was vibe coded and then roughly checked by me. So this might be a relatively rough implementation (or even just a POC). Welcome to provide various improvement suggestions, or take this idea and implement it yourself.

External resources:

I used https://github.com/heimoshuiyu/whisper-fastapi to set up a local whisper API interface
I used the Qwen/Qwen3-Omni-30B-A3B-Instruct model on https://cloud.siliconflow.cn / https://cloud.siliconflow.com

Here is the demo

tui.mp4

app.mp4

github-actions · 2026-01-18T16:51:18Z

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

Open an issue describing the bug/feature (if one doesn't exist)
Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

github-actions · 2026-01-18T16:51:43Z

The following comment was made by an LLM, it may be inaccurate:

Potential Duplicate Found

PR #3827: Add voice-to-text transcription feature

Add voice-to-text transcription feature #3827

Why it's related: This PR is already referenced in the current PR's description as a related issue. It appears to be a previous attempt or related work on voice-to-text transcription functionality. The current PR (9264) may be an updated implementation or continuation of this feature.

calebdw · 2026-01-18T18:38:39Z

See also https://github.com/goodroot/hyprwhspr, which I've started using for system wide STT, not just in opencode

heimoshuiyu · 2026-01-18T18:44:05Z

@calebdw Great project. However, integrating voice input in OpenCode is necessary because it can capture context to significantly improve transcription accuracy.

heimoshuiyu · 2026-01-21T09:12:56Z

rebased, ready to merge

telnet2 · 2026-01-22T07:27:49Z

I cannot wait to see it working

Mikec78660 · 2026-01-22T08:26:34Z

This works great. Set it up with a local speaches server and it works perfect. Using model Systran/faster-distil-whisper-large-v3.

heimoshuiyu · 2026-01-22T10:05:10Z

@Mikec78660 Glad you got it working! Could you share what issues you encountered during configuration? We should show a toast message when configuration errors occur.

Mikec78660 · 2026-01-22T10:25:15Z

@heimoshuiyu I had https instead of http in my url for my stt server like an idiot. I just coped and pasted my server name into the url field and didn't even notice. But it was weird because the Record "button" was just ghosted out and would not let me press it like I had not setup voice at all. I was expecting it would give me some sort of error. Like let me hit the button and then get an error that the server was unreachable or something. But it was completely my mistake that I even had an issue.

Here is my entry in opencode.json

  "voice": {
    "type": "whisper",
    "whisper": {
      "url": "http://speaches.lan:8000/v1/audio/transcriptions",
      "apiKey": "1234",
      "model": "Systran/faster-distil-whisper-large-v3",
      "language": "en"
    }
  }

move whisper config into config document whisper voice config remove tui voice enabled Fix voice error handling and whisper context

When voice transcription completes, addPart now checks if the current selection is within the prompt editor. If the selection is outside the editor (e.g., user clicked on an assistant message during recording), it focuses the editor and restores the cursor position from prompt.cursor() before inserting the transcribed text. This prevents transcription results from being inserted into unintended locations like assistant messages. Also fixes cursor position logic to prefer real DOM position when selection is inside the editor, only falling back to prompt.cursor() when selection is outside.

mohamedbouddi7777-dev

اوكي مرحبا بسم الله الرحمن الرحيم توكلنا على الله انشر في المكان يعني هو هنا يعني لا تنشر هذه الامور في جيميل

heimoshuiyu · 2026-01-22T11:45:48Z

@Mikec78660 Thank you for sharing. I just fixed an issue where clicking the gray RECORD button in the TUI was unresponsive.

- Your work (PR anomalyco#9264, agent sidebar) preserved - Conflicts in prompt-input.tsx, server.ts, icon.tsx resolved in favor of our changes

Mikec78660 · 2026-01-26T09:20:22Z

@heimoshuiyu Have you considered adding wake word capability?

heimoshuiyu · 2026-01-26T09:43:40Z

@Mikec78660 That might be another PR. Whisper and ALM do not have wake word functionality. To support wake word, it may be necessary to use other smaller models to continuously monitor the microphone.

heimoshuiyu requested a review from adamdotdevin as a code owner January 18, 2026 16:51

github-actions bot added the needs:issue label Jan 18, 2026

github-actions bot removed the needs:issue label Jan 18, 2026

This was referenced Jan 18, 2026

Add voice-to-text transcription feature #3827

Open

feat: first party support for voice conversing #2425

Open

[FEATURE]: Speech-to-Text Voice Input for Lazy People in OpenCode #4695

Open

heimoshuiyu force-pushed the voice-typing branch from 596c1f5 to a3b6a2c Compare January 21, 2026 09:12

heimoshuiyu force-pushed the voice-typing branch from a3b6a2c to 7200cbc Compare January 21, 2026 09:13

Mikec78660 pushed a commit to Mikec78660/opencode that referenced this pull request Jan 22, 2026

Merge PR anomalyco#9264: Add voice-to-text functionality

462c910

heimoshuiyu added 6 commits January 22, 2026 18:42

feat: add voice input feature

f2e698a

move whisper config into config document whisper voice config remove tui voice enabled Fix voice error handling and whisper context

feat: add ALM voice transcription

77a4589

Add web deploy skill and configurable web proxy

b40b3ef

Update default Whisper URL

e3c04a3

feat: show spinner while transcribing

b547bf5

heimoshuiyu force-pushed the voice-typing branch from 7200cbc to e271bb1 Compare January 22, 2026 10:44

mohamedbouddi7777-dev reviewed Jan 22, 2026

View reviewed changes

fix(tui): show warning toast when clicking disabled voice button

32318b5

Mikec78660 mentioned this pull request Jan 24, 2026

[Question]: Can Oh My OpenCode use local small language models (<32B)? code-yeongyu/oh-my-opencode#585

Closed

3 tasks

berenar mentioned this pull request Jan 28, 2026

feat(tui): Allow to expand pasted text (AKA "pasted summary") #8496

Open

3 tasks

thdxr force-pushed the dev branch from cbab81f to 2d3c7a0 Compare January 30, 2026 04:49

opencode-agent bot force-pushed the dev branch from 00637c0 to 71e0ba2 Compare January 30, 2026 14:32

thdxr force-pushed the dev branch 3 times, most recently from f1ae801 to 08fa7f7 Compare January 30, 2026 14:37

cmdNiels mentioned this pull request Jan 30, 2026

feat: add first-party voice transcription with local Whisper #11345

Open

Conversation

heimoshuiyu commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 18, 2026

Uh oh!

github-actions bot commented Jan 18, 2026

Potential Duplicate Found

Uh oh!

calebdw commented Jan 18, 2026

Uh oh!

heimoshuiyu commented Jan 18, 2026

Uh oh!

heimoshuiyu commented Jan 21, 2026

Uh oh!

telnet2 commented Jan 22, 2026

Uh oh!

Mikec78660 commented Jan 22, 2026

Uh oh!

heimoshuiyu commented Jan 22, 2026

Uh oh!

Mikec78660 commented Jan 22, 2026

Uh oh!

mohamedbouddi7777-dev left a comment

Choose a reason for hiding this comment

Uh oh!

heimoshuiyu commented Jan 22, 2026

Uh oh!

Mikec78660 commented Jan 26, 2026

Uh oh!

heimoshuiyu commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

heimoshuiyu commented Jan 18, 2026 •

edited

Loading