Skip to content

AIWander/Voice-Command

Voice-Command

CI License: Apache 2.0 Python 3.11+

Talk to your AI. Hear it work.

Voice-Command lets you voice-control your AI end-to-end. You say what you want done — Claude chat or another AI does it, using whatever tools, connectors, and MCPs it has access to — and narrates what it's doing as it goes. The "Command" isn't a euphemism. Anything the AI can do, you can ask for out loud: search the web, check your calendar, send email through your connectors, write or fix code, edit files on your computer, run shell commands, kick off automations. If it can do it typed, you can do it spoken.

Under the hood it uses faster-whisper to understand what you say (running fully on your own computer — your voice doesn't go to the cloud) and edge-tts to speak responses back. It also reads the feel of how you say things — excited, hesitant, frustrated — and passes that along so the AI can respond more naturally.


🔒 Stays on your computer

Voice-Command and all its dependencies are local tools that run on your hardware. Speech-to-text, text-to-speech, audio capture, audio playback — all of it happens on your machine. Voice-Command itself never reaches out to the internet on its own. Your AI may reach out (Claude pings Anthropic, ChatGPT pings OpenAI, and so on), and the AI may use other tools that reach out (web search, email connectors), but the voice layer adds zero outbound traffic. For a fully-offline setup, pair it with a local model in LM Studio.


🖥️ Platform support

Currently Windows-only. For Linux, see the upstream AIWander/voice repo — there's a community fork there with Linux support. macOS support is coming.


Works with

Voice-Command is a STDIO MCP server, so it plugs into any AI client that speaks MCP. That includes:

  • Claude (chat) — Claude Desktop and the web app
  • Cowork — Claude's desktop agent
  • Claude Code — the CLI coding agent
  • Codex CLI — OpenAI / GPT
  • Gemini CLI — Google
  • LM Studio — for running local models (Llama, Qwen, Mistral, whatever you've loaded up)
  • Anything else that can call a STDIO MCP server — the protocol is the only requirement

It doesn't care which model is on the other end. If your AI of choice can call MCP tools, you can talk to it.


How a turn works

  1. You hear a series of beeps. That's the AI's "I'm listening, your turn" cue.
  2. You talk. Say what you want done. Anything your AI has the tools to handle counts.
  3. The AI works — and tells you out loud what it's doing as it goes. ("Checking your calendar… found three events tomorrow… drafting the reply…")
  4. You hear the beeps again. The AI's done with that turn. Your move.

One thing to know up front: the audio flow is one-way at a time. You can't cut the AI off mid-sentence with your voice — once it's talking or working, the only way to interrupt is to click or tap directly in the AI's UI. The beeps are the only voice handoff signal.


How to end a session

Just tell the AI you're done — "I'm done talking," "let's talk later," "bye for now," anything in that family. Or hit the stop button in your AI's UI. Both work; saying it out loud is more graceful.


What you can actually ask for

Anything your AI has the tools to do. A few examples to give you the shape of it:

  • "Check my calendar for tomorrow and read me what's on it." → uses your Google Calendar connector
  • "Search the web for the latest on [topic] and summarize." → uses web search
  • "Read me the README in this folder." → uses local filesystem
  • "Find the file we were editing yesterday and fix the bug we talked about." → uses filesystem + memory of past chats
  • "Send an email to Sarah saying I'll be ten minutes late." → uses your Gmail connector
  • "Run the deploy script and tell me when it's done." → uses shell access

The voice layer doesn't add capabilities — it just changes how you reach them. Whatever connectors, MCPs, and tools you've already hooked up to your AI all work the same. You're just using your mouth instead of your keyboard.


Pairs nicely with

Voice-Command is most useful when your AI also has hands. These three companion MCPs are all local tools that live on your computer, all callable by voice once Voice-Command is wired up:

  • ops — file and shell operations: read/write files, run commands, manage processes
  • hands — browser automation, Windows UI control, vision/OCR
  • workflow — API discovery and replay, credential vault, scheduled flows

Install any combination. Voice-Command is the mouth and ears; these are the rest of the body. All of them run locally, none of them reach out unless the AI explicitly asks them to.


The easy way to install: ask your AI to do it

This is the whole point. You shouldn't need a CS degree to get this running.

If you have Claude Desktop with ops installed, Cowork, Claude Code, Codex CLI, or Gemini CLI open right now, copy this and paste it to your AI:

https://github.com/AIWander/Voice-Command — Can you install this MCP for us to use here, set up the voice listening server, and make me a .bat to launch it. Walk me through any restart or step I need to do. Tell me clearly when everything's installed and we're ready to talk.

Your AI will:

  1. Grab the right voice-mcp.exe for your computer (ARM64 or x64) from the latest release
  2. Drop it somewhere sensible (usually C:\CPC\servers\)
  3. Wire it into your AI client's MCP config file — your existing setup is preserved, and a timestamped backup is made first, so nothing breaks
  4. Clone this repo and install the Python pieces
  5. Write you a START_VOICE_SERVER.bat you can double-click whenever you want to talk
  6. Walk you through restarting your AI client and starting the listener
  7. Tell you when everything's ready

Then you're talking. Literally.

After install — starting voice mode is just asking for it. Once everything's running and you've restarted your AI client, you don't need to paste anything else. On Claude chat or Cowork, just ask the AI "let's talk" or "start voice mode" — it'll fire up the listening loop and you'll hear the beeps. Same on other MCP-capable clients.

Don't forget the connector toggle. In Claude Desktop and Cowork, MCP connectors have an on/off switch in Settings → Connectors. Make sure the voice MCP entry is toggled on after the restart, otherwise the AI won't see the speak/listen tools.

No Python on the machine? Voice-Command's listening server runs on Python 3.11. If you don't have Python installed, your AI can fetch it for you using ops — just ask. Without Python, the AI can still talk to you (text-to-speech works), but it can't hear you (no listening server). Both halves need Python. On Windows ARM64 you'll specifically want x64 Python 3.11 since some dependencies don't ship ARM64 wheels yet.

Don't have an operator MCP yet? ops is the recommended one — public, lightweight, does file/shell work for any AI you want to give hands to. Install ops first, then come back and paste the prompt above. If you have local, programmer, or another operator MCP already, those work too.

If your AI doesn't have access to your filesystem and shell, scroll down to Manual installation below.


What it looks like when it's running

📸 Screenshot of the voice listening server in action — coming soon. Once you've installed Voice-Command, the server window will look something like this, with a beep cue when it's your turn and a live RMS readout while you talk.


What you'll need on your computer

  • Windows 10 or 11 (Linux/macOS support not yet — see Platform support above)
  • Python 3.11 or newerdownload here
  • A microphone — built-in or USB, doesn't need to be fancy
  • PortAudio — a library that lets Python use your mic; usually installs automatically
  • ffmpeg — for playing back the AI's voice; free, grab it here

If any of those words look scary, don't worry — your AI can handle all of this for you using the prompt at the top.


Manual installation (if you'd rather drive yourself)

Clone the repo, then:

pip install -r requirements.txt

PortAudio

pip install pyaudio usually just works on Windows. If it complains, grab a wheel from the unofficial PyAudio wheels page.

ffmpeg

winget install Gyan.FFmpeg, or download from ffmpeg.org. Make sure it's on your PATH, or set the VOICE_FFMPEG_PATH environment variable to point at it.

Setting up on Linux? The upstream AIWander/voice repo has a Linux fork with the equivalent install steps.


Running it

Start the voice server:

python voice_server.py

It runs at http://localhost:5123. You can also just double-click START_VOICE_SERVER.bat.

You mostly won't touch the endpoints directly — the AI calls them for you — but here they are:

Endpoint What it does
GET /status Health check
POST /listen?timeout=30 Records, transcribes, reads emotion

Optional knobs you can pass to /listen:

  • skip_emotion=true — don't bother with emotion detection
  • skip_filter=true — turn off noise filtering
  • silence_timeout=4.0 — how long of a pause before it stops listening
  • min_speech_duration=3.0 — how long you need to talk before it'll consider stopping
  • rms_threshold=100 — how loud counts as "talking" (20–500)

Tweaking the defaults

Drop a file called voice.config.toml next to the script (or set VOICE_CONFIG_PATH to point at it):

[listen]
silence_timeout_secs = 4.0
min_speech_duration_secs = 3.0
rms_threshold = 100
noise_filter_enabled = true
pre_record_enabled = true

It looks for config in this order:

  1. VOICE_CONFIG_PATH environment variable
  2. ./voice.config.toml (current directory)
  3. ~/.config/voice/voice.config.toml

How an MCP client connects to it

server.py is a thin wrapper that exposes three tools to any MCP client:

  • speak — say something out loud
  • listen_for_speech — listen for what the user says
  • start_voice_mode — kick off a back-and-forth conversation

For everyday use, there's a Rust version of that wrapper (voice-mcp.exe) that's faster and more stable. It comes as release downloads (ARM64 + x64). Add this to your client's MCP server config (e.g. claude_desktop_config.json for Claude Desktop):

{
  "mcpServers": {
    "voice": {
      "command": "path/to/voice-mcp.exe"
    }
  }
}

The Python server.py works as a fallback if you'd rather not use the binary.


Building voice-mcp from source

The Rust source for voice-mcp.exe lives in voice-mcp/ at the root of this repo. To build it yourself instead of grabbing the prebuilt binary from releases:

cd voice-mcp
cargo build --release

The compiled binary lands in voice-mcp/target/release/voice-mcp.exe. Move it wherever you like and point claude_desktop_config.json at it.

You'll need a Rust toolchain installed. Building takes a couple of minutes on a modern machine. The release workflow on tag push builds both ARM64 and x64 Windows binaries automatically.


What's in the box

  • voice_server.py — the standalone HTTP server that does the actual listening, transcribing, and tone-reading
  • server.py — the MCP wrapper your AI client talks to; calls voice_server.py for input and edge-tts for output
  • response_analyzer.py — separate analyzer that reads emotion from text the AI says back
  • emotion_config.json — knobs for the response analyzer (which words count as excited, hedging, etc.)
  • play_audio.ps1 — Windows audio playback helper
  • START_VOICE_SERVER.bat — Windows launcher

Environment variables

Variable What it's for Default
VOICE_CONFIG_PATH Where your voice.config.toml lives Auto-discovered
VOICE_FFMPEG_PATH Where ffmpeg lives Found via PATH
VOICE_EMOTION_LOG_DIR Where emotion logs get written ~/.voice/logs/

Troubleshooting

pip install pyaudio fails on Windows ARM64. PyAudio doesn't ship native ARM64 wheels yet. Install x64 Python 3.11 alongside your ARM64 Python (Windows 11 ARM64 runs x64 Python under emulation just fine), and use the x64 interpreter to run voice_server.py. Or grab a precompiled ARM64 wheel from the unofficial PyAudio wheels page.

ffmpeg not found. Either add ffmpeg to your PATH or set the VOICE_FFMPEG_PATH environment variable to the full path of ffmpeg.exe. On Windows, winget install Gyan.FFmpeg is the easiest install.

Python 3.13 wheel mismatch. faster-whisper and pyaudio don't have Python 3.13 wheels yet at the time of writing. Stick with Python 3.11 or 3.12 until the dependency tree catches up.

Microphone permission denied. Check Windows Settings → Privacy & security → Microphone, and make sure both "Microphone access" and "Let apps access your microphone" are on. If you launched voice_server.py from a terminal, that terminal needs mic permission too.

MCP connector toggle is off. In Claude Desktop, Settings → Connectors → make sure the voice (Rust) or voice-command (Python) toggle is ON. If neither is on, your AI won't see the tools.

voice-command MCP can't reach the listener. The MCP wrappers (voice-mcp.exe or server.py) talk to voice_server.py over localhost:5123. Make sure the listening server window is open and showing output. Restart START_VOICE_SERVER.bat if it's quiet.

Whisper model download stalls on first run. The first time voice_server.py starts, faster-whisper downloads the Whisper model (~150 MB for the base model). This can stall on slow connections. Run voice_server.py once at install time and let the download finish before trying voice mode in your AI client.


License

Apache 2.0 — see LICENSE.

About

Voice-control for Claude Desktop. Speak commands, hear Claude narrate as it works. Triggers any tool, connector, or MCP Claude has. Active dev fork of AIWander/voice.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors