Talk to your AI. Hear it work.
Voice-Command lets you voice-control your AI end-to-end. You say what you want done — Claude chat or another AI does it, using whatever tools, connectors, and MCPs it has access to — and narrates what it's doing as it goes. The "Command" isn't a euphemism. Anything the AI can do, you can ask for out loud: search the web, check your calendar, send email through your connectors, write or fix code, edit files on your computer, run shell commands, kick off automations. If it can do it typed, you can do it spoken.
Under the hood it uses faster-whisper to understand what you say (running fully on your own computer — your voice doesn't go to the cloud) and edge-tts to speak responses back. It also reads the feel of how you say things — excited, hesitant, frustrated — and passes that along so the AI can respond more naturally.
Voice-Command and all its dependencies are local tools that run on your hardware. Speech-to-text, text-to-speech, audio capture, audio playback — all of it happens on your machine. Voice-Command itself never reaches out to the internet on its own. Your AI may reach out (Claude pings Anthropic, ChatGPT pings OpenAI, and so on), and the AI may use other tools that reach out (web search, email connectors), but the voice layer adds zero outbound traffic. For a fully-offline setup, pair it with a local model in LM Studio.
Currently Windows-only. For Linux, see the upstream AIWander/voice repo — there's a community fork there with Linux support. macOS support is coming.
Voice-Command is a STDIO MCP server, so it plugs into any AI client that speaks MCP. That includes:
- Claude (chat) — Claude Desktop and the web app
- Cowork — Claude's desktop agent
- Claude Code — the CLI coding agent
- Codex CLI — OpenAI / GPT
- Gemini CLI — Google
- LM Studio — for running local models (Llama, Qwen, Mistral, whatever you've loaded up)
- Anything else that can call a STDIO MCP server — the protocol is the only requirement
It doesn't care which model is on the other end. If your AI of choice can call MCP tools, you can talk to it.
- You hear a series of beeps. That's the AI's "I'm listening, your turn" cue.
- You talk. Say what you want done. Anything your AI has the tools to handle counts.
- The AI works — and tells you out loud what it's doing as it goes. ("Checking your calendar… found three events tomorrow… drafting the reply…")
- You hear the beeps again. The AI's done with that turn. Your move.
One thing to know up front: the audio flow is one-way at a time. You can't cut the AI off mid-sentence with your voice — once it's talking or working, the only way to interrupt is to click or tap directly in the AI's UI. The beeps are the only voice handoff signal.
Just tell the AI you're done — "I'm done talking," "let's talk later," "bye for now," anything in that family. Or hit the stop button in your AI's UI. Both work; saying it out loud is more graceful.
Anything your AI has the tools to do. A few examples to give you the shape of it:
- "Check my calendar for tomorrow and read me what's on it." → uses your Google Calendar connector
- "Search the web for the latest on [topic] and summarize." → uses web search
- "Read me the README in this folder." → uses local filesystem
- "Find the file we were editing yesterday and fix the bug we talked about." → uses filesystem + memory of past chats
- "Send an email to Sarah saying I'll be ten minutes late." → uses your Gmail connector
- "Run the deploy script and tell me when it's done." → uses shell access
The voice layer doesn't add capabilities — it just changes how you reach them. Whatever connectors, MCPs, and tools you've already hooked up to your AI all work the same. You're just using your mouth instead of your keyboard.
Voice-Command is most useful when your AI also has hands. These three companion MCPs are all local tools that live on your computer, all callable by voice once Voice-Command is wired up:
ops— file and shell operations: read/write files, run commands, manage processeshands— browser automation, Windows UI control, vision/OCRworkflow— API discovery and replay, credential vault, scheduled flows
Install any combination. Voice-Command is the mouth and ears; these are the rest of the body. All of them run locally, none of them reach out unless the AI explicitly asks them to.
This is the whole point. You shouldn't need a CS degree to get this running.
If you have Claude Desktop with ops installed, Cowork, Claude Code, Codex CLI, or Gemini CLI open right now, copy this and paste it to your AI:
https://github.com/AIWander/Voice-Command— Can you install this MCP for us to use here, set up the voice listening server, and make me a.batto launch it. Walk me through any restart or step I need to do. Tell me clearly when everything's installed and we're ready to talk.
Your AI will:
- Grab the right
voice-mcp.exefor your computer (ARM64 or x64) from the latest release - Drop it somewhere sensible (usually
C:\CPC\servers\) - Wire it into your AI client's MCP config file — your existing setup is preserved, and a timestamped backup is made first, so nothing breaks
- Clone this repo and install the Python pieces
- Write you a
START_VOICE_SERVER.batyou can double-click whenever you want to talk - Walk you through restarting your AI client and starting the listener
- Tell you when everything's ready
Then you're talking. Literally.
After install — starting voice mode is just asking for it. Once everything's running and you've restarted your AI client, you don't need to paste anything else. On Claude chat or Cowork, just ask the AI "let's talk" or "start voice mode" — it'll fire up the listening loop and you'll hear the beeps. Same on other MCP-capable clients.
Don't forget the connector toggle. In Claude Desktop and Cowork, MCP connectors have an on/off switch in Settings → Connectors. Make sure the voice MCP entry is toggled on after the restart, otherwise the AI won't see the speak/listen tools.
No Python on the machine? Voice-Command's listening server runs on Python 3.11. If you don't have Python installed, your AI can fetch it for you using
ops— just ask. Without Python, the AI can still talk to you (text-to-speech works), but it can't hear you (no listening server). Both halves need Python. On Windows ARM64 you'll specifically want x64 Python 3.11 since some dependencies don't ship ARM64 wheels yet.
Don't have an operator MCP yet?
opsis the recommended one — public, lightweight, does file/shell work for any AI you want to give hands to. Install ops first, then come back and paste the prompt above. If you havelocal,programmer, or another operator MCP already, those work too.
If your AI doesn't have access to your filesystem and shell, scroll down to Manual installation below.
📸 Screenshot of the voice listening server in action — coming soon. Once you've installed Voice-Command, the server window will look something like this, with a beep cue when it's your turn and a live RMS readout while you talk.
- Windows 10 or 11 (Linux/macOS support not yet — see Platform support above)
- Python 3.11 or newer — download here
- A microphone — built-in or USB, doesn't need to be fancy
- PortAudio — a library that lets Python use your mic; usually installs automatically
- ffmpeg — for playing back the AI's voice; free, grab it here
If any of those words look scary, don't worry — your AI can handle all of this for you using the prompt at the top.
Clone the repo, then:
pip install -r requirements.txtpip install pyaudio usually just works on Windows. If it complains, grab a wheel from the unofficial PyAudio wheels page.
winget install Gyan.FFmpeg, or download from ffmpeg.org. Make sure it's on your PATH, or set the VOICE_FFMPEG_PATH environment variable to point at it.
Setting up on Linux? The upstream
AIWander/voicerepo has a Linux fork with the equivalent install steps.
Start the voice server:
python voice_server.pyIt runs at http://localhost:5123. You can also just double-click START_VOICE_SERVER.bat.
You mostly won't touch the endpoints directly — the AI calls them for you — but here they are:
| Endpoint | What it does |
|---|---|
GET /status |
Health check |
POST /listen?timeout=30 |
Records, transcribes, reads emotion |
Optional knobs you can pass to /listen:
skip_emotion=true— don't bother with emotion detectionskip_filter=true— turn off noise filteringsilence_timeout=4.0— how long of a pause before it stops listeningmin_speech_duration=3.0— how long you need to talk before it'll consider stoppingrms_threshold=100— how loud counts as "talking" (20–500)
Drop a file called voice.config.toml next to the script (or set VOICE_CONFIG_PATH to point at it):
[listen]
silence_timeout_secs = 4.0
min_speech_duration_secs = 3.0
rms_threshold = 100
noise_filter_enabled = true
pre_record_enabled = trueIt looks for config in this order:
VOICE_CONFIG_PATHenvironment variable./voice.config.toml(current directory)~/.config/voice/voice.config.toml
server.py is a thin wrapper that exposes three tools to any MCP client:
speak— say something out loudlisten_for_speech— listen for what the user saysstart_voice_mode— kick off a back-and-forth conversation
For everyday use, there's a Rust version of that wrapper (voice-mcp.exe) that's faster and more stable. It comes as release downloads (ARM64 + x64). Add this to your client's MCP server config (e.g. claude_desktop_config.json for Claude Desktop):
{
"mcpServers": {
"voice": {
"command": "path/to/voice-mcp.exe"
}
}
}The Python server.py works as a fallback if you'd rather not use the binary.
The Rust source for voice-mcp.exe lives in voice-mcp/ at the root of this repo. To build it yourself instead of grabbing the prebuilt binary from releases:
cd voice-mcp
cargo build --releaseThe compiled binary lands in voice-mcp/target/release/voice-mcp.exe. Move it wherever you like and point claude_desktop_config.json at it.
You'll need a Rust toolchain installed. Building takes a couple of minutes on a modern machine. The release workflow on tag push builds both ARM64 and x64 Windows binaries automatically.
voice_server.py— the standalone HTTP server that does the actual listening, transcribing, and tone-readingserver.py— the MCP wrapper your AI client talks to; callsvoice_server.pyfor input and edge-tts for outputresponse_analyzer.py— separate analyzer that reads emotion from text the AI says backemotion_config.json— knobs for the response analyzer (which words count as excited, hedging, etc.)play_audio.ps1— Windows audio playback helperSTART_VOICE_SERVER.bat— Windows launcher
| Variable | What it's for | Default |
|---|---|---|
VOICE_CONFIG_PATH |
Where your voice.config.toml lives |
Auto-discovered |
VOICE_FFMPEG_PATH |
Where ffmpeg lives | Found via PATH |
VOICE_EMOTION_LOG_DIR |
Where emotion logs get written | ~/.voice/logs/ |
pip install pyaudio fails on Windows ARM64.
PyAudio doesn't ship native ARM64 wheels yet. Install x64 Python 3.11 alongside your ARM64 Python (Windows 11 ARM64 runs x64 Python under emulation just fine), and use the x64 interpreter to run voice_server.py. Or grab a precompiled ARM64 wheel from the unofficial PyAudio wheels page.
ffmpeg not found.
Either add ffmpeg to your PATH or set the VOICE_FFMPEG_PATH environment variable to the full path of ffmpeg.exe. On Windows, winget install Gyan.FFmpeg is the easiest install.
Python 3.13 wheel mismatch.
faster-whisper and pyaudio don't have Python 3.13 wheels yet at the time of writing. Stick with Python 3.11 or 3.12 until the dependency tree catches up.
Microphone permission denied.
Check Windows Settings → Privacy & security → Microphone, and make sure both "Microphone access" and "Let apps access your microphone" are on. If you launched voice_server.py from a terminal, that terminal needs mic permission too.
MCP connector toggle is off.
In Claude Desktop, Settings → Connectors → make sure the voice (Rust) or voice-command (Python) toggle is ON. If neither is on, your AI won't see the tools.
voice-command MCP can't reach the listener.
The MCP wrappers (voice-mcp.exe or server.py) talk to voice_server.py over localhost:5123. Make sure the listening server window is open and showing output. Restart START_VOICE_SERVER.bat if it's quiet.
Whisper model download stalls on first run.
The first time voice_server.py starts, faster-whisper downloads the Whisper model (~150 MB for the base model). This can stall on slow connections. Run voice_server.py once at install time and let the download finish before trying voice mode in your AI client.
Apache 2.0 — see LICENSE.