Voice-Command

Talk to your AI. Hear it work.

Voice-Command lets you voice-control your AI end-to-end. You say what you want done — Claude chat or another AI does it, using whatever tools, connectors, and MCPs it has access to — and narrates what it's doing as it goes. The "Command" isn't a euphemism. Anything the AI can do, you can ask for out loud: search the web, check your calendar, send email through your connectors, write or fix code, edit files on your computer, run shell commands, kick off automations. If it can do it typed, you can do it spoken.

Under the hood it uses faster-whisper to understand what you say (running fully on your own computer — your voice doesn't go to the cloud) and edge-tts to speak responses back. It also reads the feel of how you say things — excited, hesitant, frustrated — and passes that along so the AI can respond more naturally.

🔒 Stays on your computer

Voice-Command and all its dependencies are local tools that run on your hardware. Speech-to-text, text-to-speech, audio capture, audio playback — all of it happens on your machine. Voice-Command itself never reaches out to the internet on its own. Your AI may reach out (Claude pings Anthropic, ChatGPT pings OpenAI, and so on), and the AI may use other tools that reach out (web search, email connectors), but the voice layer adds zero outbound traffic. For a fully-offline setup, pair it with a local model in LM Studio.

🖥️ Platform support

Currently Windows-only. For Linux, see the upstream AIWander/voice repo — there's a community fork there with Linux support. macOS support is coming.

Works with

Voice-Command is a STDIO MCP server, so it plugs into any AI client that speaks MCP. That includes:

Claude (chat) — Claude Desktop and the web app
Cowork — Claude's desktop agent
Claude Code — the CLI coding agent
Codex CLI — OpenAI / GPT
Gemini CLI — Google
LM Studio — for running local models (Llama, Qwen, Mistral, whatever you've loaded up)
Anything else that can call a STDIO MCP server — the protocol is the only requirement

It doesn't care which model is on the other end. If your AI of choice can call MCP tools, you can talk to it.

How a turn works

You hear a series of beeps. That's the AI's "I'm listening, your turn" cue.
You talk. Say what you want done. Anything your AI has the tools to handle counts.
The AI works — and tells you out loud what it's doing as it goes. ("Checking your calendar… found three events tomorrow… drafting the reply…")
You hear the beeps again. The AI's done with that turn. Your move.

One thing to know up front: the audio flow is one-way at a time. You can't cut the AI off mid-sentence with your voice — once it's talking or working, the only way to interrupt is to click or tap directly in the AI's UI. The beeps are the only voice handoff signal.

How to end a session

Just tell the AI you're done — "I'm done talking," "let's talk later," "bye for now," anything in that family. Or hit the stop button in your AI's UI. Both work; saying it out loud is more graceful.

What you can actually ask for

Anything your AI has the tools to do. A few examples to give you the shape of it:

"Check my calendar for tomorrow and read me what's on it." → uses your Google Calendar connector
"Search the web for the latest on [topic] and summarize." → uses web search
"Read me the README in this folder." → uses local filesystem
"Find the file we were editing yesterday and fix the bug we talked about." → uses filesystem + memory of past chats
"Send an email to Sarah saying I'll be ten minutes late." → uses your Gmail connector
"Run the deploy script and tell me when it's done." → uses shell access

The voice layer doesn't add capabilities — it just changes how you reach them. Whatever connectors, MCPs, and tools you've already hooked up to your AI all work the same. You're just using your mouth instead of your keyboard.

Pairs nicely with

Voice-Command is most useful when your AI also has hands. These three companion MCPs are all local tools that live on your computer, all callable by voice once Voice-Command is wired up:

ops — file and shell operations: read/write files, run commands, manage processes
hands — browser automation, Windows UI control, vision/OCR
workflow — API discovery and replay, credential vault, scheduled flows

Install any combination. Voice-Command is the mouth and ears; these are the rest of the body. All of them run locally, none of them reach out unless the AI explicitly asks them to.

The easy way to install: ask your AI to do it

This is the whole point. You shouldn't need a CS degree to get this running.

If you have Claude Desktop with ops installed, Cowork, Claude Code, Codex CLI, or Gemini CLI open right now, copy this and paste it to your AI:

https://github.com/AIWander/Voice-Command — Can you install this MCP for us to use here, set up the voice listening server, and make me a .bat to launch it. Walk me through any restart or step I need to do. Tell me clearly when everything's installed and we're ready to talk.

Your AI will:

Grab the right voice-mcp.exe for your computer (ARM64 or x64) from the latest release
Drop it somewhere sensible (usually C:\CPC\servers\)
Wire it into your AI client's MCP config file — your existing setup is preserved, and a timestamped backup is made first, so nothing breaks
Clone this repo and install the Python pieces
Write you a START_VOICE_SERVER.bat you can double-click whenever you want to talk
Walk you through restarting your AI client and starting the listener
Tell you when everything's ready

Then you're talking. Literally.

After install — starting voice mode is just asking for it. Once everything's running and you've restarted your AI client, you don't need to paste anything else. On Claude chat or Cowork, just ask the AI "let's talk" or "start voice mode" — it'll fire up the listening loop and you'll hear the beeps. Same on other MCP-capable clients.

Don't forget the connector toggle. In Claude Desktop and Cowork, MCP connectors have an on/off switch in Settings → Connectors. Make sure the voice MCP entry is toggled on after the restart, otherwise the AI won't see the speak/listen tools.

No Python on the machine? Voice-Command's listening server runs on Python 3.11. If you don't have Python installed, your AI can fetch it for you using ops — just ask. Without Python, the AI can still talk to you (text-to-speech works), but it can't hear you (no listening server). Both halves need Python. On Windows ARM64 you'll specifically want x64 Python 3.11 since some dependencies don't ship ARM64 wheels yet.

Don't have an operator MCP yet? ops is the recommended one — public, lightweight, does file/shell work for any AI you want to give hands to. Install ops first, then come back and paste the prompt above. If you have local, programmer, or another operator MCP already, those work too.

If your AI doesn't have access to your filesystem and shell, scroll down to Manual installation below.

What it looks like when it's running

📸 Screenshot of the voice listening server in action — coming soon. Once you've installed Voice-Command, the server window will look something like this, with a beep cue when it's your turn and a live RMS readout while you talk.

What you'll need on your computer

Windows 10 or 11 (Linux/macOS support not yet — see Platform support above)
Python 3.11 or newer — download here
A microphone — built-in or USB, doesn't need to be fancy
PortAudio — a library that lets Python use your mic; usually installs automatically
ffmpeg — for playing back the AI's voice; free, grab it here

If any of those words look scary, don't worry — your AI can handle all of this for you using the prompt at the top.

Manual installation (if you'd rather drive yourself)

Clone the repo, then:

pip install -r requirements.txt

PortAudio

pip install pyaudio usually just works on Windows. If it complains, grab a wheel from the unofficial PyAudio wheels page.

ffmpeg

winget install Gyan.FFmpeg, or download from ffmpeg.org. Make sure it's on your PATH, or set the VOICE_FFMPEG_PATH environment variable to point at it.

Setting up on Linux? The upstream AIWander/voice repo has a Linux fork with the equivalent install steps.

Running it

Start the voice server:

python voice_server.py

It runs at http://localhost:5123. You can also just double-click START_VOICE_SERVER.bat.

You mostly won't touch the endpoints directly — the AI calls them for you — but here they are:

Endpoint	What it does
`GET /status`	Health check
`POST /listen?timeout=30`	Records, transcribes, reads emotion

Optional knobs you can pass to /listen:

skip_emotion=true — don't bother with emotion detection
skip_filter=true — turn off noise filtering
silence_timeout=4.0 — how long of a pause before it stops listening
min_speech_duration=3.0 — how long you need to talk before it'll consider stopping
rms_threshold=100 — how loud counts as "talking" (20–500)

Tweaking the defaults

Drop a file called voice.config.toml next to the script (or set VOICE_CONFIG_PATH to point at it):

[listen]
silence_timeout_secs = 4.0
min_speech_duration_secs = 3.0
rms_threshold = 100
noise_filter_enabled = true
pre_record_enabled = true

It looks for config in this order:

VOICE_CONFIG_PATH environment variable
./voice.config.toml (current directory)
~/.config/voice/voice.config.toml

How an MCP client connects to it

server.py is a thin wrapper that exposes three tools to any MCP client:

speak — say something out loud
listen_for_speech — listen for what the user says
start_voice_mode — kick off a back-and-forth conversation

For everyday use, there's a Rust version of that wrapper (voice-mcp.exe) that's faster and more stable. It comes as release downloads (ARM64 + x64). Add this to your client's MCP server config (e.g. claude_desktop_config.json for Claude Desktop):

{
  "mcpServers": {
    "voice": {
      "command": "path/to/voice-mcp.exe"
    }
  }
}

The Python server.py works as a fallback if you'd rather not use the binary.

Building `voice-mcp` from source

The Rust source for voice-mcp.exe lives in voice-mcp/ at the root of this repo. To build it yourself instead of grabbing the prebuilt binary from releases:

cd voice-mcp
cargo build --release

The compiled binary lands in voice-mcp/target/release/voice-mcp.exe. Move it wherever you like and point claude_desktop_config.json at it.

You'll need a Rust toolchain installed. Building takes a couple of minutes on a modern machine. The release workflow on tag push builds both ARM64 and x64 Windows binaries automatically.

What's in the box

voice_server.py — the standalone HTTP server that does the actual listening, transcribing, and tone-reading
server.py — the MCP wrapper your AI client talks to; calls voice_server.py for input and edge-tts for output
response_analyzer.py — separate analyzer that reads emotion from text the AI says back
emotion_config.json — knobs for the response analyzer (which words count as excited, hedging, etc.)
play_audio.ps1 — Windows audio playback helper
START_VOICE_SERVER.bat — Windows launcher

Environment variables

Variable	What it's for	Default
`VOICE_CONFIG_PATH`	Where your `voice.config.toml` lives	Auto-discovered
`VOICE_FFMPEG_PATH`	Where ffmpeg lives	Found via PATH
`VOICE_EMOTION_LOG_DIR`	Where emotion logs get written	`~/.voice/logs/`

Troubleshooting

pip install pyaudio fails on Windows ARM64. PyAudio doesn't ship native ARM64 wheels yet. Install x64 Python 3.11 alongside your ARM64 Python (Windows 11 ARM64 runs x64 Python under emulation just fine), and use the x64 interpreter to run voice_server.py. Or grab a precompiled ARM64 wheel from the unofficial PyAudio wheels page.

ffmpeg not found. Either add ffmpeg to your PATH or set the VOICE_FFMPEG_PATH environment variable to the full path of ffmpeg.exe. On Windows, winget install Gyan.FFmpeg is the easiest install.

Python 3.13 wheel mismatch. faster-whisper and pyaudio don't have Python 3.13 wheels yet at the time of writing. Stick with Python 3.11 or 3.12 until the dependency tree catches up.

Microphone permission denied. Check Windows Settings → Privacy & security → Microphone, and make sure both "Microphone access" and "Let apps access your microphone" are on. If you launched voice_server.py from a terminal, that terminal needs mic permission too.

MCP connector toggle is off. In Claude Desktop, Settings → Connectors → make sure the voice (Rust) or voice-command (Python) toggle is ON. If neither is on, your AI won't see the tools.

voice-command MCP can't reach the listener. The MCP wrappers (voice-mcp.exe or server.py) talk to voice_server.py over localhost:5123. Make sure the listening server window is open and showing output. Restart START_VOICE_SERVER.bat if it's quiet.

Whisper model download stalls on first run. The first time voice_server.py starts, faster-whisper downloads the Whisper model (~150 MB for the base model). This can stall on slow connections. Run voice_server.py once at install time and let the download finish before trying voice mode in your AI client.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
tests		tests
voice-mcp		voice-mcp
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
START_VOICE_SERVER.bat		START_VOICE_SERVER.bat
emotion_config.json		emotion_config.json
play_audio.ps1		play_audio.ps1
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
response_analyzer.py		response_analyzer.py
server.py		server.py
voice_server.py		voice_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice-Command

🔒 Stays on your computer

🖥️ Platform support

Works with

How a turn works

How to end a session

What you can actually ask for

Pairs nicely with

The easy way to install: ask your AI to do it

What it looks like when it's running

What you'll need on your computer

Manual installation (if you'd rather drive yourself)

PortAudio

ffmpeg

Running it

Tweaking the defaults

How an MCP client connects to it

Building `voice-mcp` from source

What's in the box

Environment variables

Troubleshooting

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice-Command

🔒 Stays on your computer

🖥️ Platform support

Works with

How a turn works

How to end a session

What you can actually ask for

Pairs nicely with

The easy way to install: ask your AI to do it

What it looks like when it's running

What you'll need on your computer

Manual installation (if you'd rather drive yourself)

PortAudio

ffmpeg

Running it

Tweaking the defaults

How an MCP client connects to it

Building voice-mcp from source

What's in the box

Environment variables

Troubleshooting

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Building `voice-mcp` from source

Packages