Voice input/output for Claude Desktop — real-time speech-to-text via faster-whisper with biquad noise filtering and rule-based emotion detection, paired with text-to-speech via edge-tts. Runs as a local HTTP server on localhost:5123 and exposes MCP tools for Claude Desktop integration.
If you already have local installed in Claude Desktop, or you're sitting in Claude Code / Codex CLI / Gemini CLI, paste this prompt:
https://github.com/AIWander/voice— Can you install this MCP for us to use here and the voice listening server, and make me a new.batto call it and direct me to do what I need to do to get both sides running, then we can have a talk.
Your AI will:
- Download the matching pre-built
voice-mcp.exefor your architecture (ARM64 or x64) from releases/latest. - Drop it somewhere sensible (usually
C:\CPC\servers\). - Edit your
claude_desktop_config.jsonto register it as an MCP server (your existing servers are preserved; a timestamped backup is made first). - Clone this repo and install the Python listener requirements (
pip install -r requirements.txt). - Write you a
START_VOICE_SERVER.batthat launchespython voice_server.pyon boot. - Tell you to quit and reopen Claude Desktop, start the listener, and you're talking.
If your AI doesn't have filesystem/shell access, fall back to the manual install below.
- Speech-to-text via faster-whisper (base model, int8 quantized)
- Biquad noise filtering — 80 Hz highpass + 7.5 kHz lowpass removes hum and hiss
- Emotion detection — rule-based classifier from audio features (energy, pitch variance, ZCR, spectral centroid)
- Triple beep indicator when recording starts
- Real-time level monitoring — RMS printed per chunk
- Configurable — silence timeout, RMS threshold, min speech duration via query params or TOML config
- End-phrase stripping — automatically removes "send this", "done", "stop", etc.
- Response emotion analysis — text-based hedge/excitement/engagement scoring
- Python 3.11+
- PortAudio (system library, required by PyAudio)
- ffmpeg (for MP3→WAV conversion in TTS playback)
- A working microphone
pip install -r requirements.txt- Windows:
pip install pyaudiousually works. If not, download from PyAudio wheels. - macOS:
brew install portaudio && pip install pyaudio - Linux:
sudo apt install portaudio19-dev && pip install pyaudio
- Windows:
winget install Gyan.FFmpegor download from ffmpeg.org. Ensureffmpegis on your PATH, or set theVOICE_FFMPEG_PATHenvironment variable. - macOS:
brew install ffmpeg - Linux:
sudo apt install ffmpeg
# Start the voice server
python voice_server.py
# Server runs on http://localhost:5123
# Endpoints:
# GET /status - Health check
# POST /listen?timeout=30 - Record + transcribe + emotion
# &skip_emotion=true - Skip emotion detection
# &skip_filter=true - Skip noise filtering
# &silence_timeout=4.0 - Silence cutoff seconds
# &min_speech_duration=3.0 - Min speech before checking silence
# &rms_threshold=100 - Loudness floor (20-500)On Windows, you can also double-click START_VOICE_SERVER.bat.
Create voice.config.toml in the repo directory (or set VOICE_CONFIG_PATH env var) to override defaults:
[listen]
silence_timeout_secs = 4.0
min_speech_duration_secs = 3.0
rms_threshold = 100
noise_filter_enabled = true
pre_record_enabled = trueConfig lookup order:
VOICE_CONFIG_PATHenvironment variable./voice.config.toml(current directory)~/.config/voice/voice.config.toml
server.py provides MCP tool wrappers (speak, listen_for_speech, start_voice_mode) for Claude Desktop integration via the FastMCP framework.
For production Claude Desktop use, companion Rust MCP binaries (voice-mcp.exe) are available as release assets (ARM64 + x64). These wrap the same voice server with Rust-native MCP transport and checkpoint recovery. Add to your claude_desktop_config.json:
{
"mcpServers": {
"voice": {
"command": "path/to/voice-mcp.exe"
}
}
}The Python server.py serves as a pure-Python MCP fallback if you prefer not to use the Rust binary.
voice_server.py— Standalone HTTP server (faster-whisper STT + noise filter + emotion detection)server.py— MCP tool wrapper that delegates to voice_server.py for STT and uses edge-tts for TTSresponse_analyzer.py— Text-based response emotion analysis (separate from audio emotion)emotion_config.json— Thresholds and word lists for response analyzerplay_audio.ps1— PowerShell audio playback helper (Windows)START_VOICE_SERVER.bat— Windows launcher script
| Variable | Purpose | Default |
|---|---|---|
VOICE_CONFIG_PATH |
Path to voice.config.toml |
Auto-discovered |
VOICE_FFMPEG_PATH |
Path to ffmpeg binary | Auto-discovered via PATH |
VOICE_EMOTION_LOG_DIR |
Directory for emotion analysis logs | ~/.voice/logs/ |
Apache 2.0 — see LICENSE