Voice Server

Voice input/output for Claude Desktop — real-time speech-to-text via faster-whisper with biquad noise filtering and rule-based emotion detection, paired with text-to-speech via edge-tts. Runs as a local HTTP server on localhost:5123 and exposes MCP tools for Claude Desktop integration.

Install — just ask your AI

If you already have local installed in Claude Desktop, or you're sitting in Claude Code / Codex CLI / Gemini CLI, paste this prompt:

https://github.com/AIWander/voice — Can you install this MCP for us to use here and the voice listening server, and make me a new .bat to call it and direct me to do what I need to do to get both sides running, then we can have a talk.

Your AI will:

Download the matching pre-built voice-mcp.exe for your architecture (ARM64 or x64) from releases/latest.
Drop it somewhere sensible (usually C:\CPC\servers\).
Edit your claude_desktop_config.json to register it as an MCP server (your existing servers are preserved; a timestamped backup is made first).
Clone this repo and install the Python listener requirements (pip install -r requirements.txt).
Write you a START_VOICE_SERVER.bat that launches python voice_server.py on boot.
Tell you to quit and reopen Claude Desktop, start the listener, and you're talking.

If your AI doesn't have filesystem/shell access, fall back to the manual install below.

Features

Speech-to-text via faster-whisper (base model, int8 quantized)
Biquad noise filtering — 80 Hz highpass + 7.5 kHz lowpass removes hum and hiss
Emotion detection — rule-based classifier from audio features (energy, pitch variance, ZCR, spectral centroid)
Triple beep indicator when recording starts
Real-time level monitoring — RMS printed per chunk
Configurable — silence timeout, RMS threshold, min speech duration via query params or TOML config
End-phrase stripping — automatically removes "send this", "done", "stop", etc.
Response emotion analysis — text-based hedge/excitement/engagement scoring

Requirements

Python 3.11+
PortAudio (system library, required by PyAudio)
ffmpeg (for MP3→WAV conversion in TTS playback)
A working microphone

Manual installation

pip install -r requirements.txt

PortAudio

Windows: pip install pyaudio usually works. If not, download from PyAudio wheels.
macOS: brew install portaudio && pip install pyaudio
Linux: sudo apt install portaudio19-dev && pip install pyaudio

ffmpeg

Windows: winget install Gyan.FFmpeg or download from ffmpeg.org. Ensure ffmpeg is on your PATH, or set the VOICE_FFMPEG_PATH environment variable.
macOS: brew install ffmpeg
Linux: sudo apt install ffmpeg

Quick Start

# Start the voice server
python voice_server.py

# Server runs on http://localhost:5123
# Endpoints:
#   GET  /status              - Health check
#   POST /listen?timeout=30   - Record + transcribe + emotion
#        &skip_emotion=true   - Skip emotion detection
#        &skip_filter=true    - Skip noise filtering
#        &silence_timeout=4.0 - Silence cutoff seconds
#        &min_speech_duration=3.0 - Min speech before checking silence
#        &rms_threshold=100   - Loudness floor (20-500)

On Windows, you can also double-click START_VOICE_SERVER.bat.

Configuration

Create voice.config.toml in the repo directory (or set VOICE_CONFIG_PATH env var) to override defaults:

[listen]
silence_timeout_secs = 4.0
min_speech_duration_secs = 3.0
rms_threshold = 100
noise_filter_enabled = true
pre_record_enabled = true

Config lookup order:

VOICE_CONFIG_PATH environment variable
./voice.config.toml (current directory)
~/.config/voice/voice.config.toml

MCP Integration

server.py provides MCP tool wrappers (speak, listen_for_speech, start_voice_mode) for Claude Desktop integration via the FastMCP framework.

For production Claude Desktop use, companion Rust MCP binaries (voice-mcp.exe) are available as release assets (ARM64 + x64). These wrap the same voice server with Rust-native MCP transport and checkpoint recovery. Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "voice": {
      "command": "path/to/voice-mcp.exe"
    }
  }
}

The Python server.py serves as a pure-Python MCP fallback if you prefer not to use the Rust binary.

Architecture

voice_server.py — Standalone HTTP server (faster-whisper STT + noise filter + emotion detection)
server.py — MCP tool wrapper that delegates to voice_server.py for STT and uses edge-tts for TTS
response_analyzer.py — Text-based response emotion analysis (separate from audio emotion)
emotion_config.json — Thresholds and word lists for response analyzer
play_audio.ps1 — PowerShell audio playback helper (Windows)
START_VOICE_SERVER.bat — Windows launcher script

Environment Variables

Variable	Purpose	Default
`VOICE_CONFIG_PATH`	Path to `voice.config.toml`	Auto-discovered
`VOICE_FFMPEG_PATH`	Path to ffmpeg binary	Auto-discovered via PATH
`VOICE_EMOTION_LOG_DIR`	Directory for emotion analysis logs	`~/.voice/logs/`

License

Apache 2.0 — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
START_VOICE_SERVER.bat		START_VOICE_SERVER.bat
emotion_config.json		emotion_config.json
play_audio.ps1		play_audio.ps1
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
response_analyzer.py		response_analyzer.py
server.py		server.py
voice_server.py		voice_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Server

Install — just ask your AI

Features

Requirements

Manual installation

PortAudio

ffmpeg

Quick Start

Configuration

MCP Integration

Architecture

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Server

Install — just ask your AI

Features

Requirements

Manual installation

PortAudio

ffmpeg

Quick Start

Configuration

MCP Integration

Architecture

Environment Variables

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages