Give Claude Code a voice.
Hear spoken summaries after every response β zero friction, multiple TTS backends.
π Table of Contents
- π Automatic voice feedback β Claude speaks a summary after every response
- π― Multi-backend TTS β Qwen3-TTS, Fish Speech, Chatterbox (GPU), Kokoro (CPU), pocket-tts (zero setup)
- π Auto-detection β Picks the best available backend automatically
- ποΈ Slash commands β Control voice, backend, personality on the fly
- π£οΈ 9 voices β Cross-backend voice mapping between Kokoro and pocket-tts
- β‘ Zero config fallback β pocket-tts auto-starts via
uvx, nothing to install - π§ Smart GPU awareness β Skips GPU backends when your GPU is busy
- π Voice personality β Set prompts like "be chill" or "be upbeat"
Note
The entire pipeline is hands-free. Once installed, Claude automatically includes voice summaries β no prompting required.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β $ claude β
β β
β You: refactor the auth module to use JWT tokens β
β β
β Claude: I've refactored the authentication module... β
β [... full response ...] β
β β
β π’ Done! I refactored auth to use JWT. Changed 3 files: β
β auth.py, middleware.py, and config.py. All tests pass. β
β β
β π ββββββββββββββββββββββββ Speaking... β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The
π’summary is extracted by the stop hook and spoken aloud through your chosen TTS backend.
In auto mode (default), cc-vox tries Qwen3-TTS β Fish Speech β Chatterbox β Kokoro β pocket-tts and uses the first available. GPU backends are skipped when GPU utilization exceeds the threshold (default 80%).
claude plugin marketplace add BestSithInEU/cc-vox
claude plugin install voiceOption A: Zero setup β pocket-tts auto-starts via uvx, nothing to install
[!TIP] Just use Claude Code β pocket-tts will auto-download and start on first speech. No Docker, no GPU needed.
Optionally pre-download the model:
hf download kyutai/pocket-ttsOption B: Kokoro β recommended β great quality, CPU-only Docker
docker run -d --name kokoro \
-p 32612:8880 \
ghcr.io/remsky/kokoro-fastapi-cpu:latest[!TIP] Kokoro offers the best balance of quality and simplicity. One command, CPU-only, great results.
Option C: Qwen3-TTS β β best quality, voice cloning, requires NVIDIA GPU
# Clone the server and start via Docker Compose
cd tools/tts && git clone https://github.com/ValyrianTech/Qwen3-TTS_server qwen3-tts
docker compose -f tts/docker-compose.yml --profile gpu up -d qwen3-ttsSupports voice cloning β upload a reference audio clip to create custom voices:
curl -X POST http://localhost:32614/upload_audio/ \
-F "audio_file_label=my_voice" \
-F "file=@reference.wav"[!IMPORTANT] Requires an NVIDIA GPU with 8GB+ VRAM. Supports 10 languages.
Option D: Fish Speech β high quality, requires NVIDIA GPU
# Download the model (0.5B params, 13 languages)
hf download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini
# Start the container
docker run -d --name fish-speech \
--gpus all \
-p 32611:7860 \
-v ./checkpoints:/app/checkpoints \
fishaudio/fish-speech:latest[!IMPORTANT] Requires an NVIDIA GPU with Docker GPU support configured. The openaudio-s1-mini model is licensed CC-BY-NC-SA-4.0.
Option E: Chatterbox β voice cloning, requires NVIDIA GPU
docker run -d --name chatterbox \
--gpus all \
-p 32613:4123 \
travisvn/chatterbox-tts-api:latest[!IMPORTANT] Requires an NVIDIA GPU with 4-8GB VRAM. OpenAI-compatible API.
claude # Voice feedback is automatic!Voice feedback is automatic. Claude speaks a summary after each response.
/voice:speak Enable voice
/voice:speak stop Disable voice
/voice:speak af_bella Change voice
/voice:speak prompt be chill Set voice personality
/voice:speak prompt Clear personality
/voice:speak backend kokoro Force backend
/voice:speak backend auto Auto-detect (default)
/voice:speak speed 1.3 Adjust speech speed (kokoro)
/voice:speak max_sentences 4 Longer summaries
/voice:speak fallback on Try other backends if forced one is down
Voice names work across all backends β cc-vox auto-maps between Kokoro and pocket-tts names.
| Kokoro name | pocket-tts alias | Gender | Accent |
|---|---|---|---|
af_heart β
|
alba |
F | American |
af_bella |
azure |
F | American |
af_nicole |
fantine |
F | American |
af_sarah |
cosette |
F | American |
af_sky |
eponine |
F | American |
am_adam |
marius |
M | American |
am_michael |
jean |
M | American |
bf_emma |
azelma |
F | British |
bm_george |
β | M | British |
β default voice
~/.claude/cc-vox.toml
[core]
enabled = true
voice = "af_heart" # see voices below
backend = "auto" # auto | kokoro | fish-speech | pocket-tts | chatterbox | qwen3-tts
[tuning]
speed = 1.0 # 0.5-2.0 (kokoro only)
max_sentences = 2 # max sentences in spoken summary (1-10)
fallback = true # try other backends when forced one is down
[style]
prompt = "be upbeat and encouraging"| Setting | Default | Description |
|---|---|---|
tuning.speed |
1.0 |
Speech speed 0.5β2.0 (kokoro only) |
tuning.max_sentences |
2 |
Max sentences in spoken summary (1β10) |
tuning.fallback |
true |
Try other backends when forced one is down |
| Variable | Default | Description |
|---|---|---|
TTS_BACKEND |
auto |
Override backend: auto qwen3-tts fish-speech chatterbox kokoro pocket-tts |
KOKORO_PORT |
32612 |
Kokoro Docker port |
FISH_SPEECH_PORT |
32611 |
Fish Speech Docker port |
CHATTERBOX_PORT |
32613 |
Chatterbox Docker port |
QWEN3_TTS_PORT |
32614 |
Qwen3-TTS Docker port |
TTS_PORT |
8000 |
pocket-tts port |
GPU_THRESHOLD |
80 |
GPU % above which Fish Speech is skipped |
cc-vox/
βββ hooks/ # Claude Code hook scripts
β βββ hooks.json # Hook registration manifest
β βββ user_prompt_submit_hook.py # β Injects π’ reminder at turn start
β βββ post_tool_use_hook.py # β‘ Brief nudge after tool calls
β βββ stop_hook.py # β’ Extracts summary β calls say
β βββ voice_common.py # Config parsing (TOML) & reminders
β βββ session.py # Session JSONL file I/O
β βββ summarize.py # Headless Claude fallback
β βββ tts/ # TTS backend package
β βββ __init__.py # Registry + select_backend()
β βββ _protocol.py # TTSBackend Protocol
β βββ voices.py # Voice catalog (single source of truth)
β βββ kokoro.py # Kokoro backend
β βββ fish_speech.py # Fish Speech backend
β βββ chatterbox.py # Chatterbox backend
β βββ qwen3_tts.py # Qwen3-TTS backend
β βββ pocket_tts.py # pocket-tts backend
β βββ _playback.py # Audio playback + locking
β βββ _session_state.py # Session sentinel files
βββ commands/
β βββ speak.md # /voice:speak slash command definition
βββ scripts/
β βββ say # Thin TTS CLI (uses tts package)
βββ docs/ # Zensical documentation
βββ assets/ # SVG diagrams & logos
β βββ logo-dark.svg # Animated logo (dark mode)
β βββ logo-light.svg # Animated logo (light mode)
β βββ flow.svg # Pipeline flow diagram
β βββ architecture.svg # Component architecture diagram
β βββ backends.svg # Backend comparison cards
β βββ sequence.svg # Sequence diagram
βββ .claude-plugin/
β βββ plugin.json # v2.0.0 plugin manifest
β βββ marketplace.json # Distribution metadata
βββ zensical.toml # Documentation config
βββ LICENSE # MIT
βββ README.md
| cc-vox | Manual TTS | No voice | |
|---|---|---|---|
| Automatic speech after every response | β | β manual | β |
| Multiple TTS backends | β 5 backends | β | |
| Auto-detects best backend | β | β | β |
| Zero-setup option | β pocket-tts | β | β |
| GPU-aware routing | β | β | β |
| Voice personality prompts | β | β | β |
| Cross-backend voice mapping | β | β | β |
| Slash command control | β | β | β |
| Setup time | ~2 min | 30+ min | 0 min |
No audio output
- Check that voice is enabled: run
/voice:speakin Claude Code - Verify your TTS backend is running:
# Kokoro curl http://localhost:32612/v1/audio/speech -X POST -d '{}' 2>/dev/null && echo "OK" || echo "Not running" # Fish Speech curl http://localhost:32611 2>/dev/null && echo "OK" || echo "Not running"
- Check system audio output device
- Try forcing a backend:
/voice:speak backend pocket-tts
Docker container won't start
# Check if port is already in use
lsof -i :32612 # Kokoro
lsof -i :32611 # Fish Speech
# Check Docker logs
docker logs kokoro
docker logs fish-speechFish Speech skipped (GPU threshold)
cc-vox checks GPU utilization before using Fish Speech. If your GPU is busy (default >80%), it falls back to Kokoro or pocket-tts.
# Check current GPU usage
nvidia-smi
# Raise the threshold
export GPU_THRESHOLD=95Voice sounds wrong or uses wrong backend
# Force a specific backend
/voice:speak backend kokoro
# Check which backend is being used (verbose mode)
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro directly"Does it work offline?
Yes β if you run Kokoro or Fish Speech locally via Docker, everything stays on your machine. pocket-tts also runs locally via uvx.
Can I add custom voices?
The voice list is currently fixed to the 9 voices that map cleanly across backends. Custom voice support depends on the backend you're using β Fish Speech supports voice cloning natively.
Does it slow down Claude?
No. TTS runs asynchronously after Claude finishes responding. The only overhead is a small system prompt injection (~50 tokens) to remind Claude to include a voice summary. With fallback = true (default), if your forced backend goes down, cc-vox transparently tries the next available backend.
Can I use it with other AI coding tools?
cc-vox is built specifically for Claude Code's hook system. The say script can be used standalone, but the automatic hook integration is Claude Code-specific.
How do I uninstall?
claude plugin uninstall voice
# Optionally remove Docker containers
docker rm -f kokoro fish-speech# Run with local plugin directory
claude --plugin-dir ~/Documents/Projects/cc-vox
# Test say script directly
./scripts/say --voice af_heart "Hello, testing voice output"
# Force a specific backend
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro"
# Test with custom speed
./scripts/say --voice af_heart --speed 1.3 "Testing faster speech"Contributions are welcome! Here's how to get started:
- Fork the repository
- Clone your fork and set up the development environment:
git clone https://github.com/<your-username>/cc-vox.git cd cc-vox claude --plugin-dir .
- Make your changes β follow the existing code style
- Test with at least one TTS backend running
- Submit a PR with a clear description of your changes
Note
Adding a new backend = create one file in hooks/tts/ + one registry line in __init__.py. See the Adding a Backend guide.
Based on the original voice plugin by pchalasani, which pioneered the hook-based voice feedback architecture for Claude Code. cc-vox extends it with multi-backend TTS support and auto-detection.
MIT License Β· Made with π by BestSithInEU