Skip to content

BestSithInEU/cc-vox

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

cc-vox cc-vox

Setup time MIT License Version Python Claude Code Plugin

Give Claude Code a voice.
Hear spoken summaries after every response β€” zero friction, multiple TTS backends.


πŸ“– Table of Contents


✨ Features

  • πŸ”Š Automatic voice feedback β€” Claude speaks a summary after every response
  • 🎯 Multi-backend TTS β€” Qwen3-TTS, Fish Speech, Chatterbox (GPU), Kokoro (CPU), pocket-tts (zero setup)
  • πŸ”„ Auto-detection β€” Picks the best available backend automatically
  • πŸŽ›οΈ Slash commands β€” Control voice, backend, personality on the fly
  • πŸ—£οΈ 9 voices β€” Cross-backend voice mapping between Kokoro and pocket-tts
  • ⚑ Zero config fallback β€” pocket-tts auto-starts via uvx, nothing to install
  • 🧠 Smart GPU awareness β€” Skips GPU backends when your GPU is busy
  • 🎭 Voice personality β€” Set prompts like "be chill" or "be upbeat"

πŸ” How It Works

cc-vox flow diagram

Note

The entire pipeline is hands-free. Once installed, Claude automatically includes voice summaries β€” no prompting required.


οΏ½ Demo

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ $ claude                                                        β”‚
β”‚                                                                 β”‚
β”‚ You: refactor the auth module to use JWT tokens                 β”‚
β”‚                                                                 β”‚
β”‚ Claude: I've refactored the authentication module...            β”‚
β”‚ [... full response ...]                                         β”‚
β”‚                                                                 β”‚
β”‚ πŸ“’ Done! I refactored auth to use JWT. Changed 3 files:        β”‚
β”‚    auth.py, middleware.py, and config.py. All tests pass.       β”‚
β”‚                                                                 β”‚
β”‚ πŸ”Š β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ Speaking...                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The πŸ“’ summary is extracted by the stop hook and spoken aloud through your chosen TTS backend.


πŸŽ™οΈ Backends

TTS backends

In auto mode (default), cc-vox tries Qwen3-TTS β†’ Fish Speech β†’ Chatterbox β†’ Kokoro β†’ pocket-tts and uses the first available. GPU backends are skipped when GPU utilization exceeds the threshold (default 80%).


πŸš€ Quick Start

1. Install the plugin

claude plugin marketplace add BestSithInEU/cc-vox
claude plugin install voice

2. Pick a backend

Option A: Zero setup β€” pocket-tts auto-starts via uvx, nothing to install

[!TIP] Just use Claude Code β€” pocket-tts will auto-download and start on first speech. No Docker, no GPU needed.

Optionally pre-download the model:

hf download kyutai/pocket-tts
Option B: Kokoro ⭐ recommended β€” great quality, CPU-only Docker
docker run -d --name kokoro \
  -p 32612:8880 \
  ghcr.io/remsky/kokoro-fastapi-cpu:latest

[!TIP] Kokoro offers the best balance of quality and simplicity. One command, CPU-only, great results.

Option C: Qwen3-TTS ⭐ β€” best quality, voice cloning, requires NVIDIA GPU
# Clone the server and start via Docker Compose
cd tools/tts && git clone https://github.com/ValyrianTech/Qwen3-TTS_server qwen3-tts
docker compose -f tts/docker-compose.yml --profile gpu up -d qwen3-tts

Supports voice cloning β€” upload a reference audio clip to create custom voices:

curl -X POST http://localhost:32614/upload_audio/ \
  -F "audio_file_label=my_voice" \
  -F "file=@reference.wav"

[!IMPORTANT] Requires an NVIDIA GPU with 8GB+ VRAM. Supports 10 languages.

Option D: Fish Speech β€” high quality, requires NVIDIA GPU
# Download the model (0.5B params, 13 languages)
hf download fishaudio/openaudio-s1-mini --local-dir checkpoints/openaudio-s1-mini

# Start the container
docker run -d --name fish-speech \
  --gpus all \
  -p 32611:7860 \
  -v ./checkpoints:/app/checkpoints \
  fishaudio/fish-speech:latest

[!IMPORTANT] Requires an NVIDIA GPU with Docker GPU support configured. The openaudio-s1-mini model is licensed CC-BY-NC-SA-4.0.

Option E: Chatterbox β€” voice cloning, requires NVIDIA GPU
docker run -d --name chatterbox \
  --gpus all \
  -p 32613:4123 \
  travisvn/chatterbox-tts-api:latest

[!IMPORTANT] Requires an NVIDIA GPU with 4-8GB VRAM. OpenAI-compatible API.

3. Start using Claude

claude  # Voice feedback is automatic!

πŸ’¬ Usage

Voice feedback is automatic. Claude speaks a summary after each response.

Slash Commands

/voice:speak                  Enable voice
/voice:speak stop             Disable voice
/voice:speak af_bella         Change voice
/voice:speak prompt be chill  Set voice personality
/voice:speak prompt           Clear personality
/voice:speak backend kokoro   Force backend
/voice:speak backend auto     Auto-detect (default)
/voice:speak speed 1.3        Adjust speech speed (kokoro)
/voice:speak max_sentences 4  Longer summaries
/voice:speak fallback on      Try other backends if forced one is down

Voices

Voice names work across all backends β€” cc-vox auto-maps between Kokoro and pocket-tts names.

Kokoro name pocket-tts alias Gender Accent
af_heart β˜… alba F American
af_bella azure F American
af_nicole fantine F American
af_sarah cosette F American
af_sky eponine F American
am_adam marius M American
am_michael jean M American
bf_emma azelma F British
bm_george β€” M British

β˜… default voice


βš™οΈ Configuration

~/.claude/cc-vox.toml

[core]
enabled = true
voice = "af_heart"       # see voices below
backend = "auto"         # auto | kokoro | fish-speech | pocket-tts | chatterbox | qwen3-tts

[tuning]
speed = 1.0              # 0.5-2.0 (kokoro only)
max_sentences = 2        # max sentences in spoken summary (1-10)
fallback = true          # try other backends when forced one is down

[style]
prompt = "be upbeat and encouraging"

Settings

Setting Default Description
tuning.speed 1.0 Speech speed 0.5–2.0 (kokoro only)
tuning.max_sentences 2 Max sentences in spoken summary (1–10)
tuning.fallback true Try other backends when forced one is down

Environment Variables

Variable Default Description
TTS_BACKEND auto Override backend: auto qwen3-tts fish-speech chatterbox kokoro pocket-tts
KOKORO_PORT 32612 Kokoro Docker port
FISH_SPEECH_PORT 32611 Fish Speech Docker port
CHATTERBOX_PORT 32613 Chatterbox Docker port
QWEN3_TTS_PORT 32614 Qwen3-TTS Docker port
TTS_PORT 8000 pocket-tts port
GPU_THRESHOLD 80 GPU % above which Fish Speech is skipped

πŸ—οΈ Architecture

cc-vox architecture


πŸ”€ Sequence Diagram

cc-vox sequence diagram


πŸ“ Project Structure

cc-vox/
β”œβ”€β”€ hooks/                              # Claude Code hook scripts
β”‚   β”œβ”€β”€ hooks.json                      # Hook registration manifest
β”‚   β”œβ”€β”€ user_prompt_submit_hook.py      # β‘  Injects πŸ“’ reminder at turn start
β”‚   β”œβ”€β”€ post_tool_use_hook.py           # β‘‘ Brief nudge after tool calls
β”‚   β”œβ”€β”€ stop_hook.py                    # β‘’ Extracts summary β†’ calls say
β”‚   β”œβ”€β”€ voice_common.py                 # Config parsing (TOML) & reminders
β”‚   β”œβ”€β”€ session.py                      # Session JSONL file I/O
β”‚   β”œβ”€β”€ summarize.py                    # Headless Claude fallback
β”‚   └── tts/                            # TTS backend package
β”‚       β”œβ”€β”€ __init__.py                 # Registry + select_backend()
β”‚       β”œβ”€β”€ _protocol.py                # TTSBackend Protocol
β”‚       β”œβ”€β”€ voices.py                   # Voice catalog (single source of truth)
β”‚       β”œβ”€β”€ kokoro.py                   # Kokoro backend
β”‚       β”œβ”€β”€ fish_speech.py              # Fish Speech backend
β”‚       β”œβ”€β”€ chatterbox.py              # Chatterbox backend
β”‚       β”œβ”€β”€ qwen3_tts.py              # Qwen3-TTS backend
β”‚       β”œβ”€β”€ pocket_tts.py              # pocket-tts backend
β”‚       β”œβ”€β”€ _playback.py                # Audio playback + locking
β”‚       └── _session_state.py           # Session sentinel files
β”œβ”€β”€ commands/
β”‚   └── speak.md                        # /voice:speak slash command definition
β”œβ”€β”€ scripts/
β”‚   └── say                             # Thin TTS CLI (uses tts package)
β”œβ”€β”€ docs/                               # Zensical documentation
β”œβ”€β”€ assets/                             # SVG diagrams & logos
β”‚   β”œβ”€β”€ logo-dark.svg                   # Animated logo (dark mode)
β”‚   β”œβ”€β”€ logo-light.svg                  # Animated logo (light mode)
β”‚   β”œβ”€β”€ flow.svg                        # Pipeline flow diagram
β”‚   β”œβ”€β”€ architecture.svg                # Component architecture diagram
β”‚   β”œβ”€β”€ backends.svg                    # Backend comparison cards
β”‚   └── sequence.svg                    # Sequence diagram
β”œβ”€β”€ .claude-plugin/
β”‚   β”œβ”€β”€ plugin.json                     # v2.0.0 plugin manifest
β”‚   └── marketplace.json                # Distribution metadata
β”œβ”€β”€ zensical.toml                       # Documentation config
β”œβ”€β”€ LICENSE                             # MIT
└── README.md

πŸ€” Why cc-vox?

cc-vox Manual TTS No voice
Automatic speech after every response βœ… ❌ manual ❌
Multiple TTS backends βœ… 5 backends ⚠️ 1 at a time β€”
Auto-detects best backend βœ… ❌ β€”
Zero-setup option βœ… pocket-tts ❌ β€”
GPU-aware routing βœ… ❌ β€”
Voice personality prompts βœ… ❌ β€”
Cross-backend voice mapping βœ… ❌ β€”
Slash command control βœ… ❌ β€”
Setup time ~2 min 30+ min 0 min

πŸ”§ Troubleshooting

No audio output
  1. Check that voice is enabled: run /voice:speak in Claude Code
  2. Verify your TTS backend is running:
    # Kokoro
    curl http://localhost:32612/v1/audio/speech -X POST -d '{}' 2>/dev/null && echo "OK" || echo "Not running"
    
    # Fish Speech
    curl http://localhost:32611 2>/dev/null && echo "OK" || echo "Not running"
  3. Check system audio output device
  4. Try forcing a backend: /voice:speak backend pocket-tts
Docker container won't start
# Check if port is already in use
lsof -i :32612  # Kokoro
lsof -i :32611  # Fish Speech

# Check Docker logs
docker logs kokoro
docker logs fish-speech
Fish Speech skipped (GPU threshold)

cc-vox checks GPU utilization before using Fish Speech. If your GPU is busy (default >80%), it falls back to Kokoro or pocket-tts.

# Check current GPU usage
nvidia-smi

# Raise the threshold
export GPU_THRESHOLD=95
Voice sounds wrong or uses wrong backend
# Force a specific backend
/voice:speak backend kokoro

# Check which backend is being used (verbose mode)
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro directly"

❓ FAQ

Does it work offline?

Yes β€” if you run Kokoro or Fish Speech locally via Docker, everything stays on your machine. pocket-tts also runs locally via uvx.

Can I add custom voices?

The voice list is currently fixed to the 9 voices that map cleanly across backends. Custom voice support depends on the backend you're using β€” Fish Speech supports voice cloning natively.

Does it slow down Claude?

No. TTS runs asynchronously after Claude finishes responding. The only overhead is a small system prompt injection (~50 tokens) to remind Claude to include a voice summary. With fallback = true (default), if your forced backend goes down, cc-vox transparently tries the next available backend.

Can I use it with other AI coding tools?

cc-vox is built specifically for Claude Code's hook system. The say script can be used standalone, but the automatic hook integration is Claude Code-specific.

How do I uninstall?
claude plugin uninstall voice
# Optionally remove Docker containers
docker rm -f kokoro fish-speech

πŸ› οΈ Development

# Run with local plugin directory
claude --plugin-dir ~/Documents/Projects/cc-vox

# Test say script directly
./scripts/say --voice af_heart "Hello, testing voice output"

# Force a specific backend
TTS_BACKEND=kokoro ./scripts/say "Testing Kokoro"

# Test with custom speed
./scripts/say --voice af_heart --speed 1.3 "Testing faster speech"

🀝 Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Clone your fork and set up the development environment:
    git clone https://github.com/<your-username>/cc-vox.git
    cd cc-vox
    claude --plugin-dir .
  3. Make your changes β€” follow the existing code style
  4. Test with at least one TTS backend running
  5. Submit a PR with a clear description of your changes

Note

Adding a new backend = create one file in hooks/tts/ + one registry line in __init__.py. See the Adding a Backend guide.


πŸ™ Credits

Based on the original voice plugin by pchalasani, which pioneered the hook-based voice feedback architecture for Claude Code. cc-vox extends it with multi-backend TTS support and auto-detection.


MIT License Β· Made with πŸ”Š by BestSithInEU

About

Claude Code plugin that speaks a short summary aloud after every response

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages