π This is a fork of the original Resemble AI Chatterbox TTS with enhanced features for Windows users!
- Automatic environment setup with CUDA 11.8 support
- No technical knowledge required - just run
one-click-installer.bat
orinstall.bat
- Instant launcher - double-click
run_tts.bat
to start
What the original didn't have:
- Smart Text Chunking: Automatically splits long documents at sentence boundaries
- Parallel Processing: Processes multiple chunks simultaneously for 4x faster generation
- Seamless Audio Stitching: Combines chunks into one cohesive audio file
- Progress Tracking: Real-time progress indicators during generation
- Voice Consistency: Maintains the same cloned voice across all chunks
- Configurable Batch Size: Adjust parallel processing for your hardware
Perfect for: Articles, books, scripts, documentation, or any text longer than 300 characters!
Download one-click-installer.bat
and just run
git clone https://github.com/Saganaki22/chatterbox-WebUI
cd chatterbox-WebUI
install.bat
run_tts.bat
Then open: http://127.0.0.1:7860/
That's it! π
I've provided 11 high quality ENG voice samples to get you started, find them in /samples/*

We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200msβideal for production use in agents, applications, or interactive media.
- SoTA zeroshot TTS - State-of-the-art voice cloning
- 0.5B Llama backbone - Powerful language model foundation
- Unique exaggeration/intensity control - First open-source TTS with emotion control
- Ultra-stable with alignment-informed inference - Consistent, high-quality output
- Trained on 0.5M hours of cleaned data
- Watermarked outputs - Built-in Perth watermarking for responsible AI
- Easy voice conversion - Simple reference audio upload
- Outperforms ElevenLabs in side-by-side comparisons
- The default settings (
exaggeration=0.5
,cfg_weight=0.5
) work well for most prompts. - If the reference speaker has a fast speaking style, lowering
cfg_weight
to around0.3
can improve pacing.
- Try lower
cfg_weight
values (e.g.~0.3
) and increaseexaggeration
to around0.7
or higher. - Higher
exaggeration
tends to speed up speech; reducingcfg_weight
helps compensate with slower, more deliberate pacing.
- Use the Long Form Content tab for documents, articles, or scripts
- Adjust Batch Size (1-8) based on your GPU memory
- Set Chunk Size (100-500 chars) for optimal sentence splitting
- Upload reference audio once - it applies to all chunks automatically
If you prefer manual installation or need to customize:
# Create virtual environment
python -m venv myenv
myenv\Scripts\activate.bat
# Install PyTorch with CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install ChatterboxTTS and Gradio
pip install chatterbox-tts gradio
# Run the WebUI
python gradio_tts_app.py
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS
model = ChatterboxTTS.from_pretrained(device="cuda")
text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill."
wav = model.generate(text)
ta.save("test-1.wav", wav, model.sr)
# Voice cloning with reference audio
AUDIO_PROMPT_PATH="YOUR_FILE.wav"
wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH)
ta.save("test-2.wav", wav, model.sr)
- Quick TTS generation for short texts
- Voice cloning with reference audio upload
- Real-time parameter adjustment
- Example prompts included
- Process documents, articles, books
- Smart sentence-boundary chunking
- Parallel processing for speed
- Progress tracking with real-time updates
- Automatic audio stitching
- Voice consistency across all chunks
- File cleanup and management
- Tips for best results
- Parameter explanations
This fork is based on the original Resemble AI Chatterbox TTS. All core TTS functionality and model weights remain unchanged - we've simply added Windows-friendly installation and an enhanced web interface with long-form processing capabilities.
Every audio file generated by Chatterbox includes Resemble AI's Perth (Perceptual Threshold) Watermarker - imperceptible neural watermarks that survive MP3 compression, audio editing, and common manipulations while maintaining nearly 100% detection accuracy.
π Join us on Discord and let's build something awesome together!
Don't use this model to do bad things. Prompts are sourced from freely available data on the internet.