FastAPI server for Chatterbox TTS with voice cloning support.
- Text-to-Speech Synthesis: Generate speech from text using ChatterboxTurboTTS
- Voice Cloning: Clone voices from reference audio samples (>5s recommended)
- Voice Library: Manage multiple voice profiles with aliases
- Long Text Support: Automatic sentence-based chunking for long text inputs
- Multiple Formats: Output as MP3 (default) or WAV
- Docker Ready: Easy deployment with Docker and docker-compose
On macOS, running the API directly on the host is usually much faster than
Docker: PyTorch can use MPS (Apple GPU) when DEVICE=auto, whereas a
container on Docker Desktop is Linux on a virtual CPU with no Metal/MPS,
so synthesis is CPU-only and pays VM overhead. Use Docker when you need Linux
deployment parity or server environments; for daily use on a Mac, prefer uv.
Prerequisites: Python 3.11+, uv, and ffmpeg
(brew install ffmpeg on macOS).
# Clone the repository
git clone https://github.com/YOUR_USERNAME/fast-chatterbox.git
cd fast-chatterbox
# Install dependencies
uv sync
# Run the server (MPS on Apple Silicon when DEVICE=auto)
uv run uvicorn app.main:app --reload
# Or: bash scripts/dev.sh (same, loads .env from repo root)The API will be available at http://localhost:8000. Confirm GET /health shows
your device (e.g. mps on Apple Silicon when the model is ready).
First startup downloads model weights through
ChatterboxTurboTTS.from_pretrained(...). Internet is required once; after that, weights are cached locally.
# Clone the repository
git clone https://github.com/YOUR_USERNAME/fast-chatterbox.git
cd fast-chatterbox
# Start with docker-compose
docker compose up -d
# Or build and run manually
docker build -t fast-chatterbox .
docker run -p 8000:8000 fast-chatterboxThe API will be available at http://localhost:8000.
Verify the server:
curl http://localhost:8000/ping
curl http://localhost:8000/healthFor a Mac you use daily, you can install Fast-Chatterbox as a permanent, auto-starting background service. This runs the native uv stack (giving you MPS / Apple GPU speed) in the background, automatically restarts if it crashes, and starts on boot.
Installation (One-time setup):
- Install prerequisites:
brew install ffmpeg uv - Install Python dependencies:
uv sync - Create your
.envfile (copy from.env.exampleand ensureHF_TOKENis set). - Free up port 8000 if Docker is currently using it:
docker compose down - Warm up the model cache to prevent first-boot download timeouts:
uv run python generate_turbo.py --text "warmup" - Install and start the daemon (prompts for password):
If that prints
bash scripts/install-launchd.sh
Bootstrap failed: 5, the plist could not be registered in the system domain (common when the clone lives under iCloud Drive / cloud-syncedDocuments, or a launchd quirk on some setups). Do one of: move the repo to a path like~/dev/Chatter-Fast-Chatter-Boxon a local APFS volume, or install the per-user service instead (nosudo):A user LaunchAgent starts at login and avoidsbash scripts/uninstall-launchd.sh bash scripts/install-launchagent.sh
/Library/LaunchDaemons/.
The API is now running at http://localhost:8000 and will automatically start when your Mac boots (or at login, if you used install-launchagent.sh). Check health with curl http://localhost:8000/health (expect "device": "mps" on Apple Silicon).
Managing the Service:
- View Logs:
tail -f ~/Library/Logs/fast-chatterbox/stdout.log ~/Library/Logs/fast-chatterbox/stderr.log - Check Status (LaunchDaemon):
sudo launchctl print system/com.fastchatterbox.server | head - Check Status (LaunchAgent):
launchctl print gui/$(id -u)/com.fastchatterbox.server | head - Restart (LaunchDaemon):
sudo launchctl kickstart -k system/com.fastchatterbox.server - Restart (LaunchAgent):
launchctl kickstart -k gui/$(id -u)/com.fastchatterbox.server - Stop (LaunchDaemon, until next boot):
sudo launchctl bootout system/com.fastchatterbox.server - Stop (LaunchAgent):
launchctl bootout gui/$(id -u)/com.fastchatterbox.server - Uninstall completely (LaunchDaemon):
bash scripts/uninstall-launchd.sh - Uninstall LaunchAgent only:
bash scripts/uninstall-launchagent.sh
(Note: For local development with hot-reload instead of the background daemon, run bash scripts/dev.sh)
- Start the server (Docker or local development from Quick Start).
- Confirm readiness:
GET /pingshould return quickly.GET /healthshould report"status": "healthy".
- Generate your first audio file:
curl -X POST http://localhost:8000/synthesize \
-F "text=Hello from Fast-Chatterbox" \
--output speech.mp3- Play the output:
- macOS:
open speech.mp3 - Linux:
xdg-open speech.mp3
- macOS:
If you need a specific voice, list available options with:
curl http://localhost:8000/voicesUse this prompt with Claude Code when you want it to deploy and run Fast-Chatterbox locally on a MacBook:
You are in the Fast-Chatterbox repository on macOS.
Goal:
- Run Fast-Chatterbox locally and verify it works end-to-end.
Please do the following:
1) Check prerequisites and install missing tools:
- Homebrew (if needed)
- uv
- ffmpeg
2) Install project dependencies with uv.
3) Start the API server locally (uv run uvicorn app.main:app --reload).
4) Verify startup using:
- GET /ping
- GET /health
5) Run a synthesis request and save output to speech.mp3.
6) Confirm the output file exists and report exact commands used.
7) If anything fails, diagnose the root cause and fix it before continuing.
Constraints:
- Use safe, non-destructive commands only.
- Explain each step briefly.
- Stop only after the server is running and synthesis succeeds.
Once running, access the interactive API docs:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
Generate speech from text.
Parameters (form-data):
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | - | Text to synthesize (max 10,000 chars) |
voice |
string | No | dan | Voice name or alias from library |
output_format |
string | No | mp3 | Output format: mp3 or wav |
max_sentences_per_chunk |
int | No | 3 | Max sentences per audio chunk (1-50) |
max_chunk_chars |
int | No | 320 | Max characters per chunk (50-1000) |
chunk_gap_ms |
int | No | 120 | Gap between chunks in milliseconds |
reference_audio |
file | No | - | Upload custom reference audio for cloning |
Examples:
# Basic synthesis (uses default voice "dan")
curl -X POST http://localhost:8000/synthesize \
-F "text=Hello, how are you today?" \
--output speech.mp3
# Use a specific voice from the library
curl -X POST http://localhost:8000/synthesize \
-F "text=Welcome to our podcast!" \
-F "voice=huberman" \
--output speech.mp3
# Upload custom reference audio for voice cloning
curl -X POST http://localhost:8000/synthesize \
-F "text=This will sound like the uploaded voice" \
-F "reference_audio=@my_voice_sample.wav" \
--output speech.mp3
# Get WAV output instead of MP3
curl -X POST http://localhost:8000/synthesize \
-F "text=High quality audio output" \
-F "output_format=wav" \
--output speech.wav
# Long text with custom chunking
curl -X POST http://localhost:8000/synthesize \
-F "text=This is a very long text that will be automatically chunked..." \
-F "max_chunk_chars=200" \
-F "chunk_gap_ms=150" \
--output speech.mp3Response:
Content-Type:audio/mpeg(MP3) oraudio/wav(WAV)Content-Disposition:attachment; filename="speech.mp3"
List all available voices in the library.
curl http://localhost:8000/voicesResponse:
{
"voices": [
{
"name": "dan_carlin",
"filename": "dan_carlin.wav",
"file_size": 3840078,
"created": "2025-03-29T12:00:00",
"exists": true
}
],
"count": 6,
"default_voice": "dan_carlin"
}Get information about a specific voice.
curl http://localhost:8000/voices/danResponse:
{
"name": "dan_carlin",
"filename": "dan_carlin.wav",
"file_size": 3840078,
"created": "2025-03-29T12:00:00",
"exists": true
}Upload a new voice to the library.
curl -X POST http://localhost:8000/voices \
-F "voice_name=my_custom_voice" \
-F "voice_file=@voice_sample.wav"Response:
{
"message": "Voice uploaded successfully",
"voice": {
"name": "my_custom_voice",
"filename": "my_custom_voice.wav",
"file_size": 5000000
}
}Delete a voice from the library.
curl -X DELETE http://localhost:8000/voices/my_custom_voiceSet the default voice for synthesis.
curl -X POST http://localhost:8000/voices/default \
-F "voice_name=huberman"Download a voice file.
curl http://localhost:8000/voices/dan/download --output dan_voice.wavCheck server health and model status.
curl http://localhost:8000/healthResponse:
{
"status": "healthy",
"model_loaded": true,
"device": "mps",
"default_voice": "dan_carlin",
"error": null
}Simple connectivity check.
curl http://localhost:8000/pingResponse:
{
"status": "ok",
"message": "Server is running"
}The server comes with pre-configured voices:
| Name | Alias | File | Description |
|---|---|---|---|
dan_carlin |
dan |
dan_carlin.wav | Default voice (Dan Carlin style) |
donald_trump |
donald |
donald_trump.wav | Trump-style voice |
donald_trump_2 |
donald_2 |
donald_trump_2.wav | Trump-style variant 2 |
donald_trump_3 |
donald_3 |
donald_trump_3.wav | Trump-style variant 3 |
andrew_huberman |
huberman |
andrew_huberman.wav | Huberman-style voice |
snoop_dogg |
snoop |
snoop_dogg.wav | Snoop Dogg-style voice |
Use either the full name or the alias in API calls.
Create a .env file (copy from .env.example):
# Server Configuration
HOST=0.0.0.0
PORT=8000
# TTS Configuration
MAX_SENTENCES_PER_CHUNK=3 # Sentences per TTS chunk (1-50)
MAX_CHUNK_CHARS=320 # Characters per chunk for long text
CHUNK_GAP_MS=120 # Silence between chunks (milliseconds)
# Device: auto, cuda, mps, or cpu
DEVICE=auto
# CPU: thread budget for PyTorch / OpenMP (0 = use all logical CPUs, 1–256 to set a cap)
TORCH_NUM_THREADS=0
# Default voice (name or alias)
DEFAULT_VOICE=dan
# Output format: mp3 or wav
DEFAULT_OUTPUT_FORMAT=mp3auto- Automatically select best available (cuda > mps > cpu)cuda- Force NVIDIA GPUmps- Apple Silicon Metal (only when running natively on macOS, not inside Docker on Mac, where the guest is Linux and typically cpu)cpu- Force CPU (slowest but most compatible)
On CPU, synthesis speed depends on how many threads OpenMP, MKL, and PyTorch
use. By default, TORCH_NUM_THREADS=0 picks all logical CPUs the process
is allowed to use. Set a positive number (1–256) to cap usage if you are
running other services on the same host.
- Linux / macOS (native): the default is usually optimal; you can still cap
with
TORCH_NUM_THREADS=4if needed. - Docker Desktop (Mac/Windows): the container only sees the CPUs you assign
to the Docker VM. Increase that under Settings → Resources if generation
is still slow, and keep
TORCH_NUM_THREADS=0so the app uses all of them.
Add a project .env (copy from .env.example) and set HF_TOKEN with a
Hugging Face read token so the
container can authenticate with the Hub (better rate limits and faster
first-time downloads). docker-compose.yml passes HF_TOKEN into the service
using Compose’s .env substitution; the file is not copied into the image.
The image is built with uv.lock in the build context and uv sync --frozen so
resemble-perth resolves exactly as on your machine (see resemble-perth from
tool.uv.sources in pyproject.toml). Do not add uv.lock to
.dockerignore or the container may install the wrong perth and fail model
load with 'NoneType' object is not callable. The first image build is slower
because the Dockerfile installs a compiler toolchain needed for
praat-parselmouth (a dependency of Git-based resemble-perth on Linux).
# Build and run
docker compose up -d
# View logs
docker compose logs -f
# Stop
docker compose downFor CUDA support, modify docker-compose.yml:
services:
fast-chatterbox:
build: .
ports:
- "8000:8000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Then run:
docker-compose up -dThe default configuration mounts local directories:
./voices- Voice library (persist your custom voices)./outputs- Generated audio files- Named volume
hf-cache- Caches Hugging Face downloads under/root/.cache/huggingfaceso model weights survivedocker compose downand container rebuilds
The original generate_turbo.py script is available for command-line usage:
# Basic usage with default voice
uv run python generate_turbo.py --text "Hello world"
# With specific reference audio
uv run python generate_turbo.py \
--ref voices/dan_carlin.wav \
--text "Custom voice synthesis"
# Save to specific file
uv run python generate_turbo.py \
--text "Hello world" \
--out outputs/my_speech.wav
# Long text with chunking options
uv run python generate_turbo.py \
--text-file long_text.txt \
--max-chunk-chars 280 \
--chunk-gap-ms 150# Install dev dependencies
uv sync --group dev
# Run with hot reload
uv run uvicorn app.main:app --reload
# Or: bash scripts/dev.sh
# Run on specific port
uv run uvicorn app.main:app --port 8080If the model fails to load:
- Check available memory (model requires ~4GB RAM)
- If you see
'NoneType' object is not callableright after download: theresemble-perth(watermark) library must be the version from this repo’spyproject.toml/uv.lock(GitHubresemble-ai/Perth, not a broken PyPI install). Re-runuv syncand rebuild the Docker image. - Try forcing CPU mode:
DEVICE=cpuin.env - Check logs:
docker compose logs -f - Verify startup sequence:
GET /pingshould return immediatelyGET /healthshould move frominitializingtohealthy
- Warm up the model manually to confirm download/auth works:
uv run python generate_turbo.py --text "test"
If startup fails before /health is available (for example
ModuleNotFoundError), the model initialization step never runs. Fix import
errors first, then restart the server so the model can download/load.
- First run may be slow while model artifacts are downloaded.
- If your network is restricted, run once on an unrestricted connection.
- Re-running after a completed download should be much faster because cached artifacts are reused.
Ensure ffmpeg is installed:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg
# Docker (already included in image)
# No action neededCheck available voices:
curl http://localhost:8000/voicesMake sure you're using the correct name or alias.
MIT