Skip to content

Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)

License

Notifications You must be signed in to change notification settings

travisvn/chatterbox-tts-api

Repository files navigation

Chatterbox API TTS header

Chatterbox TTS API

GitHub stars GitHub forks GitHub issues GitHub last commit Discord

FastAPI-powered REST API for Chatterbox TTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities and additional features on top of the chatterbox-tts base package.

Features

πŸš€ OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
⚑ FastAPI Performance - High-performance async API with automatic documentation
🎨 React Frontend - Includes an optional, ready-to-use web interface
🎭 Voice Cloning - Use your own voice samples for personalized speech
🎀 Voice Library Management - Upload, manage, and use custom voices by name
πŸ“ Smart Text Processing - Automatic chunking for long texts
πŸ“Š Real-time Status - Monitor TTS progress, statistics, and request history
🐳 Docker Ready - Full containerization with persistent voice storage
βš™οΈ Configurable - Extensive environment variable configuration
πŸŽ›οΈ Parameter Control - Real-time adjustment of speech characteristics
πŸ“š Auto Documentation - Interactive API docs at /docs and /redoc
πŸ”§ Type Safety - Full Pydantic validation for requests and responses
🧠 Memory Management - Advanced memory monitoring and automatic cleanup

⚑️ Quick Start

git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
uv sync
uv run main.py

Tip

uv installed with curl -LsSf https://astral.sh/uv/install.sh | sh

Local Installation with Python 🐍

Option A: Using uv (Recommended - Faster & Better Dependencies)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv (automatically creates venv)
uv sync

# Copy and customize environment variables
cp .env.example .env

# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py

πŸ’‘ Why uv? Users report better compatibility with chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide β†’

Option B: Using pip (Traditional)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Setup environment β€” using Python 3.11
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and customize environment variables
cp .env.example .env

# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3

# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py

Ran into issues? Check the troubleshooting section

🐳 Docker (Recommended)

# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Use Docker-optimized environment variables
cp .env.example.docker .env  # Docker-specific paths, ready to use
# Or: cp .env.example .env    # Local development paths, needs customization

# Choose your deployment method:

# API Only (default)
docker compose -f docker/docker-compose.yml up -d             # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d          # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d         # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d      # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d         # CPU-only

# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d             # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d         # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d      # uv + GPU + Frontend

# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f

# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Chatterbox TTS!"}' \
  --output test.wav
πŸš€ Running with the Web UI (Full Stack)

This project includes an optional React-based web UI. Use Docker Compose profiles to easily opt in or out of the frontend:

With Docker Compose Profiles

# API only (default behavior)
docker compose -f docker/docker-compose.yml up -d

# API + Frontend + Web UI (with --profile frontend)
docker compose -f docker/docker-compose.yml --profile frontend up -d

# Or use the convenient helper script for fullstack:
python start.py fullstack

# Same pattern works with all deployment variants:
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d    # GPU + Frontend
docker compose -f docker/docker-compose.uv.yml --profile frontend up -d     # uv + Frontend
docker compose -f docker/docker-compose.cpu.yml --profile frontend up -d    # CPU + Frontend

Local Development

For local development, you can run the API and frontend separately:

# Start the API first (follow earlier instructions)
# Then run the frontend:
cd frontend && npm install && npm run dev

Click the link provided from Vite to access the web UI.

Build for Production

Build the frontend for production deployment:

cd frontend && npm install && npm run build

You can then access it directly from your local file system at /dist/index.html.

Port Configuration

  • API Only: Accessible at http://localhost:4123 (direct API access)
  • With Frontend: Web UI at http://localhost:4321, API requests routed via proxy

The frontend uses a reverse proxy to route requests, so when running with --profile frontend, the web interface will be available at http://localhost:4321 while the API runs behind the proxy.

Screenshots of Frontend (Web UI)

Chatterbox TTS API - Frontend - Dark Mode Chatterbox TTS API - Frontend - Light Mode
Chatterbox TTS API - Frontend Processing - Dark Mode Chatterbox TTS API - Frontend Processing - Light Mode

πŸ–ΌοΈ View screenshot of full frontend web UI β€” light mode / dark mode

API Usage

Basic Text-to-Speech (Default Voice)

This endpoint works for both the API-only and full-stack setups.

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Your text here"}' \
  --output speech.wav

Using Custom Parameters (JSON)

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3, "temperature": 0.9}' \
  --output dramatic.wav

Custom Voice Upload

Upload your own voice sample for personalized speech:

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my custom voice!" \
  -F "exaggeration=0.8" \
  -F "voice_file=@my_voice.mp3" \
  --output custom_voice_speech.wav

With Custom Parameters and Voice Upload

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Dramatic speech!" \
  -F "exaggeration=1.2" \
  -F "cfg_weight=0.3" \
  -F "temperature=0.9" \
  -F "voice_file=@dramatic_voice.wav" \
  --output dramatic.wav

Voice Library Management

Store and manage custom voices by name for reuse across requests:

# Upload a voice to the library
curl -X POST http://localhost:4123/v1/voices \
  -F "voice_file=@my_voice.wav" \
  -F "name=my-custom-voice"

# Use the voice by name in speech generation
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello with my custom voice!", "voice": "my-custom-voice"}' \
  --output custom_voice_output.wav

# List all available voices
curl http://localhost:4123/v1/voices

πŸ”§ Complete Voice Library Documentation β†’

🎡 Real-time Audio Streaming

The API supports multiple streaming formats for lower latency and better user experience:

  • Raw Audio Streaming: Traditional audio chunks (WAV format)
  • Server-Side Events (SSE): OpenAI-compatible format with base64-encoded audio chunks

Quick Start

# Basic audio streaming
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "This streams in real-time!"}' \
  --output streaming.wav

# SSE streaming (OpenAI compatible)
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"input": "This streams as Server-Side Events!", "stream_format": "sse"}' \
  --no-buffer

# Real-time playback
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Play as it generates!"}' \
  | ffplay -f wav -i pipe:0 -autoexit -nodisp

For comprehensive streaming features including:

  • Advanced chunking strategies (sentence, paragraph, word, fixed)
  • Quality presets (fast, balanced, high)
  • Configurable parameters and performance tuning
  • Real-time progress monitoring
  • Python, JavaScript, and cURL examples
  • Integration patterns for different use cases

Key Benefits:

  • ⚑ Lower latency - Start hearing audio in 1-2 seconds
  • 🎯 Better UX - No waiting for complete generation
  • πŸ’Ύ Memory efficient - Process chunks individually
  • πŸŽ›οΈ Configurable - Choose speed vs quality trade-offs
🐍 Python Examples

Default Voice (JSON)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Upload Endpoint (Default Voice)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech/upload",
    data={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Custom Voice Upload

import requests

with open("my_voice.mp3", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/v1/audio/speech/upload",
        data={
            "input": "Hello with my custom voice!",
            "exaggeration": 0.8,
            "temperature": 1.0
        },
        files={
            "voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
        }
    )

with open("custom_output.wav", "wb") as f:
    f.write(response.content)

Basic Streaming Example

import requests

# Stream audio generation in real-time
response = requests.post(
    "http://localhost:4123/v1/audio/speech/stream",
    json={
        "input": "This will stream as it's generated!",
        "exaggeration": 0.8
    },
    stream=True  # Enable streaming mode
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)
            print(f"Received chunk: {len(chunk)} bytes")

SSE Streaming Example (OpenAI Compatible)

import requests
import json
import base64

# Stream audio using Server-Side Events format
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "This streams as Server-Side Events!",
        "stream_format": "sse",
        "exaggeration": 0.8
    },
    stream=True,
    headers={'Accept': 'text/event-stream'}
)

audio_chunks = []

for line in response.iter_lines(decode_unicode=True):
    if line.startswith('data: '):
        event_data = line[6:]  # Remove 'data: ' prefix

        try:
            event = json.loads(event_data)

            if event.get('type') == 'speech.audio.delta':
                # Decode base64 audio chunk
                audio_data = base64.b64decode(event['audio'])
                audio_chunks.append(audio_data)
                print(f"Received audio chunk: {len(audio_data)} bytes")

            elif event.get('type') == 'speech.audio.done':
                usage = event.get('usage', {})
                print(f"Complete! Tokens: {usage.get('total_tokens', 0)}")
                break
        except:
            continue

print(f"Received {len(audio_chunks)} audio chunks")

πŸ“š Complete Streaming Examples & Documentation β†’

Including real-time playback, progress monitoring, custom voice uploads, and advanced integration patterns.

Voice File Requirements

Supported Formats:

  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • M4A (.m4a)
  • OGG (.ogg)

Requirements:

  • Maximum file size: 10MB
  • Recommended duration: 10-30 seconds of clear speech
  • Avoid background noise for best results
  • Higher quality audio produces better voice cloning

πŸŽ›οΈ Configuration

The project provides two environment example files:

  • .env.example - For local development (uses ./models, ./voice-sample.mp3)
  • .env.example.docker - For Docker deployment (uses /cache, /app/voice-sample.mp3)

Choose the appropriate one for your setup:

# For local development
cp .env.example .env

# For Docker deployment
cp .env.example.docker .env

Key environment variables (see the example files for full list):

Variable Default Description
PORT 4123 API server port
EXAGGERATION 0.5 Emotion intensity (0.25-2.0)
CFG_WEIGHT 0.5 Pace control (0.0-1.0)
TEMPERATURE 0.8 Sampling randomness (0.05-5.0)
VOICE_SAMPLE_PATH ./voice-sample.mp3 Voice sample for cloning
DEVICE auto Device (auto/cuda/mps/cpu)
🎭 Voice Cloning

Replace the default voice sample:

# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3

# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env

For best results:

  • Use 10-30 seconds of clear speech
  • Avoid background noise
  • Prefer WAV or high-quality MP3
🐳 Docker Deployment

Development

docker compose -f docker/docker-compose.yml up

Production

# Create production environment
cp .env.example.docker .env
nano .env  # Set production values

# Deploy
docker compose -f docker/docker-compose.yml up -d

With GPU Support

# Use GPU-enabled compose file
# Ensure NVIDIA Container Toolkit is installed
docker compose -f docker/docker-compose.gpu.yml up -d
πŸ“š API Reference

API Endpoints

Endpoint Method Description
/audio/speech POST Generate speech from text (complete)
/audio/speech/upload POST Generate speech with voice upload
/audio/speech/stream POST Stream speech generation (docs)
/audio/speech/stream/upload POST Stream speech with voice upload (docs)
/health GET Health check and status
/config GET Current configuration
/v1/models GET Available models (OpenAI compat)
/status GET TTS processing status & progress
/status/progress GET Real-time progress (lightweight)
/status/statistics GET Processing statistics
/status/history GET Recent request history
/info GET Complete API information
/docs GET Interactive API documentation
/redoc GET Alternative API documentation

Parameters Reference

Speech Generation Parameters

Exaggeration (0.25-2.0)

  • 0.3-0.4: Professional, neutral
  • 0.5: Default balanced
  • 0.7-0.8: More expressive
  • 1.0+: Very dramatic

CFG Weight (0.0-1.0)

  • 0.2-0.3: Faster speech
  • 0.5: Default pace
  • 0.7-0.8: Slower, deliberate

Temperature (0.05-5.0)

  • 0.4-0.6: More consistent
  • 0.8: Default balance
  • 1.0+: More creative/random

Stream Format

  • audio: Raw audio streaming (default)
  • sse: Server-Side Events with base64-encoded audio chunks (OpenAI compatible)
🧠 Memory Management

The API includes advanced memory management to prevent memory leaks and optimize performance:

Memory Management Features

  • Automatic Cleanup: Periodic garbage collection and tensor cleanup
  • CUDA Memory Management: Automatic GPU cache clearing
  • Memory Monitoring: Real-time memory usage tracking
  • Manual Controls: API endpoints for manual cleanup operations

Memory Configuration

Variable Default Description
MEMORY_CLEANUP_INTERVAL 5 Cleanup memory every N requests
CUDA_CACHE_CLEAR_INTERVAL 3 Clear CUDA cache every N requests
ENABLE_MEMORY_MONITORING true Enable detailed memory logging

Memory Monitoring Endpoints

# Get memory status
curl http://localhost:4123/memory

# Trigger manual cleanup
curl "http://localhost:4123/memory?cleanup=true&force_cuda_clear=true"

# Reset memory tracking (with confirmation)
curl -X POST "http://localhost:4123/memory/reset?confirm=true"

Real-time Status Tracking

Monitor TTS processing in real-time:

# Check current processing status
curl "http://localhost:4123/v1/status/progress"

# Get detailed status with memory and stats
curl "http://localhost:4123/v1/status?include_memory=true&include_stats=true"

# View processing statistics
curl "http://localhost:4123/v1/status/statistics"

# Check request history
curl "http://localhost:4123/v1/status/history?limit=5"

# Get comprehensive API information
curl "http://localhost:4123/info"

Status Response Example:

{
  "is_processing": true,
  "status": "generating_audio",
  "current_step": "Generating audio for chunk 2/4",
  "current_chunk": 2,
  "total_chunks": 4,
  "progress_percentage": 50.0,
  "duration_seconds": 2.5,
  "text_preview": "Your text being processed..."
}

See Status API Documentation for complete details.

Memory Testing

Run the memory management test suite:

# Test memory patterns and cleanup
python tests/test_memory.py  # or: uv run tests/test_memory.py

# Monitor memory during testing
watch -n 1 'curl -s http://localhost:4123/memory | jq .memory_info'

Memory Optimization Tips

For High-Volume Production:

MEMORY_CLEANUP_INTERVAL=3
CUDA_CACHE_CLEAR_INTERVAL=2
ENABLE_MEMORY_MONITORING=false  # Reduce logging overhead
MAX_CHUNK_LENGTH=200             # Smaller chunks for less memory usage

For Development/Debugging:

MEMORY_CLEANUP_INTERVAL=1
CUDA_CACHE_CLEAR_INTERVAL=1
ENABLE_MEMORY_MONITORING=true

Memory Leak Prevention:

  • Tensors are automatically moved to CPU before deletion
  • Gradient tracking is disabled during inference
  • Audio chunks are cleaned up after concatenation
  • CUDA cache is periodically cleared
  • Python garbage collection is triggered regularly
πŸ§ͺ Testing

Run the test script to verify the API functionality:

python tests/test_api.py

The test script will:

  • Test health check endpoint
  • Test models endpoint
  • Test API documentation endpoints (new!)
  • Generate speech for various text lengths
  • Test custom parameter validation
  • Test error handling with validation
  • Save generated audio files as test_output_*.wav
⚑ Performance

FastAPI Benefits:

  • Async support: Better concurrent request handling
  • Faster serialization: JSON responses ~25% faster than Flask
  • Type validation: Pydantic models prevent invalid requests
  • Auto documentation: No manual API doc maintenance

Hardware Recommendations:

  • CPU: Works but slower, reduce chunk size for better memory usage
  • GPU: Recommended for production, significantly faster
  • Memory: 4GB minimum, 8GB+ recommended
  • Concurrency: Async support allows better multi-request handling
πŸ”§ Troubleshooting

Common Issues

CUDA/CPU Compatibility Error

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

This happens because chatterbox-tts models require PyTorch with CUDA support, even when running on CPU. Solutions:

# Option 1: Use default setup (now includes CUDA-enabled PyTorch)
docker compose -f docker/docker-compose.yml up -d

# Option 2: Use explicit CUDA setup (traditional)
docker compose -f docker/docker-compose.gpu.yml up -d

# Option 3: Use uv + GPU setup (recommended for GPU users)
docker compose -f docker/docker-compose.uv.gpu.yml up -d

# Option 4: Use CPU-only setup (may have compatibility issues)
docker compose -f docker/docker-compose.cpu.yml up -d

# Option 5: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose -f docker/docker-compose.yml up -d --build

# Option 6: Try uv for better dependency resolution
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123

For local development, install PyTorch with CUDA support:

# With pip
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install chatterbox-tts

# With uv (handles this automatically)
uv sync

Windows Users, using pip & having issues:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install --force-reinstall typing_extensions

Port conflicts

# Change port
echo "PORT=4124" >> .env

GPU not detected

# Force CPU mode
echo "DEVICE=cpu" >> .env

Out of memory

# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env

Model download fails

# Clear cache and retry
rm -rf models/
uvicorn app.main:app --host 0.0.0.0 --port 4123  # or: uv run main.py

FastAPI startup issues

# Check if uvicorn is installed
uvicorn --version

# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

# Alternative startup method
python main.py
πŸ’» Development

Project Structure

This project follows a clean, modular architecture for maintainability:

app/                     # FastAPI backend application
β”œβ”€β”€ __init__.py           # Main package
β”œβ”€β”€ config.py            # Configuration management
β”œβ”€β”€ main.py              # FastAPI application
β”œβ”€β”€ models/              # Pydantic models
β”‚   β”œβ”€β”€ requests.py      # Request models
β”‚   └── responses.py     # Response models
β”œβ”€β”€ core/                # Core functionality
β”‚   β”œβ”€β”€ memory.py        # Memory management
β”‚   β”œβ”€β”€ text_processing.py # Text processing utilities
β”‚   └── tts_model.py     # TTS model management
└── api/                 # API endpoints
    β”œβ”€β”€ router.py        # Main router
    └── endpoints/       # Individual endpoint modules
        β”œβ”€β”€ speech.py    # TTS endpoint
        β”œβ”€β”€ health.py    # Health check
        β”œβ”€β”€ models.py    # Model listing
        β”œβ”€β”€ memory.py    # Memory management
        └── config.py    # Configuration

frontend/                # React frontend application
β”œβ”€β”€ src/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ nginx.conf          # Integrated proxy configuration
└── package.json

docker/                  # Docker files consolidated
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ Dockerfile.uv       # uv-optimized image
β”œβ”€β”€ Dockerfile.gpu      # GPU-enabled image
β”œβ”€β”€ Dockerfile.cpu      # CPU-only image
β”œβ”€β”€ Dockerfile.uv.gpu   # uv + GPU image
β”œβ”€β”€ docker-compose.yml  # Standard deployment
β”œβ”€β”€ docker-compose.uv.yml # uv deployment
β”œβ”€β”€ docker-compose.gpu.yml # GPU deployment
β”œβ”€β”€ docker-compose.uv.gpu.yml # uv + GPU deployment
└── docker-compose.cpu.yml # CPU-only deployment

tests/                   # Test suite
β”œβ”€β”€ test_api.py         # API tests
└── test_memory.py      # Memory tests

main.py                  # Main entry point
start.py                 # Development helper script

Quick Start Scripts

# Development mode with auto-reload
python start.py dev

# Production mode
python start.py prod

# Full Stack mode with UI (using Docker)
python start.py fullstack

# Run tests
python start.py test

# View project structure
python start.py info

Local Development

# Install in development mode (pip)
pip install -e .

# Or with uv (basic development tools)
uv sync

# Or with test dependencies (for contributors)
uv sync --group test

# Start with auto-reload (FastAPI development)
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

# Or use the main script
python main.py

# Or use the development helper
python start.py dev

Testing

# Run API tests
python tests/test_api.py  # or: uv run tests/test_api.py

# Run memory tests
python tests/test_memory.py

# Test specific endpoint
curl http://localhost:4123/health

# Check API documentation
curl http://localhost:4123/openapi.json

FastAPI Development Features

  • Auto-reload: Use --reload flag for development
  • Interactive docs: Visit /docs for live API testing
  • Type hints: Full IDE support with Pydantic models
  • Validation: Automatic request/response validation
  • Modular structure: Easy to extend and maintain
🀝 Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Ensure FastAPI docs are updated
  6. Submit a pull request

Support


πŸ”— Integrations

Open WebUI

Tip

Customize available voices first by using the frontend at http://localhost:4321

To use Chatterbox TTS API with Open WebUI, follow these steps:

  • Open the Admin Panel and go to Settings -> Audio
  • Set your TTS Settings to match the following:
    • Text-to-Speech Engine: OpenAI
    • API Base URL: http://localhost:4123/v1 # alternatively, try http://host.docker.internal:4123/v1
    • API Key: none
    • TTS Model: tts-1 or tts-1-hd
    • TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
    • Response splitting: Paragraphs

Settings to integrate Chatterbox TTS API with Open WebUI