<a href="https://colab.research.google.com/github/cmanderskronquist/getting_started_with_ai/blob/main/chatterbox_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎙️ Chatterbox TTS & Voice Conversion - Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bhootnaat22/chatterbox-colab/blob/master/chatterbox_colab.ipynb)

Welcome to Chatterbox - Resemble AI's open-source TTS and Voice Conversion model!

This notebook provides:
- 🎯 **Text-to-Speech (TTS)** with emotion control
- 🔄 **Voice Conversion** to change voice characteristics
- 🌐 **Live Gradio Interface** accessible via public URL
- ⚡ **GPU acceleration** when available

## Features:
- Zero-shot TTS with custom voice cloning
- Emotion/exaggeration control
- Voice conversion between different speakers
- Built-in watermarking for responsible AI
- Easy-to-use web interface

## 🚀 Setup and Installation

First, let's install all the required dependencies:

In [1]:
# Install system dependencies
!apt-get update
#!apt-get install -y -qq ffmpeg
!apt-get install python3.11 python3.11-venv
!apt-get install -y -qq build-essential python3.11-dev pkg-config cmake ffmpeg \
                     libffi-dev libssl-dev libsndfile1
!apt-get install -y -qq cargo rustc
!rm -r .chatterbox
!python3.11 -m venv .chatterbox
!source .chatterbox/bin/activate
!/content/.chatterbox/bin/python --version
!/content/.chatterbox/bin/python -m pip install -q --upgrade pip setuptools wheel
# Install Python dependencies
!/content/.chatterbox/bin/python -m pip install numpy
!/content/.chatterbox/bin/python -m pip install chatterbox-tts gradio

#print("✅ Installation complete!")

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libpython3.11-minimal libpython3.11-stdlib python3.11-minimal
Suggested packages:
  binfmt-support
The following NEW packages will be installed:
  libpython3.11-minimal libpython3.11-stdlib python3.11 python3.11-minimal
  python3.11-venv
0 upgraded, 5 newly installed, 0 to remove and 38 not upgraded.
Need to get 8,279 kB of archives.
After this operation, 24.4 MB of additional disk space will be used.
Get:1 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy/main amd64 libpython3.11-minimal amd64 3.11.13-1+jammy1 [887 kB]
Get:2 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy/main amd64 python3.11-minimal amd64 3.11.13-1+jammy1 [2,356 kB]
Get:3 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy/main amd64 libpython3.11-stdlib amd64 3.11.13-1+jammy1 [1,925 kB]
Get:4 https://ppa.launchpadcontent.net/deadsnak

## 🔧 Device Configuration

Let's check what hardware we have available:

In [2]:
import torch
import platform

# Check available devices
if torch.cuda.is_available():
    device = "cuda"
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"🚀 GPU Available: {gpu_name}")
    print(f"💾 GPU Memory: {gpu_memory:.1f} GB")
elif torch.backends.mps.is_available():
    device = "mps"
    print("🍎 Apple MPS Available")
else:
    device = "cpu"
    print("💻 Using CPU")

print(f"🎯 Selected device: {device}")
print(f"🐍 Python version: {platform.python_version()}")
print(f"🔥 PyTorch version: {torch.__version__}")

🚀 GPU Available: Tesla T4
💾 GPU Memory: 14.7 GB
🎯 Selected device: cuda
🐍 Python version: 3.12.11
🔥 PyTorch version: 2.8.0+cu126


## 📱 Download Gradio App

Let's download the optimized Gradio application:

In [3]:
# Download the Colab-optimized Gradio app
!wget -q https://raw.githubusercontent.com/bhootnaat22/chatterbox-colab/master/colab_gradio_app.py

print("✅ Gradio app downloaded!")

✅ Gradio app downloaded!


## 🌐 Launch Gradio Interface

Now let's launch the interactive web interface!

**Note:** The first run will download the model weights (~2GB), which may take a few minutes.

In [None]:
# Launch the Gradio app
!/content/.chatterbox/bin/python colab_gradio_app.py

  from pkg_resources import resource_filename
🎯 Using device: cuda
🚀 Starting Chatterbox Gradio App...
📱 This may take a few minutes on first run (downloading models)
* Running on local URL:  http://0.0.0.0:7860
* Running on public URL: https://1d91165e994925ad1d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
🔄 Loading TTS model...
ve.safetensors: 100% 5.70M/5.70M [00:01<00:00, 2.91MB/s]
t3_cfg.safetensors: 100% 2.13G/2.13G [01:49<00:00, 19.5MB/s]
s3gen.safetensors: 100% 1.06G/1.06G [01:44<00:00, 10.1MB/s]
tokenizer.json: 25.5kB [00:00, 69.2MB/s]
conds.pt: 100% 107k/107k [00:00<00:00, 258kB/s]
loaded PerthNet (Implicit) at step 250,000
✅ TTS model loaded successfully!
TTS Error: 
Traceback (most recent call last):
  File "/content/.chatterbox/lib/python3.11/site-packages/librosa/core/audio.py", line 176, in load
  

## 📖 Usage Instructions

### Text-to-Speech (TTS)
1. **Enter text** you want to synthesize (max 300 characters)
2. **Upload reference audio** (optional) to clone a specific voice
3. **Adjust parameters:**
   - **Exaggeration**: Controls emotion intensity (0.5 = neutral)
   - **CFG/Pace**: Controls generation pace and quality
   - **Temperature**: Controls randomness in generation
4. **Click Generate** to create speech

### Voice Conversion (VC)
1. **Upload source audio** you want to convert
2. **Upload target voice** (optional) to specify the target voice
3. **Click Submit** to convert the voice

### Tips for Best Results
- Use clear, high-quality reference audio (3-10 seconds)
- For dramatic speech: lower CFG (~0.3) + higher exaggeration (~0.7)
- For natural speech: keep default settings (exaggeration=0.5, CFG=0.5)
- Reference audio should match the desired speaking style

## 🎵 Quick Test Examples

Want to test the models quickly? Run these examples:

In [None]:
# Quick TTS test
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

print("Loading TTS model...")
model = ChatterboxTTS.from_pretrained(device=device)

text = "Hello! This is Chatterbox TTS running in Google Colab. How does it sound?"
print(f"Generating speech for: '{text}'")

wav = model.generate(text)
ta.save("test_output.wav", wav, model.sr)

print("✅ TTS test complete! Audio saved as 'test_output.wav'")

# Play the audio in Colab
from IPython.display import Audio, display
display(Audio("test_output.wav"))

## 🔧 Troubleshooting

### Common Issues:

1. **Out of Memory Error**:
   - Restart runtime: `Runtime → Restart Runtime`
   - Use shorter text inputs
   - Try CPU mode if GPU memory is insufficient

2. **Model Download Issues**:
   - Check internet connection
   - Restart and try again
   - Models are ~2GB total

3. **Audio Quality Issues**:
   - Use high-quality reference audio (16kHz+)
   - Keep reference audio 3-10 seconds long
   - Adjust exaggeration and CFG parameters

4. **Gradio Interface Not Loading**:
   - Wait for model initialization to complete
   - Check the public URL in the output
   - Try refreshing the browser

### Performance Tips:
- GPU: ~10-30 seconds per generation
- CPU: ~1-3 minutes per generation
- Shorter texts generate faster
- First generation takes longer (model loading)

## 🎉 Enjoy Chatterbox!

You now have a fully functional TTS and Voice Conversion system running in Google Colab!

### Links:
- 🏠 [Chatterbox GitHub](https://github.com/resemble-ai/chatterbox)
- 🤗 [Hugging Face Space](https://huggingface.co/spaces/ResembleAI/Chatterbox)
- 🎵 [Demo Samples](https://resemble-ai.github.io/chatterbox_demopage/)
- 💬 [Discord Community](https://discord.gg/rJq9cRJBJ6)

### Made with ❤️ by [Resemble AI](https://resemble.ai)

---
*Remember to use this technology responsibly and ethically!*