# 🎤 ChatterBox TTS - Simple & Reliable Colab Edition

**A streamlined approach to running ResembleAI's ChatterBox TTS in Google Colab**

## ✨ Features
- 🎯 **Zero-shot TTS**: Generate speech from any text
- 🎭 **Voice Cloning**: Clone voices from audio samples
- 🎨 **Emotion Control**: Adjust expressiveness and intensity
- 🚀 **GPU Acceleration**: Fast generation with Colab's GPUs
- 🌐 **Web Interface**: Beautiful Gradio UI
- 🔧 **Simplified Setup**: Works with Colab's existing environment

## 🚀 Quick Start
1. **Enable GPU**: Runtime → Change runtime type → GPU
2. **Run all cells** below in order
3. **Access the interface** through the Gradio link
4. **Start generating speech**!

---

## 📦 Step 1: Environment Setup & Dependencies

This approach works with Colab's existing PyTorch installation for maximum compatibility.

In [None]:
import os
import sys
import subprocess
import importlib

print("🔍 Checking Colab environment...")
print(f"Python version: {sys.version}")

# Check if we're in Colab
try:
    import google.colab
    IN_COLAB = True
    print("✅ Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("⚠️ Not running in Colab - some features may not work")

# Check existing PyTorch installation
try:
    import torch
    print(f"✅ PyTorch {torch.__version__} already installed")
    print(f"🎮 CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"📱 GPU: {torch.cuda.get_device_name(0)}")
        print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
except ImportError:
    print("❌ PyTorch not found - installing...")
    !pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

print("\n🔧 Installing core dependencies...")

In [None]:
# Install essential packages that work well with Colab
packages_to_install = [
    "gradio>=4.0.0",
    "soundfile",
    "librosa",
    "numpy>=1.24.0",
    "transformers>=4.40.0",
    "accelerate",
    "safetensors",
    "omegaconf",
    "einops"
]

print("📦 Installing compatible packages...")
for package in packages_to_install:
    try:
        subprocess.run([sys.executable, "-m", "pip", "install", package], 
                      check=True, capture_output=True, text=True)
        print(f"✅ {package}")
    except subprocess.CalledProcessError as e:
        print(f"⚠️ {package} - {e.stderr.strip()[:100]}...")

print("\n🎤 Installing ChatterBox TTS...")
# Try multiple installation methods for ChatterBox TTS
chatterbox_installed = False

# Method 1: Try PyPI first
try:
    subprocess.run([sys.executable, "-m", "pip", "install", "chatterbox-tts"], 
                  check=True, capture_output=True, text=True)
    print("✅ ChatterBox TTS installed from PyPI")
    chatterbox_installed = True
except subprocess.CalledProcessError:
    print("⚠️ PyPI installation failed, trying GitHub...")

# Method 2: Try GitHub if PyPI fails
if not chatterbox_installed:
    try:
        subprocess.run([sys.executable, "-m", "pip", "install", 
                       "git+https://github.com/resemble-ai/chatterbox.git"], 
                      check=True, capture_output=True, text=True)
        print("✅ ChatterBox TTS installed from GitHub")
        chatterbox_installed = True
    except subprocess.CalledProcessError:
        print("❌ GitHub installation also failed")

# Method 3: Manual installation if both fail
if not chatterbox_installed:
    print("🔧 Attempting manual installation...")
    !git clone https://github.com/resemble-ai/chatterbox.git /tmp/chatterbox
    !cd /tmp/chatterbox && pip install -e .
    chatterbox_installed = True

print("\n🎉 Installation complete!")

## 🧪 Step 2: Test Installation & Compatibility

Let's verify everything is working correctly.

In [None]:
def test_import(module_name, friendly_name=None):
    """Test if a module can be imported successfully"""
    if friendly_name is None:
        friendly_name = module_name
    
    try:
        module = importlib.import_module(module_name)
        version = getattr(module, '__version__', 'unknown')
        print(f"✅ {friendly_name}: {version}")
        return True, module
    except Exception as e:
        print(f"❌ {friendly_name}: {str(e)[:100]}...")
        return False, None

print("🔍 Testing all imports...")
print("=" * 50)

# Test core dependencies
success_count = 0
total_tests = 0

modules_to_test = [
    ("torch", "PyTorch"),
    ("torchaudio", "TorchAudio"),
    ("transformers", "Transformers"),
    ("gradio", "Gradio"),
    ("soundfile", "SoundFile"),
    ("librosa", "Librosa"),
    ("numpy", "NumPy")
]

for module_name, friendly_name in modules_to_test:
    success, _ = test_import(module_name, friendly_name)
    if success:
        success_count += 1
    total_tests += 1

# Test ChatterBox TTS specifically
print("\n🎤 Testing ChatterBox TTS...")
try:
    from chatterbox.tts import ChatterboxTTS
    print("✅ ChatterBox TTS: Import successful")
    success_count += 1
except Exception as e:
    print(f"❌ ChatterBox TTS: {str(e)[:100]}...")
total_tests += 1

# Test GPU availability
print("\n🎮 GPU Status:")
try:
    import torch
    if torch.cuda.is_available():
        print(f"✅ CUDA available: {torch.cuda.get_device_name(0)}")
        print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
        print(f"🔧 CUDA Version: {torch.version.cuda}")
    else:
        print("⚠️ CUDA not available - will use CPU (slower)")
except:
    print("❌ Could not check GPU status")

print("\n" + "=" * 50)
print(f"📊 Test Results: {success_count}/{total_tests} passed")

if success_count == total_tests:
    print("🎉 All tests passed! Ready to proceed.")
elif success_count >= total_tests - 1:
    print("✅ Most tests passed. Should work with minor issues.")
else:
    print("⚠️ Some tests failed. May encounter issues.")
    print("💡 Try restarting runtime and running again.")

## 🚀 Step 3: Launch ChatterBox TTS Interface

Create and launch the Gradio web interface.

In [None]:
import gradio as gr
import torch
import torchaudio
import numpy as np
import tempfile
import os
from pathlib import Path

# Global variables for model
model = None
model_loaded = False

def load_model():
    """Load the ChatterBox TTS model"""
    global model, model_loaded
    
    if model_loaded:
        return "✅ Model already loaded!"
    
    try:
        print("🔄 Loading ChatterBox TTS model...")
        from chatterbox.tts import ChatterboxTTS
        
        # Determine device
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"🎮 Using device: {device}")
        
        # Load model
        model = ChatterboxTTS.from_pretrained(device=device)
        model_loaded = True
        
        return f"✅ Model loaded successfully on {device}!"
        
    except Exception as e:
        error_msg = f"❌ Failed to load model: {str(e)}"
        print(error_msg)
        return error_msg

def generate_speech(text, audio_file=None, exaggeration=0.5, cfg_weight=0.5):
    """Generate speech from text with optional voice cloning"""
    global model, model_loaded
    
    if not model_loaded or model is None:
        return None, "❌ Please load the model first!"
    
    if not text.strip():
        return None, "❌ Please enter some text to synthesize!"
    
    try:
        print(f"🎤 Generating speech for: '{text[:50]}...'")
        
        # Generate speech
        if audio_file is not None:
            # Voice cloning mode
            print("🎭 Using voice cloning mode")
            wav = model.generate(
                text, 
                audio_prompt_path=audio_file,
                exaggeration=exaggeration,
                cfg_weight=cfg_weight
            )
        else:
            # Standard TTS mode
            print("🎯 Using standard TTS mode")
            wav = model.generate(
                text,
                exaggeration=exaggeration,
                cfg_weight=cfg_weight
            )
        
        # Save to temporary file
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
            torchaudio.save(tmp_file.name, wav, model.sr)
            
        success_msg = f"✅ Generated {wav.shape[1]/model.sr:.1f}s of audio"
        print(success_msg)
        
        return tmp_file.name, success_msg
        
    except Exception as e:
        error_msg = f"❌ Generation failed: {str(e)}"
        print(error_msg)
        return None, error_msg

print("🎨 Creating Gradio interface...")

In [None]:
# Create the Gradio interface
with gr.Blocks(title="ChatterBox TTS - Simple Colab Edition", theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # 🎤 ChatterBox TTS - Simple & Reliable
    
    **State-of-the-art Text-to-Speech and Voice Cloning**
    
    Generate natural-sounding speech from text, or clone voices from audio samples!
    """)
    
    with gr.Row():
        with gr.Column():
            # Model loading section
            gr.Markdown("### 🔧 Model Setup")
            load_btn = gr.Button("🚀 Load ChatterBox TTS Model", variant="primary", size="lg")
            load_status = gr.Textbox(label="Status", interactive=False, value="Click to load model...")
            
            # Text input
            gr.Markdown("### 📝 Text Input")
            text_input = gr.Textbox(
                label="Text to synthesize",
                placeholder="Enter the text you want to convert to speech...",
                lines=3,
                value="Hello! This is ChatterBox TTS, a state-of-the-art text-to-speech model. I can generate natural-sounding speech from any text you provide."
            )
            
            # Voice cloning section
            gr.Markdown("### 🎭 Voice Cloning (Optional)")
            audio_input = gr.Audio(
                label="Reference audio for voice cloning",
                type="filepath",
                sources=["upload", "microphone"]
            )
            gr.Markdown("*Upload 3-10 seconds of clear speech to clone a voice*")
            
            # Advanced settings
            with gr.Accordion("⚙️ Advanced Settings", open=False):
                exaggeration = gr.Slider(
                    minimum=0.0,
                    maximum=1.0,
                    value=0.5,
                    step=0.1,
                    label="Exaggeration (emotion intensity)",
                    info="Higher values = more expressive speech"
                )
                cfg_weight = gr.Slider(
                    minimum=0.0,
                    maximum=1.0,
                    value=0.5,
                    step=0.1,
                    label="CFG Weight (speech pacing)",
                    info="Lower values = slower, more deliberate speech"
                )
        
        with gr.Column():
            # Generation section
            gr.Markdown("### 🎵 Generated Audio")
            generate_btn = gr.Button("🎤 Generate Speech", variant="primary", size="lg")
            generation_status = gr.Textbox(label="Generation Status", interactive=False)
            
            audio_output = gr.Audio(
                label="Generated Speech",
                type="filepath",
                interactive=False
            )
            
            # Tips section
            gr.Markdown("""
            ### 💡 Tips for Best Results
            
            **General Use:**
            - Default settings (0.5, 0.5) work well for most text
            - For fast speakers, try lowering CFG weight to ~0.3
            
            **Expressive Speech:**
            - Lower CFG weight (~0.3) + higher exaggeration (~0.7+)
            - Higher exaggeration speeds up speech; lower CFG compensates
            
            **Voice Cloning:**
            - Use 3-10 seconds of clear, high-quality audio
            - Single speaker, minimal background noise
            - The model will mimic the reference voice's characteristics
            """)
    
    # Event handlers
    load_btn.click(
        fn=load_model,
        outputs=load_status
    )
    
    generate_btn.click(
        fn=generate_speech,
        inputs=[text_input, audio_input, exaggeration, cfg_weight],
        outputs=[audio_output, generation_status]
    )

# Launch the interface
print("🚀 Launching Gradio interface...")
demo.launch(
    share=True,
    debug=True,
    show_error=True,
    server_port=7860
)

## 📖 Usage Guide

### 🚀 Getting Started
1. **Load Model**: Click "🚀 Load ChatterBox TTS Model" (first time takes ~2-3 minutes)
2. **Enter Text**: Type your text in the input box
3. **Generate**: Click "🎤 Generate Speech"
4. **Download**: Right-click the audio player to save your generated speech

### 🎭 Voice Cloning
1. **Upload Reference**: Use a 3-10 second audio clip of the target voice
2. **Quality Matters**: Clear speech, minimal background noise
3. **Generate**: The output will mimic the reference voice characteristics

### ⚙️ Parameter Guide
- **Exaggeration (0.0-1.0)**: Controls emotion intensity and expressiveness
  - `0.0-0.3`: Calm, neutral speech
  - `0.4-0.6`: Natural expressiveness (recommended)
  - `0.7-1.0`: Highly expressive, dramatic speech

- **CFG Weight (0.0-1.0)**: Controls speech pacing and adherence to text
  - `0.0-0.3`: Slower, more deliberate speech
  - `0.4-0.6`: Natural pacing (recommended)
  - `0.7-1.0`: Faster, more direct speech

### 🎯 Optimization Tips
- **For Podcasts/Narration**: `exaggeration=0.4, cfg_weight=0.5`
- **For Character Voices**: `exaggeration=0.7, cfg_weight=0.3`
- **For News/Professional**: `exaggeration=0.3, cfg_weight=0.6`
- **For Audiobooks**: `exaggeration=0.5, cfg_weight=0.4`

---

## 🔧 Troubleshooting

### ❌ Common Issues & Solutions

**"Model failed to load"**
- ✅ Restart runtime: Runtime → Restart Runtime
- ✅ Re-run all cells from Step 1
- ✅ Check GPU is enabled: Runtime → Change runtime type → GPU

**"Generation failed"**
- ✅ Make sure model is loaded first
- ✅ Check text input is not empty
- ✅ Try shorter text (< 200 characters)
- ✅ Restart runtime if memory issues occur

**"Slow generation"**
- ✅ Ensure GPU is enabled and detected
- ✅ Check Step 2 shows CUDA available
- ✅ Close other browser tabs to free memory

**"Voice cloning not working"**
- ✅ Use 3-10 second audio clips
- ✅ Ensure clear speech, single speaker
- ✅ Try WAV format for best results
- ✅ Check audio file uploaded successfully

### 🆘 Emergency Reset
If nothing works:
1. Runtime → Restart Runtime
2. Edit → Clear all outputs
3. Run all cells again from the beginning

### 📊 Performance Notes
- **First load**: ~2-3 minutes (downloads ~500MB model)
- **Generation time**: 5-15 seconds per sentence (GPU)
- **Memory usage**: ~2-3GB GPU memory
- **Audio quality**: 22kHz, watermarked for responsible AI use

---

## 🔗 Resources

- **Original Model**: [ResembleAI ChatterBox](https://github.com/resemble-ai/chatterbox)
- **Demo Page**: [Official Samples](https://resemble-ai.github.io/chatterbox_demopage/)
- **Hugging Face**: [Model Card](https://huggingface.co/ResembleAI/chatterbox)
- **Discord**: [Join Community](https://discord.gg/rJq9cRJBJ6)

**Built with ❤️ for the open-source community**

*This notebook uses a simplified approach that works with Colab's existing environment for maximum reliability.*