# 🎙️ ChatterBox TTS - Kaggle Demo

Welcome to the **ChatterBox TTS** demonstration notebook! This notebook showcases Resemble AI's state-of-the-art open-source text-to-speech model.

## 🌟 Key Features
- **Zero-shot TTS**: Generate speech from any text without training
- **Voice Cloning**: Clone voices from short audio samples
- **Emotion Control**: Adjust speech intensity and exaggeration
- **High Quality**: Outperforms many commercial TTS systems
- **MIT Licensed**: Free for commercial use

## 📋 What You'll Learn
1. How to set up ChatterBox TTS in Kaggle
2. Basic text-to-speech generation
3. Advanced parameter tuning
4. Voice cloning techniques
5. Error handling and troubleshooting

## 🔧 Environment Setup

First, let's install all the required dependencies. This may take a few minutes on first run.

In [None]:
# Install ChatterBox TTS and dependencies
import subprocess
import sys

def install_package(package):
    """Install a package with proper error handling"""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"✅ Successfully installed {package}")
    except subprocess.CalledProcessError as e:
        print(f"❌ Failed to install {package}: {e}")
        return False
    return True

# Core dependencies
packages = [
    "chatterbox-tts",
    "librosa",
    "IPython"
]

print("🚀 Installing ChatterBox TTS and dependencies...")
for package in packages:
    install_package(package)

print("\n✨ Installation complete!")

## 📦 Import Libraries & Device Detection

Let's import the necessary libraries and detect the best available compute device.

In [None]:
import torch
import torchaudio
import librosa
import numpy as np
import time
import warnings
from pathlib import Path
from IPython.display import Audio, display

# Import ChatterBox TTS
try:
    from chatterbox.tts import ChatterboxTTS
    print("✅ ChatterBox TTS imported successfully!")
except ImportError as e:
    print(f"❌ Failed to import ChatterBox TTS: {e}")
    print("Please ensure the installation completed successfully.")

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Device detection with fallback
def detect_device():
    """Detect the best available device for inference"""
    if torch.cuda.is_available():
        device = "cuda"
        gpu_name = torch.cuda.get_device_name(0)
        gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
        print(f"🚀 CUDA GPU detected: {gpu_name} ({gpu_memory:.1f}GB)")
    elif torch.backends.mps.is_available():
        device = "mps"
        print("🍎 Apple MPS detected")
    else:
        device = "cpu"
        print("💻 Using CPU (GPU not available)")
    
    return device

device = detect_device()
print(f"\n🎯 Selected device: {device}")

## 🤖 Load ChatterBox TTS Model

Now let's load the pre-trained ChatterBox TTS model. This will download the model weights on first run.

In [None]:
def load_chatterbox_model(device):
    """Load ChatterBox TTS model with error handling"""
    try:
        print("📥 Loading ChatterBox TTS model...")
        print("⏳ This may take a few minutes on first run (downloading ~2GB of model weights)")
        
        start_time = time.time()
        model = ChatterboxTTS.from_pretrained(device=device)
        load_time = time.time() - start_time
        
        print(f"✅ Model loaded successfully in {load_time:.1f} seconds!")
        print(f"🎵 Sample rate: {model.sr} Hz")
        
        return model
        
    except Exception as e:
        print(f"❌ Failed to load model: {e}")
        print("\n🔧 Troubleshooting tips:")
        print("1. Ensure you have internet access for model download")
        print("2. Check if you have sufficient disk space (~2GB)")
        print("3. Try restarting the kernel if you encounter memory issues")
        return None

# Load the model
model = load_chatterbox_model(device)

if model is not None:
    print("\n🎉 ChatterBox TTS is ready to generate speech!")
else:
    print("\n⚠️ Model loading failed. Please check the error messages above.")

## 🎤 Basic Text-to-Speech Generation

Let's start with some basic TTS examples using the default voice.

In [None]:
def generate_speech(model, text, save_path=None, **kwargs):
    """Generate speech from text with timing and error handling"""
    if model is None:
        print("❌ Model not loaded. Please run the model loading cell first.")
        return None
    
    try:
        print(f"🎯 Generating speech for: '{text[:50]}{'...' if len(text) > 50 else ''}'")
        
        start_time = time.time()
        wav = model.generate(text, **kwargs)
        generation_time = time.time() - start_time
        
        # Calculate audio duration
        audio_duration = wav.shape[-1] / model.sr
        rtf = generation_time / audio_duration  # Real-time factor
        
        print(f"✅ Generated {audio_duration:.1f}s of audio in {generation_time:.1f}s (RTF: {rtf:.2f}x)")
        
        # Save if path provided
        if save_path:
            torchaudio.save(save_path, wav, model.sr)
            print(f"💾 Saved to: {save_path}")
        
        return wav
        
    except Exception as e:
        print(f"❌ Generation failed: {e}")
        return None

# Sample texts for demonstration
sample_texts = [
    "Hello! Welcome to ChatterBox TTS, the state-of-the-art open source text-to-speech system.",
    "The quick brown fox jumps over the lazy dog. This pangram contains every letter of the alphabet.",
    "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole filled with worms and oozy smells."
]

# Generate speech for the first sample
if model is not None:
    print("🎵 Generating basic TTS example...\n")
    
    wav = generate_speech(model, sample_texts[0], save_path="basic_example.wav")
    
    if wav is not None:
        print("\n🔊 Click play to listen:")
        display(Audio(wav.squeeze().numpy(), rate=model.sr))
else:
    print("⚠️ Please load the model first before generating speech.")

## 🎛️ Advanced Parameter TuningChatterBox TTS offers several parameters to control the generated speech quality and characteristics.

In [None]:
def demonstrate_parameters(model, text):    """Demonstrate different parameter settings"""    if model is None:        print("❌ Model not loaded.")        return        print("🎛️ Exploring different parameter settings...\n")        # Parameter configurations to test    configs = [        {"name": "Default", "exaggeration": 0.5, "cfg_weight": 0.5, "temperature": 0.8},        {"name": "Calm & Controlled", "exaggeration": 0.2, "cfg_weight": 0.7, "temperature": 0.6},        {"name": "Expressive & Dynamic", "exaggeration": 0.8, "cfg_weight": 0.3, "temperature": 0.9},        {"name": "Dramatic & Intense", "exaggeration": 1.0, "cfg_weight": 0.2, "temperature": 1.0}    ]        for i, config in enumerate(configs):        print(f"\n📊 Configuration {i+1}: {config['name']}")        print(f"   Exaggeration: {config['exaggeration']}, CFG Weight: {config['cfg_weight']}, Temperature: {config['temperature']}")                wav = generate_speech(            model, text,            save_path=f"param_demo_{i+1}.wav",            exaggeration=config['exaggeration'],            cfg_weight=config['cfg_weight'],            temperature=config['temperature']        )                if wav is not None:            print(f"🔊 {config['name']} version:")            display(Audio(wav.squeeze().numpy(), rate=model.sr))# Demonstrate parameter variationsdemo_text = "This is a demonstration of ChatterBox TTS with different parameter settings. Notice how the emotion and intensity change!"if model is not None:    demonstrate_parameters(model, demo_text)else:    print("⚠️ Please load the model first.")

## 🎭 Voice Cloning with Audio PromptsOne of ChatterBox's most powerful features is voice cloning using short audio samples.

In [None]:
def create_sample_audio_prompt():    """Create a sample audio prompt for voice cloning demonstration"""    print("🎤 Creating sample audio prompt...")        # Generate a reference audio using the default voice    if model is not None:        reference_text = "Hello, this is a reference voice sample for cloning."        reference_wav = model.generate(reference_text)                # Save as reference        torchaudio.save("reference_voice.wav", reference_wav, model.sr)        print("✅ Sample reference audio created: reference_voice.wav")                print("\n🔊 Reference voice:")        display(Audio(reference_wav.squeeze().numpy(), rate=model.sr))                return "reference_voice.wav"    return Nonedef demonstrate_voice_cloning(model, reference_path, target_text):    """Demonstrate voice cloning with audio prompt"""    if model is None or not Path(reference_path).exists():        print("❌ Model not loaded or reference audio not found.")        return        print(f"\n🎭 Cloning voice from: {reference_path}")    print(f"📝 Target text: '{target_text}'")        try:        # Generate with voice cloning        cloned_wav = model.generate(            target_text,            audio_prompt_path=reference_path,            exaggeration=0.5        )                # Save cloned audio        torchaudio.save("cloned_voice.wav", cloned_wav, model.sr)        print("✅ Voice cloning successful!")                print("\n🔊 Cloned voice speaking new text:")        display(Audio(cloned_wav.squeeze().numpy(), rate=model.sr))            except Exception as e:        print(f"❌ Voice cloning failed: {e}")# Voice cloning demonstrationif model is not None:    print("🎭 Voice Cloning Demonstration\n")        # Create sample reference audio    reference_path = create_sample_audio_prompt()        if reference_path:        # Clone voice with new text        new_text = "Now I'm speaking completely different words, but with the same voice characteristics!"        demonstrate_voice_cloning(model, reference_path, new_text)else:    print("⚠️ Please load the model first.")

## 📊 Interactive Text GenerationTry generating speech with your own text input!

In [None]:
def interactive_tts_generation():    """Interactive TTS generation with user input"""    if model is None:        print("❌ Model not loaded. Please run the model loading cell first.")        return        print("🎤 Interactive TTS Generation")    print("Enter your text below and run the cell to generate speech!\n")        # User input text (modify this to test different texts)    user_text = "Welcome to the future of text-to-speech technology! ChatterBox TTS brings your words to life with incredible realism and emotion."        # Customizable parameters    exaggeration = 0.6  # 0.0 to 1.0+ (higher = more expressive)    cfg_weight = 0.5    # 0.0 to 1.0 (higher = more controlled)    temperature = 0.8   # 0.0 to 1.0+ (higher = more varied)        print(f"📝 Text: '{user_text}'")    print(f"🎛️ Parameters: exaggeration={exaggeration}, cfg_weight={cfg_weight}, temperature={temperature}")        # Generate speech    wav = generate_speech(        model, user_text,        save_path="interactive_output.wav",        exaggeration=exaggeration,        cfg_weight=cfg_weight,        temperature=temperature    )        if wav is not None:        print("\n🔊 Generated audio:")        display(Audio(wav.squeeze().numpy(), rate=model.sr))                print("\n💡 Tips for better results:")        print("• Modify the 'user_text' variable above with your own text")        print("• Adjust 'exaggeration' for more/less emotional speech")        print("• Increase 'cfg_weight' for more controlled, stable output")        print("• Adjust 'temperature' for more/less variation in speech")# Run interactive generationinteractive_tts_generation()

## 🔧 System Diagnostics & TroubleshootingCheck your system status and get help with common issues.

In [None]:
def system_diagnostics():    """Run system diagnostics to identify potential issues"""    print("🔍 Running System Diagnostics...\n")        # Check PyTorch installation    print(f"🐍 Python version: {sys.version.split()[0]}")    print(f"🔥 PyTorch version: {torch.__version__}")    print(f"🎵 TorchAudio version: {torchaudio.__version__}")        # Check device availability    print(f"\n💻 Device Information:")    print(f"   Current device: {device}")    print(f"   CUDA available: {torch.cuda.is_available()}")    if torch.cuda.is_available():        print(f"   CUDA version: {torch.version.cuda}")        print(f"   GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")        # Check model status    print(f"\n🤖 Model Status:")    if model is not None:        print("   ✅ Model loaded successfully")        print(f"   📊 Sample rate: {model.sr} Hz")        print(f"   🎯 Device: {model.device}")    else:        print("   ❌ Model not loaded")        print("\n📋 Common Issues & Solutions:")    print("\n1. 🚫 'CUDA out of memory' error:")    print("   • Restart kernel and try again")    print("   • Use CPU instead: device='cpu'")    print("   • Generate shorter text segments")        print("\n2. 🐌 Slow generation on CPU:")    print("   • This is normal - CPU inference is slower")    print("   • Consider using GPU if available")    print("   • Generate shorter texts for faster results")# Run diagnosticssystem_diagnostics()

## 📁 File ManagementList and manage your generated audio files.

In [None]:
import osfrom pathlib import Pathdef list_generated_files():    """List all generated audio files"""    print("📁 Generated Audio Files:\n")        wav_files = list(Path('.').glob('*.wav'))        if wav_files:        for i, file_path in enumerate(wav_files, 1):            file_size = file_path.stat().st_size / 1024  # KB            print(f"{i}. {file_path.name} ({file_size:.1f} KB)")                print(f"\n📊 Total files: {len(wav_files)}")        total_size = sum(f.stat().st_size for f in wav_files) / 1024        print(f"📦 Total size: {total_size:.1f} KB")                print("\n💾 Download Instructions:")        print("In Kaggle, you can download files by:")        print("1. Going to the 'Output' tab on the right")        print("2. Finding your .wav files in the file list")        print("3. Clicking the download button next to each file")    else:        print("No .wav files found. Generate some audio first!")        return wav_files# List generated filesgenerated_files = list_generated_files()

## 🎉 Conclusion & Next StepsCongratulations! You've successfully explored ChatterBox TTS capabilities.### 🌟 What You've Learned✅ **Setup & Installation**: How to install and configure ChatterBox TTS in Kaggle  ✅ **Basic TTS**: Generate speech from text with default settings  ✅ **Parameter Tuning**: Control emotion, stability, and variation in speech  ✅ **Voice Cloning**: Clone voices using audio prompts  ✅ **Troubleshooting**: Handle common issues and optimize performance  ### 🚀 Next Steps1. **Experiment with Your Own Content**:   - Try different text styles (poetry, dialogue, technical content)   - Upload your own reference audio for voice cloning   - Test various parameter combinations2. **Integration Ideas**:   - Build a chatbot with voice responses   - Create audiobooks from text   - Generate voiceovers for videos   - Develop accessibility tools### 📚 Resources- **GitHub Repository**: [https://github.com/resemble-ai/chatterbox](https://github.com/resemble-ai/chatterbox)- **Hugging Face Demo**: [https://huggingface.co/spaces/ResembleAI/Chatterbox](https://huggingface.co/spaces/ResembleAI/Chatterbox)- **Discord Community**: [https://discord.gg/XqS7RxUp](https://discord.gg/XqS7RxUp)---**Happy voice synthesis! 🎤✨***Made with ♥️ by [Resemble AI](https://resemble.ai)*