# 🎙️ ChatterBox TTS - Gradio UI Demo (Google Colab)

[\![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Wamp1re-Ai/chatterbox/blob/master/chatterbox_colab_gradio.ipynb)

Welcome to the **ChatterBox TTS Gradio Interface** on Google Colab\! This notebook demonstrates how to run the enhanced Gradio UI with public URL sharing.

## 🌟 What You'll Get
- **🌐 Live Gradio Interface**: Professional web UI with public URL
- **🎤 Text-to-Speech**: Generate speech from any text
- **🎭 Voice Cloning**: Clone voices using audio samples
- **🔗 Public URL**: Share the interface with anyone worldwide
- **🎛️ Advanced Controls**: Emotion, stability, and variation settings
- **🚀 GPU Acceleration**: Fast generation with Colab's free GPU

## 🚀 Quick Start
1. **Enable GPU**: Runtime → Change runtime type → GPU
2. **Run all cells**: Runtime → Run all (Ctrl+F9)
3. **Wait for launch**: Look for the public URL in the output
4. **Share & Test**: Use the URL to access from anywhere

---

## ⚙️ Setup & Installation

First, let's check the environment and install all dependencies.

In [None]:
# Check environment and GPU availability
import torch
import subprocess
import sys
import os

print("🔍 Environment Check")
print("=" * 40)
print(f"🐍 Python: {sys.version.split()[0]}")
print(f"🔥 PyTorch: {torch.__version__}")
print(f"🖥️ CUDA Available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
    device = "cuda"
else:
    print("💻 Using CPU (consider enabling GPU in Runtime settings)")
    device = "cpu"

print(f"
🎯 Selected Device: {device}")
print()

# Check if we're in Colab
try:
    import google.colab
    IN_COLAB = True
    print("✅ Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("ℹ️ Not running in Colab")

In [None]:
# Install ChatterBox TTS and Gradio
print("📦 Installing Dependencies")
print("=" * 40)
print("⏳ This may take 3-5 minutes...")
print()

# Install packages with progress
packages = [
    "gradio>=4.0.0",
    "chatterbox-tts",
    "librosa>=0.9.0"
]

for i, package in enumerate(packages, 1):
    print(f"📥 [{i}/{len(packages)}] Installing {package}...")
    try:
        result = subprocess.run([
            sys.executable, "-m", "pip", "install", package, "--quiet"
        ], check=True, capture_output=True, text=True)
        print(f"✅ {package} installed successfully")
    except subprocess.CalledProcessError as e:
        print(f"❌ Failed to install {package}")
        print(f"Error: {e.stderr}")

print()
print("🎉 Installation complete\!")
print("🔄 Restarting runtime to load new packages...")

# Restart runtime in Colab to ensure packages are loaded
if IN_COLAB:
    print("⚠️ Runtime will restart automatically. Please run the next cell after restart.")
    os.kill(os.getpid(), 9)  # Force restart in Colab

## 🔄 After Runtime Restart

**Run this cell after the runtime restarts** to load the installed packages and set up the environment.

In [None]:
# Load packages after restart
import gradio as gr
import torch
import torchaudio
import numpy as np
import time
import random
import warnings
from pathlib import Path
from datetime import datetime

# Suppress warnings
warnings.filterwarnings('ignore')

# Configuration
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MAX_TEXT_LENGTH = 500
model = None

print("📦 Packages loaded successfully\!")
print(f"🎯 Device: {DEVICE}")

if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
else:
    print("💻 Using CPU")

In [None]:
# Load ChatterBox TTS model
def load_chatterbox_model():
    """Load ChatterBox TTS model"""
    global model
    
    if model is not None:
        return model, "✅ Model already loaded"
    
    try:
        from chatterbox.tts import ChatterboxTTS
        
        print("📥 Loading ChatterBox TTS model...")
        print("⏳ First run: downloading ~2GB model weights...")
        print("🚀 Using GPU acceleration for faster loading\!")
        
        start_time = time.time()
        model = ChatterboxTTS.from_pretrained(device=DEVICE)
        load_time = time.time() - start_time
        
        status = f"✅ Model loaded in {load_time:.1f}s\n"
        status += f"🎵 Sample rate: {model.sr} Hz\n"
        status += f"🎯 Device: {model.device}"
        
        print(status)
        return model, status
        
    except Exception as e:
        error_msg = f"❌ Model loading failed: {str(e)}"
        print(error_msg)
        return None, error_msg

# Load the model
print("🤖 Loading ChatterBox TTS Model")
print("=" * 40)
model, status = load_chatterbox_model()
print("
" + "=" * 40)
if model:
    print("🎉 Ready to generate speech\!")
else:
    print("❌ Model loading failed. Check the error above.")

## 🎛️ Gradio Interface Functions

Define the core functions for the Gradio interface.

In [None]:
# TTS generation and helper functions
def generate_speech(text, reference_audio, exaggeration, cfg_weight, temperature, seed, preset):
    """Generate TTS audio"""
    
    if not text or not text.strip():
        return None, "❌ Please enter some text to synthesize"
    
    if len(text) > MAX_TEXT_LENGTH:
        return None, f"❌ Text too long. Maximum {MAX_TEXT_LENGTH} characters."
    
    if model is None:
        return None, "❌ Model not loaded. Please run the model loading cell."
    
    try:
        # Apply preset if selected
        if preset \!= "Custom":
            preset_configs = get_preset_configs()
            if preset in preset_configs:
                config = preset_configs[preset]
                exaggeration = config["exaggeration"]
                cfg_weight = config["cfg_weight"]
                temperature = config["temperature"]
        
        # Set seed
        if seed == 0:
            seed = random.randint(1, 1000000)
        torch.manual_seed(seed)
        
        # Generate
        start_time = time.time()
        
        params = {
            "exaggeration": exaggeration,
            "cfg_weight": cfg_weight,
            "temperature": temperature
        }
        
        if reference_audio:
            params["audio_prompt_path"] = reference_audio
        
        wav = model.generate(text, **params)
        generation_time = time.time() - start_time
        
        # Convert to numpy
        audio_np = wav.squeeze(0).numpy()
        duration = len(audio_np) / model.sr
        rtf = generation_time / duration
        
        status = f"✅ Generated {duration:.1f}s audio in {generation_time:.1f}s\n"
        status += f"📊 RTF: {rtf:.2f}x | Seed: {seed}\n"
        status += f"🎛️ exag={exaggeration:.1f}, cfg={cfg_weight:.1f}, temp={temperature:.1f}"
        
        return (model.sr, audio_np), status
        
    except Exception as e:
        return None, f"❌ Generation failed: {str(e)}"

def get_preset_configs():
    """Preset configurations"""
    return {
        "Neutral": {"exaggeration": 0.5, "cfg_weight": 0.5, "temperature": 0.8},
        "Calm & Controlled": {"exaggeration": 0.2, "cfg_weight": 0.7, "temperature": 0.6},
        "Expressive & Dynamic": {"exaggeration": 0.8, "cfg_weight": 0.3, "temperature": 0.9},
        "Dramatic & Intense": {"exaggeration": 1.2, "cfg_weight": 0.2, "temperature": 1.0},
        "Robotic & Stable": {"exaggeration": 0.1, "cfg_weight": 0.8, "temperature": 0.5},
        "Creative & Varied": {"exaggeration": 0.7, "cfg_weight": 0.4, "temperature": 1.2}
    }

def apply_preset(preset):
    """Apply preset configuration"""
    if preset == "Custom":
        return 0.5, 0.5, 0.8
    configs = get_preset_configs()
    if preset in configs:
        config = configs[preset]
        return config["exaggeration"], config["cfg_weight"], config["temperature"]
    return 0.5, 0.5, 0.8

# Sample texts
SAMPLE_TEXTS = [
    "Hello\! Welcome to ChatterBox TTS, running on Google Colab with GPU acceleration\!",
    "The quick brown fox jumps over the lazy dog. This pangram contains every letter.",
    "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole.",
    "To be or not to be, that is the question. Whether 'tis nobler in the mind.",
    "It was the best of times, it was the worst of times, it was the age of wisdom.",
    "Space: the final frontier. These are the voyages of the starship Enterprise."
]

def load_sample_text(sample_choice):
    """Load sample text"""
    if sample_choice == "Custom":
        return ""
    try:
        index = int(sample_choice.split(".")[0]) - 1
        if 0 <= index < len(SAMPLE_TEXTS):
            return SAMPLE_TEXTS[index]
    except:
        pass
    return ""

print("✅ TTS functions loaded\!")

In [None]:
# Create Gradio interface
def create_gradio_interface():
    """Create the Gradio interface"""
    
    css = """
    .gradio-container { max-width: 1200px \!important; }
    .header-text { text-align: center; color: #2d5aa0; margin-bottom: 20px; }
    """
    
    with gr.Blocks(css=css, title="ChatterBox TTS - Colab") as demo:
        
        gr.HTML("""
        <div class="header-text">
            <h1>🎙️ ChatterBox TTS Studio</h1>
            <p>🚀 <strong>Google Colab Edition</strong> with GPU Acceleration</p>
            <p>State-of-the-art Text-to-Speech • Voice Cloning • Public URL Sharing</p>
        </div>
        """)
        
        with gr.Row():
            with gr.Column(scale=2):
                gr.Markdown("### 📝 Text Input")
                
                sample_dropdown = gr.Dropdown(
                    choices=["Custom"] + [f"{i+1}. {text[:50]}..." for i, text in enumerate(SAMPLE_TEXTS)],
                    value="Custom",
                    label="Quick Sample Texts"
                )
                
                text_input = gr.Textbox(
                    value="Hello\! Welcome to ChatterBox TTS, running on Google Colab with GPU acceleration\!",
                    label=f"Text to Synthesize (Max {MAX_TEXT_LENGTH} chars)",
                    placeholder="Enter your text here...",
                    lines=4
                )
                
                reference_audio = gr.Audio(
                    sources=["upload", "microphone"],
                    type="filepath",
                    label="Reference Audio (Optional) - For voice cloning"
                )
                
                gr.Markdown("### 🎛️ Generation Settings")
                
                preset_dropdown = gr.Dropdown(
                    choices=["Custom"] + list(get_preset_configs().keys()),
                    value="Neutral",
                    label="Preset Configurations"
                )
                
                with gr.Row():
                    exaggeration_slider = gr.Slider(
                        0.0, 2.0, step=0.1, value=0.5,
                        label="Exaggeration (Emotion)"
                    )
                    cfg_weight_slider = gr.Slider(
                        0.0, 1.0, step=0.05, value=0.5,
                        label="CFG Weight (Control)"
                    )
                
                with gr.Row():
                    temperature_slider = gr.Slider(
                        0.1, 2.0, step=0.1, value=0.8,
                        label="Temperature (Variation)"
                    )
                    seed_input = gr.Number(
                        value=0,
                        label="Seed (0=random)"
                    )
                
                generate_btn = gr.Button(
                    "🎵 Generate Speech",
                    variant="primary",
                    size="lg"
                )
            
            with gr.Column(scale=1):
                gr.Markdown("### 🔊 Generated Audio")
                
                audio_output = gr.Audio(
                    label="Generated Speech",
                    show_download_button=True
                )
                
                status_output = gr.Textbox(
                    label="Generation Status",
                    lines=8,
                    interactive=False
                )
                
                gr.Markdown("### ⚡ Quick Actions")
                with gr.Row():
                    clear_btn = gr.Button("🗑️ Clear", size="sm")
                    info_btn = gr.Button("ℹ️ Info", size="sm")
        
        # Event handlers
        sample_dropdown.change(load_sample_text, [sample_dropdown], [text_input])
        preset_dropdown.change(apply_preset, [preset_dropdown], [exaggeration_slider, cfg_weight_slider, temperature_slider])
        generate_btn.click(generate_speech, [text_input, reference_audio, exaggeration_slider, cfg_weight_slider, temperature_slider, seed_input, preset_dropdown], [audio_output, status_output])
        clear_btn.click(lambda: (None, ""), [], [audio_output, status_output])
        
        def show_info():
            info = f"🎙️ ChatterBox TTS - Colab Edition\n\n"
            info += f"🖥️ Device: {DEVICE}\n"
            if model:
                info += f"🎵 Sample Rate: {model.sr} Hz\n"
                info += f"🎯 Model Device: {model.device}\n"
            info += "\n📊 Available Presets:\n"
            for name, config in get_preset_configs().items():
                info += f"• {name}: exag={config['exaggeration']}, cfg={config['cfg_weight']}, temp={config['temperature']}\n"
            return info
        
        info_btn.click(show_info, [], [status_output])
    
    return demo

print("✅ Gradio interface ready\!")

## 🚀 Launch Gradio Interface with Public URL

**🎯 This is the main cell\!** Run this to launch the Gradio interface with a public URL.

### 🌐 Public URL Benefits
- **Share instantly**: Send the URL to anyone
- **No installation**: Works on any device with a browser
- **Collaborative**: Multiple people can use it simultaneously
- **Secure**: HTTPS encryption and temporary URLs


In [None]:
# Launch the Gradio interface
if model is not None:
    print("🚀 Launching ChatterBox TTS Gradio Interface...")
    print("📡 Creating public URL for worldwide access...")
    print("⏳ This may take a moment...")
    print()
    
    # Create and launch the interface
    demo = create_gradio_interface()
    
    # Launch with public URL sharing
    demo.launch(
        share=True,           # 🌐 Enable public URL sharing
        debug=False,          # Disable debug mode for cleaner output
        show_error=True,      # Show detailed error messages
        server_name="0.0.0.0",   # Allow external connections
        server_port=7860,     # Default port
        enable_queue=True,    # Enable request queuing
        max_threads=4,        # Limit concurrent requests
        show_tips=True,       # Show helpful tips
        quiet=False           # Show launch information
    )
else:
    print("❌ Cannot launch interface - model not loaded\!")
    print("💡 Please run the model loading cell above first.")
    print("🔄 If the model failed to load, try restarting the runtime.")

## 🎉 Success\! Interface Launched

If everything worked correctly, you should see output above with:

### 🔗 URLs Generated
1. **Local URL**:  (Colab internal)
2. **🌐 Public URL**:  ← **Share this one\!**

### 📤 How to Share & Test

1. **📋 Copy the public URL** from the output above
2. **🔗 Share it with anyone** - they can access immediately
3. **📱 Test on different devices** - phones, tablets, computers
4. **👥 Collaborate in real-time** - multiple users can use it

### 🎯 Interface Features

- **🎤 Text-to-Speech**: Type any text and generate natural speech
- **🎭 Voice Cloning**: Upload audio samples to clone voices
- **🎛️ Advanced Controls**: Fine-tune emotion, stability, and variation
- **🎲 Presets**: Quick settings for different speech styles
- **📝 Sample Texts**: Built-in examples for immediate testing
- **🔊 Audio Playback**: Listen and download generated speech
- **🚀 GPU Acceleration**: Fast generation with Colab's GPU

### 💡 Pro Tips

- **🎯 Best Results**: Use clear, punctuated text (avoid very long sentences)
- **🎤 Voice Cloning**: Upload 3-10 seconds of clear, single-speaker audio
- **🎛️ Parameters**: Start with presets, then fine-tune manually
- **⏰ URL Expiry**: Public URLs expire after 72 hours for security
- **🔄 Keep Running**: Keep this Colab tab open while others use the URL

### 🐛 Troubleshooting

- **No URL generated**: Check if the model loaded successfully above
- **Interface not responding**: Restart runtime and run all cells again
- **Slow generation**: Normal on CPU, much faster with GPU enabled
- **Audio issues**: Check browser permissions and audio settings

---

**🎙️ Enjoy creating amazing voices with ChatterBox TTS\!**

**🔗 Share the public URL** and let others experience the power of AI voice synthesis\!