# üéôÔ∏è ChatterBox TTS - Gradio UI Demo (Kaggle)

Welcome to the **ChatterBox TTS Gradio Interface** on Kaggle\! This notebook demonstrates how to run the enhanced Gradio UI with public URL sharing.

## üåü What You'll Get
- **üåê Live Gradio Interface**: Professional web UI with public URL
- **üé§ Text-to-Speech**: Generate speech from any text
- **üé≠ Voice Cloning**: Clone voices using audio samples
- **üîó Public URL**: Share the interface with anyone
- **üéõÔ∏è Advanced Controls**: Emotion, stability, and variation settings

## üöÄ Quick Start
1. Run all cells in order
2. Wait for the Gradio interface to launch
3. Use the public URL to access from anywhere
4. Share the URL for collaborative testing

---

## üì¶ Install Dependencies

First, let's install ChatterBox TTS and Gradio with all required dependencies.

In [None]:
# Install ChatterBox TTS and Gradio
import subprocess
import sys

def install_package(package):
    """Install package with progress tracking"""
    print(f"üì• Installing {package}...")
    try:
        result = subprocess.run([
            sys.executable, "-m", "pip", "install", package, "--quiet"
        ], capture_output=True, text=True, check=True)
        print(f"‚úÖ {package} installed successfully")
        return True
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Failed to install {package}: {e}")
        if e.stderr:
            print(f"Error details: {e.stderr}")
        return False

# Core packages for the Gradio interface
packages = [
    "gradio>=4.0.0",
    "torch>=2.0.0",
    "torchaudio>=2.0.0",
    "librosa>=0.9.0",
    "numpy>=1.21.0",
    "chatterbox-tts"
]

print("üöÄ Installing dependencies for ChatterBox TTS Gradio UI...")
print("‚è≥ This may take a few minutes...")
print()

success_count = 0
for package in packages:
    if install_package(package):
        success_count += 1

print()
print(f"üìä Installation Summary: {success_count}/{len(packages)} packages installed")

if success_count == len(packages):
    print("üéâ All dependencies installed successfully\!")
else:
    print("‚ö†Ô∏è Some packages failed to install. The interface may not work properly.")

print("
‚ú® Ready to launch Gradio interface\!")

## üéõÔ∏è Enhanced Gradio Interface

Now let's create and launch the enhanced Gradio interface with public URL sharing.

In [None]:
# Enhanced Gradio Interface for ChatterBox TTS
import gradio as gr
import torch
import torchaudio
import numpy as np
import time
import random
import warnings
from pathlib import Path
from datetime import datetime

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Global configuration
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
MAX_TEXT_LENGTH = 500
model = None

print(f"üñ•Ô∏è Device: {DEVICE}")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"üöÄ GPU: {gpu_name} ({gpu_memory:.1f}GB)")
else:
    print("üíª Using CPU (no GPU available)")

In [None]:
# Load ChatterBox TTS model
def load_chatterbox_model():
    """Load ChatterBox TTS model with error handling"""
    global model
    
    if model is not None:
        return model, "‚úÖ Model already loaded"
    
    try:
        from chatterbox.tts import ChatterboxTTS
        
        print("üì• Loading ChatterBox TTS model...")
        print("‚è≥ This may take a few minutes on first run (downloading ~2GB)")
        
        start_time = time.time()
        model = ChatterboxTTS.from_pretrained(device=DEVICE)
        load_time = time.time() - start_time
        
        status = f"‚úÖ Model loaded successfully in {load_time:.1f}s\n"
        status += f"üéµ Sample rate: {model.sr} Hz\n"
        status += f"üéØ Device: {model.device}"
        
        print(status)
        return model, status
        
    except Exception as e:
        error_msg = f"‚ùå Failed to load model: {str(e)}\n\n"
        error_msg += "üîß Troubleshooting:\n"
        error_msg += "1. Ensure internet connection for model download\n"
        error_msg += "2. Check available disk space (~2GB needed)\n"
        error_msg += "3. Try restarting if memory issues occur"
        print(error_msg)
        return None, error_msg

# Load the model immediately
model, status = load_chatterbox_model()
print("
" + "="*50)
print(status)
print("="*50)

In [None]:
# Core TTS generation function
def generate_speech(text, reference_audio, exaggeration, cfg_weight, temperature, seed, preset):
    """Generate TTS audio with comprehensive error handling"""
    
    if not text or not text.strip():
        return None, "‚ùå Please enter some text to synthesize"
    
    if len(text) > MAX_TEXT_LENGTH:
        return None, f"‚ùå Text too long. Maximum {MAX_TEXT_LENGTH} characters allowed."
    
    if model is None:
        return None, "‚ùå Model not loaded. Please check the model loading cell above."
    
    try:
        # Apply preset if selected
        if preset \!= "Custom":
            preset_configs = get_preset_configs()
            if preset in preset_configs:
                config = preset_configs[preset]
                exaggeration = config["exaggeration"]
                cfg_weight = config["cfg_weight"]
                temperature = config["temperature"]
        
        # Set seed for reproducibility
        if seed == 0:
            seed = random.randint(1, 1000000)
        torch.manual_seed(seed)
        
        # Generate audio
        start_time = time.time()
        
        generation_params = {
            "exaggeration": exaggeration,
            "cfg_weight": cfg_weight,
            "temperature": temperature
        }
        
        if reference_audio is not None:
            generation_params["audio_prompt_path"] = reference_audio
        
        wav = model.generate(text, **generation_params)
        generation_time = time.time() - start_time
        
        # Convert to numpy for Gradio
        audio_np = wav.squeeze(0).numpy()
        
        # Calculate stats
        duration = len(audio_np) / model.sr
        rtf = generation_time / duration
        
        status_msg = f"‚úÖ Generated {duration:.1f}s audio in {generation_time:.1f}s\n"
        status_msg += f"üìä Real-time factor: {rtf:.2f}x\n"
        status_msg += f"üé≤ Seed used: {seed}\n"
        status_msg += f"üéõÔ∏è Settings: exag={exaggeration:.1f}, cfg={cfg_weight:.1f}, temp={temperature:.1f}"
        
        return (model.sr, audio_np), status_msg
        
    except Exception as e:
        error_msg = f"‚ùå Generation failed: {str(e)}"
        return None, error_msg

# Preset configurations
def get_preset_configs():
    """Get preset configurations for different use cases"""
    return {
        "Neutral": {"exaggeration": 0.5, "cfg_weight": 0.5, "temperature": 0.8},
        "Calm & Controlled": {"exaggeration": 0.2, "cfg_weight": 0.7, "temperature": 0.6},
        "Expressive & Dynamic": {"exaggeration": 0.8, "cfg_weight": 0.3, "temperature": 0.9},
        "Dramatic & Intense": {"exaggeration": 1.2, "cfg_weight": 0.2, "temperature": 1.0},
        "Robotic & Stable": {"exaggeration": 0.1, "cfg_weight": 0.8, "temperature": 0.5},
        "Creative & Varied": {"exaggeration": 0.7, "cfg_weight": 0.4, "temperature": 1.2}
    }

# Apply preset function
def apply_preset(preset):
    """Apply preset configuration"""
    if preset == "Custom":
        return 0.5, 0.5, 0.8
    
    configs = get_preset_configs()
    if preset in configs:
        config = configs[preset]
        return config["exaggeration"], config["cfg_weight"], config["temperature"]
    
    return 0.5, 0.5, 0.8

print("‚úÖ TTS functions loaded successfully\!")

## üåê Launch Gradio Interface

Now let's create and launch the Gradio interface with public URL sharing\!

In [None]:
# Sample texts for quick testing
SAMPLE_TEXTS = [
    "Hello\! Welcome to ChatterBox TTS, the state-of-the-art open source text-to-speech system.",
    "The quick brown fox jumps over the lazy dog. This pangram contains every letter of the alphabet.",
    "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole filled with worms.",
    "To be or not to be, that is the question. Whether 'tis nobler in the mind to suffer.",
    "It was the best of times, it was the worst of times, it was the age of wisdom.",
    "Space: the final frontier. These are the voyages of the starship Enterprise."
]

def load_sample_text(sample_choice):
    """Load a sample text for quick testing"""
    if sample_choice == "Custom":
        return ""
    
    try:
        index = int(sample_choice.split(".")[0]) - 1
        if 0 <= index < len(SAMPLE_TEXTS):
            return SAMPLE_TEXTS[index]
    except:
        pass
    
    return ""

print("‚úÖ Sample texts loaded\!")

In [None]:
# Create the Gradio interface
def create_gradio_interface():
    """Create the main Gradio interface"""
    
    # Custom CSS for better styling
    css = """
    .gradio-container {
        max-width: 1200px \!important;
    }
    .header-text {
        text-align: center;
        color: #2d5aa0;
        margin-bottom: 20px;
    }
    """
    
    with gr.Blocks(css=css, title="ChatterBox TTS Studio") as demo:
        
        # Header
        gr.HTML("""
        <div class="header-text">
            <h1>üéôÔ∏è ChatterBox TTS Studio</h1>
            <p>State-of-the-art Text-to-Speech powered by Resemble AI</p>
            <p><strong>Features:</strong> Zero-shot TTS ‚Ä¢ Voice Cloning ‚Ä¢ Emotion Control ‚Ä¢ Public URL Sharing</p>
        </div>
        """)
        
        with gr.Row():
            with gr.Column(scale=2):
                gr.Markdown("### üìù Text Input")
                
                # Sample text selector
                sample_dropdown = gr.Dropdown(
                    choices=["Custom"] + [f"{i+1}. {text[:50]}..." for i, text in enumerate(SAMPLE_TEXTS)],
                    value="Custom",
                    label="Quick Sample Texts - Select a sample or choose 'Custom' to write your own"
                )
                
                # Text input
                text_input = gr.Textbox(
                    value="Hello\! Welcome to ChatterBox TTS, the state-of-the-art open source text-to-speech system.",
                    label=f"Text to Synthesize (Maximum {MAX_TEXT_LENGTH} characters)",
                    placeholder="Enter your text here...",
                    lines=4,
                    max_lines=8
                )
                
                # Reference audio for voice cloning
                reference_audio = gr.Audio(
                    sources=["upload", "microphone"],
                    type="filepath",
                    label="Reference Audio (Optional) - Upload or record audio to clone the voice"
                )
                
                gr.Markdown("### üéõÔ∏è Generation Settings")
                
                # Preset selector
                preset_dropdown = gr.Dropdown(
                    choices=["Custom"] + list(get_preset_configs().keys()),
                    value="Neutral",
                    label="Preset Configurations - Choose a preset or select 'Custom' for manual control"
                )
                
                with gr.Row():
                    exaggeration_slider = gr.Slider(
                        0.0, 2.0, step=0.1, value=0.5,
                        label="Exaggeration - Emotion intensity (0.5 = neutral, higher = more expressive)"
                    )
                    cfg_weight_slider = gr.Slider(
                        0.0, 1.0, step=0.05, value=0.5,
                        label="CFG Weight - Generation control (higher = more stable)"
                    )
                
                with gr.Row():
                    temperature_slider = gr.Slider(
                        0.1, 2.0, step=0.1, value=0.8,
                        label="Temperature - Variation control (higher = more diverse)"
                    )
                    seed_input = gr.Number(
                        value=0,
                        label="Seed - Random seed (0 = random)"
                    )
                
                # Generate button
                generate_btn = gr.Button(
                    "üéµ Generate Speech",
                    variant="primary",
                    size="lg"
                )
            
            with gr.Column(scale=1):
                gr.Markdown("### üîä Generated Audio")
                
                # Audio output
                audio_output = gr.Audio(
                    label="Generated Speech",
                    show_download_button=True
                )
                
                # Status display
                status_output = gr.Textbox(
                    label="Generation Status",
                    lines=8,
                    max_lines=12,
                    interactive=False
                )
                
                # Quick actions
                gr.Markdown("### ‚ö° Quick Actions")
                with gr.Row():
                    clear_btn = gr.Button("üóëÔ∏è Clear", size="sm")
                    info_btn = gr.Button("‚ÑπÔ∏è Info", size="sm")
        
        # Event handlers
        sample_dropdown.change(
            fn=load_sample_text,
            inputs=[sample_dropdown],
            outputs=[text_input]
        )
        
        preset_dropdown.change(
            fn=apply_preset,
            inputs=[preset_dropdown],
            outputs=[exaggeration_slider, cfg_weight_slider, temperature_slider]
        )
        
        generate_btn.click(
            fn=generate_speech,
            inputs=[
                text_input,
                reference_audio,
                exaggeration_slider,
                cfg_weight_slider,
                temperature_slider,
                seed_input,
                preset_dropdown
            ],
            outputs=[audio_output, status_output]
        )
        
        clear_btn.click(
            fn=lambda: (None, ""),
            inputs=[],
            outputs=[audio_output, status_output]
        )
        
        def show_info():
            info = "üéôÔ∏è ChatterBox TTS Studio\n\n"
            info += f"üñ•Ô∏è Device: {DEVICE}\n"
            if model:
                info += f"üéµ Sample Rate: {model.sr} Hz\n"
                info += f"üéØ Model Device: {model.device}\n"
            info += "\nüìä Preset Guide:\n"
            for name, config in get_preset_configs().items():
                info += f"‚Ä¢ {name}: exag={config['exaggeration']}, cfg={config['cfg_weight']}, temp={config['temperature']}\n"
            return info
        
        info_btn.click(
            fn=show_info,
            inputs=[],
            outputs=[status_output]
        )
    
    return demo

print("‚úÖ Gradio interface created\!")

## üöÄ Launch with Public URL

**This is the main cell\!** Run this to launch the Gradio interface with a public URL that you can share.

In [None]:
# Launch the Gradio interface with public URL
if model is not None:
    print("üöÄ Launching ChatterBox TTS Gradio Interface...")
    print("üì° Creating public URL for easy sharing...")
    print()
    
    # Create the interface
    demo = create_gradio_interface()
    
    # Launch with public URL sharing
    demo.launch(
        share=True,          # üåê Create public URL
        server_name="0.0.0.0",  # Allow external access
        server_port=7860,    # Default port
        show_error=True,     # Show detailed errors
        quiet=False,         # Show launch info
        debug=False,         # Disable debug mode
        enable_queue=True,   # Enable request queue
        max_threads=4        # Limit concurrent requests
    )
else:
    print("‚ùå Cannot launch interface - model not loaded")
    print("üí° Please run the model loading cell above first")

## üéâ Success\!

If everything worked correctly, you should see:

1. **üîó Local URL**:  (for Kaggle internal use)
2. **üåê Public URL**:  (shareable link)

### üì§ How to Share

1. **Copy the public URL** from the output above
2. **Share it with anyone** - they can access it immediately
3. **No installation required** for users
4. **Works on any device** with a web browser

### üéØ Features Available

- **üé§ Text-to-Speech**: Enter any text and generate speech
- **üé≠ Voice Cloning**: Upload audio to clone voices
- **üéõÔ∏è Advanced Controls**: Adjust emotion, stability, variation
- **üé≤ Presets**: Quick settings for different styles
- **üìù Sample Texts**: Built-in examples for testing
- **üîä Audio Playback**: Listen and download generated audio

### üí° Tips for Best Results

- **Text**: Use clear, well-punctuated sentences
- **Voice Cloning**: Upload 3-10 seconds of clear speech
- **Parameters**: Start with presets, then fine-tune manually
- **Sharing**: The public URL expires after 72 hours

---

**üéôÔ∏è Enjoy creating amazing voices with ChatterBox TTS\!**