# 🎤 ChatterBox TTS - Unlimited Edition

**Unlimited version with NO LENGTH LIMITS for text or audio!**

## 🚀 Unlimited Features
- 🎯 **Zero-shot TTS**: Generate speech from ANY amount of text
- 🎭 **Voice Cloning**: Clone voices from audio samples of ANY duration
- 🎨 **Emotion Control**: Adjust expressiveness and intensity
- 🚀 **GPU Acceleration**: Fast generation with proper error handling
- 🌐 **Web Interface**: Beautiful Gradio UI
- 🔧 **Audio Preprocessing**: Automatic audio format conversion and validation
- 🚫 **NO LIMITS**: Process unlimited text length and audio duration

## 🚀 Quick Start
1. **Enable GPU**: Runtime → Change runtime type → GPU
2. **Run all cells** below in order
3. **Access the interface** through the Gradio link
4. **Upload audio and generate speech** without CUDA errors!

---

## 📦 Step 1: Environment Setup & Dependencies

This approach works with Colab's existing PyTorch installation and includes audio preprocessing libraries.

In [None]:
import os
import sys
import subprocess
import importlib

print("🔍 Checking Colab environment...")
print(f"Python version: {sys.version}")

# Check if we're in Colab
try:
    import google.colab
    IN_COLAB = True
    print("✅ Running in Google Colab")
except ImportError:
    IN_COLAB = False
    print("⚠️ Not running in Colab - some features may not work")

# Check existing PyTorch installation
try:
    import torch
    print(f"✅ PyTorch {torch.__version__} already installed")
    print(f"🎮 CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"📱 GPU: {torch.cuda.get_device_name(0)}")
        print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
except ImportError:
    print("❌ PyTorch not found - installing...")
    !pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

print("\n🔧 Installing core dependencies...")

In [None]:
# Install essential packages that work well with Colab
packages_to_install = [
    "gradio>=4.0.0",
    "soundfile",
    "librosa",
    "numpy>=1.24.0",
    "transformers>=4.40.0",
    "accelerate",
    "safetensors",
    "omegaconf",
    "einops"
]

print("📦 Installing compatible packages...")
for package in packages_to_install:
    try:
        subprocess.run([sys.executable, "-m", "pip", "install", package], 
                      check=True, capture_output=True, text=True)
        print(f"✅ {package}")
    except subprocess.CalledProcessError as e:
        print(f"⚠️ {package} - {e.stderr.strip()[:100]}...")

print("\n🎤 Installing ChatterBox TTS...")
# Try multiple installation methods for ChatterBox TTS
chatterbox_installed = False

# Method 1: Try PyPI first
try:
    subprocess.run([sys.executable, "-m", "pip", "install", "chatterbox-tts"], 
                  check=True, capture_output=True, text=True)
    print("✅ ChatterBox TTS installed from PyPI")
    chatterbox_installed = True
except subprocess.CalledProcessError:
    print("⚠️ PyPI installation failed, trying GitHub...")

# Method 2: Try GitHub if PyPI fails
if not chatterbox_installed:
    try:
        subprocess.run([sys.executable, "-m", "pip", "install", 
                       "git+https://github.com/resemble-ai/chatterbox.git"], 
                      check=True, capture_output=True, text=True)
        print("✅ ChatterBox TTS installed from GitHub")
        chatterbox_installed = True
    except subprocess.CalledProcessError:
        print("❌ GitHub installation also failed")

# Method 3: Manual installation if both fail
if not chatterbox_installed:
    print("🔧 Attempting manual installation...")
    !git clone https://github.com/resemble-ai/chatterbox.git /tmp/chatterbox
    !cd /tmp/chatterbox && pip install -e .
    chatterbox_installed = True

print("\n🎉 Installation complete!")

## 🧪 Step 2: Test Installation & Compatibility

Let's verify everything is working correctly.

In [None]:
def test_import(module_name, friendly_name=None):
    """Test if a module can be imported successfully"""
    if friendly_name is None:
        friendly_name = module_name
    
    try:
        module = importlib.import_module(module_name)
        version = getattr(module, '__version__', 'unknown')
        print(f"✅ {friendly_name}: {version}")
        return True, module
    except Exception as e:
        print(f"❌ {friendly_name}: {str(e)[:100]}...")
        return False, None

print("🔍 Testing all imports...")
print("=" * 50)

# Test core dependencies
success_count = 0
total_tests = 0

modules_to_test = [
    ("torch", "PyTorch"),
    ("torchaudio", "TorchAudio"),
    ("transformers", "Transformers"),
    ("gradio", "Gradio"),
    ("soundfile", "SoundFile"),
    ("librosa", "Librosa"),
    ("numpy", "NumPy")
]

for module_name, friendly_name in modules_to_test:
    success, _ = test_import(module_name, friendly_name)
    if success:
        success_count += 1
    total_tests += 1

# Test ChatterBox TTS specifically
print("\n🎤 Testing ChatterBox TTS...")
try:
    from chatterbox.tts import ChatterboxTTS
    print("✅ ChatterBox TTS: Import successful")
    success_count += 1
except Exception as e:
    print(f"❌ ChatterBox TTS: {str(e)[:100]}...")
total_tests += 1

# Test GPU availability
print("\n🎮 GPU Status:")
try:
    import torch
    if torch.cuda.is_available():
        print(f"✅ CUDA available: {torch.cuda.get_device_name(0)}")
        print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
        print(f"🔧 CUDA Version: {torch.version.cuda}")
    else:
        print("⚠️ CUDA not available - will use CPU (slower)")
except:
    print("❌ Could not check GPU status")

print("\n" + "=" * 50)
print(f"📊 Test Results: {success_count}/{total_tests} passed")

if success_count == total_tests:
    print("🎉 All tests passed! Ready to proceed.")
elif success_count >= total_tests - 1:
    print("✅ Most tests passed. Should work with minor issues.")
else:
    print("⚠️ Some tests failed. May encounter issues.")
    print("💡 Try restarting runtime and running again.")

## 🚀 Step 3: Launch ChatterBox TTS Interface with CUDA Error Fixes

Create and launch the Gradio web interface with proper audio preprocessing and CUDA error handling.

In [None]:
import gradio as gr
import torch
import torchaudio
import numpy as np
import tempfile
import os
import librosa
import soundfile as sf
from pathlib import Path

# Global variables for model
model = None
model_loaded = False

def load_model():
    """Load the ChatterBox TTS model"""
    global model, model_loaded
    
    if model_loaded:
        return "✅ Model already loaded!"
    
    try:
        print("🔄 Loading ChatterBox TTS model...")
        from chatterbox.tts import ChatterboxTTS
        
        # Determine device
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"🎮 Using device: {device}")
        
        # Clear CUDA cache before loading
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
        
        # Load model
        model = ChatterboxTTS.from_pretrained(device=device)
        model_loaded = True
        
        return f"✅ Model loaded successfully on {device}!"
        
    except Exception as e:
        error_msg = f"❌ Failed to load model: {str(e)}"
        print(error_msg)
        return error_msg

def preprocess_audio(audio_file):
    """Preprocess audio file for voice cloning with CUDA error prevention"""
    if audio_file is None:
        return None, "No audio file provided"
    
    try:
        print(f"🔍 Preprocessing audio: {audio_file}")
        
        # Load audio with librosa for better compatibility
        audio, sr = librosa.load(audio_file, sr=None)
        
        # Check audio duration
        duration = len(audio) / sr
        print(f"📊 Audio info: {duration:.1f}s, {sr}Hz, {audio.shape}")
        
        if duration < 1.0:
            return None, "❌ Audio too short (minimum 1 second required)"
        # No maximum duration limit - process any length audio!
        print(f"✅ Audio duration: {duration:.1f}s - processing without limits")
        
        # Normalize audio to prevent CUDA assertion errors
        audio = librosa.util.normalize(audio)
        
        # Ensure audio is mono
        if audio.ndim > 1:
            audio = librosa.to_mono(audio)
        
        # Resample to model's expected sample rate
        target_sr = 22050  # ChatterBox TTS expected sample rate
        if sr != target_sr:
            print(f"🔄 Resampling from {sr}Hz to {target_sr}Hz")
            audio = librosa.resample(audio, orig_sr=sr, target_sr=target_sr)
            sr = target_sr
        
        # Remove silence and trim
        audio, _ = librosa.effects.trim(audio, top_db=20)
        
        # Ensure audio is not too quiet or too loud
        max_val = np.max(np.abs(audio))
        if max_val > 0:
            audio = audio / max_val * 0.8  # Normalize to 80% of max
        
        # Save preprocessed audio to temporary file
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
            sf.write(tmp_file.name, audio, sr)
            preprocessed_path = tmp_file.name
        
        final_duration = len(audio) / sr
        print(f"✅ Audio preprocessed: {final_duration:.1f}s, {sr}Hz")
        return preprocessed_path, f"✅ Audio ready ({final_duration:.1f}s, {sr}Hz)"
        
    except Exception as e:
        error_msg = f"❌ Audio preprocessing failed: {str(e)}"
        print(error_msg)
        return None, error_msg

print("🎨 Creating Gradio interface with CUDA error fixes...")

In [None]:
def generate_speech(text, audio_file=None, exaggeration=0.5, cfg_weight=0.5):
    """Generate speech from text with optional voice cloning and CUDA error handling"""
    global model, model_loaded
    
    if not model_loaded or model is None:
        return None, "❌ Please load the model first!"
    
    if not text.strip():
        return None, "❌ Please enter some text to synthesize!"
    
    # Store original text for reference (no length limits!)
    original_text = text
    print(f"📝 Processing text: {len(text)} characters")
    
    try:
        print(f"🎤 Generating speech for: '{text[:50]}...'")
        
        # Preprocess audio if provided
        processed_audio_path = None
        if audio_file is not None:
            processed_audio_path, preprocess_msg = preprocess_audio(audio_file)
            if processed_audio_path is None:
                return None, preprocess_msg
            print(preprocess_msg)
        
        # Clear CUDA cache to prevent memory issues
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            print("🧹 Cleared CUDA cache")
        
        # Set environment variable to help with CUDA debugging
        os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
        
        # Generate speech with comprehensive error handling
        max_retries = 3
        for attempt in range(max_retries):
            try:
                if processed_audio_path is not None:
                    # Voice cloning mode
                    print(f"🎭 Using voice cloning mode (attempt {attempt + 1})")
                    wav = model.generate(
                        text, 
                        audio_prompt_path=processed_audio_path,
                        exaggeration=exaggeration,
                        cfg_weight=cfg_weight
                    )
                else:
                    # Standard TTS mode
                    print(f"🎯 Using standard TTS mode (attempt {attempt + 1})")
                    wav = model.generate(
                        text,
                        exaggeration=exaggeration,
                        cfg_weight=cfg_weight
                    )
                
                # If we get here, generation was successful
                break
                
            except RuntimeError as e:
                error_str = str(e)
                print(f"⚠️ Attempt {attempt + 1} failed: {error_str[:100]}...")
                
                if "CUDA" in error_str or "device-side assert" in error_str:
                    print("🔧 CUDA error detected, applying fixes...")
                    
                    # Clear CUDA cache
                    if torch.cuda.is_available():
                        torch.cuda.empty_cache()
                        torch.cuda.synchronize()
                    
                    # Keep original text length - no reduction!
                    print(f"🔄 Retrying with full text ({len(text)} characters)")
                    
                    # Adjust parameters for stability
                    if attempt > 0:
                        exaggeration = min(exaggeration, 0.3)
                        cfg_weight = min(cfg_weight, 0.5)
                        print(f"🔧 Adjusted parameters: exaggeration={exaggeration}, cfg_weight={cfg_weight}")
                    
                    if attempt < max_retries - 1:
                        print("🔄 Retrying with fixes...")
                        continue
                
                # If it's the last attempt or not a CUDA error, re-raise
                if attempt == max_retries - 1:
                    raise e
        
        # Save to temporary file
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
            torchaudio.save(tmp_file.name, wav, model.sr)
            output_path = tmp_file.name
        
        # Clean up preprocessed audio file
        if processed_audio_path and os.path.exists(processed_audio_path):
            try:
                os.unlink(processed_audio_path)
            except:
                pass
        
        # Reset environment variable
        if 'CUDA_LAUNCH_BLOCKING' in os.environ:
            del os.environ['CUDA_LAUNCH_BLOCKING']
            
        duration = wav.shape[1] / model.sr
        success_msg = f"✅ Generated {duration:.1f}s of audio from {len(text)} characters"
        if audio_file is not None:
            success_msg += " (voice cloned)"
        
        print(success_msg)
        return output_path, success_msg
        
    except Exception as e:
        error_msg = f"❌ Generation failed: {str(e)}"
        print(error_msg)
        
        # Clean up on error
        if 'processed_audio_path' in locals() and processed_audio_path and os.path.exists(processed_audio_path):
            try:
                os.unlink(processed_audio_path)
            except:
                pass
        
        # Reset environment variable
        if 'CUDA_LAUNCH_BLOCKING' in os.environ:
            del os.environ['CUDA_LAUNCH_BLOCKING']
        
        # Provide specific guidance for common errors
        if "CUDA" in str(e) or "device-side assert" in str(e):
            error_msg += "\n\n🔧 CUDA Error Solutions:"
            error_msg += "\n• Restart runtime: Runtime → Restart Runtime"
            error_msg += "\n• Try different audio file (WAV format, any duration)"
            error_msg += "\n• Lower exaggeration and cfg_weight values"
            error_msg += "\n• Clear CUDA cache manually if needed"
        elif "memory" in str(e).lower():
            error_msg += "\n\n💾 Memory Error Solutions:"
            error_msg += "\n• Restart runtime to free memory"
            error_msg += "\n• Close other browser tabs"
            error_msg += "\n• Try processing in smaller chunks"
        elif "audio" in str(e).lower():
            error_msg += "\n\n🎵 Audio Error Solutions:"
            error_msg += "\n• Use WAV format audio files"
            error_msg += "\n• Use clear speech, single speaker"
            error_msg += "\n• Check audio file is not corrupted"
            error_msg += "\n• Any duration supported (minimum 1 second)"
        
        return None, error_msg

print("✅ Speech generation function with CUDA fixes ready!")

In [None]:
# Create the Gradio interface with unlimited generation
with gr.Blocks(title="ChatterBox TTS - Unlimited Edition", theme=gr.themes.Soft()) as demo:
    gr.Markdown("""
    # 🎤 ChatterBox TTS - Unlimited Edition
    
    **State-of-the-art Text-to-Speech and Voice Cloning with NO LENGTH LIMITS!**
    
    Generate natural-sounding speech from text of ANY length, or clone voices from audio samples of ANY duration!
    
    🚀 **Unlimited Features:**
    - ✅ NO text length limits - process any amount of text
    - ✅ NO audio duration limits - use any length reference audio
    - ✅ CUDA device-side assert errors fixed
    - ✅ Audio preprocessing and format issues resolved
    - ✅ Memory management and error recovery
    - ✅ Automatic retry with parameter adjustment
    """)
    
    with gr.Row():
        with gr.Column():
            # Model loading section
            gr.Markdown("### 🔧 Model Setup")
            load_btn = gr.Button("🚀 Load ChatterBox TTS Model", variant="primary", size="lg")
            load_status = gr.Textbox(label="Status", interactive=False, value="Click to load model...")
            
            # Text input
            gr.Markdown("### 📝 Text Input")
            text_input = gr.Textbox(
                label="Text to synthesize (NO LENGTH LIMITS!)",
                placeholder="Enter ANY amount of text you want to convert to speech - no limits!",
                lines=5,
                value="Hello! This is ChatterBox TTS Unlimited Edition. I can generate natural-sounding speech from ANY amount of text you provide - no character limits, no duration limits, completely unlimited generation!"
            )
            
            # Voice cloning section
            gr.Markdown("### 🎭 Voice Cloning (Optional)")
            audio_input = gr.Audio(
                label="Reference audio for voice cloning",
                type="filepath",
                sources=["upload", "microphone"]
            )
            gr.Markdown("""
            **📋 Audio Requirements:**
            - 🎵 **Format**: WAV preferred (MP3 also works)
            - ⏱️ **Duration**: ANY length supported (minimum 1 second)
            - 🎤 **Quality**: Clear speech, single speaker
            - 🔇 **Background**: Minimal noise
            - 🚀 **No limits**: Use audio of any duration for voice cloning!
            """)
            
            # Advanced settings
            with gr.Accordion("⚙️ Advanced Settings", open=False):
                exaggeration = gr.Slider(
                    minimum=0.0,
                    maximum=1.0,
                    value=0.5,
                    step=0.1,
                    label="Exaggeration (emotion intensity)",
                    info="Higher values = more expressive speech"
                )
                cfg_weight = gr.Slider(
                    minimum=0.0,
                    maximum=1.0,
                    value=0.5,
                    step=0.1,
                    label="CFG Weight (speech pacing)",
                    info="Lower values = slower, more deliberate speech"
                )
        
        with gr.Column():
            # Generation section
            gr.Markdown("### 🎵 Generated Audio")
            generate_btn = gr.Button("🎤 Generate Speech", variant="primary", size="lg")
            generation_status = gr.Textbox(label="Generation Status", interactive=False)
            
            audio_output = gr.Audio(
                label="Generated Speech",
                type="filepath",
                interactive=False
            )
            
            # CUDA Error Help section
            gr.Markdown("""
            ### 🚀 Unlimited Generation Features
            
            **This unlimited version includes:**
            - 🚀 **NO text length limits**: Process any amount of text
            - 🎵 **NO audio duration limits**: Use any length reference audio
            - 🛡️ **Device-side assert errors**: Automatic retry with parameter adjustment
            - 🔄 **Memory management**: CUDA cache clearing and optimization
            - 📊 **Audio preprocessing**: Format conversion and validation
            - ⚡ **Error recovery**: Multiple retry attempts with different settings
            
            **If you still encounter issues:**
            1. 🔄 **Restart Runtime**: Runtime → Restart Runtime
            2. 🎵 **Try different audio** (WAV format, any duration)
            3. ⚙️ **Lower parameter values** (exaggeration < 0.5, cfg_weight < 0.5)
            4. 💾 **Clear CUDA cache** manually if needed
            """)
    
    # Event handlers
    load_btn.click(
        fn=load_model,
        outputs=load_status
    )
    
    generate_btn.click(
        fn=generate_speech,
        inputs=[text_input, audio_input, exaggeration, cfg_weight],
        outputs=[audio_output, generation_status]
    )

# Launch the interface
print("🚀 Launching Gradio interface with unlimited generation capabilities...")
demo.launch(
    share=True,
    debug=True,
    show_error=True,
    server_port=7860
)

## 🔧 CUDA Error Solutions & Troubleshooting

### ✅ **Fixed CUDA Issues**

This notebook specifically addresses the common CUDA error:
```
❌ CUDA error: device-side assert triggered
```

**Root Causes & Solutions:**

1. **Audio Format Issues** ✅ Fixed
   - **Problem**: Incompatible audio formats causing tensor assertion errors
   - **Solution**: Automatic audio preprocessing with librosa
   - **Features**: Format conversion, resampling, normalization, trimming

2. **Memory Management** ✅ Fixed
   - **Problem**: CUDA memory fragmentation
   - **Solution**: Automatic cache clearing and memory optimization
   - **Features**: `torch.cuda.empty_cache()` before generation

3. **Parameter Validation** ✅ Fixed
   - **Problem**: Invalid parameter ranges causing assertions
   - **Solution**: Automatic parameter adjustment on retry
   - **Features**: Progressive parameter reduction for stability

4. **Text Length Issues** ✅ UNLIMITED
   - **Problem**: Long text causing memory overflow
   - **Solution**: NO LIMITS - process any amount of text
   - **Features**: Unlimited text processing with smart memory management

### 🛠️ **Advanced Troubleshooting**

**If CUDA errors persist:**

1. **Environment Reset**
   ```python
   # Run this in a new cell if needed
   import torch
   torch.cuda.empty_cache()
   torch.cuda.synchronize()
   ```

2. **Debug Mode**
   ```python
   # Enable CUDA debugging
   import os
   os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
   ```

3. **Audio Validation**
   - Use WAV files only
   - 22050 Hz sample rate preferred
   - Mono audio (single channel)
   - ANY duration supported (minimum 1 second)
   - Clear speech, minimal background noise

4. **Parameter Guidelines**
   - **Safe values**: `exaggeration=0.3, cfg_weight=0.5`
   - **Expressive**: `exaggeration=0.7, cfg_weight=0.3`
   - **Stable**: `exaggeration=0.2, cfg_weight=0.6`

### 📊 **Performance Tips**

- **First generation**: Takes 10-20 seconds (model loading)
- **Subsequent generations**: 5-10 seconds
- **Voice cloning**: +2-5 seconds for audio preprocessing
- **Memory usage**: ~2-3GB GPU memory

### 🆘 **Emergency Recovery**

If nothing works:
1. **Runtime → Restart Runtime**
2. **Edit → Clear all outputs**
3. **Run all cells from Step 1**
4. **Try with minimal settings**: Short text, no audio, low parameters

---

## 🔗 Resources

- **Original Model**: [ResembleAI ChatterBox](https://github.com/resemble-ai/chatterbox)
- **Demo Samples**: [Official Demo Page](https://resemble-ai.github.io/chatterbox_demopage/)
- **Model Card**: [Hugging Face](https://huggingface.co/ResembleAI/chatterbox)
- **Community**: [Discord](https://discord.gg/rJq9cRJBJ6)

**🎉 This UNLIMITED version has NO LIMITS on text length or audio duration!**

*Built with ❤️ and unlimited generation capabilities for the community*