# 🎤 ChatterBox Unlimited - Colab Edition

Welcome to **ChatterBox Unlimited**! This notebook provides a powerful Gradio interface for ResembleAI's state-of-the-art ChatterBox TTS model.

## ✨ Features
- 🎯 **Zero-shot TTS**: Generate speech from any text
- 🎭 **Voice Cloning**: Clone voices from audio samples
- 🎨 **Emotion Control**: Adjust expressiveness
- 🚀 **GPU Acceleration**: Fast generation with Colab's GPUs
- 🌐 **Web Interface**: Beautiful Gradio UI

## 🚀 Instructions
1. **Enable GPU**: Go to Runtime → Change runtime type → GPU
2. **Run all cells** below in order
3. **Access the interface** through the Gradio link
4. **Start generating speech**!

---

## 📦 Step 1: Install Dependencies

This will install all required packages including PyTorch with CUDA support and ChatterBox TTS.

In [None]:
# Install UV package manager for faster installations
!pip install uv

# CRITICAL: Uninstall any existing PyTorch installations to avoid conflicts
print("🧹 Cleaning existing PyTorch installations...")
!pip uninstall torch torchvision torchaudio -y
!uv pip uninstall torch torchvision torchaudio -y

# Install compatible PyTorch ecosystem versions (MUST be installed together)
print("🔧 Installing PyTorch ecosystem with matched versions...")
!uv pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121

# Install compatible numpy version first
print("📊 Installing NumPy...")
!uv pip install "numpy>=1.24.0,<2.0.0"

# Install transformers with proper version that includes is_quanto_available
print("🤖 Installing Transformers with proper version...")
!uv pip install "transformers>=4.44.0" "accelerate>=0.21.0"

# Install other ML dependencies with compatible versions
print("🧠 Installing ML dependencies...")
!uv pip install "diffusers>=0.25.0" "omegaconf>=2.3.0"

# Install audio processing dependencies
print("🎵 Installing audio processing libraries...")
!uv pip install soundfile librosa resampy

# Install Gradio for UI
print("🌐 Installing Gradio...")
!uv pip install "gradio>=4.0.0"

# Install ResembleAI specific dependencies
print("🎤 Installing ResembleAI dependencies...")
!uv pip install conformer resemble-perth s3tokenizer

# Install ChatterBox TTS (try multiple methods)
print("🎯 Installing ChatterBox TTS...")
try:
    !uv pip install chatterbox-tts
    print("✅ ChatterBox TTS installed from PyPI")
except:
    print("⚠️ PyPI installation failed, trying GitHub...")
    !uv pip install git+https://github.com/resemble-ai/chatterbox.git
    print("✅ ChatterBox TTS installed from GitHub")

# Verify installation
print("\n🧪 Testing PyTorch installation...")
try:
    import torch
    import torchvision
    print(f"🔥 PyTorch version: {torch.__version__}")
    print(f"👁️ TorchVision version: {torchvision.__version__}")
    print(f"🎮 CUDA available: {torch.cuda.is_available()}")
    if torch.cuda.is_available():
        print(f"   GPU: {torch.cuda.get_device_name(0)}")
        print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
        print(f"   CUDA Version: {torch.version.cuda}")
    
    # Test torchvision operations to ensure compatibility
    print("🧪 Testing TorchVision compatibility...")
    import torchvision.ops
    # Test a simple operation that uses the nms operator
    boxes = torch.tensor([[0, 0, 1, 1], [0.5, 0.5, 1.5, 1.5]], dtype=torch.float32)
    scores = torch.tensor([0.9, 0.8], dtype=torch.float32)
    keep = torchvision.ops.nms(boxes, scores, 0.5)
    print("✅ TorchVision NMS operator working correctly!")
    
except Exception as e:
    print(f"❌ PyTorch/TorchVision compatibility issue: {e}")
    print("🔄 This requires runtime restart...")

# Test transformers version
try:
    import transformers
    print(f"🤖 Transformers version: {transformers.__version__}")
    
    # Test the specific function that was causing issues
    from transformers.utils import is_quanto_available
    print("✅ is_quanto_available function found!")
except ImportError as e:
    print(f"❌ Transformers issue: {e}")
    print("🔄 This may require runtime restart...")

# Test ChatterBox TTS
try:
    from chatterbox.tts import ChatterboxTTS
    print("✅ ChatterBox TTS imported successfully!")
except ImportError as e:
    print(f"❌ ChatterBox TTS import failed: {e}")
    print("🔄 This may require runtime restart...")

print("\n🎉 Installation complete! If you see any errors above, restart runtime and re-run.")

## 🔄 Step 2: Restart Runtime (If Needed)

If you see any CUDA or dependency conflicts above, restart the runtime and re-run from Step 1.

In [None]:
# Comprehensive dependency check
import sys
import importlib

def check_dependency(module_name, import_path=None, min_version=None):
    """Check if a dependency is properly installed"""
    try:
        if import_path:
            module = importlib.import_module(import_path)
        else:
            module = importlib.import_module(module_name)
        
        version = getattr(module, '__version__', 'unknown')
        print(f"✅ {module_name}: {version}")
        return True
    except ImportError as e:
        print(f"❌ {module_name}: {e}")
        return False

print("🔍 Checking all dependencies...")
print("=" * 50)

# Check core dependencies
dependencies_ok = True
dependencies_ok &= check_dependency("torch")
dependencies_ok &= check_dependency("torchvision")
dependencies_ok &= check_dependency("transformers")
dependencies_ok &= check_dependency("gradio")
dependencies_ok &= check_dependency("numpy")
dependencies_ok &= check_dependency("soundfile")
dependencies_ok &= check_dependency("librosa")

# Test PyTorch/TorchVision compatibility (critical test)
try:
    import torch
    import torchvision.ops
    # Test the NMS operator that was causing issues
    boxes = torch.tensor([[0, 0, 1, 1]], dtype=torch.float32)
    scores = torch.tensor([0.9], dtype=torch.float32)
    keep = torchvision.ops.nms(boxes, scores, 0.5)
    print("✅ PyTorch/TorchVision compatibility: Working")
except Exception as e:
    print(f"❌ PyTorch/TorchVision compatibility: {e}")
    dependencies_ok = False

# Check the specific function that was causing issues
try:
    from transformers.utils import is_quanto_available
    print("✅ transformers.utils.is_quanto_available: Available")
except ImportError:
    print("❌ transformers.utils.is_quanto_available: Missing (needs transformers>=4.44.0)")
    dependencies_ok = False

# Check ChatterBox TTS
try:
    from chatterbox.tts import ChatterboxTTS
    print("✅ ChatterBox TTS: Available")
except ImportError as e:
    print(f"❌ ChatterBox TTS: {e}")
    dependencies_ok = False

# Check GPU
try:
    import torch
    if torch.cuda.is_available():
        print(f"✅ GPU: {torch.cuda.get_device_name(0)}")
        print(f"   CUDA Version: {torch.version.cuda}")
    else:
        print("⚠️ GPU: Not available (will use CPU - slower generation)")
except:
    print("❌ GPU: Cannot check")

print("=" * 50)
if dependencies_ok:
    print("🎉 All dependencies are working correctly!")
    print("🚀 You can proceed to the next step.")
else:
    print("⚠️ Some dependencies have issues.")
    print("🔄 SOLUTION: Runtime -> Restart Runtime, then re-run from Step 1")
    print("💡 If issues persist, try running Step 1 twice before restarting.")

## 📥 Step 3: Download Repository

Clone the ChatterBox Unlimited repository with the Gradio interface.

In [None]:
# Clone the repository
!git clone https://github.com/Wamp1re-Ai/Chatterbox-Unlimited-Colab.git
%cd Chatterbox-Unlimited-Colab

# List files
!ls -la

## 🚀 Step 4: Launch ChatterBox TTS Interface

This will start the Gradio web interface. The model will automatically download (~5GB) on first run.

In [None]:
# Launch the ChatterBox TTS interface
!python main.py --share --port 7860

# Note: The interface will be available at the Gradio public link
# Look for the line that says "Running on public URL: https://xxxxx.gradio.live"

## 🎯 How to Use

Once the interface is running:

### Basic Text-to-Speech
1. **Load Model**: Click "Load ChatterBox TTS Model" (first time may take a few minutes)
2. **Enter Text**: Type your text in the input box
3. **Adjust Settings**:
   - **Exaggeration**: 0.0-1.0 (emotion intensity)
   - **CFG Weight**: 0.0-1.0 (speech pacing)
4. **Generate**: Click "🎤 Generate Speech"

### Voice Cloning
1. **Upload Reference Audio**: 3-10 seconds of clear speech
2. **Enter Text**: What you want the cloned voice to say
3. **Generate**: The output will mimic the reference voice

### Tips for Best Results
- **General Use**: Default settings (0.5, 0.5) work well
- **Expressive Speech**: Lower CFG (~0.3) + higher exaggeration (~0.7+)
- **Voice Cloning**: Use clear, high-quality reference audio
- **GPU Acceleration**: Colab's GPU will make generation much faster!

---

## 📝 Notes
- Generated audio includes watermarking for responsible AI use
- First model load downloads ~5GB of weights
- Colab session will timeout after inactivity
- For extended use, consider Colab Pro for longer sessions

## 🔧 Troubleshooting

### Common Issues and Solutions

**❌ `cannot import name 'is_quanto_available' from 'transformers.utils'`**
- **Solution**: Restart runtime and re-run Step 1
- **Cause**: Outdated transformers version
- **Fix**: The notebook now installs transformers>=4.44.0

**❌ `operator torchvision::nms does not exist`**
- **Solution**: Restart runtime and re-run Step 1
- **Cause**: PyTorch/TorchVision version mismatch
- **Fix**: The notebook now uninstalls old versions and installs matched versions
- **Critical**: PyTorch ecosystem must be installed together

**❌ CUDA version conflicts**
- **Solution**: The notebook installs compatible PyTorch versions
- **If still issues**: Try running Step 1 twice, then restart

**❌ ChatterBox TTS import fails**
- **Solution**: The notebook tries both PyPI and GitHub installation
- **Manual fix**: `!pip install git+https://github.com/resemble-ai/chatterbox.git`

**⚠️ Slow generation**
- **Check**: Make sure GPU is enabled (Runtime → Change runtime type → GPU)
- **Verify**: Step 2 should show your GPU name

**🔄 General troubleshooting steps:**
1. Restart runtime: Runtime → Restart Runtime
2. Re-run all cells from Step 1
3. If Step 2 shows issues, repeat steps 1-2
4. Check that GPU is enabled in runtime settings

## 🔗 Links
- [GitHub Repository](https://github.com/Wamp1re-Ai/Chatterbox-Unlimited-Colab)
- [ResembleAI ChatterBox](https://github.com/resemble-ai/chatterbox)
- [Hugging Face Model](https://huggingface.co/ResembleAI/chatterbox)

**Enjoy creating amazing speech with ChatterBox Unlimited! 🎉**