<a href="https://colab.research.google.com/github/Wamp1re-Ai/Chatterbox-TTS-Extended/blob/main/chatterbox_tts_extended_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎧 Chatterbox TTS Extended - Google Colab Notebook

This notebook allows you to run [Chatterbox TTS Extended](https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended) directly in Google Colab.

## Setup

First, we'll clone the repository and install the required dependencies using `uv` for faster installation.

In [None]:
# Clone the repository
!git clone https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended.git
%cd Chatterbox-TTS-Extended

In [None]:
# Install uv package manager for faster dependency installation
!pip install uv

# Install PyTorch and torchvision with compatible versions first
!uv pip install torch==2.7.0 torchaudio==2.7.0 torchvision --index-url https://download.pytorch.org/whl/cu128

# Install transformers with a compatible version
!uv pip install transformers==4.46.3

# Install other dependencies from requirements file
!uv pip install -r requirements.txt

# Install additional dependencies that might be needed
!uv pip install nltk

# Download NLTK data
import nltk
nltk.download('punkt')

# Add the current directory to Python path
import sys
import os
sys.path.append(os.getcwd())
sys.path.append(os.path.join(os.getcwd(), 'chatterbox/src'))

## Usage

Now let's use the Chatterbox TTS Extended to generate speech from text.

In [None]:
import torch
import torchaudio
import nltk
from IPython.display import Audio, display
import numpy as np

try:
    from chatterbox.src.chatterbox.tts import ChatterboxTTS
    print("Successfully imported ChatterboxTTS")
except ImportError as e:
    print(f"Failed to import ChatterboxTTS: {e}")
    # Try alternative import paths
    try:
        from src.chatterbox.tts import ChatterboxTTS
        print("Successfully imported ChatterboxTTS from src.chatterbox.tts")
    except ImportError as e2:
        print(f"Failed to import ChatterboxTTS from src.chatterbox.tts: {e2}")
        raise e

In [None]:
# Select device: CUDA if available, else CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Initialize the Chatterbox TTS model
try:
    model = ChatterboxTTS.from_pretrained(device)
    model = model.to(device)
    model.eval()
    print(f"Model loaded on device: {getattr(model, 'device', 'unknown')}")
except Exception as e:
    print(f"Failed to load model: {e}")
    raise

In [None]:
def generate_speech(text, exaggeration=0.5, temperature=0.8, cfg_weight=0.5):
    """
    Generate speech from text using Chatterbox TTS Extended
    
    Args:
        text (str): The text to convert to speech
        exaggeration (float): Emotion exaggeration (0.0 to 2.0)
        temperature (float): Sampling temperature (0.01 to 5.0)
        cfg_weight (float): Classifier-free guidance weight (0.1 to 1.0)
    
    Returns:
        tuple: (waveform, sample_rate)
    """
    try:
        with torch.no_grad():
            # Generate audio
            wav = model.generate(
                text,
                exaggeration=min(exaggeration, 1.0),
                temperature=temperature,
                cfg_weight=cfg_weight,
                apply_watermark=False
            )
        
        return wav, model.sr
    except Exception as e:
        print(f"Error generating speech: {e}")
        raise

# Example usage
text = "Hello, welcome to Chatterbox TTS Extended running on Google Colab! This is a demonstration of text-to-speech synthesis."

try:
    wav, sr = generate_speech(text)
    print(f"Generated audio with shape: {wav.shape} and sample rate: {sr}")
    
    # Save the audio
    output_path = "generated_speech.wav"
    torchaudio.save(output_path, wav, sr)
    print(f"Audio saved to: {output_path}")
    
    # Play the audio
    display(Audio(output_path, rate=sr))
except Exception as e:
    print(f"Failed to generate speech: {e}")

## Advanced Usage

You can customize the speech generation by adjusting various parameters:

In [None]:
# More emotional speech
emotional_text = "Wow, this is incredible! I can't believe how realistic this text-to-speech sounds!"

try:
    wav_emotional, sr_emotional = generate_speech(emotional_text, exaggeration=1.5, temperature=0.9)
    
    output_path_emotional = "generated_speech_emotional.wav"
    torchaudio.save(output_path_emotional, wav_emotional, sr_emotional)
    print("Emotional speech generated!")
    display(Audio(output_path_emotional, rate=sr_emotional))
except Exception as e:
    print(f"Failed to generate emotional speech: {e}")

In [None]:
# More monotone speech
monotone_text = "This is a monotone voice with low emotion and high CFG weight for more literal speech."

try:
    wav_monotone, sr_monotone = generate_speech(monotone_text, exaggeration=0.2, temperature=0.3, cfg_weight=0.8)
    
    output_path_monotone = "generated_speech_monotone.wav"
    torchaudio.save(output_path_monotone, wav_monotone, sr_monotone)
    print("Monotone speech generated!")
    display(Audio(output_path_monotone, rate=sr_monotone))
except Exception as e:
    print(f"Failed to generate monotone speech: {e}")

## Troubleshooting

If you encounter any issues:

1. Make sure you're using a GPU runtime in Colab (Runtime → Change runtime type → GPU)
2. If you get out-of-memory errors, try:
   - Reducing the length of the input text
   - Using a CPU runtime (slower but uses less memory)
3. For best results, use clear, properly punctuated English text
4. If you encounter import errors, try restarting the runtime (Runtime → Restart runtime) and running the cells again

## Conclusion

You've successfully run Chatterbox TTS Extended in Google Colab! You can now generate high-quality speech from text with various emotional expressions and styles.

For more information about the parameters and advanced usage, check out the [Chatterbox TTS Extended GitHub repository](https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended).