<a href="https://colab.research.google.com/github/Wamp1re-Ai/Chatterbox-TTS-Extended/blob/main/chatterbox_tts_extended_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🎧 Chatterbox TTS Extended - Google Colab Notebook

This notebook allows you to run [Chatterbox TTS Extended](https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended) directly in Google Colab.

## Setup

First, we'll clone the repository and install the required dependencies using `uv` for faster installation.

In [None]:
# Clone the repository
!git clone https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended.git
%cd Chatterbox-TTS-Extended

In [None]:
# Install uv package manager for faster dependency installation
!pip install uv

# Install dependencies from requirements file
!uv pip install -r requirements.txt

# Install additional dependencies that might be needed
!uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
!uv pip install nltk

# Download NLTK data
import nltk
nltk.download('punkt')

## Usage

Now let's use the Chatterbox TTS Extended to generate speech from text.

In [None]:
import torch
import torchaudio
import nltk
import sys
import os

# Add the current directory to Python path
sys.path.append(os.getcwd())
sys.path.append(os.path.join(os.getcwd(), 'chatterbox/src'))

from chatterbox.src.chatterbox.tts import ChatterboxTTS
from IPython.display import Audio, display
import numpy as np

In [None]:
# Select device: CUDA if available, otherwise CPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Initialize the Chatterbox TTS model
model = ChatterboxTS.from_pretrained(device)
model = model.to(device)
model.eval()

print("Model loaded successfully!")

In [None]:
def generate_speech(text, exaggeration=0.5, temperature=0.8, cfg_weight=0.5):
    ""
    Generate speech from text using Chatterbox TTS Extended
    
    Args:
        text (str): The text to convert to speech
        exaggeration (float): Emotion exaggeration (0.0 to 2.0)
        temperature (float): Sampling temperature (0.01 to 5.0)
        cfg_weight (float): Classifier-free guidance weight (0.1 to 1.0)
    
    Returns:
        tuple: (waveform, sample_rate)
    """
    with torch.no_grad():
        # Generate audio
        wav = model.generate(
            text,
            exaggeration=exaggeration,
            temperature=temperature,
            cfg_weight=cfg_weight,
            apply_watermark=False
        )
    
    return wav, model.sr

# Example usage
text = "Hello, welcome to Chatterbox TTS Extended running on Google Colab! This is a demonstration of text-to-speech synthesis."
wav, sr = generate_speech(text)

print(f"Generated audio with shape: {wav.shape} and sample rate: {sr}")

# Save the audio
output_path = "generated_speech.wav"
torchaudio.save(output_path, wav, sr)
print(f"Audio saved to: {output_path}")

# Play the audio
display(Audio(output_path, rate=sr))

## Advanced Usage

You can customize the speech generation by adjusting various parameters:

In [None]:
# More emotional speech
emotional_text = "Wow, this is incredible! I can't believe how realistic this text-to-speech sounds!"
wav_emotional, sr_emotional = generate_speech(emotional_text, exaggeration=1.5, temperature=0.9)

output_path_emotional = "generated_speech_emotional.wav"
torchaudio.save(output_path_emotional, wav_emotional, sr_emotional)
print("Emotional speech generated!")
display(Audio(output_path_emotional, rate=sr_emotional))

In [None]:
# More monotone speech
monotone_text = "This is a monotone voice with low emotion and high CFG weight for more literal speech."
wav_monotone, sr_monotone = generate_speech(monotone_text, exaggeration=0.2, temperature=0.3, cfg_weight=0.8)

output_path_monotone = "generated_speech_monotone.wav"
torchaudio.save(output_path_monotone, wav_monotone, sr_monotone)
print("Monotone speech generated!")
display(Audio(output_path_monotone, rate=sr_monotone))

## Batch Generation

Generate multiple speech samples at once:

In [None]:
texts = [
    "This is the first sentence.",
    "Here is another example of generated speech.",
    "Batch processing allows generating multiple audios efficiently."
]

# Generate multiple audios at once
with torch.no_grad():
    wavs = model.generate_batch(
        texts,
        exaggeration=0.7,
        temperature=0.7,
        cfg_weight=0.5,
        apply_watermark=False
    )

print(f"Generated {len(wavs)} audio samples")

# Save and play each audio
for i, wav in enumerate(wavs):
    output_path = f"batch_generated_{i}.wav"
    torchaudio.save(output_path, wav, model.sr)
    print(f"Audio {i+1} saved to: {output_path}")
    display(Audio(output_path, rate=model.sr))

## Troubleshooting

If you encounter any issues:

1. Make sure you're using a GPU runtime in Colab (Runtime → Change runtime type → GPU)
2. If you get out-of-memory errors, try:
   - Reducing the length of the input text
   - Using a CPU runtime (slower but uses less memory)
3. For best results, use clear, properly punctuated English text

## Conclusion

You've successfully run Chatterbox TTS Extended in Google Colab! You can now generate high-quality speech from text with various emotional expressions and styles.

For more information about the parameters and advanced usage, check out the [Chatterbox TTS Extended GitHub repository](https://github.com/Wamp1re-Ai/Chatterbox-TTS-Extended).