# Voice Cloning with Coqui XTTS v2 (Google Colab)

This notebook is designed for **free Google Colab GPU runtime** and is beginner-friendly.
It will help you:
1. Install Coqui TTS (XTTS v2)
2. Verify GPU access
3. Upload a short voice sample
4. Clone the voice from your text
5. Save and download `output.wav`

## 0) Enable GPU Runtime (Required)
In Colab, go to:
**Runtime → Change runtime type → Hardware accelerator → GPU**

Then run the next cell. If GPU is not available, reconnect and try again.

In [None]:
# Check GPU status
import torch
!nvidia-smi -L

if not torch.cuda.is_available():
    raise RuntimeError("GPU not detected. In Colab, enable GPU from Runtime > Change runtime type.")

print("✅ GPU is available:", torch.cuda.get_device_name(0))

## 1) Install dependencies
This installs a **Colab-friendly Python 3.12 setup** with PyTorch CUDA 11.8 wheels (good for T4 GPUs) and XTTS v2 dependencies.


In [None]:
# Install PyTorch (CUDA 11.8) + XTTS v2 dependencies for Colab Python 3.12
!pip -q install --upgrade pip
!pip -q install --upgrade --index-url https://download.pytorch.org/whl/cu118 torch torchvision torchaudio
!pip -q install --upgrade TTS soundfile

print("✅ Installation complete (PyTorch cu118 + XTTS v2 deps)")


## 2) Upload your voice sample
Tips for best quality:
- Use a **clear** recording (5–20 seconds)
- Prefer **.wav** format (mono, minimal background noise)
- English cloning works well; XTTS also supports multiple languages

In [None]:
from google.colab import files

uploaded = files.upload()
if not uploaded:
    raise RuntimeError("No file uploaded. Please upload a voice sample.")

voice_sample_path = next(iter(uploaded.keys()))
print(f"✅ Uploaded voice sample: {voice_sample_path}")

## 3) Set text and language
Use a supported language code such as `en`, `es`, `fr`, `de`, `it`, `pt`, `hi`, etc.

In [None]:
# Edit these values before running
text_to_speak = "Hello! This is a cloned voice generated with Coqui XTTS version 2 in Google Colab."
language_code = "en"  # Example: en, es, fr, de, it, pt, hi

print("Text:", text_to_speak)
print("Language:", language_code)

## 4) Load XTTS v2 model and generate cloned speech
This step downloads model files the first time.

In [None]:
import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)

model_name = "tts_models/multilingual/multi-dataset/xtts_v2"
tts = TTS(model_name).to(device)

output_path = "output.wav"
tts.tts_to_file(
    text=text_to_speak,
    speaker_wav=voice_sample_path,
    language=language_code,
    file_path=output_path,
)

print(f"✅ Done! Generated file: {output_path}")

## 5) Play and download `output.wav`

In [None]:
from IPython.display import Audio, display
from google.colab import files

display(Audio("output.wav", autoplay=False))
files.download("output.wav")