# Yoruba TTS Training on Google Colab

This notebook sets up and trains a VITS-based Text-to-Speech model for Yoruba language.

**Features:**
- GPU-accelerated training
- Automatic setup and dependency installation
- Training progress monitoring
- Audio synthesis testing

**Note:** Make sure to enable GPU runtime: `Runtime > Change runtime type > GPU`

## 1. Setup Environment

In [1]:
# Check GPU availability
!nvidia-smi

Sat Nov 29 16:26:18 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   69C    P8             11W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [7]:
# Clone the repository
!git clone https://github.com/T-ultrafast/Naija_tts.git
%cd Naija_tts

Cloning into 'Naija_tts'...
remote: Enumerating objects: 31, done.[K
remote: Counting objects: 100% (31/31), done.[K
remote: Compressing objects: 100% (23/23), done.[K
remote: Total 31 (delta 7), reused 26 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (31/31), 123.91 KiB | 4.77 MiB/s, done.
Resolving deltas: 100% (7/7), done.
/content/Naija_tts


In [None]:
# Install dependencies
!pip install --upgrade pip
!pip install -q TTS==0.22.0 torch torchaudio librosa accelerate einops
!pip install -q flask

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.t

## 2. Verify Setup

In [2]:
# List files
!ls -la

total 16
drwxr-xr-x 1 root root 4096 Jul 15 13:41 .
drwxr-xr-x 1 root root 4096 Nov 29 16:56 ..
drwxr-xr-x 1 root root 4096 Jul 15 13:41 .config
drwxr-xr-x 1 root root 4096 Jul 15 13:41 sample_data


In [3]:
# Check metadata file
!head -5 metadata_yor.csv

head: cannot open 'metadata_yor.csv' for reading: No such file or directory


## 3. Update Config for GPU Training

In [8]:
# Update train_vits.py to use CUDA if available
import os

# Read the current train_vits.py
with open('train_vits.py', 'r') as f:
    content = f.read()

# Replace use_cuda=False with use_cuda=True in the script if needed
# (The Trainer should auto-detect GPU, but we can verify)
print("train_vits.py is ready for training")
print(f"CUDA available: {__import__('torch').cuda.is_available()}")

train_vits.py is ready for training
CUDA available: True


In [9]:
!ls -R

.:
app.py		       naija_formatter.py      train_vits.py
check_config.py        naija_xtts_config.yaml  wavs
metadata_ljspeech.csv  static		       Yoruba
metadata_ljspeech.txt  templates	       yoruba_characters.txt
metadata_yor.csv       test_inference.py       yoruba_tts_colab.ipynb

./static:
script.js  style.css

./templates:
index.html

./Yoruba:
git  yo_ng_female

./Yoruba/yo_ng_female:
LICENSE  line_index.tsv  make_manifest.py  metadata.csv


## 4. Start Training

**Note:** Training will take several hours. You can monitor progress in the output below.

In [17]:
# Start training (this will run for a long time)
!python train_vits.py

Traceback (most recent call last):
  File "/content/Naija_tts/train_vits.py", line 3, in <module>
    from TTS.config import load_config
ModuleNotFoundError: No module named 'TTS'


## 5. Monitor Training (Optional)

You can check training logs and checkpoints while training is running.

In [10]:
# List output directories
!ls -lh out/naija_xtts_yor/

ls: cannot access 'out/naija_xtts_yor/': No such file or directory


In [11]:
# View latest training log (update the directory name to match your run)
!tail -50 out/naija_xtts_yor/*/trainer_0_log.txt

tail: cannot open 'out/naija_xtts_yor/*/trainer_0_log.txt' for reading: No such file or directory


## 6. Test Inference

After training (or using an existing checkpoint), test the model.

In [12]:
# Find the latest checkpoint
import glob
import os

checkpoint_dirs = glob.glob('out/naija_xtts_yor/*/')
if checkpoint_dirs:
    latest_dir = max(checkpoint_dirs, key=os.path.getmtime)
    checkpoints = glob.glob(os.path.join(latest_dir, 'checkpoint_*.pth'))
    if checkpoints:
        latest_checkpoint = max(checkpoints, key=os.path.getmtime)
        print(f"Latest checkpoint: {latest_checkpoint}")
    else:
        print("No checkpoints found yet")
else:
    print("No training runs found")

No training runs found


In [13]:
# Test synthesis
from TTS.utils.synthesizer import Synthesizer
from naija_formatter import naija_formatter
import TTS.tts.datasets
from IPython.display import Audio

# Register formatter
TTS.tts.datasets.naija = naija_formatter

# Update these paths to match your latest checkpoint
MODEL_PATH = latest_checkpoint  # Use the checkpoint found above
CONFIG_PATH = os.path.join(os.path.dirname(latest_checkpoint), 'config.json')

# Load model
synthesizer = Synthesizer(
    tts_checkpoint=MODEL_PATH,
    tts_config_path=CONFIG_PATH,
    use_cuda=True,  # Use GPU for inference
)

# Synthesize
text = "Bawo ni, se dada ni?"
print(f"Synthesizing: {text}")
wav = synthesizer.tts(text)

# Play audio
Audio(wav, rate=synthesizer.output_sample_rate)

ModuleNotFoundError: No module named 'TTS'

## 7. Save Checkpoint to Google Drive (Optional)

To preserve your trained model, save it to Google Drive.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Copy checkpoints to Drive
!mkdir -p /content/drive/MyDrive/yoruba_tts_checkpoints
!cp -r out/naija_xtts_yor/* /content/drive/MyDrive/yoruba_tts_checkpoints/
print("Checkpoints saved to Google Drive!")

## 8. Run Web Interface (Optional)

You can run the Flask web interface in Colab using ngrok for public access.

In [None]:
# Install pyngrok
!pip install -q pyngrok

In [None]:
# Update app.py to use the latest checkpoint
# Then start the Flask app with ngrok
from pyngrok import ngrok
import threading

# Start Flask in background
def run_flask():
    os.system('python app.py')

thread = threading.Thread(target=run_flask)
thread.start()

# Create ngrok tunnel
public_url = ngrok.connect(5000)
print(f"\nüåê Web Interface URL: {public_url}")
print("Click the link above to access your Yoruba TTS web interface!")