# F5-TTS Vietnamese Training on Google Colab

This notebook allows you to train F5-TTS with Vietnamese data on Google Colab while developing locally in VS Code.

## Setup Instructions:
1. Upload this notebook to Google Colab
2. Ensure your Vietnamese dataset is uploaded to Google Drive
3. Run the cells in order
4. Monitor training progress via TensorBoard

## 1. Check GPU and System Info

In [None]:
# Check GPU availability and specs
!nvidia-smi

# Check Python version
import sys
print(f"Python version: {sys.version}")

# Check available disk space
!df -h

## 2. Setup F5-TTS Environment

In [None]:
# Clone your F5-TTS repository (replace with your GitHub repo URL)
!git clone https://github.com/SWivid/F5-TTS.git
%cd F5-TTS

# Install F5-TTS in development mode
!pip install -e .

# Install additional dependencies for training
!pip install tensorboard accelerate transformers

# Install compatible numpy version
!pip install "numpy<2.0"

# Verify installation
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## 3. Verify Data (Already included in repository)

In [None]:
# Data is already included in the GitHub repository
# No need to mount Google Drive or copy data separately

# Verify data is available
!ls -la data/vietnamese_char/
!head -5 data/vietnamese_char/sample.csv

# Check number of audio files
!echo "Number of audio files:"
!ls data/vietnamese_char/wavs/ | wc -l

## 4. Download Pretrained Models

In [None]:
# Create checkpoint directory
!mkdir -p ckpts

# Download F5-TTS base model for fine-tuning
!wget -O ckpts/F5TTS_Base_model.pt "https://huggingface.co/SWivid/F5-TTS/resolve/main/F5TTS_Base/model_1200000.pt"

# Verify download
!ls -lh ckpts/

## 5. Configure Training (Colab-Optimized)

In [None]:
# Check available GPU memory to optimize batch size
import torch
if torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"Available GPU memory: {gpu_memory:.1f} GB")
    
    # Recommend batch size based on GPU memory
    if gpu_memory >= 24:  # A100
        batch_size = 12000
        max_samples = 64
    elif gpu_memory >= 16:  # V100
        batch_size = 8000
        max_samples = 48
    else:  # T4
        batch_size = 6000
        max_samples = 32
        
    print(f"Recommended batch_size_per_gpu: {batch_size}")
    print(f"Recommended max_samples: {max_samples}")
else:
    print("No GPU available!")

## 6. Start TensorBoard (Optional - for monitoring)

In [None]:
# Load TensorBoard extension
%load_ext tensorboard

# Start TensorBoard (run this in background)
%tensorboard --logdir logs/ --port 6006

## 7. Start Training

In [None]:
# Fine-tuning command optimized for Colab
# Adjust batch_size_per_gpu and max_samples based on your GPU memory from cell above

!accelerate launch \
    --mixed_precision=fp16 \
    src/f5_tts/train/finetune_cli.py \
    --exp_name F5TTS_vietnamese_colab \
    --learning_rate 5e-5 \
    --batch_size_per_gpu 6000 \
    --batch_size_type frame \
    --max_samples 32 \
    --grad_accumulation_steps 2 \
    --max_grad_norm 1 \
    --epochs 50 \
    --num_warmup_updates 5000 \
    --save_per_updates 10000 \
    --keep_last_n_checkpoints 3 \
    --last_per_updates 2000 \
    --dataset_name vietnamese \
    --finetune \
    --tokenizer char \
    --logger tensorboard \
    --log_samples

## 8. Save and Backup Models

In [None]:
# Compress and download checkpoints directly
!tar -czf vietnamese_model_checkpoints.tar.gz ckpts/

# Also save logs
!tar -czf training_logs.tar.gz logs/

# Download to local machine
from google.colab import files
files.download('vietnamese_model_checkpoints.tar.gz')
files.download('training_logs.tar.gz')

# Optional: Save to Google Drive as backup
# from google.colab import drive
# drive.mount('/content/drive')
# !cp vietnamese_model_checkpoints.tar.gz "/content/drive/MyDrive/"
# !cp training_logs.tar.gz "/content/drive/MyDrive/"

print("Models ready for download!")

## 9. Update Code from Local Development (Run when needed)

In [None]:
# Pull latest changes from your GitHub repository
# Run this cell when you've made changes locally and pushed to GitHub

%cd F5-TTS
!git pull origin main

# Reinstall if you made changes to the package
!pip install -e . --force-reinstall

print("Code updated from GitHub repository!")