# StyleTTS2 Training on Google Colab
## Train Cigdem TTS Model

This notebook will help you train your Turkish TTS model on Google Colab with free GPU.

## Step 1: Check GPU and Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi
import torch
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Step 2: Clone StyleTTS2 Repository

In [None]:
# Clone your repository
!git clone https://github.com/ElkhanAbbasov/TTS_Cigdem.git
%cd TTS_Cigdem

# Verify we're in the right place
!pwd
!ls -la

## Step 3: Install Dependencies

In [None]:
# Install required packages (INCLUDING einops-exts and click)
!pip install -q SoundFile torchaudio munch pydub pyyaml librosa nltk matplotlib accelerate transformers einops einops-exts tqdm click
!pip install -q git+https://github.com/resemble-ai/monotonic_align.git

print("‚úÖ All dependencies installed!")

## Step 4: Mount Google Drive (for saving checkpoints)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Create checkpoint backup folder in Drive
!mkdir -p '/content/drive/MyDrive/Cigdem_TTS_Checkpoints'
print("‚úÖ Google Drive mounted!")

In [None]:
# Check disk space and clean up unnecessary files
print("üíæ Disk Space Check:")
!df -h /content

print("\nüßπ Cleaning up to free space...")

# Clean pip cache
!pip cache purge

# Clean apt cache
!sudo apt-get clean

# Remove unnecessary files
!rm -rf /root/.cache/*
!rm -rf /tmp/*

print("\nüíæ After cleanup:")
!df -h /content

print("\n‚ö†Ô∏è IMPORTANT: Colab has limited space (~70GB free)")
print("   Checkpoints are ~400MB each")
print("   With save_freq=2, you'll save ~50 checkpoints")
print("   Make sure to backup to Google Drive regularly!")

## Step 4.5: Free Up Disk Space (Critical!)

Colab has limited disk space. We need to clean up to make room for training checkpoints.

## Step 5: Verify Training Data

In [None]:
# Download pretrained models from StyleTTS2
import os
import gdown

print("üì• Downloading pretrained StyleTTS2 models...")
print("This may take 5-10 minutes depending on connection speed.\n")

# Create Models directory if it doesn't exist
os.makedirs("Models", exist_ok=True)

# Download the pretrained model (LibriTTS base model)
# This is the base model needed for fine-tuning
print("Downloading base pretrained model...")
model_url = "https://huggingface.co/yl4579/StyleTTS2-LibriTTS/resolve/main/Models/LibriTTS/epochs_2nd_00020.pth"
model_path = "Models/LibriTTS_pretrained.pth"

# Use wget instead of gdown for HuggingFace
!wget -q --show-progress -O {model_path} {model_url}

if os.path.exists(model_path) and os.path.getsize(model_path) > 100000:
    print(f"‚úÖ Base model downloaded: {model_path}")
    print(f"   Size: {os.path.getsize(model_path) / 1024 / 1024:.1f} MB")
else:
    print("‚ùå Download failed! Trying alternative method...")
    # Try with curl as backup
    !curl -L -o {model_path} {model_url}
    
if not os.path.exists(model_path):
    print("\n‚ö†Ô∏è Manual download needed:")
    print(f"   1. Download from: {model_url}")
    print(f"   2. Upload to: {model_path}")
else:
    print("\n‚úÖ All pretrained models ready for fine-tuning!")

## Step 5.5: Download Required Pretrained Models

StyleTTS2 requires base pretrained models for fine-tuning.

In [None]:
# Check if training data exists
import os

train_list = "Data/my_train_list.txt"
if os.path.exists(train_list):
    with open(train_list, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    print(f"‚úÖ Training data found: {len(lines)} samples")
    print("\nFirst 3 samples:")
    for line in lines[:3]:
        print(f"  {line.strip()}")
    
    # Quick check that all audio files can be loaded
    print("\nüîç Checking all audio files...")
    import soundfile as sf
    all_valid = True
    for i, line in enumerate(lines, 1):
        path = line.strip().split('|')[0]
        try:
            wave, sr = sf.read(path)
            print(f"  ‚úì {i}: {path} ({sr} Hz)")
        except Exception as e:
            print(f"  ‚úó {i}: {path} - ERROR: {e}")
            all_valid = False
    
    if all_valid:
        print("\n‚úÖ All audio files can be loaded!")
    else:
        print("\n‚ö†Ô∏è Some audio files have issues!")
else:
    print("‚ùå Training data not found!")
    print("\nüìÇ Current directory:", os.getcwd())
    print("\nüìã Directory contents:")
    !ls -la
    print("\nüìã Looking for Data folder:")
    !ls -la Data/ 2>/dev/null || echo "Data folder not found"
    print("\n‚ö†Ô∏è Make sure the repository was cloned correctly!")

## Step 6: Configure Training Settings

Update the config to use the pretrained model and prepare for training.

## Step 7: Start Training

**IMPORTANT:** 
- Free Colab sessions last ~12 hours max
- Training will run continuously
- Checkpoints saved every 2 epochs to Models/Cigdem_TTS/
- Backup to Google Drive regularly!
- If disconnected, re-run from Step 6 to resume from last checkpoint

In [None]:
# Configure training settings - AUTO-DETECT GPU TYPE
import yaml
import os
import torch

config_path = "Configs/config_ft.yml"
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

# Set the pretrained model path
pretrained_model_path = "Models/LibriTTS_pretrained.pth"

if os.path.exists(pretrained_model_path):
    print(f"‚úÖ Found pretrained model: {pretrained_model_path}")
    config['pretrained_model'] = pretrained_model_path
    config['second_stage_load_pretrained'] = True
    config['load_only_params'] = True  # Load only model weights, not optimizer state
    print("   Will load pretrained weights for fine-tuning")
else:
    print(f"‚ö†Ô∏è Pretrained model not found: {pretrained_model_path}")
    print("   Please run Step 5.5 to download the model")
    config['second_stage_load_pretrained'] = False
    config['pretrained_model'] = ""

# üîß AUTO-DETECT GPU AND OPTIMIZE SETTINGS
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"
gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9 if torch.cuda.is_available() else 0

print(f"\nüéÆ GPU Detection:")
print(f"   Type: {gpu_name}")
print(f"   Memory: {gpu_memory:.1f} GB")

# Optimize settings based on GPU
if "A100" in gpu_name or gpu_memory > 70:
    # A100 or V100 (80GB): Maximum performance
    config['batch_size'] = 4
    config['max_len'] = 300
    print("\n‚ö° A100/V100 Detected - Using HIGH PERFORMANCE settings:")
    print("   Batch size: 4 (4x faster training)")
    print("   Max length: 300 (longer sequences)")
elif "T4" in gpu_name or "L4" in gpu_name or (15 < gpu_memory < 30):
    # T4 or L4 (16-24GB): Balanced settings
    config['batch_size'] = 1
    config['max_len'] = 200
    print("\n‚öôÔ∏è T4/L4 Detected - Using BALANCED settings:")
    print("   Batch size: 1 (stable training)")
    print("   Max length: 200 (memory-efficient)")
else:
    # Unknown or smaller GPU: Conservative settings
    config['batch_size'] = 1
    config['max_len'] = 150
    print("\nüîß Using CONSERVATIVE settings for safety:")
    print("   Batch size: 1")
    print("   Max length: 150")

config['epochs'] = 100
config['save_freq'] = 5  # Save less frequently to save disk space

# Save updated config
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False)

print(f"\nüìã Training Configuration:")
print(f"  Pretrained model: {config.get('pretrained_model', 'None')}")
print(f"  Load pretrained: {config.get('second_stage_load_pretrained', False)}")
print(f"  Batch size: {config.get('batch_size', 'N/A')}")
print(f"  Max length: {config.get('max_len', 'N/A')}")
print(f"  Epochs: {config.get('epochs', 'N/A')}")
print(f"  Save frequency: {config.get('save_freq', 'N/A')} epochs")
print(f"  Log dir: {config.get('log_dir', 'N/A')}")

In [None]:
# Start training - runs continuously until complete or stopped
# If this is a resume, it will automatically load the latest checkpoint
!python train_finetune.py --config_path Configs/config_ft.yml

## Step 8: Backup Checkpoints to Google Drive (Run periodically)

**Run this cell every 2-3 hours while training runs above!**

In [None]:
# Backup ONLY the latest checkpoint to Google Drive (saves space!)
import os

# Create backup directory if it doesn't exist
!mkdir -p '/content/drive/MyDrive/Cigdem_TTS_Checkpoints'

# Check what checkpoints exist
checkpoint_dir = "Models/Cigdem_TTS"
if os.path.exists(checkpoint_dir):
    checkpoints = !ls {checkpoint_dir}/epoch_*.pth 2>/dev/null || echo ""
    checkpoints = [c for c in checkpoints if c and 'epoch_' in c]
    
    if checkpoints:
        # Get only the LATEST checkpoint (highest epoch number)
        latest_checkpoint = sorted(checkpoints)[-1]
        
        print(f"üì¶ Found {len(checkpoints)} checkpoints")
        print(f"‚úÖ Backing up LATEST: {os.path.basename(latest_checkpoint)}")
        
        # Copy latest checkpoint to Google Drive
        !cp '{latest_checkpoint}' '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/latest_checkpoint.pth'
        
        # Also save with epoch number for tracking
        !cp '{latest_checkpoint}' '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/'
        
        # Copy training log
        !cp Models/Cigdem_TTS/train.log '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/' 2>/dev/null || echo "No log yet"
        
        print("\nüìÇ Google Drive contents:")
        !ls -lh '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/'
        
        print(f"\n‚úÖ Latest checkpoint backed up!")
        print(f"   File: {os.path.basename(latest_checkpoint)}")
        print(f"   Also saved as: latest_checkpoint.pth (easy to find!)")
        
        # Clean up old local checkpoints to save Colab disk space
        if len(checkpoints) > 2:
            old_checkpoints = sorted(checkpoints)[:-2]  # Keep last 2 locally
            print(f"\nüßπ Cleaning up {len(old_checkpoints)} old local checkpoints...")
            for ckpt in old_checkpoints:
                !rm '{ckpt}'
                print(f"   Deleted: {os.path.basename(ckpt)}")
        
    else:
        print("‚ö†Ô∏è No checkpoints found yet")
        print("   Training hasn't saved any checkpoints yet")
else:
    print("‚ùå Checkpoint directory not found")
    print(f"   Looking for: {checkpoint_dir}")
    !pwd
    !ls -la Models/ 2>/dev/null || echo "Models folder not found"

## Step 9: View Training Progress

In [None]:
# Display last 20 lines of training log
!tail -n 20 Models/Cigdem_TTS/train.log

## Step 9: Download Results (After epoch 20+)

Download checkpoints to test locally

In [None]:
# Download checkpoints from Google Drive (best results)
import os
from google.colab import files

# First, check what's in Google Drive
print("üìÇ Checkpoints in Google Drive:")
!ls -lh '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/'

# Find the best checkpoint (highest epoch number)
drive_checkpoints = !ls '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/epoch_*.pth' 2>/dev/null || echo ""
drive_checkpoints = [c for c in drive_checkpoints if c and 'epoch_' in c]

if drive_checkpoints:
    # Sort to get the latest checkpoint
    latest_checkpoint = sorted(drive_checkpoints)[-1]
    print(f"\nüì• Downloading latest checkpoint: {os.path.basename(latest_checkpoint)}")
    
    # Copy to local for download
    !cp '{latest_checkpoint}' ./best_checkpoint.pth
    !cp '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/train.log' ./train.log 2>/dev/null || echo "No log"
    
    # Download
    if os.path.exists('best_checkpoint.pth'):
        files.download('best_checkpoint.pth')
        print("‚úÖ Downloaded!")
    
    if os.path.exists('train.log'):
        files.download('train.log')
        print("‚úÖ Training log downloaded!")
else:
    print("‚ùå No checkpoints found in Google Drive!")
    print("   Make sure you ran Step 8 to backup checkpoints")

---

## üìù Important Notes:

### Training Process:
- Training runs **continuously** once started (Step 6)
- Goes from epoch 1 ‚Üí 100 automatically
- Each epoch takes ~2-3 minutes
- **Don't re-run Step 6** unless training stops!

### When to Test:
- **Epoch 20+**: First test (may still be noisy)
- **Epoch 30+**: Better quality expected
- **Epoch 50+**: Good quality for your voice

### If Colab Disconnects:
1. Restore checkpoints from Google Drive:
   ```python
   !cp '/content/drive/MyDrive/Cigdem_TTS_Checkpoints/epoch_*.pth' Models/Cigdem_TTS/
   ```
2. Re-run Step 6 (training cell)
3. It will automatically resume from last checkpoint!