# 🎤 StarGANv2-VC Training on Google Colab
## Train Your Voice Conversion Model with FREE GPU!

### 📋 What This Notebook Does:
1. ✅ Sets up StarGANv2-VC on Colab GPU
2. ✅ Uploads your prepared training data
3. ✅ Downloads pretrained models
4. ✅ Trains with optimal GPU settings (10-20x faster than CPU!)
5. ✅ Downloads trained model back to your computer

### ⚡ Expected Training Time:
- **CPU (your PC)**: ~150 hours (6 days)
- **GPU (Colab)**: ~15-25 hours (1 day)
- **Speed Improvement**: 10-15x faster! 🚀

### 🎯 Steps:
1. Click **Runtime → Change runtime type → GPU (T4)**
2. Run all cells in order
3. Upload your `MyVoice` data folder when prompted
4. Let it train!
5. Download your trained model

---

## 1️⃣ Check GPU Availability

In [None]:
import torch
import subprocess

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"✅ GPU Available: {gpu_name}")
    print(f"✅ GPU Memory: {gpu_memory:.2f} GB")
    print(f"✅ CUDA Version: {torch.version.cuda}")
    print("\n🚀 Perfect! You'll get 10-20x faster training!")
else:
    print("❌ No GPU detected!")
    print("⚠️  Go to Runtime → Change runtime type → Select 'GPU'")
    raise Exception("Please enable GPU in Colab settings!")

## 2️⃣ Install Dependencies

In [None]:
# Install required packages
!pip install -q librosa==0.11.0
!pip install -q phonemizer==3.3.0
!pip install -q munch==4.0.0
!pip install -q pyyaml==6.0.2
!pip install -q tqdm==4.67.1
!pip install -q soundfile==0.13.0
!pip install -q git+https://github.com/resemble-ai/monotonic_align.git
!pip install -q --no-build-isolation parallel_wavegan==0.6.1

print("\n✅ All dependencies installed!")

## 3️⃣ Clone StarGANv2-VC Repository

In [None]:
import os

# Clone repository
if not os.path.exists('StarGANv2-VC'):
    !git clone https://github.com/yl4579/StarGANv2-VC.git
    print("✅ Repository cloned!")
else:
    print("✅ Repository already exists!")

# Change to repo directory
%cd StarGANv2-VC

## 4️⃣ Download Pretrained Models

In [None]:
import gdown
import os

# Create directories
os.makedirs('Models', exist_ok=True)
os.makedirs('Vocoder', exist_ok=True)
os.makedirs('Utils/ASR', exist_ok=True)
os.makedirs('Utils/JDC', exist_ok=True)

print("📥 Downloading pretrained models...\n")

# Download StarGANv2-VC pretrained model (VCTK20 - Award Winner)
if not os.path.exists('Models/epoch_00150.pth'):
    print("⬇️  Downloading StarGANv2-VC pretrained model (1.5GB)...")
    gdown.download('https://drive.google.com/uc?id=1WARohzB7EHiQJfNDNMmWOJHe0SbPiLcY', 
                   'Models/epoch_00150.pth', quiet=False)
    print("✅ StarGANv2-VC model downloaded!\n")
else:
    print("✅ StarGANv2-VC model already exists!\n")

# Download ParallelWaveGAN vocoder
if not os.path.exists('Vocoder/checkpoint-400000steps.pkl'):
    print("⬇️  Downloading ParallelWaveGAN vocoder (23MB)...")
    gdown.download('https://drive.google.com/uc?id=1xhkLRy_0-S2tsZMFF3kFLZVJ-_Bxa7RY',
                   'Vocoder/checkpoint-400000steps.pkl', quiet=False)
    print("✅ Vocoder downloaded!\n")
else:
    print("✅ Vocoder already exists!\n")

# Download ASR model
if not os.path.exists('Utils/ASR/epoch_00100.pth'):
    print("⬇️  Downloading ASR model (87MB)...")
    gdown.download('https://drive.google.com/uc?id=1EwI7xAB2Ql1B2E8Xp8BKPcqrx1mSLJJr',
                   'Utils/ASR/epoch_00100.pth', quiet=False)
    print("✅ ASR model downloaded!\n")
else:
    print("✅ ASR model already exists!\n")

# Download ASR config
if not os.path.exists('Utils/ASR/config.yml'):
    print("⬇️  Downloading ASR config...")
    gdown.download('https://drive.google.com/uc?id=1f7hgaEZqfaO3vmO_l3gSnFVTDAqIYtXF',
                   'Utils/ASR/config.yml', quiet=False)
    print("✅ ASR config downloaded!\n")
else:
    print("✅ ASR config already exists!\n")

# Download F0 model
if not os.path.exists('Utils/JDC/bst.t7'):
    print("⬇️  Downloading F0 model (21MB)...")
    gdown.download('https://drive.google.com/uc?id=1DPwiDg2oaPww29u4NQ97YL0oXUEeyayW',
                   'Utils/JDC/bst.t7', quiet=False)
    print("✅ F0 model downloaded!\n")
else:
    print("✅ F0 model already exists!\n")

print("\n🎉 All pretrained models ready!")

## 5️⃣ Upload Your Training Data

### 📤 Upload your prepared data:
You have **TWO OPTIONS**:

#### **Option A: Upload ZIP file** (Recommended - Faster!)
1. On your PC, create a ZIP file of your data:
   - Zip the entire `Data/MyVoice` folder
2. Upload the ZIP using the file upload button below

#### **Option B: Upload from Google Drive**
1. Upload your `MyVoice` folder to Google Drive
2. Share it and get the folder ID
3. Use the Google Drive download code

In [None]:
# OPTION A: Upload ZIP file
from google.colab import files
import zipfile
import os

print("📤 Please upload your MyVoice.zip file...")
print("(The ZIP should contain: train_list.txt, val_list.txt, and all .wav files)\n")

uploaded = files.upload()

# Extract the ZIP
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        print(f"\n📦 Extracting {filename}...")
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('Data/')
        print("✅ Data extracted!")
        os.remove(filename)  # Clean up
        break

# Verify data
if os.path.exists('Data/MyVoice/train_list.txt'):
    train_count = len(open('Data/MyVoice/train_list.txt').readlines())
    val_count = len(open('Data/MyVoice/val_list.txt').readlines())
    print(f"\n✅ Training data verified!")
    print(f"   📊 Training samples: {train_count}")
    print(f"   📊 Validation samples: {val_count}")
else:
    print("❌ Data not found! Please check your ZIP structure.")

In [None]:
# OPTION B: Download from Google Drive (Alternative)
# Uncomment and use this if you prefer Google Drive

# from google.colab import drive
# drive.mount('/content/drive')

# # Copy your data from Drive
# !cp -r /content/drive/MyDrive/MyVoice Data/
# print("✅ Data copied from Google Drive!")

## 6️⃣ Create GPU-Optimized Configuration

In [None]:
import os

# Create Configs directory
os.makedirs('Configs', exist_ok=True)

# GPU-optimized configuration
config = """log_dir: "Models/MyVoice_BestQuality"
save_freq: 2  # Save every 2 epochs (recommended by original paper)
device: "cuda"  # GPU enabled for 10-20x faster training!
epochs: 150  # Same as award-winning model
batch_size: 8  # Increased for GPU (original paper used 5)
pretrained_model: "Models/epoch_00150.pth"  # Use award-winning VCTK20 pretrained model
load_only_params: true  # Transfer learning for best quality with single speaker
fp16_run: true  # Mixed precision for faster GPU training
num_workers: 2  # Data loading workers

train_data: "Data/MyVoice/train_list.txt"
val_data: "Data/MyVoice/val_list.txt"

F0_path: "Utils/JDC/bst.t7"
ASR_config: "Utils/ASR/config.yml"
ASR_path: "Utils/ASR/epoch_00100.pth"

preprocess_params:
  sr: 24000
  spect_params:
    n_fft: 2048
    win_length: 1200
    hop_length: 300

model_params:
  dim_in: 64
  style_dim: 64
  latent_dim: 16
  num_domains: 1  # Single speaker (your voice)
  max_conv_dim: 512
  n_repeat: 4
  w_hpf: 0
  F0_channel: 256

loss_params:
  g_loss:
    lambda_sty: 1.
    lambda_cyc: 5.
    lambda_ds: 1.
    lambda_norm: 1.
    lambda_asr: 10.
    lambda_f0: 5.
    lambda_f0_sty: 0.1
    lambda_adv: 2.
    lambda_adv_cls: 0.5
    norm_bias: 0.5
  d_loss:
    lambda_reg: 1.
    lambda_adv_cls: 0.1
    lambda_con_reg: 10.
  
  adv_cls_epoch: 50
  con_reg_epoch: 30

optimizer_params:
  lr: 0.0001
  beta1: 0.0
  beta2: 0.99
  weight_decay: 0.0001
"""

# Write config file
with open('Configs/my_voice_config.yml', 'w') as f:
    f.write(config)

print("✅ GPU-optimized configuration created!")
print("\n📋 Configuration highlights:")
print("   🚀 Device: GPU (CUDA)")
print("   🚀 Batch size: 8 (vs 2 on CPU)")
print("   🚀 Mixed precision: Enabled")
print("   🚀 Expected speedup: 10-20x faster!")

## 7️⃣ Fix PyTorch 2.x Compatibility (if needed)

In [None]:
# Fix torch.load() for PyTorch 2.x
import fileinput
import sys

def fix_torch_load(filename):
    """Add weights_only=False to torch.load() calls"""
    fixed = False
    with open(filename, 'r') as f:
        content = f.read()
    
    # Check if already fixed
    if 'weights_only=False' in content:
        return False
    
    # Fix torch.load calls
    new_content = content.replace(
        'torch.load(checkpoint_path)',
        'torch.load(checkpoint_path, weights_only=False)'
    ).replace(
        'torch.load(self.pretrained_model)',
        'torch.load(self.pretrained_model, weights_only=False)'
    )
    
    if new_content != content:
        with open(filename, 'w') as f:
            f.write(new_content)
        fixed = True
    
    return fixed

# Fix files
files_to_fix = ['train.py', 'trainer.py']
for file in files_to_fix:
    if fix_torch_load(file):
        print(f"✅ Fixed {file}")
    else:
        print(f"✓ {file} already compatible")

print("\n✅ PyTorch compatibility ensured!")

## 8️⃣ Start Training! 🚀

### ⚡ Training will take approximately:
- **T4 GPU**: ~15-20 hours for 150 epochs
- **V100 GPU**: ~10-15 hours for 150 epochs
- **A100 GPU**: ~8-12 hours for 150 epochs

### ⚠️ Important Notes:
1. **Keep this tab open** (Colab disconnects after ~90 min of inactivity)
2. **Colab free tier**: Max 12 hours runtime (consider Colab Pro for 24h)
3. **Checkpoints saved every 2 epochs** - you can resume if disconnected!
4. **Monitor GPU usage** to ensure it's being utilized

In [None]:
# Create output directory
!mkdir -p Models/MyVoice_BestQuality

# Start training
print("🚀 Starting training with GPU acceleration!\n")
print("📊 You can monitor progress below...\n")
print("="*70)

!python train.py --config_path ./Configs/my_voice_config.yml

## 9️⃣ Monitor Training Progress

In [None]:
# Check training log
!tail -n 50 Models/MyVoice_BestQuality/train.log

In [None]:
# List saved checkpoints
!ls -lh Models/MyVoice_BestQuality/*.pth 2>/dev/null || echo "No checkpoints yet"

In [None]:
# Monitor GPU usage
!nvidia-smi

## 🔟 Download Trained Model

### After training completes, download your model!

In [None]:
import os
from google.colab import files
import zipfile

print("📦 Preparing your trained model for download...\n")

# Create ZIP of all trained models
zip_filename = 'MyVoice_Trained_Models.zip'

with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    model_dir = 'Models/MyVoice_BestQuality'
    
    # Add all .pth files
    for root, dirs, filenames in os.walk(model_dir):
        for filename in filenames:
            if filename.endswith('.pth') or filename == 'train.log':
                file_path = os.path.join(root, filename)
                arcname = os.path.relpath(file_path, model_dir)
                zipf.write(file_path, arcname)
                print(f"   ✓ Added: {filename}")

print(f"\n✅ Created {zip_filename}")
print(f"📊 Size: {os.path.getsize(zip_filename) / 1024**2:.2f} MB\n")

# Download
print("⬇️  Downloading to your computer...")
files.download(zip_filename)

print("\n🎉 Download complete!")
print("\n📝 Next steps:")
print("   1. Extract the ZIP file")
print("   2. Copy epoch_00150.pth to your StarGANv2-VC/Models/ folder")
print("   3. Use it for voice conversion!")

## 🔄 Resume Training (If Disconnected)

### If Colab disconnects, you can resume training:

In [None]:
# First, re-upload your previous checkpoint ZIP if you have one
from google.colab import files
import zipfile
import os

print("📤 Upload your previous checkpoint ZIP (if you have one)...")
uploaded = files.upload()

# Extract checkpoints
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('Models/MyVoice_BestQuality/')
        print(f"✅ Checkpoints restored!")
        os.remove(filename)

# Resume training
!python train.py --config_path ./Configs/my_voice_config.yml

---

## 🎊 Congratulations!

### You've successfully trained your voice conversion model!

### 📊 What You Achieved:
- ✅ Trained with award-winning INTERSPEECH 2021 architecture
- ✅ Used transfer learning from VCTK20 pretrained model
- ✅ Optimized for best quality with GPU acceleration
- ✅ 10-20x faster than CPU training!

### 🎤 Next Steps:
1. Download your trained model (epoch_00150.pth)
2. Test it using the inference notebook
3. Convert any voice to your voice!

### 📚 Resources:
- [StarGANv2-VC GitHub](https://github.com/yl4579/StarGANv2-VC)
- [Original Paper](https://arxiv.org/abs/2107.10394)
- [Demo Samples](https://starganv2-vc.github.io/)

---

**🚀 Happy Voice Converting! 🎤✨**