# Anti-Aliasing Neural Network Training on Google Colab

This notebook enables GPU-accelerated training of anti-aliasing RNN models.

**Expected speedup:** 20-30x (from 32 min/epoch on CPU to ~1-2 min/epoch on T4 GPU)

## Setup Overview

1. Mount Google Drive for persistent storage
2. Clone repository from GitHub
3. Install dependencies
4. Create symlinks for weights/audio/checkpoints
5. Configure wandb for remote monitoring
6. Run training

## Before Running

Ensure your Google Drive has this structure:
```
Google Drive/
└── AA_Neural/
    ├── weights/
    │   └── NAM/
    │       └── Marshall JCM 800 2203/
    │           └── JCM800 2203 - P5 B5 M5 T5 MV7 G10 - AZG - 700.nam
    ├── audio_data/
    │   └── val_input.wav
    └── checkpoints/
        (created automatically)
```

## 1. Mount Google Drive

This mounts your Google Drive to `/content/drive` for persistent storage.

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Verify the AA_Neural folder exists
drive_base = '/content/drive/MyDrive/AA_Neural'
assert os.path.exists(drive_base), f"ERROR: {drive_base} not found. Create it and upload weights/audio_data first."
assert os.path.exists(f'{drive_base}/weights'), f"ERROR: {drive_base}/weights not found."
assert os.path.exists(f'{drive_base}/audio_data'), f"ERROR: {drive_base}/audio_data not found."

print("\u2713 Google Drive mounted successfully")
print(f"\u2713 Found {drive_base}")
print(f"\u2713 Found {drive_base}/weights")
print(f"\u2713 Found {drive_base}/audio_data}")

## 2. Clone Repository

Clone the repository from GitHub. Update the URL below with your repo.

In [None]:
import os

# Configuration
REPO_URL = 'https://github.com/YOUR_USERNAME/dafx25_antialiasing_neural.git'  # UPDATE THIS
REPO_DIR = '/content/dafx25_antialiasing_neural'

# Remove existing clone if present
if os.path.exists(REPO_DIR):
    print(f"Removing existing {REPO_DIR}")
    !rm -rf {REPO_DIR}

# Clone repository
print(f"Cloning {REPO_URL}...")
!git clone --recurse-submodules {REPO_URL} {REPO_DIR}

# Change to repo directory
os.chdir(REPO_DIR)
print(f"\n\u2713 Repository cloned to {REPO_DIR}")
print(f"\u2713 Current directory: {os.getcwd()}")

# Verify OpenAmp submodule
assert os.path.exists('OpenAmp/Open_Amp/amp_model.py'), "ERROR: OpenAmp submodule not loaded correctly"
print("\u2713 OpenAmp submodule loaded")

## 3. Install Dependencies

Install required Python packages. Colab already has PyTorch, so we only install additional dependencies.

In [None]:
# Install core dependencies
!pip install -q pytorch-lightning wandb auraloss neural-amp-modeler librosa

# Verify critical imports
import torch
import pytorch_lightning as pl
import wandb

print("\u2713 Dependencies installed")
print(f"\u2713 PyTorch version: {torch.__version__}")
print(f"\u2713 PyTorch Lightning version: {pl.__version__}")
print(f"\u2713 CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"\u2713 GPU: {torch.cuda.get_device_name(0)}")

## 4. Link Drive Storage

Create symlinks so the training script reads weights/audio from Drive and saves checkpoints to Drive.

In [None]:
import os
import shutil

drive_base = '/content/drive/MyDrive/AA_Neural'
repo_dir = '/content/dafx25_antialiasing_neural'

# Ensure we're in the repo directory
os.chdir(repo_dir)

# Create symlinks for persistent storage
symlinks = [
    ('weights', f'{drive_base}/weights'),
    ('audio_data', f'{drive_base}/audio_data'),
    ('lightning_logs', f'{drive_base}/checkpoints'),  # Redirect PL checkpoints
]

for link_name, target in symlinks:
    link_path = f'{repo_dir}/{link_name}'
    
    # Remove existing file/dir/symlink
    if os.path.islink(link_path):
        os.unlink(link_path)
    elif os.path.isdir(link_path):
        shutil.rmtree(link_path)
    elif os.path.exists(link_path):
        os.remove(link_path)
    
    # Create target directory if it doesn't exist
    os.makedirs(target, exist_ok=True)
    
    # Create symlink
    os.symlink(target, link_path)
    print(f"\u2713 Linked {link_name} -> {target}")

# Verify critical files exist
test_file = 'weights/NAM/Marshall JCM 800 2203/JCM800 2203 - P5 B5 M5 T5 MV7 G10 - AZG - 700.nam'
assert os.path.exists(test_file), f"ERROR: {test_file} not found. Check that Drive has weights/NAM/Marshall JCM 800 2203/ directory."
print(f"\u2713 Verified model file accessible: {test_file}")

test_audio = 'audio_data/val_input.wav'
assert os.path.exists(test_audio), f"ERROR: {test_audio} not found. Check that Drive has audio_data/val_input.wav file."
print(f"\u2713 Verified audio file accessible: {test_audio}")

print(f"\u2713 Checkpoints will save to {drive_base}/checkpoints")

## 5. Configure wandb

Login to Weights & Biases for experiment tracking. Get your API key from https://wandb.ai/authorize

In [None]:
import wandb

# Login to wandb
# Option 1: Interactive login (will prompt for API key)
wandb.login()

# Option 2: Programmatic login (uncomment and add your key)
# wandb.login(key='your-api-key-here')

print("\u2713 wandb configured")

## 6. Training Configuration

Set training parameters. Modify these as needed.

In [None]:
# Training configuration
CONFIG_IDX = 3          # Model config (3 = JCM800 NAM model)
MAX_EPOCHS = 100        # Number of epochs to train
USE_WANDB = True        # Enable wandb logging

print(f"Training configuration:")
print(f"  Config index: {CONFIG_IDX}")
print(f"  Max epochs: {MAX_EPOCHS}")
print(f"  wandb logging: {USE_WANDB}")

## 7. Run Training

Execute the training script. Progress will stream to this notebook.

**Expected time:** ~2-3 hours for 100 epochs on T4 GPU

**Monitoring:** 
- Live metrics visible in this notebook
- Full dashboard at wandb.ai
- Checkpoints auto-save to Drive

In [None]:
import os

# Ensure we're in the repo directory
os.chdir('/content/dafx25_antialiasing_neural')

# Build command
cmd_parts = [
    'python train.py',
    f'--config {CONFIG_IDX}',
    f'--max_epochs {MAX_EPOCHS}',
]

if USE_WANDB:
    cmd_parts.append('--wandb')
else:
    cmd_parts.append('--no-wandb')

cmd = ' '.join(cmd_parts)

print(f"Executing: {cmd}")
print("="*80)

# Run training (this will take a while)
!{cmd}

## 8. Post-Training

After training completes, checkpoints are in Google Drive.

In [None]:
# List saved checkpoints
import os
import glob

checkpoints_base = '/content/drive/MyDrive/AA_Neural/checkpoints'

# Find all checkpoint versions
versions = sorted(glob.glob(f'{checkpoints_base}/version_*'))

if versions:
    latest_version = versions[-1]
    print(f"Found {len(versions)} training run(s)")
    print(f"\nLatest run: {latest_version}")
    
    # List checkpoints in latest run
    checkpoints = glob.glob(f'{latest_version}/checkpoints/*.ckpt')
    if checkpoints:
        print(f"\nCheckpoints ({len(checkpoints)}):")
        for ckpt in sorted(checkpoints):
            size_mb = os.path.getsize(ckpt) / (1024*1024)
            print(f"  {os.path.basename(ckpt)} ({size_mb:.1f} MB)")
    
    # Check for best model export
    best_export = f'{latest_version}/best_export'
    if os.path.exists(best_export):
        print(f"\n✓ Best model exported to: {best_export}")
        exported_files = glob.glob(f'{best_export}/*')
        for f in exported_files:
            print(f"  {os.path.basename(f)}")
else:
    print("No checkpoints found. Training may not have completed.")

print(f"\nAll results saved to Google Drive: {checkpoints_base}")