# Harmonica Training on Google Colab

This notebook sets up and trains the Harmonica voice cloning system on Colab.

**Requirements:**
- Google Colab with GPU runtime (T4 or better)
- Google Drive for checkpoint storage

## 1. Setup

In [None]:
# Check GPU
!nvidia-smi

In [None]:
# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone repository (or copy from Drive)
!git clone https://github.com/yourusername/harmonica.git
%cd harmonica

In [None]:
# Install dependencies
!pip install -e ".[dev]" -q

In [None]:
# Verify installation
import torch
from harmonica.utils.device import device_info

print(f"PyTorch version: {torch.__version__}")
print(f"Device info: {device_info()}")

## 2. Download and Preprocess Data

In [None]:
# Download LJSpeech dataset
!python scripts/download_data.py --dataset ljspeech --output-dir ./data

In [None]:
# Preprocess (encode audio to codec tokens)
!python scripts/preprocess.py --dataset ljspeech --data-dir ./data/LJSpeech-1.1 --output-dir ./cache/ljspeech

## 3. Training

In [None]:
# Set up checkpoint directory on Google Drive for persistence
import os
os.makedirs('/content/drive/MyDrive/harmonica_checkpoints', exist_ok=True)
!ln -sf /content/drive/MyDrive/harmonica_checkpoints ./experiments/checkpoints

In [None]:
# Train AR model
!python scripts/train.py \
    --config configs/experiment/baseline.yaml \
    --model-type ar \
    --device cuda

In [None]:
# Resume training after disconnect
!python scripts/train.py \
    --config configs/experiment/baseline.yaml \
    --model-type ar \
    --device cuda \
    --resume

## 4. Generate Samples

In [None]:
# Synthesize speech
!python scripts/synthesize.py \
    --text "Hello, this is a test of the voice cloning system." \
    --ar-checkpoint experiments/checkpoints/checkpoint_best.pt \
    --output test_output.wav

In [None]:
# Play audio
from IPython.display import Audio
Audio('test_output.wav')

## 5. Evaluation

In [None]:
# Run evaluation on test sentences
!python scripts/evaluate.py \
    --ar-checkpoint experiments/checkpoints/checkpoint_best.pt \
    --output-dir ./eval/output

In [None]:
# View results
import json
with open('eval/output/results.json') as f:
    results = json.load(f)
print(json.dumps(results['statistics'], indent=2))

## 6. Copy Checkpoints to Drive

In [None]:
# Checkpoints are automatically saved to Drive via symlink
# List checkpoints
!ls -la /content/drive/MyDrive/harmonica_checkpoints/