# Harmonica VCTK Training (Colab)

This notebook clones the repo, installs deps, and runs VCTK training on Colab GPU.

In [None]:
# Clone repo
!git clone https://github.com/yourusername/harmonica.git
%cd harmonica

In [None]:
# Install dependencies
!pip install -e ".[dev]"

## Download VCTK (optional)

If you have the dataset in Google Drive, mount and set the path accordingly.
Otherwise, download (large).

In [None]:
# Option A: Mount Google Drive (if dataset is there)
from google.colab import drive
drive.mount('/content/drive')
# Example path: /content/drive/MyDrive/VCTK-Corpus-0.92

In [None]:
# Option B: Download VCTK (large, may take time)
!python scripts/download_data.py --dataset vctk --output-dir ./data

## Preprocess and Train (with cache detection)

Set `DATA_DIR` below to your VCTK path. If `./cache/vctk/metadata.pt` exists,
preprocessing will be skipped.

In [None]:
# Set path to VCTK data
DATA_DIR = "./data/VCTK-Corpus-0.92"

# Optional: if you have cached tokens in Drive, copy them into ./cache/vctk
# Example:
# !mkdir -p ./cache/vctk
# !cp -r /content/drive/MyDrive/vctk_cache/* ./cache/vctk/

# Preprocess only if cache is missing
import os
if not os.path.exists("./cache/vctk/metadata.pt"):
    !python scripts/preprocess.py --dataset vctk --data-dir $DATA_DIR --output-dir ./cache/vctk --device cuda
else:
    print("Cache found, skipping preprocessing.")

In [None]:
# Train AR
!python scripts/train.py --config configs/experiment/vctk_base.yaml --model-type ar --device cuda

In [None]:
# Train NAR
!python scripts/train.py --config configs/experiment/vctk_base.yaml --model-type nar --device cuda