# Wide & Deep IRT — Colab Training

This notebook runs the PyTorch Lightning trainer defined in `src/wd_irt/train.py` on Google Colab. Before starting, upload or clone the repository (including the canonical EDM Cup artifacts under `data/`) into `/content/deepkt-irt` or adjust `REPO_DIR` below to match your path.

In [None]:
# Optional: verify GPU availability
!nvidia-smi || echo "GPU not available—enable it via Runtime > Change runtime type."

In [None]:
from pathlib import Path

# Path where you cloned/unzipped the repo contents.
REPO_DIR = Path("/content/deepkt-irt")
assert REPO_DIR.exists(), "Upload or clone the repo to /content/deepkt-irt (or update REPO_DIR)."

%cd {REPO_DIR}

In [None]:
# Install Python dependencies. requirements.txt already contains torch, lightning, pandas, etc.
!pip install -r requirements.txt

## Launch Training

The config file references canonical data produced by the preprocessing pipeline:

- `data/interim/edm_cup_2023_42_events.parquet`
- `data/splits/edm_cup_2023_42.json`
- Raw CSVs under `data/raw/edm_cup_2023/` (assignment details, relationships, unit test scores, problem details)

Mount Google Drive (`from google.colab import drive; drive.mount('/content/drive')`) if those files live there, or upload them directly to `/content/deepkt-irt/data/` before executing the training cell.

In [None]:
# Run the Wide & Deep IRT trainer.
# Adjust --config if you create variants or point to different dataset splits.
!python -m src.wd_irt.train --config configs/wd_irt_edm.yaml

## After Training

- Checkpoints and metrics are written under `reports/checkpoints/wd_irt_edm/` and `reports/metrics/` (configurable via `configs/wd_irt_edm.yaml`).
- Use additional cells to run evaluation/export scripts once they’re implemented (e.g., exporting `item_params.parquet`).
- Download artifacts or copy them back to Drive for later analysis.