# RAID AI Detector (Colab)

This notebook sets up the repo, installs dependencies, and runs training or tuning in Colab.

In [None]:
# Clone the repo (skip if already in your Drive)
REPO_URL = "https://github.com/epicliem/aidetector.git"
REPO_DIR = "aidetector"

import os
if not os.path.exists(REPO_DIR):
    !git clone {REPO_URL}

%cd {REPO_DIR}

Replace `REPO_URL` above with your GitHub repo URL. If you're using Google Drive instead, mount Drive and `cd` to the repo folder.

In [None]:
# Install dependencies
!pip -q install -r requirements.txt

## TPU setup (Colab)

Use a TPU runtime (`Runtime → Change runtime type → TPU`). Install normal deps first, then run this cell to install `torch_xla` and enable XLA.

In [None]:
# If you're on TPU, install torch_xla (this will override the torch version)
# You may need to restart the runtime after this install.
!pip -q install torch==2.1.0 torch_xla==2.1.0 -f https://storage.googleapis.com/libtpu-releases/index.html

import os
os.environ["PJRT_DEVICE"] = "TPU"
# os.environ["XLA_USE_BF16"] = "1"  # optional: lower memory, faster

Optional: set `HF_TOKEN` for faster downloads (Hugging Face rate limits).

In [None]:
# os.environ["HF_TOKEN"] = "YOUR_TOKEN_HERE"
import os

Optional: create run configs.

- `colab.yaml`: smaller run to sanity check.
- `tpu.yaml`: full run on TPU.

In [None]:
import yaml

colab_cfg = {
    "dataset": {"max_rows": 20000},
    "training": {"batch_size": 4, "num_epochs": 1},
    "mining": {"enabled": False},
}

with open("config/colab.yaml", "w", encoding="utf-8") as f:
    yaml.safe_dump(colab_cfg, f, sort_keys=False)

print("Wrote config/colab.yaml")

In [None]:
import yaml

# TPU config for a bigger run
# - device: xla
# - no max_rows cap
# - keep mining enabled

big_tpu_cfg = {
    "dataset": {"max_rows": None},
    "training": {
        "device": "xla",
        "xla_distributed": True,
        "xla_cores": 8,
        "batch_size": 8,
        "num_epochs": 2,
    },
    "mining": {"enabled": True, "start_epoch": 2},
}

with open("config/tpu.yaml", "w", encoding="utf-8") as f:
    yaml.safe_dump(big_tpu_cfg, f, sort_keys=False)

print("Wrote config/tpu.yaml")

## H100 Configs

If you're using an external H100 instance, use `config/h100.yaml` and `config/h100_tuning.yaml`. In Colab, H100 isn't typically available, so these are just listed here for convenience.

## Train

In [None]:
# H100 configs (for external GPU instances)
# These are generated locally; copy the files to your instance if needed.
!ls -lah config/h100.yaml config/h100_tuning.yaml

In [None]:
# Full config (GPU/CPU)
!python scripts/train.py --config config/config.yaml

# Smaller Colab config:
# !python scripts/train.py --config config/colab.yaml

# TPU big run:
# !python scripts/train.py --config config/tpu.yaml

## Hyperparameter Tuning

Note: keep `training.xla_distributed: false` for tuning so metrics can be returned per trial.

In [None]:
# Full config (GPU/CPU)
!python scripts/tune.py --config config/config.yaml

# Smaller Colab config:
# !python scripts/tune.py --config config/colab.yaml

# TPU tuning: use a config with xla_distributed: false
# (do not use config/tpu.yaml as-is for tuning)
# !python scripts/tune.py --config config/tpu.yaml

## TensorBoard

In [None]:
%load_ext tensorboard
%tensorboard --logdir outputs