# Colab Fine-Tuning (LoRA)

This notebook runs a small LoRA fine-tune with `train_lora_hunyuanvideo.py` using preprocessed data and frames.
It expects:
- Model checkpoint in `/content/drive/My Drive/HunyuanVideo-diffusers` (or a subfolder with `model_index.json`).
- Dataset manifest at `/content/drive/My Drive/data/dataset.jsonl` (from the preprocessing/frames notebook).

## Mount Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Paths

In [None]:
from pathlib import Path
MOUNT = Path('/content/drive') / 'My Drive'
MODEL_DIR = MOUNT / 'HunyuanVideo-diffusers'
DATASET_JSONL = MOUNT / 'data' / 'dataset.jsonl'
OUTPUT_DIR = MOUNT / 'outputs' / 'lora'
print('Model dir:', MODEL_DIR)
print('Dataset:', DATASET_JSONL)
print('Output dir:', OUTPUT_DIR)
print('Repo root (current dir):', Path('.').resolve())

## Install dependencies
Installs requirements from this repo; you can also pin specific versions if needed.

In [None]:
%%bash
set -euo pipefail
python -V
pip install -U pip setuptools wheel >/dev/null 2>&1 || true
# Install minimal deps if requirements.txt not present
if [ -f requirements.txt ]; then
  pip install -r requirements.txt
else
  pip install 'diffusers>=0.30' 'transformers>=4.43' 'accelerate>=0.30' safetensors Pillow
fi


## Sanity checks
Verify model folder and dataset manifest exist; show help for the training script.

In [None]:
from pathlib import Path
assert Path('train_lora_hunyuanvideo.py').exists(), 'Run this notebook from the repo root.'
print('Model folder exists:', MODEL_DIR.exists())
print('Dataset manifest exists:', Path(DATASET_JSONL).exists())
!python train_lora_hunyuanvideo.py --help | sed -n '1,60p'

## Train (LoRA)
Small example settings suited for Colab or limited resources. Adjust as needed.

In [None]:
import subprocess
from pathlib import Path
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
args = [
  'python','train_lora_hunyuanvideo.py',
  '--model_id', str(MODEL_DIR),
  '--dataset', str(DATASET_JSONL),
  '--output_dir', str(OUTPUT_DIR),
  '--resolution','384',
  '--num_frames','8',
  '--rank','4',
  '--alpha','8',
  '--lr','1e-4',
  '--batch_size','1',
  '--max_steps','200',
  '--gradient_accumulation','4',
]
print('Running:', ' '.join(args))
subprocess.run(args, check=True)

## Artifacts
List saved LoRA weights and show their path.

In [None]:
from pathlib import Path
for p in Path(OUTPUT_DIR).glob('**/lora_weights.pt'):
    print('Saved:', p)
print('Done.')
