# DaZZLeD Hash Center Training Notebook (ResNet + Counterfactual VAE)

**Goal:** Train the ResNet Hash Center model from `resnet.tex` with counterfactual VAE, CF‑SimCLR, DHD, PGD, and TTC checks.

**Runtime:** Set Colab to GPU before running training cells.

**Note:** If you do not have VAE weights yet, you must train them first (Step 3). If you want a quick run without VAE, set `--counterfactual-mode aug` in Step 4.


## 0. Mount Google Drive


In [None]:
from google.colab import drive
from pathlib import Path

drive.mount('/content/drive')

DRIVE_ROOT = Path("/content/drive/MyDrive/dazzled")
DATA_ROOT = DRIVE_ROOT / "data"
OUTPUT_ROOT = DRIVE_ROOT / "outputs"
OUTPUT_ROOT.mkdir(parents=True, exist_ok=True)


## 1. Setup & Installation


In [None]:
import os
if not os.path.exists('DaZZLeD'):
    !git clone https://github.com/D13ya/DaZZLeD.git
    %cd DaZZLeD/ml-core
else:
    %cd DaZZLeD/ml-core

!pip install -q -r requirements.txt


## 2. Build Manifest (Optional)

If you already have a manifest at `/content/drive/MyDrive/dazzled/manifests/train.txt`, you can skip this.


In [None]:
from pathlib import Path

DATA_ROOT = Path("/content/drive/MyDrive/dazzled/data")
MANIFEST = Path("/content/drive/MyDrive/dazzled/manifests/train.txt")
MANIFEST.parent.mkdir(parents=True, exist_ok=True)

exts = {".jpg", ".jpeg", ".png", ".bmp"}
paths = [str(p) for p in DATA_ROOT.rglob("*") if p.suffix.lower() in exts]
MANIFEST.write_text("
".join(paths))
print(f"Wrote {len(paths)} lines to {MANIFEST}")


## 3. Train Counterfactual VAE (Save Weights)

This produces the `--counterfactual-weights` file used by HashNet.


In [None]:
!python training/train_counterfactual_vae.py   --data-list /content/drive/MyDrive/dazzled/manifests/train.txt   --epochs 10   --batch-size 128   --checkpoint-dir /content/drive/MyDrive/dazzled/outputs/cf_vae   --domain-mode regex   --domain-regex "(ffhq|openimages|openimg|mobileview)"   --cache-ram


## 4. Train HashNet (ResNet + Hash Centers + CF/DHD/PGD)

This uses the VAE weights from Step 3 and writes checkpoints to Drive.


In [None]:
!python training/train_hashnet.py   --data-list /content/drive/MyDrive/dazzled/manifests/train.txt   --epochs 10   --batch-size 256   --center-mode hadamard   --center-neg-k 0   --counterfactual-mode vae   --counterfactual-weights /content/drive/MyDrive/dazzled/outputs/cf_vae/cf_vae_final.safetensors   --checkpoint-dir /content/drive/MyDrive/dazzled/outputs/hashnet   --domain-mode regex   --domain-regex "(ffhq|openimages|openimg|mobileview)"   --ttc-check   --cache-ram


## 5. List Checkpoints


In [None]:
from pathlib import Path

CKPT_DIR = Path("/content/drive/MyDrive/dazzled/outputs/hashnet")
ckpts = sorted(CKPT_DIR.glob("*.safetensors"))
print(f"Found {len(ckpts)} checkpoints")
for ckpt in ckpts:
    print(ckpt.name)


## 6. TTC Inference (Production-Style)

Run the standalone TTC inference script on a sample image.


In [None]:
from pathlib import Path

CKPT_DIR = Path("/content/drive/MyDrive/dazzled/outputs/hashnet")
IMAGE_PATH = "/content/drive/MyDrive/dazzled/data/ffhq/224/00000.jpg"  # TODO: set a real path

ckpts = sorted(CKPT_DIR.glob("*.safetensors"))
if not ckpts:
    raise FileNotFoundError(f"No checkpoints in {CKPT_DIR}")

checkpoint = str(ckpts[-1])
print(f"Using checkpoint: {checkpoint}")

!python inference.py   --image "{IMAGE_PATH}"   --checkpoint "{checkpoint}"   --backbone resnet50   --hash-dim 128   --proj-dim 512   --ttc-views 8   --stability-threshold 0.9   --hamming-threshold 10
