# MonReader

Our company develops innovative Artificial Intelligence and Computer Vision solutions that revolutionize industries. Machines that can see: We pack our solutions in small yet intelligent devices that can be easily integrated to your existing data flow. Computer vision for everyone: Our devices can recognize faces, estimate age and gender, classify clothing types and colors, identify everyday objects and detect motion. Technical consultancy: We help you identify use cases of artificial intelligence and computer vision in your industry. Artificial intelligence is the technology of today, not the future.

MonReader is a new mobile document digitization experience for the blind, for researchers and for everyone else in need for fully automatic, highly fast and high-quality document scanning in bulk. It is composed of a mobile app and all the user needs to do is flip pages and everything is handled by MonReader: it detects page flips from low-resolution camera preview and takes a high-resolution picture of the document, recognizing its corners and crops it accordingly, and it dewarps the cropped document to obtain a bird's eye view, sharpens the contrast between the text and the background and finally recognizes the text with formatting kept intact, being further corrected by MonReader's ML powered redactor.

The dataset was collected from page flipping video from smart phones and they was labelled as flipping and not flipping.
The videos were clipped as short videos and was labelled as flipping or not flipping. The extracted frames are then saved to disk in a sequential order with the following naming structure: VideoID_FrameNumber

* Goal(s):   
Predict if the page is being flipped using a single image.

In [2]:
import os
from collections import Counter

import torch
from torch.utils.data import DataLoader, WeightedRandomSampler
from torchvision import datasets, transforms

#train_flip_dir = '../data/raw/training/flip'


Here’s a guided tour of the code you’ve got in the canvas—what each chunk does and why it’s written that way. I’ll trace the three things you asked about: (1) how `ImageFolder` + `DataLoader` ingest your train/test folders, (2) how the **training** set is split into **train/20% validation** (stratified), and (3) how **VGG16** is built and trained in two phases.

---

## 0) Reproducibility & config

* `set_seed(...)` locks down Python/Torch (and NumPy if present) plus CuDNN determinism so the split and training are repeatable.
* `Config` holds hyperparameters (classes, image size, batch size, workers) and—crucially—the two folder paths:

  * `TRAIN_DIR = "/path/to/dataset/training"`
  * `TEST_DIR  = "/path/to/dataset/testing"`

Folder layout expected:

```
training/
  flip/
  notflip/
testing/
  flip/
  notflip/
```

`ImageFolder` infers the class label from subfolder names (`flip` and `notflip`), mapping them to integer ids in `class_to_idx`.

---

## 1) Transforms (ImageNet normalization is key)

Two transform pipelines:

* `train_transforms`: random, label-safe augmentation for generalization

  * `Resize(int(224*1.14))` then `RandomResizedCrop(224)`: standard ImageNet-ish policy
  * `RandomRotation(8)`, `ColorJitter(...)`: small perturbations
  * `Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])`: **must** match ImageNet or the pretrained filters won’t “see” properly

* `eval_transforms`: deterministic for validation/test

  * `Resize` → `CenterCrop` → `Normalize`

Because your label is literally about orientation (flip vs notflip), the code **avoids** horizontal/vertical flips in training augs (they’d change the class).

---

## 2) ImageFolder ingestion + stratified 80/20 split

### Build two ImageFolder views on the **same training directory**

```python
train_full_eval = ImageFolder(TRAIN_DIR, transform=eval_transforms)
train_full_aug  = ImageFolder(TRAIN_DIR, transform=train_transforms)
```

Why two? You want **the same files**, but with different transform policies:

* the **train** subset uses augmentations (`train_full_aug`)
* the **val** subset uses deterministic transforms (`train_full_eval`)

### Pull labels and make a stratified split

```python
all_targets = train_full_eval.targets
splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.20, random_state=seed)
train_idx, val_idx = next(splitter.split(range(len(all_targets)), all_targets))
```

* `ImageFolder.targets` is a list of class ids (e.g., 0 for `flip`, 1 for `notflip`).
* `StratifiedShuffleSplit` guarantees the 80/20 split preserves the **class ratio** (your dataset is ~49/51 already).

### Apply the **same index split** to both views via `Subset`

```python
train_ds = Subset(train_full_aug, train_idx)   # training with augs
val_ds   = Subset(train_full_eval, val_idx)    # validation, no randomness
```

This pattern ensures consistency: you’re selecting the same filenames for train vs val, only changing the transform policy.

### Testing dataset

```python
test_ds = ImageFolder(TEST_DIR, transform=eval_transforms)
```

Testing images get the deterministic pipeline; no augmentation.

---

## 3) DataLoaders for IO, batching, and shuffling

```python
train_loader = DataLoader(train_ds, batch_size=32, shuffle=True,  ...)
val_loader   = DataLoader(val_ds,   batch_size=32, shuffle=False, ...)
test_loader  = DataLoader(test_ds,  batch_size=32, shuffle=False, ...)
```

* Training loader shuffles each epoch (good!).
* Validation/test loaders don’t shuffle, giving stable evaluation.
* `num_workers` + `pin_memory` speed up host→GPU transfers.

---

## 4) Build VGG16 for binary classification

### Load pretrained backbone

```python
vgg = models.vgg16(weights=VGG16_Weights.IMAGENET1K_V1)
```

This pulls in the ImageNet-trained filters (the “visual grammar”).

### Replace the classifier head

The script uses a **lighter**, two-layer head (recommended for small, binary tasks):

```python
vgg.classifier = nn.Sequential(
    nn.Flatten(),
    nn.Linear(512*7*7, 512),
    nn.ReLU(inplace=True),
    nn.Dropout(0.4),
    nn.Linear(512, 2)   # flip vs notflip
)
```

* VGG16’s feature extractor outputs `512x7x7` for 224×224 inputs → flatten to 25088.
* Smaller head = fewer parameters and less overfitting than the giant 4096→4096 original.

---

## 5) Training in two phases (best practice for transfer learning)

### Phase 1: “Linear probe”

```python
for p in vgg.features.parameters():
    p.requires_grad = False
optimizer = AdamW(head_params, lr=1e-3, weight_decay=1e-4)
```

* **Freeze** all convolutional blocks → only train your new classifier head.
* Use a higher LR for the randomly initialized head.
* Train a few epochs; this teaches the head to map the already-good features to your two classes.

### Phase 2: Gradual unfreeze (fine-tune the highest-level features)

```python
for p in vgg.features[-10:].parameters():   # roughly Block 5
    p.requires_grad = True

optimizer = AdamW([
  {"params": backbone_params, "lr": 2e-5},  # tiny LR for pretrained layers
  {"params": head_params,     "lr": 1e-4},  # bigger LR for head
], weight_decay=1e-4)
```

* Unfreeze the **last conv block** (Block 5) so it can adapt from “ImageNet semantics” to “orientation semantics.”
* Use **discriminative LRs**: small for backbone, larger for the head, to avoid wrecking pretrained filters.
* Keep best validation checkpoint.

### AMP for speed

`autocast` + `GradScaler` turn on mixed precision automatically when CUDA is available—nice training speedup with stable numerics.

---

## 6) Evaluation helpers

* `run_one_epoch(...)` handles a full pass (train or eval) and returns average loss + accuracy.
* `validate(...)` just calls the same loop with `no_grad()` and eval mode.
* There’s a **test evaluation** function you run **once at the end** (after model selection and tuning are done) to get a clean, unbiased final score.

---

## TL;DR (the flow in one breath)

* Read folders with `ImageFolder` → pull labels → **stratified 80/20 split** using indices.
* Wrap those subsets in `DataLoader`s (train shuffles, val/test don’t).
* Load **ImageNet-pretrained VGG16**, swap in a small **2-class** head.
* **Phase 1:** freeze conv layers and train only the head.
* **Phase 2:** unfreeze the last conv block and fine-tune with a small LR.
* Track validation, save best checkpoint, and only then run the **test** evaluation once.

If you want, I can annotate the canvas file inline with numbered comments next to each section, or clone the same scaffold for ResNet/EfficientNet/MobileNet so all four models share transforms, loaders, and training schedules for a fair comparison.


In [1]:
# libraries to import
from __future__ import annotations
import os
import random
from dataclasses import dataclass
from typing import Tuple, List

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Subset
from torchvision import datasets, transforms, models

# For a clean, stratified split
from sklearn.model_selection import StratifiedShuffleSplit

ModuleNotFoundError: No module named 'sklearn'

In [None]:
# Reproducible function
def set_seed(seed: int = 42) -> None:
    """Set seeds for python, numpy (if used), and torch to have reproducible splits/training."""
    random.seed(seed)
    try:
        import numpy as np
        np.random.seed(seed)
    except Exception:
        pass
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    # Make CuDNN deterministic (slightly slower but reproducible)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

In [None]:
@dataclass
class Config:
    # shape final classifier to flip vs notflip
    num_classes: int = 2
    img_size: int = 224               # VGG16 default input size
    # pass to every DataLoader to set how many images per step
    batch_size: int = 32
    # set default amount of workers (background processes to load/augment data per DataLoader)
    num_workers: int = 4
    pin_memory: bool = True
    seed: int = 42 # choosing seed for reproducible results

    # file system pointers to the training and testing images, fed into torchvision.datatsets.ImageFolder
    # training root containing one subfolder per class
    TRAIN_DIR: str = "/path/to/dataset/training"
    # held out test root, not touched during training/hyperparameter tuning
    TEST_DIR: str  = "/path/to/dataset/testing"

cfg = Config()
set_seed(cfg.seed)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


In [None]:
# Define two transform pipelines 
#   - training (with light augmentation)
#   - validation/testing 
# VGG16 was pretrained on ImageNet with these stats.
# Keeping these values preserves the meaning of its filters.
# RGB mean/std used when VGG16 was trained on ImageNet
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]
"""
    # A light set of augmentations to improve generalization while preserving label semantics.
    # 1. scale short side from 224 to 256
    # 2. Randomly choooses a crop then resizes that crop to 224x224, in order to train model to be robust to framing/zoom changes
    # 3. rotates image up to 8 degrees, small label preserving perturbation
    # 4 ColorJitter for mild brightness/contrast/saturation/hue changes to help resist lightning and white balance quirks
    # from phone cameras.
    # 5. Converts a PIL image to a PyTorch tensor shaped [C,H,W]
    # 6. Final conversion and ImageNet centering 

    Net effect is a moderate, label-safe augmentation 
    """
train_transforms = transforms.Compose([
    transforms.Resize(int(cfg.img_size * 1.14)),    # 1      
    transforms.RandomResizedCrop(cfg.img_size),     # 2          
    transforms.RandomRotation(8),   # 3    
    transforms.ColorJitter(0.1, 0.1, 0.1, 0.05),    # 4
    transforms.ToTensor(),      # 5
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),      # 6 
    ])

# Validation/Test should be deterministic and comparable across epochs
eval_transforms = transforms.Compose([
    transforms.Resize(int(cfg.img_size * 1.14)),
    transforms.CenterCrop(cfg.img_size),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
])

In [None]:
# We instantiate TWO ImageFolder objects pointing to the SAME training directory:
#   - one with train_transforms / training (random augments)
#   - one with eval_transforms / validation (deterministic transforms)
# Then we apply the SAME index split (train_idx / val_idx) to both via Subset.
# This pattern keeps transform policies separate while reusing the same file list.

# create ImageFolder that reads file path under TRAIN_DIR and apply determinstic transforms (resize, center, crop, normalize)
train_full_eval = datasets.ImageFolder(cfg.TRAIN_DIR, transform=eval_transforms)
# create 2nd ImageFolder over the same directory but apply random augmentations (crop,rotation,jitter) for training
train_full_aug  = datasets.ImageFolder(cfg.TRAIN_DIR, transform=train_transforms)

# grab numeric labels for evert image to feed stratified splitter 
all_targets: List[int] = train_full_eval.targets

# make a stratified 80/20 split on all_targets indicies
# Stratified split: hold out 20% of training images as validation while preserving class ratios
split_ratio = StratifiedShuffleSplit(n_splits=1, test_size=0.20, random_state=cfg.seed)
train_idx, val_idx = next(split_ratio.split(X=range(len(all_targets)), y=all_targets))

# wrap the augmented dataset in a subset that exposes only the training indicies 
# Result is a training dataset that applies random transform on the training images. 
training_dataset = Subset(train_full_aug, train_idx)

# Do the same wrap on validation 
validation_dataset   = Subset(train_full_eval, val_idx) 

# Build the final testing dataset (never augmented)
# Testing directory is separate and already labeled by subfolders
test_ds = datasets.ImageFolder(cfg.TEST_DIR, transform=eval_transforms)

# check to see how many images ended up in each split 
print(f"Training images: {len(training_dataset)} | Validation Images: {len(validation_dataset)} | Testing Images: {len(test_ds)}")
# check to see class names and numeric mapping
print(f"Classes: {train_full_eval.classes} (class_to_idx={train_full_eval.class_to_idx})")


NameError: name 'datasets' is not defined

In [None]:
"""
Here I implement DataLoader which turns my Datasets into mini-batches, shuffles them,
and uses background worker processes to load/transform images in parallel.
Batching: Packs single (image, label) samples into [B,C,H,W] tensors so model trains in vectorized chunks
Shuffling: Randomizes sample order for train only each epoch to stablize SGD and reduce bias from data order
Data plumbing, feeding the right split (train/val/test) to the model with the right behavior (augment+shuffle for train; deterministic for eval)
"""
# DataLoader handles batching, shuffling (for training), and parallel disk I/O.
train_loader = DataLoader(
    training_dataset,
    batch_size=cfg.batch_size,
    shuffle=True,                 # shuffle only for training
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)

val_loader = DataLoader(
    validation_dataset,
    batch_size=cfg.batch_size,
    shuffle=False,      # False for reproducible and comparable metrics across runs
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)

test_loader = DataLoader(
    test_ds,
    batch_size=cfg.batch_size,
    shuffle=False,
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)

In [None]:
# Build VGG16 backbone and adapt it for 2-class output
# Simple, stacked 3x3 conv blocks → predictable behavior & easy fine-tuning

# Load ImageNet-pretrained VGG16, transfer learning
vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)

# Option for binary tasks: Replace the whole classifier with a lighter head to reduce parameters and improve generalization.
# VGG16 feature extractor ends with 512 feature maps at spatial size 7x7 (for 224x224 inputs).
# Flattened dimension = 512 * 7 * 7 = 25088.
# Original head 25088-4096-4096-1000 with ReLUs & Dropout, high overfitting risk on this small dataset
# choosing new head: 25088-512-2 with ReLU & Dropout, with binary task and limited data I dont need the original 120M classifier parameters.
# Backbone already encodes rich features, a compact head is enough to separate two classes.

vgg.classifier = nn.Sequential(
    nn.Flatten(),
    nn.Linear(512 * 7 * 7, 512),
    nn.ReLU(inplace=True),
    nn.Dropout(0.4),
    nn.Linear(512, cfg.num_classes)
)
# moves weights to GPU if available else CPU
vgg = vgg.to(device)

# Phase 1: Linear Probe
# Freeze the convolutional blocks and train only the new classifier head to see quickly if this head
# can separate classes with generic ImageNet features. 
# Freeze the backbone by telling autograd not to store gradients for any parameters inside vgg.features
for parameter in vgg.features.parameters():
    parameter.requires_grad = False

# Optimizer & loss: use a higher LR for the head since it's randomly initialized
# As backbone's requires_grad=False, filter out non frozen ones and keep trainiable params (classifier head)
head_params = [param for param in vgg.parameters() if param.requires_grad]

# only head's parameters are optimized with AdamW, with higher LR as head is randomly initialized and needs to move quickly to good region
# Later during fine tuning, use param groups to give backbone tiny LR and larger LR for head
optimizer = torch.optim.AdamW(head_params, lr=1e-3, weight_decay=1e-4)
# multi class classification loss
criterion = nn.CrossEntropyLoss()

""" 
Code walkthrough during training:
1. Forward: images > frozen backbone > features > trainable head > logits.
2. Loss: criterion(logits, targets) computes scalar loss
3.	Backward: only head params accumulate gradients (backbone params don’t).
4.	Step: optimizer.step() updates the head only.
5.	Result: Learn a linear/nonlinear separator on top of fixed features—fast and low-variance.
"""

# Mixed-precision speeds up training on GPUs without numeric instability.
use_amp = torch.cuda.is_available()
scaler = torch.cuda.amp.GradScaler(enabled=use_amp)


In [None]:
# Create two helper functions for training and validating. 

# function for training to do a full pass over a DataLoader.
# Runs model forward, compute loss, backdrop + optimizer step, and aggregate avg loss and accuracy
def run_one_epoch(model: nn.Module, loader: DataLoader, train: bool=True) -> Tuple[float, float]:
    """
    Process one full pass over a dataset split (training or validation) and returns metrics. 
    1. Sets the mode, toggle between training / evaluation
    2. Iterates batches, pulls image/label from DataLoader
    3. Moves to device
    4. Forward pass (optionally AMP under autocast), computes logits and loss
    5. Optimizes if and only if training
        - zeros gradients, backprops with loss scaling, steps the optimizer via the scaler
    6. Keeps running sum of losses and correct predictions, then computes avg loss and accuracy over entire epoch.
    Returns (avg_loss, accuracy).
    """
    # Toggle model's mode, if True enable training mode: Dropout active, BatchNorm uses batch stats.
    # If False eval mode: Dropout off, BatchNorm uses running stats.
    model.train(train)

    # Initialize counters for dataset metrics
    total, correct_pred, loss_sum = 0,0,0.0
    # Iterate over the DataLoader, each iteration yields a mini batch
    for image, label in loader:
        image, label = image.to(device), label.to(device)   # move tensors to selected device

        # if CUDA avaiable
        with torch.cuda.amp.autocast(enabled=use_amp):
            logits = model(image)   # forward pass
            loss = criterion(logits, label) # compute CrossEntropyLoss 
        # Only do optimizer during training
        if train:
            # clear old gradients
            optimizer.zero_grad(set_to_none=True)
            # backward pass with loss scaling (AMP)
            scaler.scale(loss).backward() 
            scaler.step(optimizer)
            scaler.update() # adjust scaling factor for the next iteration
        # accumlate sum of losses over examples
        loss_sum += loss.item() * label.size(0)
        predictions = logits.argmax(dim=1) # argmax over class dimension to get prediced class indices
        # count how many predictions matched ground truth in this batch
        correct_pred += (predictions == label).sum().item()
        total += label.size(0)
    # return avg loss and accuracy
    return loss_sum / total, correct_pred / total

# Switch model to eval mode, disable gradient tracking 
def validate(model: nn.Module) -> Tuple[float, float]:
    model.eval() # put model in evaluation mode
    # Disable gradient tracking
    with torch.no_grad():
        # run run_one_epoch func using validation loader
        return run_one_epoch(model, val_loader, train=False)

In [None]:
# Phase 1 Linear-probe Warmup
# Keep VGG backbone frozen and train only new classifer head after a few epochs. 
# Goal: quickly learn a decision boundary on top of the generic ImageNet features, check that learning
# is happening and get a stable point before unfreezing conv layers. 
""" 
Reasons: 
- Training only the small head first avoids blasting the pretrained conv filters with large random gradients.
- Fewer trainable parameters leads to a quick convergence signal. 
- Diagnostics: If the head can't reach decent validation accuracy with the backbone frozen, then the issue is likely transforms
/labels/splits not fine-tuning. 
"""
EPOCHS_HEAD = 5 # short warmp to train new classifier head
for epoch in range(EPOCHS_HEAD):
    training_loss, training_accuracy = run_one_epoch(vgg, train_loader, train=True)
    validation_loss, validation_accuracy = validate(vgg)
    print(f"[Head] Epoch {epoch+1:02d} | Training Accuracy: {training_acccuracy:.4f} | Validation Accuracy: {validation_accuracy:.4f} ")

In [None]:
# Phase 2: unfreeze last conv block(s) and fine tune. 
""" 
After linear probe, start to unfreeze small part of the ImageNet backbone and fine tune gently with small LR. 
Adapt high-level ImageNet features to my (flip vs notflip) specific task without wrecking the useful low-level
filters the model already learned. 
"""

# Turn training on roughly for the last VGG conv block (5). The deepest conv layers encode task-specific abstractions 
# These layers benefit the most from fine-tuning, earliers left frozen to avoid overfitting
for layer in vgg.features[-10:].parameters():
    layer.requires_grad = True

# Give different learning rates, a smaller LR for the backbone and a higher LR for the head classifier
backbone_parameters = [layer for layer in vgg.features.parameters() if layer.requires_grad]
head_parameters = [layer for layer in vgg.classifier.parameters() if layer.requires_grad]

# Rebuild the optimizer with the two different LRs
optimizer = torch.optim.AdamW([
    {'params': backbone_parameters, 'lr':2e-5}, # small LR for the pretrained conv
    {'params': head_parameters, 'lr': 1e-4},
], weight_decay=1e-4)

# Fine tune for 10 epochs, slighly longer than head-only warmup
epochs_fine_tuning = 10
# Intialize varianles to track best validation accuracy and the corresponding model checkpoint
best_validation_accuracy = 0.0
best_checkpoint = None

# standard train and validiation loop with partially unfrozen model
for epoch in range(epochs_fine_tuning): 
    training_loss, training_accuracy = run_one_epoch(vgg, train_loader, train=True)
    validation_loss, validation_accuracy = validate(vgg)
    print(f"[Fine Tuning] Epoch {epoch+1:02d} | Training Accuracy {training_accuracy:.4f} | Validation Accuracy {validation_accuracy:.4f}")

    # Save the best model by validation accuracy
    if validation_accuracy > best_validation_accuracy:
        best_validation_accuracy = validation_accuracy
        best_checkpoint = {k: v for k, v in vgg.state_dict().items()}



# Restore best val checkpoint (if any)
if best_state is not None:
    vgg.load_state_dict(best_state)
    print(f"Restored best model with Val Acc = {best_val_acc:.4f}")



In [None]:
"""
VGG16 Transfer Learning Pipeline (PyTorch)
-----------------------------------------
This script shows how to:
  1) Load folder-structured image data using `ImageFolder` + `DataLoader`.
  2) Split the training set into Train / Validation (20% val) **stratified** by class.
  3) Build a VGG16 model with ImageNet weights and adapt it for a binary task.
  4) (Optional) Train in two phases: linear probe (freeze backbone) and fine-tune.

Folder structure expected (example):
  dataset_root/
    training/
      flip/
        img001.jpg
        ...
      notflip/
        imgXYZ.jpg
        ...
    testing/
      flip/
        ...
      notflip/
        ...

Replace the `TRAIN_DIR` and `TEST_DIR` paths below with your actual paths.
"""

from __future__ import annotations
import os
import random
from dataclasses import dataclass
from typing import Tuple, List

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Subset
from torchvision import datasets, transforms, models

# For a clean, stratified split
from sklearn.model_selection import StratifiedShuffleSplit


# ---------------------------
# 0) Reproducibility helpers
# ---------------------------

def set_seed(seed: int = 42) -> None:
    """Set seeds for python, numpy (if used), and torch to have reproducible splits/training."""
    random.seed(seed)
    try:
        import numpy as np
        np.random.seed(seed)
    except Exception:
        pass
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    # Make CuDNN deterministic (slightly slower but reproducible)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False


# ---------------------------
# 1) Config
# ---------------------------
@dataclass
class Config:
    num_classes: int = 2               # flip vs notflip
    img_size: int = 224               # VGG16 default input size
    batch_size: int = 32
    num_workers: int = 4
    pin_memory: bool = True
    seed: int = 42

    # file system pointers to the training and testing images, fed into torchvision.datatsets.ImageFolder
    # training root containing one subfolder per class
    TRAIN_DIR: str = "/path/to/dataset/training"
    # held out test root, not touched during training/hyperparameter tuning
    TEST_DIR: str  = "/path/to/dataset/testing"

cfg = Config()
set_seed(cfg.seed)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")


# ---------------------------------------------
# 2) Define transforms (ImageNet normalization)
# ---------------------------------------------
# IMPORTANT: VGG16 was pretrained on ImageNet with these stats.
# Keeping them preserves the meaning of its filters.
IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]

train_transforms = transforms.Compose([
    # A light set of augmentations to improve generalization while preserving label semantics.
    transforms.Resize(int(cfg.img_size * 1.14)),          # resize short side similarly to torchvision presets
    transforms.RandomResizedCrop(cfg.img_size),            # random crop to target size
    transforms.RandomRotation(8),                          # SMALL rotations; avoid flips since your label encodes orientation
    transforms.ColorJitter(0.1, 0.1, 0.1, 0.05),           # mild color/brightness changes
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
])

# Validation/Test should be deterministic and comparable across epochs
eval_transforms = transforms.Compose([
    transforms.Resize(int(cfg.img_size * 1.14)),
    transforms.CenterCrop(cfg.img_size),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
])


# --------------------------------------------------------------------
# 3) Build datasets with ImageFolder and make a stratified train/val split
# --------------------------------------------------------------------
# We instantiate TWO ImageFolder objects pointing to the SAME training directory:
#   - one with train_transforms
#   - one with eval_transforms
# Then we apply the SAME index split (train_idx / val_idx) to both via Subset.
# This pattern keeps transform policies separate while reusing the same file list.

train_full_eval = datasets.ImageFolder(cfg.TRAIN_DIR, transform=eval_transforms)
train_full_aug  = datasets.ImageFolder(cfg.TRAIN_DIR, transform=train_transforms)

# The targets are accessible on ImageFolder via `.targets` (a list of int class labels per image)
all_targets: List[int] = train_full_eval.targets

# Stratified split: hold out 20% of training images as validation while preserving class ratios
splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.20, random_state=cfg.seed)
train_idx, val_idx = next(splitter.split(X=range(len(all_targets)), y=all_targets))

# Subset the two datasets with identical indices
train_ds = Subset(train_full_aug, train_idx)  # augmentations enabled
val_ds   = Subset(train_full_eval, val_idx)    # eval transforms (no random aug)

# Build the final testing dataset (never augmented)
# Testing directory is separate and already labeled by subfolders
test_ds = datasets.ImageFolder(cfg.TEST_DIR, transform=eval_transforms)

print(f"Train images: {len(train_ds)} | Val images: {len(val_ds)} | Test images: {len(test_ds)}")
print(f"Classes: {train_full_eval.classes} (class_to_idx={train_full_eval.class_to_idx})")


# -------------------------------------
# 4) Create DataLoaders for each split
# -------------------------------------
# DataLoader handles batching, shuffling (for training), and parallel disk I/O.
train_loader = DataLoader(
    train_ds,
    batch_size=cfg.batch_size,
    shuffle=True,                 # shuffle only for training
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)

val_loader = DataLoader(
    val_ds,
    batch_size=cfg.batch_size,
    shuffle=False,
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)

test_loader = DataLoader(
    test_ds,
    batch_size=cfg.batch_size,
    shuffle=False,
    num_workers=cfg.num_workers,
    pin_memory=cfg.pin_memory,
    persistent_workers=True
)


# ------------------------------------------------------
# 5) Build VGG16 backbone and adapt it for 2-class output
# ------------------------------------------------------
# Why VGG16?
# - Simple, stacked 3x3 conv blocks → predictable behavior & easy fine-tuning
# - Strong ImageNet-pretrained features for edges/textures/parts

# Load ImageNet-pretrained VGG16
vgg = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)

# Option A: Keep the original (big) classifier and just replace the last layer.
#   Pros: quick setup, strong capacity. Cons: very large parameter count (can overfit small datasets).
# vgg.classifier[-1] = nn.Linear(vgg.classifier[-1].in_features, cfg.num_classes)

# Option B (RECOMMENDED for binary tasks): Replace the whole classifier with a lighter head
#   to reduce parameters and improve generalization.
# VGG16 feature extractor ends with 512 feature maps at spatial size 7x7 (for 224x224 inputs).
# Flattened dimension = 512 * 7 * 7 = 25088.

vgg.classifier = nn.Sequential(
    nn.Flatten(),
    nn.Linear(512 * 7 * 7, 512),
    nn.ReLU(inplace=True),
    nn.Dropout(0.4),
    nn.Linear(512, cfg.num_classes)
)

vgg = vgg.to(device)

# Phase 1: "Linear probe" — freeze convolutional blocks and train only the new classifier head.
for p in vgg.features.parameters():
    p.requires_grad = False

# Optimizer & loss: use a higher LR for the head since it's randomly initialized
head_params = [p for p in vgg.parameters() if p.requires_grad]
optimizer = torch.optim.AdamW(head_params, lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

# (Optional) Mixed-precision speeds up training on GPUs without numeric instability.
use_amp = torch.cuda.is_available()
scaler = torch.cuda.amp.GradScaler(enabled=use_amp)


# ---------------------------------------
# 6) Training & evaluation helper routines
# ---------------------------------------

def run_one_epoch(model: nn.Module, loader: DataLoader, train: bool = True) -> Tuple[float, float]:
    """Run a single epoch over a DataLoader.
    Returns (avg_loss, accuracy).
    """
    model.train(train)
    total, correct, loss_sum = 0, 0, 0.0

    for images, targets in loader:
        images, targets = images.to(device), targets.to(device)

        with torch.cuda.amp.autocast(enabled=use_amp):
            logits = model(images)
            loss = criterion(logits, targets)

        if train:
            optimizer.zero_grad(set_to_none=True)
            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

        # Stats
        loss_sum += loss.item() * targets.size(0)
        preds = logits.argmax(dim=1)
        correct += (preds == targets).sum().item()
        total += targets.size(0)

    return loss_sum / total, correct / total


def validate(model: nn.Module) -> Tuple[float, float]:
    model.eval()
    with torch.no_grad():
        return run_one_epoch(model, val_loader, train=False)


# -------------------------------------
# 7) Phase 1 training: head only (few epochs)
# -------------------------------------
EPOCHS_HEAD = 5  # short warmup to train the new classifier head
for epoch in range(EPOCHS_HEAD):
    tr_loss, tr_acc = run_one_epoch(vgg, train_loader, train=True)
    va_loss, va_acc = validate(vgg)
    print(f"[Head] Epoch {epoch+1:02d} | Train Acc {tr_acc:.4f} | Val Acc {va_acc:.4f}")


# --------------------------------------------------------
# 8) Phase 2: unfreeze last conv block(s) and fine-tune
# --------------------------------------------------------
# Strategy: unfreeze the deepest block (Block 5) first. Use a **smaller** LR so we gently
# adapt high-level features from ImageNet to your specific task (flip vs notflip).

# VGG16 features layout (for reference):
# Blocks are contiguous in vgg.features; last ~10 layers correspond roughly to Block 5.
for p in vgg.features[-10:].parameters():
    p.requires_grad = True

# Use discriminative learning rates: smaller for backbone, higher for head
backbone_params = [p for p in vgg.features.parameters() if p.requires_grad]
head_params     = [p for p in vgg.classifier.parameters() if p.requires_grad]

optimizer = torch.optim.AdamW([
    {"params": backbone_params, "lr": 2e-5},   # tiny LR for pretrained conv layers
    {"params": head_params,     "lr": 1e-4},   # slightly larger LR for classifier
], weight_decay=1e-4)

EPOCHS_FT = 10
best_val_acc = 0.0
best_state = None

for epoch in range(EPOCHS_FT):
    tr_loss, tr_acc = run_one_epoch(vgg, train_loader, train=True)
    va_loss, va_acc = validate(vgg)
    print(f"[FT ] Epoch {epoch+1:02d} | Train Acc {tr_acc:.4f} | Val Acc {va_acc:.4f}")

    # Save the best model by validation accuracy
    if va_acc > best_val_acc:
        best_val_acc = va_acc
        best_state = {k: v for k, v in vgg.state_dict().items()}

# Restore best val checkpoint (if any)
if best_state is not None:
    vgg.load_state_dict(best_state)
    print(f"Restored best model with Val Acc = {best_val_acc:.4f}")


# ---------------------------------------
# 9) Final evaluation on the held-out test set
# ---------------------------------------
# NOTE: Only run this once you've finalized hyperparameters using the validation set.
# This ensures the test performance is an unbiased estimate.

def evaluate_on_test(model: nn.Module) -> Tuple[float, float]:
    model.eval()
    with torch.no_grad():
        total, correct, loss_sum = 0, 0, 0.0
        for images, targets in test_loader:
            images, targets = images.to(device), targets.to(device)
            logits = model(images)
            loss = criterion(logits, targets)
            loss_sum += loss.item() * targets.size(0)
            preds = logits.argmax(dim=1)
            correct += (preds == targets).sum().item()
            total += targets.size(0)
        return loss_sum / total, correct / total

# Example (comment out if you don't want to evaluate yet):
# test_loss, test_acc = evaluate_on_test(vgg)
# print(f"[TEST] Loss {test_loss:.4f} | Acc {test_acc:.4f}")


# ---------------------------------------
# 10) Tips and gotchas
# ---------------------------------------
# - Avoid RandomHorizontalFlip/VerticalFlip unless you also flip the LABEL
#   (since your target is literally about flip vs notflip). That complicates pipelines.
# - Monitor per-class metrics (precision/recall/F1) to ensure neither class lags.
# - If you see overfitting:
#     * Increase dropout in the classifier head
#     * Add mild augmentations (ColorJitter, RandomPerspective)
#     * Increase weight decay, or reduce LR
# - For speed: use AMP (already enabled), increase num_workers if your disk is fast.
# - For fairness across backbones (ResNet/EfficientNet/MobileNet): reuse the SAME transforms and schedules.

if __name__ == "__main__":
    print("Pipeline built. Uncomment training calls to run end-to-end.")
