This notebook deals with the classification of 2024 and 2025 images of personalities using ResNeXt-50 (32x4d) with the insights of images in the pre-covid era.

1) Mount Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


2) Let us inspect the pre-covid dataset which is the train dataset.

The inspections of the total number of images, channels and sizes of the images are displayed.

In [None]:
import os
from collections import Counter
from PIL import Image

base_path = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Training-Validation Set/Sampled"
IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tif', '.tiff'}

total_images = 0
folder_stats = {}

for folder_name in sorted(os.listdir(base_path)):
    folder_path = os.path.join(base_path, folder_name)
    if not os.path.isdir(folder_path):
        continue

    dims_counter = Counter()
    channel_counter = Counter()
    format_counter = Counter()
    count = 0

    for fname in os.listdir(folder_path):
        _, ext = os.path.splitext(fname.lower())
        if ext not in IMAGE_EXTS:
            continue
        img_path = os.path.join(folder_path, fname)
        try:
            with Image.open(img_path) as img:
                width, height = img.size
                mode = img.mode
                fmt = img.format  # Format like JPEG, PNG, etc.

            dims_counter[(width, height)] += 1
            channel_counter[mode] += 1
            format_counter[fmt] += 1
            count += 1
        except Exception as e:
            print(f"Error reading {img_path}: {e}")
            continue

    folder_stats[folder_name] = {
        "num_images": count,
        "dims_counter": dims_counter,
        "channel_counter": channel_counter,
        "format_counter": format_counter
    }
    total_images += count

# Display results
for folder_name, stats in folder_stats.items():
    print(f"Folder: {folder_name}")
    print(f"  Number of images: {stats['num_images']}")

    if stats["dims_counter"]:
        common_dims, common_dims_count = stats["dims_counter"].most_common(1)[0]
        print(f"  Most common size: {common_dims} (n={common_dims_count})")
        print(f"  Unique sizes: {len(stats['dims_counter'])}")
    else:
        print("  No valid image sizes found.")

    print(f"  Channel modes: {dict(stats['channel_counter'])}")
    print(f"  Image formats: {dict(stats['format_counter'])}")
    print("-" * 40)

print(f"Total number of images across all folders: {total_images}")

Folder: Barack Obama
  Number of images: 300
  Most common size: (224, 224) (n=300)
  Unique sizes: 1
  Channel modes: {'RGB': 300}
  Image formats: {'PNG': 300}
----------------------------------------
Folder: Bill Gates
  Number of images: 300
  Most common size: (224, 224) (n=300)
  Unique sizes: 1
  Channel modes: {'RGB': 300}
  Image formats: {'PNG': 300}
----------------------------------------
Folder: Donald Trump
  Number of images: 300
  Most common size: (224, 224) (n=300)
  Unique sizes: 1
  Channel modes: {'RGB': 300}
  Image formats: {'PNG': 300}
----------------------------------------
Folder: Elon Musk
  Number of images: 300
  Most common size: (224, 224) (n=300)
  Unique sizes: 1
  Channel modes: {'RGB': 300}
  Image formats: {'PNG': 300}
----------------------------------------
Folder: Jeff Bezos
  Number of images: 300
  Most common size: (224, 224) (n=300)
  Unique sizes: 1
  Channel modes: {'RGB': 300}
  Image formats: {'PNG': 300}
---------------------------------

3. Let us inspect the 2024, and 2025 dataset.

The inspections of the total number of images, channels and sizes of the images are displayed.

In [None]:
import os
from collections import Counter
from PIL import Image

folder_path = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Testing Set/Unlabelled Test Data/Test Data"

# Common image extensions to consider
IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.bmp', '.gif', '.tif', '.tiff', '.webp'}

dims_counter   = Counter()
mode_counter   = Counter()  # channel modes (RGB/L/RGBA/CMYK/…)
format_counter = Counter()  # file formats (JPEG/PNG/…)
ext_counter    = Counter()  # file extensions seen on disk
count = 0

for fname in os.listdir(folder_path):
    fpath = os.path.join(folder_path, fname)
    if not os.path.isfile(fpath):
        continue
    _, ext = os.path.splitext(fname.lower())
    if ext not in IMAGE_EXTS:
        continue

    try:
        with Image.open(fpath) as img:
            w, h   = img.size         # width, height
            mode   = img.mode         # RGB, L, RGBA, etc.
            fmt    = img.format       # JPEG, PNG, etc. (from header)
        dims_counter[(w, h)] += 1
        mode_counter[mode]   += 1
        format_counter[fmt]  += 1
        ext_counter[ext]     += 1
        count += 1
    except Exception as e:
        print(f"Error reading {fpath}: {e}")

# ---- Display ----
print(f"Folder: {folder_path}")
print(f"  Number of images: {count}")

if dims_counter:
    (common_dims, common_n) = dims_counter.most_common(1)[0]
    print(f"  Most common size: {common_dims} (n={common_n})")
    print(f"  Unique sizes: {len(dims_counter)}")
else:
    print("  No valid image sizes found.")

print(f"  Channel modes: {dict(mode_counter)}")      # e.g., {'RGB': 238, 'L': 2}
print(f"  Image formats: {dict(format_counter)}")    # e.g., {'JPEG': 200, 'PNG': 40}
print(f"  File extensions: {dict(ext_counter)}")     # e.g., {'.jpg': 200, '.png': 40}

print(f"\nTotal number of images across this folder: {count}")


Folder: /content/drive/MyDrive/Applied Machine Learning/Preprocessing/Testing Set/Unlabelled Test Data/Test Data
  Number of images: 240
  Most common size: (224, 224) (n=240)
  Unique sizes: 1
  Channel modes: {'RGB': 240}
  Image formats: {'PNG': 240}
  File extensions: {'.png': 240}

Total number of images across this folder: 240


4. We compute the mean and standard deviation from the training set and use them to normalize both training and test data. We then report statistics at three stages: before normalization, after normalization on a single batch, and after normalization over the full datasets.

In [None]:
import os, glob
from typing import List, Tuple
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets, transforms
from PIL import Image
from tqdm import tqdm

# ==== PATHS ====
train_dir = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Training-Validation Set/Sampled"
test_dir  = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Testing Set/Unlabelled Test Data/Test Data"  # single folder with images

# ==== BASIC TRANSFORMS ====
to_tensor = transforms.ToTensor()  # leaves values in [0,1] float

# ==== CUSTOM DATASET FOR UNLABELLED TEST IMAGES ====
class UnlabeledImageFolder(Dataset):
    def __init__(self, root: str, transform=None, exts=(".jpg",".jpeg",".png",".bmp",".tif",".tiff",".webp")):
        self.root = root
        self.transform = transform
        self.files: List[str] = []
        for e in exts:
            self.files += glob.glob(os.path.join(root, f"*{e}"))
        self.files.sort()
        if len(self.files) == 0:
            raise FileNotFoundError(f"No image files found in {root}")
    def __len__(self): return len(self.files)
    def __getitem__(self, idx: int):
        path = self.files[idx]
        img = Image.open(path).convert("RGB")  # force 3 channels
        if self.transform: img = self.transform(img)
        return img, -1

# ==== 1) PRE-NORMALIZATION STATS FROM TRAIN ONLY ====
train_ds_raw = datasets.ImageFolder(train_dir, transform=to_tensor)
train_loader_raw = DataLoader(train_ds_raw, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)

channel_sum = torch.zeros(3)
channel_sq  = torch.zeros(3)
pixels = 0

for x, _ in tqdm(train_loader_raw, desc="Calculating TRAIN pre-norm stats"):
    bs, c, h, w = x.shape
    pixels += bs*h*w
    channel_sum += x.sum(dim=[0,2,3])
    channel_sq  += (x**2).sum(dim=[0,2,3])

train_mean = channel_sum / pixels
train_std  = (channel_sq / pixels - train_mean**2).sqrt()

print("Pre-normalization TRAIN mean:", train_mean.tolist())
print("Pre-normalization TRAIN std: ", train_std.tolist())

# ==== 2) DEFINE NORMALIZE USING TRAIN STATS ====
normalize = transforms.Normalize(mean=train_mean.tolist(), std=train_std.tolist())

# ==== 3) NORMALIZED DATASETS ====
train_ds = datasets.ImageFolder(train_dir, transform=transforms.Compose([to_tensor, normalize]))
test_ds  = UnlabeledImageFolder(test_dir, transform=transforms.Compose([to_tensor, normalize]))

train_loader = DataLoader(train_ds, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)

# ---- one-batch post-norm stats (TRAIN) ----
one_batch = next(iter(train_loader))[0]  # images only
print("One-batch normalized TRAIN mean:", one_batch.mean(dim=[0,2,3]).tolist())
print("One-batch normalized TRAIN std: ", one_batch.std(dim=[0,2,3]).tolist())

# ---- helper for full-dataset post-norm stats ----
def full_stats(loader, desc: str) -> Tuple[torch.Tensor, torch.Tensor]:
    ch_sum = torch.zeros(3)
    ch_sq  = torch.zeros(3)
    pix = 0
    for x, _ in tqdm(loader, desc=desc):
        bs, c, h, w = x.shape
        pix += bs*h*w
        ch_sum += x.sum(dim=[0,2,3])
        ch_sq  += (x**2).sum(dim=[0,2,3])
    mean = ch_sum / pix
    std  = (ch_sq / pix - mean**2).sqrt()
    return mean, std

# ---- full-dataset post-norm stats ----
train_norm_mean, train_norm_std = full_stats(train_loader, "Full normalized TRAIN stats")
test_norm_mean,  test_norm_std  = full_stats(test_loader,  "Full normalized TEST stats")

print("\nFull normalized TRAIN mean:", train_norm_mean.tolist())
print("Full normalized TRAIN std: ", train_norm_std.tolist())
print("Full normalized TEST mean:",  test_norm_mean.tolist())
print("Full normalized TEST std: ",  test_norm_std.tolist())


Calculating TRAIN pre-norm stats: 100%|██████████| 38/38 [00:15<00:00,  2.43it/s]


Pre-normalization TRAIN mean: [0.6058028340339661, 0.4570693373680115, 0.40498921275138855]
Pre-normalization TRAIN std:  [0.25304171442985535, 0.21776995062828064, 0.2149660289287567]
One-batch normalized TRAIN mean: [-0.1684441864490509, -0.2336142659187317, -0.37765467166900635]
One-batch normalized TRAIN std:  [1.0338082313537598, 0.987051784992218, 0.9565902948379517]


Full normalized TRAIN stats: 100%|██████████| 38/38 [00:24<00:00,  1.57it/s]
Full normalized TEST stats: 100%|██████████| 4/4 [00:01<00:00,  2.63it/s]


Full normalized TRAIN mean: [2.2213475858734455e-07, -3.4643679214241274e-07, -2.275843229426755e-07]
Full normalized TRAIN std:  [0.9999992251396179, 1.0000005960464478, 1.0000005960464478]
Full normalized TEST mean: [-0.6195100545883179, -0.3517855107784271, -0.10748446732759476]
Full normalized TEST std:  [1.2255595922470093, 1.2330907583236694, 1.2600317001342773]





5. Let us do modelling with ResNeXt-50 (32x4d) for classification of 2024, 2025 images of personalities with the knowledge from pre-covid images.

In [None]:
# ============================================
# 8-person Face ID — ResNeXt-50 (32x4d) + (Aida, Lion)
# Prints Train/Test Acc, saves surname CMs + random surname% confidence overlays
# Aida loaded dynamically from official repo (no pip packaging required)
# ============================================

import os, csv, math, random, re, sys, pathlib, importlib.util, subprocess
import numpy as np, pandas as pd
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import models, transforms, datasets

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import cv2

# -----------------------------
# Paths
# -----------------------------
train_root = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Training-Validation Set/Sampled"
test_root  = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Testing Set/Unlabelled Test Data/Test Data"
test_csv   = "/content/drive/MyDrive/Applied Machine Learning/Preprocessing/Testing Set/Unlabelled Test Data/Test.csv"

out_confmats   = "/content/drive/MyDrive/Applied Machine Learning/Results/CNNs/ResNeXt-50 (32x4d)/Confusion Matrices"
out_confimgs   = "/content/drive/MyDrive/Applied Machine Learning/Results/CNNs/ResNeXt-50 (32x4d)/Confidence Images"
out_metricsdir = "/content/drive/MyDrive/Applied Machine Learning/Results/CNNs/ResNeXt-50 (32x4d)/Performance Matrices"
os.makedirs(out_confmats, exist_ok=True)
os.makedirs(out_confimgs, exist_ok=True)
os.makedirs(out_metricsdir, exist_ok=True)
metrics_csv = os.path.join(out_metricsdir, "ResNeXt-50(32x4d)_RIC.csv")

# -----------------------------
# Reproducibility
# -----------------------------
SEED = 1906525
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED); torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# -----------------------------
# Normalization
# -----------------------------
MEAN = [0.6058028340339661, 0.4570693373680115, 0.40498921275138855]
STD  = [0.25304171442985535, 0.21776995062828064, 0.2149660289287567]

to_tensor_norm = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=MEAN, std=STD),
])

# -----------------------------
# Datasets & Dataloaders
# -----------------------------
train_ds = datasets.ImageFolder(root=train_root, transform=to_tensor_norm)
class_to_idx = train_ds.class_to_idx
idx_to_class = {v:k for k,v in class_to_idx.items()}

def surname_of(full_name: str) -> str:
    parts = full_name.strip().split()
    return parts[-1] if parts else full_name

def initials_of(full_name: str) -> str:
    parts = full_name.strip().split()
    return f"{parts[0][0]}{parts[-1][0]}".upper() if len(parts)>=2 else full_name[:2].upper()

surname_ticks = [surname_of(idx_to_class[i]) for i in range(len(idx_to_class))]

train_loader       = DataLoader(train_ds, batch_size=32, shuffle=True,  num_workers=2, pin_memory=True)
train_eval_loader  = DataLoader(train_ds, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)

class CSVTestDataset(Dataset):
    def __init__(self, root_dir, csv_path, transform=None, class_to_idx=None):
        self.root_dir = root_dir
        self.transform = transform
        df = pd.read_csv(csv_path)
        assert {"Image Name","Class Name"}.issubset(df.columns), "CSV must have 'Image Name' and 'Class Name'"
        self.samples = []
        for _, row in df.iterrows():
            img_name = str(row["Image Name"])
            cls_name = str(row["Class Name"])
            p = os.path.join(root_dir, img_name)
            if not os.path.exists(p):
                alt = os.path.join(root_dir, cls_name, img_name)
                if os.path.exists(alt): p = alt
            self.samples.append((p, class_to_idx[cls_name], img_name, cls_name))
    def __len__(self): return len(self.samples)
    def __getitem__(self, i):
        p, y, img_name, cls_name = self.samples[i]
        im = Image.open(p).convert("RGB")
        if self.transform: im = self.transform(im)
        return im, y, p, img_name, cls_name

test_ds = CSVTestDataset(test_root, test_csv, transform=to_tensor_norm, class_to_idx=class_to_idx)
test_loader = DataLoader(test_ds, batch_size=64, shuffle=False, num_workers=2, pin_memory=True)

# -----------------------------
# ResNeXt-50 (32x4d)
# -----------------------------
from torchvision import models

def make_model(num_classes=8):
    try:
        weights = models.ResNeXt50_32X4D_Weights.IMAGENET1K_V2  # or .DEFAULT
        net = models.resnext50_32x4d(weights=weights)           # 224x224 default recipe
    except Exception:
        net = models.resnext50_32x4d(weights="IMAGENET1K_V2")   # legacy TV fallback
    in_features = net.fc.in_features
    net.fc = nn.Linear(in_features, num_classes)
    return net.to(device)

# -----------------------------
# Lion optimizer (inline, per paper rule)
# -----------------------------
class Lion(optim.Optimizer):
    def __init__(self, params, lr=1e-4, betas=(0.9, 0.99), weight_decay=0.0):
        defaults = dict(lr=lr, betas=betas, weight_decay=weight_decay)
        super().__init__(params, defaults)
    @torch.no_grad()
    def step(self, closure=None):
        loss = None
        if closure is not None:
            with torch.enable_grad():
                loss = closure()
        for group in self.param_groups:
            lr = group['lr']; wd = group['weight_decay']; beta1, beta2 = group['betas']
            for p in group['params']:
                if p.grad is None: continue
                g = p.grad
                state = self.state[p]
                if len(state) == 0:
                    state['exp_avg'] = torch.zeros_like(p)
                m = state['exp_avg']
                if wd != 0:  # decoupled weight decay
                    p.mul_(1 - lr * wd)
                m.mul_(beta2).add_(g, alpha=1 - beta2)
                p.add_(m.sign(), alpha=-lr)
                m.mul_(beta1).add_(g, alpha=1 - beta1)
        return loss

# -----------------------------
# Aida optimizer: dynamic import from official repo
# -----------------------------
def load_aida_class():
    repo_url = "https://github.com/guoqiang-x-zhang/AidaOptimizer"
    clone_dir = "/content/AidaOptimizer"
    if not os.path.isdir(clone_dir):
        subprocess.check_call(["git", "clone", "--depth", "1", repo_url, clone_dir])
    # find a Python file that defines class Aida
    def find_aida_py(root):
        for p in pathlib.Path(root).rglob("*.py"):
            try:
                txt = p.read_text(errors="ignore")
                if re.search(r"class\s+Aida\b", txt):
                    return str(p)
            except Exception:
                pass
        return None
    aida_py = find_aida_py(clone_dir)
    if aida_py is None:
        raise ImportError("Aida class not found in the cloned repo.")
    spec = importlib.util.spec_from_file_location("aida_module", aida_py)
    aida_module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(aida_module)
    # try direct name
    if hasattr(aida_module, "Aida"):
        return aida_module.Aida
    # fallback: search namespace
    for name, obj in aida_module.__dict__.items():
        if name.lower() == "aida" and isinstance(obj, type):
            return obj
    raise ImportError("Aida class symbol not exported.")

AidaOpt = load_aida_class()

# -----------------------------
# Optimizer factory
# -----------------------------
def make_optimizer(name, params, lr=1e-4, weight_decay=1e-4):
    n = name.lower()
    if n == "adam":
        return optim.Adam(params, lr=lr, weight_decay=weight_decay)
    if n == "adamw":
        return optim.AdamW(params, lr=lr, weight_decay=weight_decay)
    if n in ("rmsprop", "rms"):
        return optim.RMSprop(params, lr=lr, weight_decay=weight_decay, momentum=0.9)
    if n == "lion":
        return Lion(params, lr=lr, weight_decay=weight_decay)
    if n == "aida":
        try:
            return AidaOpt(params, lr=lr, weight_decay=weight_decay)
        except TypeError:
            return AidaOpt(params, lr=lr)  # if signature differs
    raise ValueError(f"Unknown optimizer: {name}")

# -----------------------------
# Train / Eval helpers
# -----------------------------
criterion = nn.CrossEntropyLoss()
softmax  = nn.Softmax(dim=1)

def train_one_epoch(model, loader, optimizer):
    model.train()
    total, correct = 0, 0
    for x, y in loader:
        x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
        optimizer.zero_grad(set_to_none=True)
        logits = model(x)
        loss = criterion(logits, y)
        loss.backward()
        optimizer.step()
        with torch.no_grad():
            preds = logits.argmax(1)
            correct += (preds == y).sum().item()
            total   += y.size(0)
    return correct/total  # train accuracy

@torch.no_grad()
def eval_model(model, loader):
    model.eval()
    total, correct = 0, 0
    all_probs, all_preds, all_tgts, all_paths, all_names, all_cls = [], [], [], [], [], []
    for batch in loader:
        if len(batch) == 5:
            x, y, paths, names, clsnames = batch
        else:
            x, y = batch; paths = [""]*x.size(0); names = [""]*x.size(0); clsnames = [""]*x.size(0)
        x = x.to(device, non_blocking=True); y = y.to(device, non_blocking=True)
        logits = model(x)
        probs = softmax(logits)
        preds = probs.argmax(1)
        correct += (preds == y).sum().item(); total += y.size(0)
        all_probs.append(probs.cpu()); all_preds.append(preds.cpu()); all_tgts.append(y.cpu())
        all_paths.extend(list(paths)); all_names.extend(list(names)); all_cls.extend(list(clsnames))
    all_probs = torch.cat(all_probs).numpy(); all_preds = torch.cat(all_preds).numpy(); all_tgts = torch.cat(all_tgts).numpy()
    return (correct/total), all_probs, all_preds, all_tgts, all_paths, all_names, all_cls

def format_acc_for_csv(x: float) -> float:
    if x < 1.0:
        return float(f"{x:.3g}") if x != 0 else 0.0
    return float(f"{x:.2f}")

def selective_metrics(probs_np, preds_np, tgts_np, tau=0.75):
    maxc = probs_np.max(axis=1)
    covered = maxc >= tau
    coverage = covered.mean()
    sel_acc = (preds_np[covered] == tgts_np[covered]).mean() if covered.sum()>0 else 0.0
    hcer = ((preds_np != tgts_np) & (maxc >= tau)).mean()
    return coverage, sel_acc, hcer

def mccp(probs_np, preds_np, tgts_np):
    mask = preds_np == tgts_np
    return float(probs_np.max(axis=1)[mask].mean()) if mask.sum()>0 else 0.0

def chcr_from_cm(cm, idx_to_class):
    diag = np.diag(cm); n_per_class = cm.sum(axis=1)
    full = [i for i,c in enumerate(diag) if c == n_per_class[i]]
    if len(full) == len(diag): return "all"
    if len(full) == len(diag)-1:
        suffered = [i for i in range(len(diag)) if i not in full][0]
        return f"Exc {initials_of(idx_to_class[suffered])}({int(diag[suffered])})"
    maxc = diag.max()
    winners = [i for i,c in enumerate(diag) if c == maxc]
    return ", ".join(f"{initials_of(idx_to_class[i])}({int(diag[i])})" for i in winners)

def save_confusion_matrix(cm, surname_ticks, save_path):
    plt.figure(figsize=(6.5,5.6), dpi=160)
    ax = sns.heatmap(cm, annot=True, fmt="d", cbar=False)
    ax.set_xlabel("Predicted", fontweight="bold")
    ax.set_ylabel("Actual",   fontweight="bold")
    ax.set_xticklabels(surname_ticks, rotation=45, ha="right", fontweight="bold")
    ax.set_yticklabels(surname_ticks, rotation=0,   fontweight="bold")
    plt.tight_layout(); plt.savefig(save_path); plt.close()

def save_confidence_face_image_random(paths, preds_np, tgts_np, probs_np, idx_to_class, save_path_png):
    correct_idx = np.where(preds_np == tgts_np)[0]
    if correct_idx.size == 0:
        blank = np.zeros((224,224,3), dtype=np.uint8)
        cv2.putText(blank, "No correct face found", (10,120), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255,255,255), 1, cv2.LINE_AA)
        cv2.imwrite(save_path_png, blank); return False
    rng = np.random.default_rng()  # deliberately non-deterministic for variety
    i = int(rng.choice(correct_idx))
    path = paths[i]; pred = int(preds_np[i]); conf = float(probs_np[i, pred])
    if not os.path.exists(path): return False
    img = cv2.imread(path)
    if img is None: return False
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(40,40))
    if len(faces) == 0:
        h,w = img.shape[:2]; side = int(min(h,w)*0.35); x=(w-side)//2; y=(h-side)//2
        faces = [(x,y,side,side)]
    fx,fy,fw,fh = sorted(faces, key=lambda r: r[2]*r[3], reverse=True)[0]
    cv2.rectangle(img, (fx,fy), (fx+fw,fy+fh), (0,255,0), 2)
    surname = surname_of(idx_to_class[pred])
    label = f"{surname} {int(round(conf*100))}%"
    ty = max(fy-10, 20)
    cv2.putText(img, label, (fx,ty), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0,255,0), 2, cv2.LINE_AA)
    cv2.imwrite(save_path_png, img)
    return True

# Initialize metrics CSV
if not os.path.exists(metrics_csv):
    with open(metrics_csv, "w", newline="") as f:
        csv.writer(f).writerow(
            ["Optimizer", "Epochs", "Tr.A", "Te.A", "Cor.Cl", "CHCR",
             "Coverage@0.75", "SelAcc@0.75", "HCER@0.75", "MCCP"]
        )

# -----------------------------
# Grid and constants
# -----------------------------
optimizers = ["Adam", "AdamW", "RMSProp", "Lion", "Aida"]
epochs_grid = [5, 10, 20, 40, 60]
LR = 1e-4
WD = 1e-4
TAU = 0.75

for opt_name in optimizers:
    for E in epochs_grid:
        print(f"\n=== Run: {opt_name} for {E} epochs ===")
        model = make_model(num_classes=len(class_to_idx))
        opt = make_optimizer(opt_name, model.parameters(), lr=LR, weight_decay=WD)

        # ---- epoch loop: print Train/Test accuracy ----
        for ep in range(1, E+1):
            tr_acc = train_one_epoch(model, train_loader, opt)
            te_acc, _, _, _, _, _, _ = eval_model(model, test_loader)
            print(f"Epoch {ep:>2}/{E}: Train Acc={tr_acc:.4f} | Test Acc={te_acc:.4f}")

        # ---- final evals for logging/plots ----
        train_acc_eval, _, _, _, _, _, _ = eval_model(model, train_eval_loader)
        test_acc, probs_np, preds_np, tgts_np, paths, names, cls_names = eval_model(model, test_loader)
        correct_cls = int((preds_np == tgts_np).sum())

        # Confusion matrix (surname axes, bold)
        labels = list(range(len(class_to_idx)))
        cm = confusion_matrix(tgts_np, preds_np, labels=labels)
        cm_path = os.path.join(out_confmats, f"{opt_name}_{E}.png")
        save_confusion_matrix(cm, surname_ticks, cm_path)

        # Confidence image (random correct example, surname + %)
        confimg_path = os.path.join(out_confimgs, f"{opt_name}_{E}.png")
        # fix accidental '}' if path was pasted twice
        if confimg_path.endswith("}"): confimg_path = confimg_path[:-1]
        ok = save_confidence_face_image_random(paths, preds_np, tgts_np, probs_np, idx_to_class, confimg_path)
        if not ok:
            blank = np.zeros((224,224,3), dtype=np.uint8)
            cv2.putText(blank, "No correct face found", (10,120), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255,255,255), 1, cv2.LINE_AA)
            cv2.imwrite(confimg_path, blank)

        # Selective metrics & MCCP
        coverage, sel_acc, hcer = selective_metrics(probs_np, preds_np, tgts_np, tau=TAU)
        mccp_val = mccp(probs_np, preds_np, tgts_np)

        # CHCR tag
        chcr = chcr_from_cm(cm, idx_to_class)

        # Rounding rules for Tr.A / Te.A
        TrA = format_acc_for_csv(float(train_acc_eval))
        TeA = format_acc_for_csv(float(test_acc))

        with open(metrics_csv, "a", newline="") as f:
            csv.writer(f).writerow([
                opt_name, E, TrA, TeA, correct_cls, chcr,
                float(f"{coverage:.4f}"), float(f"{sel_acc:.4f}"), float(f"{hcer:.4f}"),
                float(f"{mccp_val:.4f}")
            ])

print("\nAll runs complete.")
print(f"Confusion matrices -> {out_confmats}")
print(f"Confidence images  -> {out_confimgs}")
print(f"Metrics CSV        -> {metrics_csv}")


=== Run: Adam for 5 epochs ===
Downloading: "https://download.pytorch.org/models/resnext50_32x4d-1a0047aa.pth" to /root/.cache/torch/hub/checkpoints/resnext50_32x4d-1a0047aa.pth


100%|██████████| 95.8M/95.8M [00:00<00:00, 176MB/s]


Epoch  1/5: Train Acc=0.7479 | Test Acc=0.5958
Epoch  2/5: Train Acc=0.9883 | Test Acc=0.6042
Epoch  3/5: Train Acc=0.9983 | Test Acc=0.6125
Epoch  4/5: Train Acc=0.9983 | Test Acc=0.6042
Epoch  5/5: Train Acc=0.9983 | Test Acc=0.6042

=== Run: Adam for 10 epochs ===
Epoch  1/10: Train Acc=0.7412 | Test Acc=0.5875
Epoch  2/10: Train Acc=0.9875 | Test Acc=0.6042
Epoch  3/10: Train Acc=0.9958 | Test Acc=0.5917
Epoch  4/10: Train Acc=0.9979 | Test Acc=0.5792
Epoch  5/10: Train Acc=0.9992 | Test Acc=0.6000
Epoch  6/10: Train Acc=1.0000 | Test Acc=0.6708
Epoch  7/10: Train Acc=0.9996 | Test Acc=0.6708
Epoch  8/10: Train Acc=1.0000 | Test Acc=0.6583
Epoch  9/10: Train Acc=1.0000 | Test Acc=0.6208
Epoch 10/10: Train Acc=1.0000 | Test Acc=0.6250

=== Run: Adam for 20 epochs ===
Epoch  1/20: Train Acc=0.7733 | Test Acc=0.6042
Epoch  2/20: Train Acc=0.9904 | Test Acc=0.6125
Epoch  3/20: Train Acc=0.9983 | Test Acc=0.6167
Epoch  4/20: Train Acc=0.9996 | Test Acc=0.6167
Epoch  5/20: Train Acc=0.99

	add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
	add_(Tensor other, *, Number alpha = 1) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1691.)
  grad.add_(group['weight_decay'], p.data)


Epoch  1/5: Train Acc=0.5700 | Test Acc=0.4708
Epoch  2/5: Train Acc=0.9150 | Test Acc=0.5583
Epoch  3/5: Train Acc=0.9858 | Test Acc=0.5792
Epoch  4/5: Train Acc=0.9967 | Test Acc=0.5875
Epoch  5/5: Train Acc=0.9988 | Test Acc=0.5917

=== Run: Aida for 10 epochs ===
Epoch  1/10: Train Acc=0.5513 | Test Acc=0.4208
Epoch  2/10: Train Acc=0.9250 | Test Acc=0.5833
Epoch  3/10: Train Acc=0.9846 | Test Acc=0.5667
Epoch  4/10: Train Acc=0.9958 | Test Acc=0.5708
Epoch  5/10: Train Acc=0.9996 | Test Acc=0.5750
Epoch  6/10: Train Acc=1.0000 | Test Acc=0.5958
Epoch  7/10: Train Acc=1.0000 | Test Acc=0.5958
Epoch  8/10: Train Acc=1.0000 | Test Acc=0.5917
Epoch  9/10: Train Acc=0.9996 | Test Acc=0.6083
Epoch 10/10: Train Acc=0.9996 | Test Acc=0.6042

=== Run: Aida for 20 epochs ===
Epoch  1/20: Train Acc=0.5425 | Test Acc=0.4292
Epoch  2/20: Train Acc=0.9171 | Test Acc=0.5875
Epoch  3/20: Train Acc=0.9808 | Test Acc=0.5958
Epoch  4/20: Train Acc=0.9962 | Test Acc=0.5833
Epoch  5/20: Train Acc=0.99