We used the "Birds_25" dataset for this assignment.
In Parts 1 and 2, we changed the image size to 128x128 pixels (it was necessary for memory usage).
For Part 3, 4 and 5 we used 224x224 pixels.
We divided the data into training, validation, and test parts.

In [18]:
#versions

Python        : 3.11.4
OpenCV        : 4.11.0
NumPy         : 1.24.1
scikit-learn  : 1.6.1
scikit-image  : 0.25.2
PyTorch       : 2.5.1+cu121
TorchVision   : 0.20.1+cu121
tqdm          : 4.67.1
Matplotlib    : 3.6.2


In [None]:
# --- PyTorch & TorchVision (CUDA 12.1 build) ---
pip install --index-url https://download.pytorch.org/whl/cu121 \
    torch==2.5.1+cu121 \
    torchvision==0.20.1+cu121

# --- Remaining libraries (exact versions) ---
pip install \
    opencv-python==4.11.0.0 \
    numpy==1.24.1 \
    scikit-learn==1.6.1 \
    scikit-image==0.25.2 \
    tqdm==4.67.1 \
    matplotlib==3.6.2

In [1]:
# Standard library 
import os
import time

# Third-party: general utilities
import cv2
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

# scikit-learn stack
from sklearn.base import clone
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# scikit-image
from skimage.feature import hog

# PyTorch & TorchVision
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, models, transforms


#### LOADING THE DATASET FOR PART 1 & 2 & BONUS

We load all the dataset resized to 128x128 for speed and memory usage (256x256 size would give memory error.) We used all the dataset with lower resolution and not part of dataset with bigger resolution because we observed that it gave better results.

In [2]:
#Load dataset for part 1&2
IMAGE_SIZE = (128, 128)

def load_images_from_folder(folder):
    X, y = [], []
    for label in os.listdir(folder):
        class_path = os.path.join(folder, label)
        for file in os.listdir(class_path):
            img_path = os.path.join(class_path, file)
            img = cv2.imread(img_path)
            if img is not None:
                img = cv2.resize(img, IMAGE_SIZE)
                X.append(img)
                y.append(label)
    return np.array(X), np.array(y)

train_path = "Birds_25/train"
valid_path = "Birds_25/valid"

X_train, y_train = load_images_from_folder(train_path)
X_valid, y_valid_full = load_images_from_folder(valid_path)
X_val, X_test, y_val, y_test = train_test_split(X_valid, y_valid_full, test_size=0.5, stratify=y_valid_full, random_state=42)

### Functions used in Part 1 & 2 & Bonus

In [3]:
#Part1 & 2 & Bonus functions
def extract_color_hist(img):
    hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
    hist = cv2.calcHist([hsv], [0, 1, 2], None, [8,8,8], [0,180,0,256,0,256])
    return hist.flatten()

def extract_hog(img):
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    return hog(gray, pixels_per_cell=(16, 16), cells_per_block=(2, 2), feature_vector=True)

def extract_sift(img):
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    sift = cv2.SIFT_create()
    kp, des = sift.detectAndCompute(gray, None)
    if des is None:
        return np.zeros(128)
    return np.mean(des, axis=0)

def extract_features(X, method):
    features = []
    for img in tqdm(X):
        if method == 'color_hist':
            features.append(extract_color_hist(img))
        elif method == 'hog':
            features.append(extract_hog(img))
        elif method == 'sift':
            features.append(extract_sift(img))
    return np.array(features)

def train_and_val(X_train, y_train, X_val, y_val, model):
    model.fit(X_train, y_train)
    preds = model.predict(X_val)
    acc = accuracy_score(y_val, preds)
    prec = precision_score(y_val, preds, average='macro', zero_division=0)
    rec = recall_score(y_val, preds, average='macro', zero_division=0)
    f1 = f1_score(y_val, preds, average='macro', zero_division=0)
    return acc, prec, rec, f1

def evaluate(model, X, y):
    pred = model.predict(X)
    return (
        accuracy_score(y, pred),
        precision_score(y, pred, average='macro', zero_division=0),
        recall_score(y, pred, average='macro', zero_division=0),
        f1_score(y, pred, average='macro', zero_division=0),
    )

#---------------------

def reduce_features(method, X_train, X_val, y_train=None, n_components=100):
    if method == 'pca':
        reducer = PCA(n_components=n_components)
    elif method == 'selectkbest':
        reducer = SelectKBest(score_func=f_classif, k=n_components)
    else:
        raise ValueError("Unknown method")

    if method == 'selectkbest':
        X_train_red = reducer.fit_transform(X_train, y_train)
    else:
        X_train_red = reducer.fit_transform(X_train)

    X_val_red  = reducer.transform(X_val)
    return X_train_red, X_val_red, reducer 

#---------------------

#Part1 & 2 variables
features_to_try = ['color_hist', 'hog', 'sift']
models_to_try = {
    'SVM': SVC(),
    'RandomForest': RandomForestClassifier(random_state=42),
    'MLP': MLPClassifier(max_iter=500)
}

### PART 1

In this part, we looked at different ways to get features from images and used them with some standard machine learning models.
Features we tried: Color Histograms, Histogram of Oriented Gradients (HOG), and SIFT.
Models we tried: Support Vector Machine (SVM), Random Forest, and Multilayer Perceptron (MLP).

In [4]:
#Part1

for feat in features_to_try:
    X_train_feat = extract_features(X_train, feat)
    X_val_feat   = extract_features(X_val,   feat)
    X_test_feat  = extract_features(X_test,  feat)

    for name, base_model in models_to_try.items():
        #scale only for models that need it
        model = make_pipeline(StandardScaler(), clone(base_model)) \
                if name in ("SVM", "MLP") else clone(base_model)

        model.fit(X_train_feat, y_train)

        val_acc, val_prec, val_rec, val_f1   = evaluate(model, X_val_feat,  y_val)
        test_acc, test_prec, test_rec, test_f1 = evaluate(model, X_test_feat, y_test)

        print(f"{feat} | {name}"
              f" → VAL  A:{val_acc:.4f} P:{val_prec:.4f} R:{val_rec:.4f} F1:{val_f1:.4f}"
              f" | TEST A:{test_acc:.4f} P:{test_prec:.4f} R:{test_rec:.4f} F1:{test_f1:.4f}")

100%|█████████████████████████████████████████████████████████████████████████| 30000/30000 [00:02<00:00, 12503.82it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3749/3749 [00:00<00:00, 12795.44it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3750/3750 [00:00<00:00, 13204.95it/s]


color_hist | SVM → VAL  A:0.4823 P:0.4988 R:0.4823 F1:0.4843 | TEST A:0.4891 P:0.5141 R:0.4891 F1:0.4935
color_hist | RandomForest → VAL  A:0.8634 P:0.8659 R:0.8634 F1:0.8634 | TEST A:0.8547 P:0.8567 R:0.8547 F1:0.8547
color_hist | MLP → VAL  A:0.7399 P:0.7427 R:0.7399 F1:0.7405 | TEST A:0.7296 P:0.7328 R:0.7296 F1:0.7299


100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:24<00:00, 357.03it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:10<00:00, 357.29it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3750/3750 [00:10<00:00, 353.43it/s]


hog | SVM → VAL  A:0.3609 P:0.3649 R:0.3609 F1:0.3572 | TEST A:0.3523 P:0.3558 R:0.3523 F1:0.3475
hog | RandomForest → VAL  A:0.2601 P:0.2615 R:0.2601 F1:0.2556 | TEST A:0.2536 P:0.2540 R:0.2536 F1:0.2493
hog | MLP → VAL  A:0.2577 P:0.2567 R:0.2577 F1:0.2565 | TEST A:0.2272 P:0.2291 R:0.2272 F1:0.2278


100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:37<00:00, 308.71it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:11<00:00, 315.35it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3750/3750 [00:12<00:00, 302.83it/s]


sift | SVM → VAL  A:0.3035 P:0.2983 R:0.3036 F1:0.2965 | TEST A:0.2971 P:0.2970 R:0.2971 F1:0.2932
sift | RandomForest → VAL  A:0.2507 P:0.2434 R:0.2507 F1:0.2413 | TEST A:0.2416 P:0.2433 R:0.2416 F1:0.2367
sift | MLP → VAL  A:0.2611 P:0.2558 R:0.2611 F1:0.2553 | TEST A:0.2488 P:0.2486 R:0.2488 F1:0.2459


### Part 1 Results

### Part 1 – Raw handcrafted features

| Model | Feature    | Split | Acc        | Prec   | Rec    | F1     |
| ----- | ---------- | ----- | ---------- | ------ | ------ | ------ |
| SVM   | color hist | VAL   | 0.4823     | 0.4988 | 0.4823 | 0.4843 |
| SVM   | color hist | TEST  | 0.4891     | 0.5141 | 0.4891 | 0.4935 |
| SVM   | HOG        | VAL   | 0.3609     | 0.3649 | 0.3609 | 0.3572 |
| SVM   | HOG        | TEST  | 0.3523     | 0.3558 | 0.3523 | 0.3475 |
| SVM   | SIFT       | VAL   | 0.3035     | 0.2983 | 0.3036 | 0.2965 |
| SVM   | SIFT       | TEST  | 0.2971     | 0.2970 | 0.2971 | 0.2932 |
| RF    | color hist | VAL   | 0.8634     | 0.8659 | 0.8634 | 0.8634 |
| RF    | color hist | TEST  | **0.8547** | 0.8567 | 0.8547 | 0.8547 |
| RF    | HOG        | VAL   | 0.2601     | 0.2615 | 0.2601 | 0.2556 |
| RF    | HOG        | TEST  | 0.2536     | 0.2540 | 0.2536 | 0.2493 |
| RF    | SIFT       | VAL   | 0.2507     | 0.2434 | 0.2507 | 0.2413 |
| RF    | SIFT       | TEST  | 0.2416     | 0.2433 | 0.2416 | 0.2367 |
| MLP   | color hist | VAL   | 0.7399     | 0.7427 | 0.7399 | 0.7405 |
| MLP   | color hist | TEST  | 0.7296     | 0.7328 | 0.7296 | 0.7299 |
| MLP   | HOG        | VAL   | 0.2577     | 0.2567 | 0.2577 | 0.2565 |
| MLP   | HOG        | TEST  | 0.2272     | 0.2291 | 0.2272 | 0.2278 |
| MLP   | SIFT       | VAL   | 0.2611     | 0.2558 | 0.2611 | 0.2553 |
| MLP   | SIFT       | TEST  | 0.2488     | 0.2486 | 0.2488 | 0.2459 |

**Best Part 1 result**
`RandomForest + color histogram` – **Test Acc 0.8547**, Prec 0.8567, Rec 0.8547, F1 0.8547.

**Comments**

* Colour information is highly discriminative for these bird classes.
* RandomForest handles the 512-bin histogram superbly; SVM/MLP benefit less.
* Texture/shape-only features (HOG, SIFT-mean) perform poorly across models.

### PART 2

In this part, we used different ways to get & reduce features from images and used them with some standard machine learning models.

In [5]:
#Part2

print("\n=== PART 2: FEATURE REDUCTION (PCA & SelectKBest) ===")
for feat in features_to_try:
    X_train_feat = extract_features(X_train, feat)
    X_val_feat   = extract_features(X_val,   feat)
    X_test_feat  = extract_features(X_test,  feat)

    # -------- 1. scale BEFORE dimensionality-reduction --------
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train_feat)
    X_val_scaled   = scaler.transform(X_val_feat)
    X_test_scaled  = scaler.transform(X_test_feat)

    n_comp = min(100, X_train_scaled.shape[1])

    # -- PCA --
    X_train_pca, X_val_pca, pca = reduce_features(
        'pca', X_train_scaled, X_val_scaled, n_components=n_comp)
    X_test_pca = pca.transform(X_test_scaled)

    for name, base_model in models_to_try.items():
        model = clone(base_model)
        model.fit(X_train_pca, y_train)
        val_metrics  = evaluate(model, X_val_pca,  y_val)
        test_metrics = evaluate(model, X_test_pca, y_test)
        print(f"PCA | {feat} | {name} → VAL {val_metrics} | TEST {test_metrics}")

    # -- SelectKBest --
    X_train_kb, X_val_kb, kb = reduce_features(
        'selectkbest', X_train_scaled, X_val_scaled, y_train, n_components=n_comp)
    X_test_kb = kb.transform(X_test_scaled)

    for name, base_model in models_to_try.items():
        model = clone(base_model)
        model.fit(X_train_kb, y_train)
        val_metrics  = evaluate(model, X_val_kb,  y_val)
        test_metrics = evaluate(model, X_test_kb, y_test)
        print(f"KBest | {feat} | {name} → VAL {val_metrics} | TEST {test_metrics}")


=== PART 2: FEATURE REDUCTION (PCA & SelectKBest) ===


100%|█████████████████████████████████████████████████████████████████████████| 30000/30000 [00:02<00:00, 12450.20it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3749/3749 [00:00<00:00, 12673.06it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3750/3750 [00:00<00:00, 12498.56it/s]


PCA | color_hist | SVM → VAL (0.45558815684182447, 0.46888496732742824, 0.45561521252796416, 0.456501437947503) | TEST (0.4672, 0.4846156414452627, 0.46720000000000006, 0.46996938467172605)
PCA | color_hist | RandomForest → VAL (0.7959455854894638, 0.7986149178347363, 0.7959284116331097, 0.7964245009893173) | TEST (0.7826666666666666, 0.785090827847431, 0.7826666666666666, 0.7828678387221011)




PCA | color_hist | MLP → VAL (0.6190984262469992, 0.6233526967313631, 0.6191230425055929, 0.6187049263301689) | TEST (0.5933333333333334, 0.5946348817551257, 0.5933333333333333, 0.592242827798504)
KBest | color_hist | SVM → VAL (0.38783675646839155, 0.41761989445736125, 0.3878335570469798, 0.39356752061376915) | TEST (0.3864, 0.4168012694641683, 0.3864000000000001, 0.3924319939831774)
KBest | color_hist | RandomForest → VAL (0.7967457988797012, 0.8006116485896917, 0.7967266219239373, 0.7974758764403621) | TEST (0.7858666666666667, 0.7889542959002622, 0.7858666666666665, 0.7862688334478087)




KBest | color_hist | MLP → VAL (0.5398773006134969, 0.5443093037077852, 0.5398747203579418, 0.5404985665232754) | TEST (0.5450666666666667, 0.5514478552103956, 0.5450666666666667, 0.5469690509704026)


100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:22<00:00, 361.65it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:10<00:00, 363.45it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3750/3750 [00:10<00:00, 363.64it/s]


PCA | hog | SVM → VAL (0.36729794611896505, 0.3681230504830001, 0.36732348993288594, 0.36407602716451615) | TEST (0.3525333333333333, 0.35330240012527847, 0.35253333333333337, 0.3485918752735299)
PCA | hog | RandomForest → VAL (0.29101093624966656, 0.28964130154612056, 0.2910514541387025, 0.28713335835583165) | TEST (0.2677333333333333, 0.26487010574748593, 0.2677333333333333, 0.2623671370539299)
PCA | hog | MLP → VAL (0.29047746065617497, 0.28574084108665637, 0.2905109619686801, 0.2853721901041897) | TEST (0.2792, 0.27632838560815626, 0.2792, 0.27442189385342175)
KBest | hog | SVM → VAL (0.20832221925846892, 0.20741942318545648, 0.20835794183445192, 0.2041001929418394) | TEST (0.20586666666666667, 0.20722003873055703, 0.2058666666666667, 0.2016435692999753)
KBest | hog | RandomForest → VAL (0.19338490264070418, 0.19230498529844742, 0.19343355704697987, 0.1898292671889895) | TEST (0.18693333333333334, 0.1847153107300893, 0.1869333333333333, 0.18308630918433647)
KBest | hog | MLP → VAL 

100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:31<00:00, 327.91it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:11<00:00, 327.87it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3750/3750 [00:11<00:00, 326.74it/s]


PCA | sift | SVM → VAL (0.3019471859162443, 0.2965584900270573, 0.30195973154362415, 0.29496141585127506) | TEST (0.2936, 0.293053414128187, 0.2936, 0.28941021409590145)
PCA | sift | RandomForest → VAL (0.23019471859162444, 0.22448338382269395, 0.23017807606263982, 0.22191772430614448) | TEST (0.2272, 0.2227943549011901, 0.22719999999999999, 0.22035506856874038)
PCA | sift | MLP → VAL (0.25606828487596694, 0.2525284188825261, 0.2560733780760626, 0.25164908790087975) | TEST (0.23866666666666667, 0.2385806795966235, 0.23866666666666667, 0.2375066808323777)
KBest | sift | SVM → VAL (0.2795412109895972, 0.2732238204169349, 0.27954541387024606, 0.27257367073423194) | TEST (0.2808, 0.2805076524325657, 0.2808, 0.2768984049758962)
KBest | sift | RandomForest → VAL (0.2328620965590824, 0.2220992484957962, 0.23285369127516778, 0.2230359717251616) | TEST (0.22773333333333334, 0.2242214496336504, 0.22773333333333334, 0.22174061640884574)
KBest | sift | MLP → VAL (0.25073352894105094, 0.24436153745

### Part 2

| Model | Feature    | Reduction | Split | Acc        | Prec   | Rec    | F1     |
| ----- | ---------- | --------- | ----- | ---------- | ------ | ------ | ------ |
| SVM   | color hist | PCA       | VAL   | 0.4556     | 0.4689 | 0.4556 | 0.4565 |
| SVM   | color hist | PCA       | TEST  | 0.4672     | 0.4846 | 0.4672 | 0.4700 |
| SVM   | color hist | KBest     | VAL   | 0.3878     | 0.4176 | 0.3878 | 0.3936 |
| SVM   | color hist | KBest     | TEST  | 0.3864     | 0.4168 | 0.3864 | 0.3924 |
| SVM   | HOG        | PCA       | VAL   | 0.3673     | 0.3681 | 0.3673 | 0.3641 |
| SVM   | HOG        | PCA       | TEST  | 0.3525     | 0.3533 | 0.3525 | 0.3486 |
| SVM   | HOG        | KBest     | VAL   | 0.2083     | 0.2074 | 0.2084 | 0.2041 |
| SVM   | HOG        | KBest     | TEST  | 0.2059     | 0.2072 | 0.2059 | 0.2016 |
| SVM   | SIFT       | PCA       | VAL   | 0.3019     | 0.2966 | 0.3020 | 0.2950 |
| SVM   | SIFT       | PCA       | TEST  | 0.2936     | 0.2931 | 0.2936 | 0.2894 |
| SVM   | SIFT       | KBest     | VAL   | 0.2795     | 0.2732 | 0.2795 | 0.2726 |
| SVM   | SIFT       | KBest     | TEST  | 0.2808     | 0.2805 | 0.2808 | 0.2769 |
| RF    | color hist | PCA       | VAL   | 0.7959     | 0.7986 | 0.7959 | 0.7964 |
| RF    | color hist | PCA       | TEST  | 0.7827     | 0.7851 | 0.7827 | 0.7829 |
| RF    | color hist | KBest     | VAL   | 0.7967     | 0.8006 | 0.7967 | 0.7975 |
| RF    | color hist | KBest     | TEST  | **0.7859** | 0.7890 | 0.7859 | 0.7863 |
| RF    | HOG        | PCA       | VAL   | 0.2910     | 0.2896 | 0.2911 | 0.2871 |
| RF    | HOG        | PCA       | TEST  | 0.2677     | 0.2649 | 0.2677 | 0.2624 |
| RF    | HOG        | KBest     | VAL   | 0.1934     | 0.1923 | 0.1934 | 0.1898 |
| RF    | HOG        | KBest     | TEST  | 0.1869     | 0.1847 | 0.1869 | 0.1831 |
| RF    | SIFT       | PCA       | VAL   | 0.2302     | 0.2245 | 0.2302 | 0.2219 |
| RF    | SIFT       | PCA       | TEST  | 0.2272     | 0.2228 | 0.2272 | 0.2204 |
| RF    | SIFT       | KBest     | VAL   | 0.2329     | 0.2221 | 0.2329 | 0.2230 |
| RF    | SIFT       | KBest     | TEST  | 0.2277     | 0.2242 | 0.2277 | 0.2217 |
| MLP   | color hist | PCA       | VAL   | 0.6191     | 0.6234 | 0.6191 | 0.6187 |
| MLP   | color hist | PCA       | TEST  | 0.5933     | 0.5946 | 0.5933 | 0.5922 |
| MLP   | color hist | KBest     | VAL   | 0.5399     | 0.5443 | 0.5399 | 0.5405 |
| MLP   | color hist | KBest     | TEST  | 0.5451     | 0.5514 | 0.5451 | 0.5470 |
| MLP   | HOG        | PCA       | VAL   | 0.2905     | 0.2857 | 0.2905 | 0.2854 |
| MLP   | HOG        | PCA       | TEST  | 0.2792     | 0.2763 | 0.2792 | 0.2744 |
| MLP   | HOG        | KBest     | VAL   | 0.1675     | 0.1645 | 0.1675 | 0.1647 |
| MLP   | HOG        | KBest     | TEST  | 0.1693     | 0.1679 | 0.1693 | 0.1671 |
| MLP   | SIFT       | PCA       | VAL   | 0.2561     | 0.2525 | 0.2561 | 0.2516 |
| MLP   | SIFT       | PCA       | TEST  | 0.2387     | 0.2386 | 0.2387 | 0.2375 |
| MLP   | SIFT       | KBest     | VAL   | 0.2507     | 0.2444 | 0.2508 | 0.2451 |
| MLP   | SIFT       | KBest     | TEST  | 0.2427     | 0.2374 | 0.2427 | 0.2382 |

**Best Part 2 result**
`RandomForest + color histogram + KBest` – **Test Acc 0.7859**, Prec 0.7890, Rec 0.7859, F1 0.7863.

**Comments**

* Dimensionality reduction **helps none** of the colour-hist models reach Part 1’s peak, but KBest retains more signal than PCA.
* For HOG/SIFT, reductions sometimes give tiny gains for linear SVM but generally remain low.
* Trees suffer when PCA scrambles colour-bin semantics.

### Bonus: Training with all extracted features at once

In [6]:
#PART-1 BONUS: train / test on ALL features at once
print("\n=== ALL FEATURES COMBINED ===")

feat_arrays = {f: {
        'train': extract_features(X_train, f),
        'val'  : extract_features(X_val,   f),
        'test' : extract_features(X_test,  f)}
    for f in features_to_try}

X_train_all = np.hstack([feat_arrays[f]['train'] for f in features_to_try])
X_val_all   = np.hstack([feat_arrays[f]['val']   for f in features_to_try])
X_test_all  = np.hstack([feat_arrays[f]['test']  for f in features_to_try])

scaler = StandardScaler()
X_train_all = scaler.fit_transform(X_train_all)
X_val_all   = scaler.transform(X_val_all)
X_test_all  = scaler.transform(X_test_all)

for name, base_model in models_to_try.items():
    model = clone(base_model)
    model.fit(X_train_all, y_train)

    val_metrics  = evaluate(model, X_val_all,  y_val)
    test_metrics = evaluate(model, X_test_all, y_test)

    print(f"ALL | {name}"
          f" → VAL {val_metrics} | TEST {test_metrics}")



=== ALL FEATURES COMBINED ===


100%|█████████████████████████████████████████████████████████████████████████| 30000/30000 [00:02<00:00, 12853.91it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3749/3749 [00:00<00:00, 12496.63it/s]
100%|███████████████████████████████████████████████████████████████████████████| 3750/3750 [00:00<00:00, 12395.79it/s]
100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:23<00:00, 360.89it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:10<00:00, 363.53it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3750/3750 [00:10<00:00, 362.36it/s]
100%|███████████████████████████████████████████████████████████████████████████| 30000/30000 [01:34<00:00, 317.26it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 3749/3749 [00:11<00:00, 326.96it/s]
100%|███████████████████████████████████

ALL | SVM → VAL (0.5222726060282742, 0.5281307982236062, 0.5222872483221477, 0.5225300534365175) | TEST (0.5152, 0.5211580813079298, 0.5152, 0.5149782806843413)
ALL | RandomForest → VAL (0.6497732728727661, 0.6522608164710885, 0.649786129753915, 0.6477732555419344) | TEST (0.6464, 0.647913621599414, 0.6464, 0.6431685982738545)
ALL | MLP → VAL (0.5038676980528141, 0.5046204590431036, 0.5038944071588367, 0.503565808948682) | TEST (0.5056, 0.5067068877027162, 0.5056, 0.5055219705923668)


### Bonus - All features concatenated

| Model | Split | Acc        | Prec   | Rec    | F1     |
| ----- | ----- | ---------- | ------ | ------ | ------ |
| SVM   | VAL   | 0.5223     | 0.5281 | 0.5223 | 0.5225 |
| SVM   | TEST  | 0.5152     | 0.5212 | 0.5152 | 0.5150 |
| RF    | VAL   | 0.6498     | 0.6523 | 0.6498 | 0.6478 |
| RF    | TEST  | **0.6464** | 0.6479 | 0.6464 | 0.6432 |
| MLP   | VAL   | 0.5039     | 0.5046 | 0.5039 | 0.5036 |
| MLP   | TEST  | 0.5056     | 0.5067 | 0.5056 | 0.5055 |

**Best Bonus result**
`RandomForest (all features)` – **Test Acc 0.6464**, Prec 0.6479, Rec 0.6464, F1 0.6432.

**Comment**
Concatenating heterogeneous features boosts SVM a bit but dilutes RF/MLP compared with colour-hist alone; mixed-scale noise outweighs the added signal.

### Comparison of Part 1, 2 and Bonus

### SVM

| Setting                     | Test Acc   |
| --------------------------- | ---------- |
| Part 1 – color hist         | 0.4891     |
| Part 1 – HOG                | 0.3523     |
| Part 1 – SIFT               | 0.2971     |
| Part 2 – color hist + PCA   | 0.4672     |
| Part 2 – color hist + KBest | 0.3864     |
| Part 2 – HOG + PCA          | 0.3525     |
| Part 2 – HOG + KBest        | 0.2059     |
| Part 2 – SIFT + PCA         | 0.2936     |
| Part 2 – SIFT + KBest       | 0.2808     |
| Bonus – all features        | **0.5152** |

### Random Forest

| Setting                     | Test Acc   |
| --------------------------- | ---------- |
| Part 1 – color hist         | **0.8547** |
| Part 1 – HOG                | 0.2536     |
| Part 1 – SIFT               | 0.2416     |
| Part 2 – color hist + PCA   | 0.7827     |
| Part 2 – color hist + KBest | 0.7859     |
| Part 2 – HOG + PCA          | 0.2677     |
| Part 2 – HOG + KBest        | 0.1869     |
| Part 2 – SIFT + PCA         | 0.2272     |
| Part 2 – SIFT + KBest       | 0.2277     |
| Bonus – all features        | 0.6464     |

### MLP

| Setting                     | Test Acc   |
| --------------------------- | ---------- |
| Part 1 – color hist         | **0.7296** |
| Part 1 – HOG                | 0.2272     |
| Part 1 – SIFT               | 0.2488     |
| Part 2 – color hist + PCA   | 0.5933     |
| Part 2 – color hist + KBest | 0.5451     |
| Part 2 – HOG + PCA          | 0.2792     |
| Part 2 – HOG + KBest        | 0.1693     |
| Part 2 – SIFT + PCA         | 0.2387     |
| Part 2 – SIFT + KBest       | 0.2427     |
| Bonus – all features        | 0.5056     |

---

### General comments

* **Colour histogram + RandomForest (Part 1)** is the overall champion (85 % accuracy).
* Dimensionality reduction **rarely beats** the raw feature when the raw feature is already informative and compact (colour hist).
* **Feature fusion** (Bonus) helps SVM a little but harms RF/MLP—highlighting the need for careful normalisation/selection when merging heterogeneous descriptors.
* Future gains likely require hyper-parameter tuning, better encodings for HOG/SIFT (e.g., BoVW/VLAD), or moving to deep features.

## PART 3, 4, 5

For Part 3 & 4, we used resnet18, mobilenet_v3_small, shufflenet_v2_x0_5 (with IMAGENET1K_V1 weights for part 3).
For Part 5, we tried these models

| Model                 | Convolutional backbone                                                                                                                             | Head                                                    |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| **custom\_small**     | 3 blocks → <br>• Conv 3×3 @32 → BN → ReLU → MaxPool 2×2 <br>• Conv 3×3 @64 → BN → ReLU → MaxPool 2×2 <br>• Conv 3×3 @128 → BN → ReLU → MaxPool 2×2 | Global-Avg-Pool → Dropout 0.30 → FC 256 → ReLU → FC 25  |
| **custom\_medium**    | 4 blocks (filters 32 → 64 → 128 → 256) – each: Conv 3×3 → BN → ReLU → MaxPool 2×2                                                                  | Global-Avg-Pool → Dropout 0.30 → FC 512 → ReLU → FC 25  |
| **custom\_deep\_bn**  | 5 blocks (32 / 64 / 128 / 256 / 512) with BN after every conv                                                                                      | Global-Avg-Pool → Dropout 0.40 → FC 512 → ReLU → FC 25  |
| **custom\_ultra\_bn** | 7 blocks (32 / 64 / 128 / 256 / 512 / 512 / 512) – deeper & wider, BN on all layers                                                                | Global-Avg-Pool → Dropout 0.50 → FC 1024 → ReLU → FC 25 |

*All conv layers use padding = 1, keeping spatial size before each 2×2 pooling step.*


We used Adams optimizer with 0.0001 learning rate .
We resized the dataset to 224x224 but used all of it.

In [6]:
#load data for part3
data_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225])
])

train_dataset = datasets.ImageFolder("Birds_25/train", transform=data_transforms)
valid_dataset = datasets.ImageFolder("Birds_25/valid", transform=data_transforms)

# Split valid into val and test
val_size = len(valid_dataset) // 2
test_size = len(valid_dataset) - val_size
val_dataset, test_dataset = random_split(
    valid_dataset,
    [val_size, test_size],
    generator=torch.Generator().manual_seed(42)
)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)
test_loader = DataLoader(test_dataset, batch_size=32)


In [7]:
# Part 3&4&5 funtions
def fine_tune(model_name, train_loader, val_loader, num_classes, pretrained=True):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    weights = 'IMAGENET1K_V1' if pretrained else None

    if model_name == "resnet18":
        model = models.resnet18(weights=weights)
        model.fc = nn.Linear(model.fc.in_features, num_classes)
    elif model_name == "mobilenet_v3_small":
        model = models.mobilenet_v3_small(weights=weights)
        model.classifier[3] = nn.Linear(model.classifier[3].in_features, num_classes)
    elif model_name == "shufflenet_v2_x0_5":
        model = models.shufflenet_v2_x0_5(weights=weights)
        model.fc = nn.Linear(model.fc.in_features, num_classes)

    else:
        raise ValueError(f"Unknown model name: {model_name}")

    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=1e-4)

    train_losses, val_losses = [], []
    val_accs, val_precs, val_recs = [], [], []
    best_loss = float('inf')
    
    best_path = f"{model_name}_{'pretrained' if pretrained else 'scratch'}_best.pth"
    
    patience, trigger = 5, 0  #early stopping

    for epoch in range(10):
        # ------- TRAIN -------
        model.train()
        running_loss = 0.0

        for inputs, labels in tqdm(train_loader,
                                   desc=f'E{epoch+1:02d} ▸ train',
                                   leave=False):
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            loss = criterion(model(inputs), labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        avg_train_loss = running_loss / len(train_loader)
        train_losses.append(avg_train_loss)

        # ------- VALIDATE -------
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in tqdm(val_loader,
                                       desc=f'E{epoch+1:02d} ▸ val  ',
                                       leave=False):
                inputs, labels = inputs.to(device), labels.to(device)
                val_loss += criterion(model(inputs), labels).item()

        avg_val_loss = val_loss / len(val_loader)

        # one pass to get metrics
        _, acc, prec, rec, _ = evaluate_loader(model, val_loader, criterion)

        val_losses.append(avg_val_loss)
        val_accs.append(acc)
        val_precs.append(prec)
        val_recs.append(rec)

        print(f'Epoch {epoch+1}: Train Loss: {avg_train_loss:.4f}  |  Val Loss: {avg_val_loss:.4f}')

        # ------- EARLY-STOP -------
        if avg_val_loss < best_loss:
            best_loss = avg_val_loss
            torch.save(model.state_dict(), best_path)
            trigger = 0
        else:
            trigger += 1
            if trigger >= patience:
                print('Early stopping')
                break


    # ------- LOSS CURVE --------
    plt.figure(figsize=(6,4))
    plt.plot(train_losses, label='Train loss')
    plt.plot(val_losses,   label='Val loss')
    plt.xlabel('Epoch'); plt.ylabel('Loss'); plt.legend()
    plt.title(f'{model_name} - loss')
    plt.savefig(f"{model_name}_{'pretrained' if pretrained else 'scratch'}_loss.png")
    plt.clf()

    # ------- METRIC CURVES -------
    plt.figure(figsize=(6,4))
    plt.plot(val_accs,  label='Accuracy')
    plt.plot(val_precs, label='Precision')
    plt.plot(val_recs,  label='Recall')
    plt.xlabel('Epoch'); plt.ylabel('Score'); plt.legend()
    plt.title(f'{model_name} - val metrics')
    plt.savefig(f"{model_name}_{'pretrained' if pretrained else 'scratch'}_metrics.png")
    plt.clf()

    
    model.load_state_dict(torch.load(best_path))
    test_loss, test_acc, test_prec, test_rec, test_f1 = evaluate_loader(
            model, test_loader, criterion)

    print(f"[{model_name} | {'pre' if pretrained else 'scratch'}] "
          f"TEST  Loss={test_loss:.4f}  "
          f"Acc={test_acc:.4f}  "
          f"P={test_prec:.4f}  R={test_rec:.4f}  F1={test_f1:.4f}")

    
    
def evaluate_loader(model, loader, criterion):
    model.eval()
    device = next(model.parameters()).device

    tot_loss, y_true, y_pred = 0.0, [], []
    with torch.no_grad():
        for x, y in loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            tot_loss += criterion(out, y).item()

            y_true.extend(y.cpu().numpy())
            y_pred.extend(out.argmax(1).cpu().numpy())

    acc  = accuracy_score (y_true, y_pred)
    prec = precision_score(y_true, y_pred, average='macro', zero_division=0)
    rec  = recall_score   (y_true, y_pred, average='macro', zero_division=0)
    f1   = f1_score       (y_true, y_pred, average='macro', zero_division=0)

    return tot_loss / len(loader), acc, prec, rec, f1


In [8]:
#Part3
for model_name in ["resnet18", "mobilenet_v3_small", "shufflenet_v2_x0_5"]:
    print(f"\nTraining {model_name}")
    fine_tune(model_name, train_loader, val_loader, num_classes=25, pretrained=True)


Training resnet18


                                                                                                                       

Epoch 1: Train Loss: 0.3919  |  Val Loss: 0.1469


                                                                                                                       

Epoch 2: Train Loss: 0.0644  |  Val Loss: 0.1028


                                                                                                                       

Epoch 3: Train Loss: 0.0288  |  Val Loss: 0.1064


                                                                                                                       

Epoch 4: Train Loss: 0.0293  |  Val Loss: 0.0851


                                                                                                                       

Epoch 5: Train Loss: 0.0239  |  Val Loss: 0.1165


                                                                                                                       

Epoch 6: Train Loss: 0.0268  |  Val Loss: 0.1056


                                                                                                                       

Epoch 7: Train Loss: 0.0177  |  Val Loss: 0.1356


                                                                                                                       

Epoch 8: Train Loss: 0.0187  |  Val Loss: 0.1054


                                                                                                                       

Epoch 9: Train Loss: 0.0148  |  Val Loss: 0.1348
Early stopping


  model.load_state_dict(torch.load(best_path))


[resnet18 | pre] TEST  Loss=0.0940  Acc=0.9699  P=0.9704  R=0.9701  F1=0.9700

Training mobilenet_v3_small


                                                                                                                       

Epoch 1: Train Loss: 0.8392  |  Val Loss: 0.2190


                                                                                                                       

Epoch 2: Train Loss: 0.2002  |  Val Loss: 0.1334


                                                                                                                       

Epoch 3: Train Loss: 0.1167  |  Val Loss: 0.0996


                                                                                                                       

Epoch 4: Train Loss: 0.0718  |  Val Loss: 0.1030


                                                                                                                       

Epoch 5: Train Loss: 0.0477  |  Val Loss: 0.0792


                                                                                                                       

Epoch 6: Train Loss: 0.0374  |  Val Loss: 0.0839


                                                                                                                       

Epoch 7: Train Loss: 0.0284  |  Val Loss: 0.0757


                                                                                                                       

Epoch 8: Train Loss: 0.0240  |  Val Loss: 0.0899


                                                                                                                       

Epoch 9: Train Loss: 0.0191  |  Val Loss: 0.0789


                                                                                                                       

Epoch 10: Train Loss: 0.0178  |  Val Loss: 0.0795


  model.load_state_dict(torch.load(best_path))


[mobilenet_v3_small | pre] TEST  Loss=0.0724  Acc=0.9771  P=0.9776  R=0.9771  F1=0.9772

Training shufflenet_v2_x0_5


                                                                                                                       

Epoch 1: Train Loss: 2.4090  |  Val Loss: 1.3623


                                                                                                                       

Epoch 2: Train Loss: 1.0178  |  Val Loss: 0.6807


                                                                                                                       

Epoch 3: Train Loss: 0.6196  |  Val Loss: 0.4783


                                                                                                                       

Epoch 4: Train Loss: 0.4677  |  Val Loss: 0.3927


                                                                                                                       

Epoch 5: Train Loss: 0.3778  |  Val Loss: 0.3406


                                                                                                                       

Epoch 6: Train Loss: 0.3202  |  Val Loss: 0.3108


                                                                                                                       

Epoch 7: Train Loss: 0.2776  |  Val Loss: 0.2859


                                                                                                                       

Epoch 8: Train Loss: 0.2460  |  Val Loss: 0.2656


                                                                                                                       

Epoch 9: Train Loss: 0.2161  |  Val Loss: 0.2490


                                                                                                                       

Epoch 10: Train Loss: 0.1873  |  Val Loss: 0.2442


  model.load_state_dict(torch.load(best_path))


[shufflenet_v2_x0_5 | pre] TEST  Loss=0.2472  Acc=0.9211  P=0.9227  R=0.9217  F1=0.9217


<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

### Part 3 Results

| Model                  | Best val loss | Test acc   | Fit pattern                                                                                              | Key note                                                                                                   |
| ---------------------- | ------------- | ---------- | -------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| **ResNet-18**          | ≈ 0.085       | **97.0 %** | **Slight over-fit** – train loss dives < 0.03 while val loss oscillates 0.08–0.14; early-stop at epoch 9 | Good accuracy, but gap hints at mild memorisation; would benefit from lower LR or stronger regularisation. |
| **MobileNet V3-small** | **0.075**     | **97.7 %** | **Well-balanced** – train & val curves fall together and flatten; acc/prec/rec rise steadily to ≈ 0.98   | Best accuracy-to-size ratio; curves show stable convergence with minimal over-fit.                         |
| **ShuffleNet V2 0.5×** | 0.24          | 92.1 %     | **Under-fit** – losses drop smoothly but remain higher; small gap between train & val                    | Lightweight backbone needs more epochs or wider (1.0×) variant to reach parity.                            |

#### Observations

* **Transfer-learning effectiveness:** All three networks start far below ImageNet head loss; pre-trained features are reused successfully.
* **Capacity vs. data size:** ResNet (≈ 11 M params) fits quickly, even risks over-fit; ShuffleNet (＜ 1 M) struggles to model the 25-class bird dataset in only 10 epochs; MobileNet (≈ 2.5 M) hits the sweet spot.
* **Metric consistency:** Accuracy, precision and recall curves almost overlap for every model → class balance is good and no single class dominates performance.

### Loss And Metric Graphs For Part 3

Resnet18:

<img src="resnet18_pretrained_loss.png" alt="resnet18 loss" width="400">

<img src="resnet18_pretrained_metrics.png" alt="resnet18 loss" width="400">

MobileNet V3-small:

<img src="mobilenet_v3_small_pretrained_loss.png" alt="mobilenet loss" width="400">

<img src="mobilenet_v3_small_pretrained_metrics.png" alt="mobilenet loss" width="400">

ShuffleNet V2 0.5×:

<img src="shufflenet_v2_x0_5_pretrained_loss.png" alt="shufflenet loss" width="400">

<img src="shufflenet_v2_x0_5_pretrained_metrics.png" alt="shufflenet loss" width="400">

## PART 4

In [9]:
#Part4
for model_name in ["resnet18", "mobilenet_v3_small", "shufflenet_v2_x0_5"]:
    print(f"\nTraining {model_name} from scratch")
    fine_tune(model_name, train_loader, val_loader, num_classes=25, pretrained=False)


Training resnet18 from scratch


                                                                                                                       

Epoch 1: Train Loss: 1.9645  |  Val Loss: 1.5323


                                                                                                                       

Epoch 2: Train Loss: 1.1455  |  Val Loss: 1.0387


                                                                                                                       

Epoch 3: Train Loss: 0.7901  |  Val Loss: 0.9047


                                                                                                                       

Epoch 4: Train Loss: 0.5786  |  Val Loss: 0.7053


                                                                                                                       

Epoch 5: Train Loss: 0.4149  |  Val Loss: 0.6535


                                                                                                                       

Epoch 6: Train Loss: 0.3051  |  Val Loss: 0.6652


                                                                                                                       

Epoch 7: Train Loss: 0.2089  |  Val Loss: 0.6484


                                                                                                                       

Epoch 8: Train Loss: 0.1362  |  Val Loss: 0.7337


                                                                                                                       

Epoch 9: Train Loss: 0.1125  |  Val Loss: 0.7550


                                                                                                                       

Epoch 10: Train Loss: 0.0940  |  Val Loss: 0.9196


  model.load_state_dict(torch.load(best_path))


[resnet18 | scratch] TEST  Loss=0.6337  Acc=0.8152  P=0.8362  R=0.8169  F1=0.8162

Training mobilenet_v3_small from scratch


                                                                                                                       

Epoch 1: Train Loss: 2.5286  |  Val Loss: 2.2155


                                                                                                                       

Epoch 2: Train Loss: 2.0527  |  Val Loss: 1.9430


                                                                                                                       

Epoch 3: Train Loss: 1.8016  |  Val Loss: 1.7066


                                                                                                                       

Epoch 4: Train Loss: 1.6163  |  Val Loss: 1.5688


                                                                                                                       

Epoch 5: Train Loss: 1.4543  |  Val Loss: 1.4868


                                                                                                                       

Epoch 6: Train Loss: 1.3221  |  Val Loss: 1.4160


                                                                                                                       

Epoch 7: Train Loss: 1.2022  |  Val Loss: 1.3614


                                                                                                                       

Epoch 8: Train Loss: 1.0937  |  Val Loss: 1.2594


                                                                                                                       

Epoch 9: Train Loss: 0.9948  |  Val Loss: 1.2955


                                                                                                                       

Epoch 10: Train Loss: 0.9055  |  Val Loss: 1.2785


  model.load_state_dict(torch.load(best_path))


[mobilenet_v3_small | scratch] TEST  Loss=1.2697  Acc=0.5981  P=0.6108  R=0.5989  F1=0.6016

Training shufflenet_v2_x0_5 from scratch


                                                                                                                       

Epoch 1: Train Loss: 2.8826  |  Val Loss: 2.6289


                                                                                                                       

Epoch 2: Train Loss: 2.5423  |  Val Loss: 2.4117


                                                                                                                       

Epoch 3: Train Loss: 2.3069  |  Val Loss: 2.1829


                                                                                                                       

Epoch 4: Train Loss: 2.1206  |  Val Loss: 2.0291


                                                                                                                       

Epoch 5: Train Loss: 1.9813  |  Val Loss: 1.8946


                                                                                                                       

Epoch 6: Train Loss: 1.8739  |  Val Loss: 1.8138


                                                                                                                       

Epoch 7: Train Loss: 1.7726  |  Val Loss: 1.7310


                                                                                                                       

Epoch 8: Train Loss: 1.6870  |  Val Loss: 1.6778


                                                                                                                       

Epoch 9: Train Loss: 1.6168  |  Val Loss: 1.6248


                                                                                                                       

Epoch 10: Train Loss: 1.5450  |  Val Loss: 1.5723


  model.load_state_dict(torch.load(best_path))


[shufflenet_v2_x0_5 | scratch] TEST  Loss=1.6068  Acc=0.4829  P=0.4892  R=0.4849  F1=0.4801


<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

<Figure size 600x400 with 0 Axes>

### Part 4 (vs Part 3)

| Model                  | **Pre-trained (P3)**            | **Scratch (P4)**          | Δ Accuracy | Curve behaviour                                                                                                                                                                                |
| ---------------------- | ------------------------------- | ------------------------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **ResNet-18**          | **97 %** acc, val loss ≈ 0.10   | 81 % acc, val loss ≈ 0.65 | **-16 pp** | P3: drops to 0.02 train loss by epoch 3, small val oscillations → slight over-fit.<br>P4: steady descent, then plateau; gap widens after epoch 6 → capacity starts to over-fit noisy features. |
| **MobileNet V3-small** | **97.7 %** acc, val loss ≈ 0.08 | 59 % acc, val loss ≈ 1.3  | **-39 pp** | P3: train & val curves almost overlap; smooth ↑ to 0.98 acc.<br>P4: linear fall in loss but curves still > 1.0 at epoch 10; metrics climb only to ≈ 0.6 ⇒ under-fit.                           |
| **ShuffleNet V2-0.5×** | **92 %** acc, val loss ≈ 0.25   | 48 % acc, val loss ≈ 1.6  | **-44 pp** | P3: slow but monotonic improvement to 0.92 acc.<br>P4: both losses fall but remain high; metrics < 0.5 even after 10 epochs ⇒ clear under-fit.                                                 |

#### Key observations

* **Transfer learning payoff**

  * Pre-trained weights give a **+16 – 44 pp** boost in test accuracy and reach useful performance in < 3 epochs (ResNet, MobileNet).
  * From-scratch models are still learning basic features after 10 epochs; MobileNet & ShuffleNet haven’t converged.

* **Capacity vs. data size**

  * Our bird dataset (25 classes, 1.5 k images each) is large enough to fine-tune, but **not large enough to train deep CNNs from random init** without many more epochs/regularisation.
  * Smaller backbones (ShuffleNet-0.5×) suffer most: insufficient representational power plus no pre-learned features.

* **Curve patterns**

  * **Pre-trained**: rapid drop in both train & val loss, early stabilisation; accuracy / precision / recall curves almost identical → balanced class performance.
  * **Scratch**: losses descend slowly; val curves lag behind train, signalling under-fit at same epoch budget. ResNet eventually starts over-fitting (val loss ticks up after epoch 7).

*(All metrics are macro-averaged over 25 classes.)*

### Loss And Metric Graphs For Part 4

Resnet18:

<img src="resnet18_scratch_loss.png" alt="resnet18 loss" width="400">

<img src="resnet18_scratch_metrics.png" alt="resnet18 loss" width="400">

MobileNet V3-small:

<img src="mobilenet_v3_small_scratch_loss.png" alt="mobilenet loss" width="400">

<img src="mobilenet_v3_small_scratch_metrics.png" alt="mobilenet loss" width="400">

ShuffleNet V2 0.5×:

<img src="shufflenet_v2_x0_5_scratch_loss.png" alt="shufflenet loss" width="400">

<img src="shufflenet_v2_x0_5_scratch_metrics.png" alt="shufflenet loss" width="400">

## PART 5

In [10]:
#functions
class CustomCNN(nn.Module):
    def __init__(self, cfg, num_classes):
        super().__init__()
        in_c = 3
        layers = []
        for f in cfg["filters"]:
            layers += [
                nn.Conv2d(in_c, f, 3, padding=1),
                nn.BatchNorm2d(f) if cfg.get("batch_norm", True) else nn.Identity(),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(2)
            ]
            in_c = f
        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        fc_in = in_c
        self.classifier = nn.Sequential(
            nn.Dropout(cfg.get("dropout", 0.0)),
            nn.Linear(fc_in, cfg.get("fc", 256)),
            nn.ReLU(inplace=True),
            nn.Linear(cfg.get("fc", 256), num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x).flatten(1)
        return self.classifier(x)


def train_custom(cfg, train_loader, val_loader, test_loader, num_classes):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = CustomCNN(cfg, num_classes).to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=1e-4)

    train_losses, val_losses = [], []
    val_accs, val_precs, val_recs = [], [], []

    best_loss, trigger, patience = float("inf"), 0, 5
    tag = cfg["name"]
    best_path = f"{tag}_best.pth"

    for epoch in range(10):
        # ----- train -----
        model.train()
        tl = 0.0
        for x, y in tqdm(train_loader, desc=f"E{epoch+1:02d} ▸ train", leave=False):
            x, y = x.to(device), y.to(device)
            optimizer.zero_grad()
            loss = criterion(model(x), y)
            loss.backward()
            optimizer.step()
            tl += loss.item()
        train_losses.append(tl / len(train_loader))

        # ----- validate loss -----
        model.eval()
        vl = 0.0
        with torch.no_grad():
            for x, y in tqdm(val_loader, desc=f"E{epoch+1:02d} ▸ val  ", leave=False):
                x, y = x.to(device), y.to(device)
                vl += criterion(model(x), y).item()
        vl /= len(val_loader)
        val_losses.append(vl)

        # ----- validate metrics -----
        _, acc, prec, rec, _ = evaluate_loader(model, val_loader, criterion)
        val_accs.append(acc)
        val_precs.append(prec)
        val_recs.append(rec)

        print(f"[{tag}] epoch {epoch+1:02d}  train={train_losses[-1]:.4f}  "
              f"val={vl:.4f}  acc={acc:.4f}  prec={prec:.4f}  rec={rec:.4f}")

        if vl < best_loss:
            best_loss, trigger = vl, 0
            torch.save(model.state_dict(), best_path)
        else:
            trigger += 1
            if trigger >= patience:
                print(f"[{tag}] early stop")
                break

    # ----- loss curve -----
    plt.plot(train_losses, label="train")
    plt.plot(val_losses, label="val")
    plt.title(f"{tag} loss")
    plt.legend()
    plt.xlabel("epoch")
    plt.ylabel("loss")
    plt.savefig(f"{tag}_loss.png")
    plt.clf()

    # ----- metric curves -----
    plt.plot(val_accs, label="accuracy")
    plt.plot(val_precs, label="precision")
    plt.plot(val_recs, label="recall")
    plt.title(f"{tag} metrics")
    plt.legend()
    plt.xlabel("epoch")
    plt.ylabel("score")
    plt.savefig(f"{tag}_metrics.png")
    plt.clf()

    # ----- test -----
    model.load_state_dict(torch.load(best_path))
    test_loss, test_acc, test_prec, test_rec, test_f1 = evaluate_loader(model, test_loader, criterion)
    print(f"[{tag}] TEST loss={test_loss:.4f}  acc={test_acc:.4f}  "
          f"P={test_prec:.4f}  R={test_rec:.4f}  F1={test_f1:.4f}")


In [11]:
# ---------- Part 5 ----------
cfg_list = [
    {"name": "custom_small",   "filters": [32, 64, 128],           "fc": 256, "dropout": 0.3},
    {"name": "custom_medium",  "filters": [32, 64, 128, 256],     "fc": 512, "dropout": 0.3},
    {"name": "custom_deep_bn", "filters": [32, 64, 128, 256,512], "fc": 512, "dropout": 0.4},
]

In [12]:
print(f"\n*** training {cfg_list[0]['name']} ***")
train_custom(cfg_list[0], train_loader, val_loader, test_loader, num_classes=25)


*** training custom_small ***


                                                                                                                       

[custom_small] epoch 01  train=2.9397  val=2.6309  acc=0.2285  prec=0.2427  rec=0.2248


                                                                                                                       

[custom_small] epoch 02  train=2.5771  val=2.4408  acc=0.2536  prec=0.2747  rec=0.2531


                                                                                                                       

[custom_small] epoch 03  train=2.4395  val=2.3067  acc=0.3040  prec=0.3391  rec=0.3020


                                                                                                                       

[custom_small] epoch 04  train=2.3657  val=2.3223  acc=0.2872  prec=0.3028  rec=0.2844


                                                                                                                       

[custom_small] epoch 05  train=2.3085  val=2.1957  acc=0.3328  prec=0.3495  rec=0.3310


                                                                                                                       

[custom_small] epoch 06  train=2.2611  val=2.1355  acc=0.3403  prec=0.3541  rec=0.3378


                                                                                                                       

[custom_small] epoch 07  train=2.2260  val=2.0981  acc=0.3595  prec=0.3646  rec=0.3567


                                                                                                                       

[custom_small] epoch 08  train=2.1909  val=2.0456  acc=0.3685  prec=0.3808  rec=0.3657


                                                                                                                       

[custom_small] epoch 09  train=2.1538  val=2.0199  acc=0.3883  prec=0.3963  rec=0.3859


                                                                                                                       

[custom_small] epoch 10  train=2.1315  val=1.9978  acc=0.3915  prec=0.4036  rec=0.3893


  model.load_state_dict(torch.load(best_path))


[custom_small] TEST loss=2.0293  acc=0.3619  P=0.3732  R=0.3633  F1=0.3599


<Figure size 640x480 with 0 Axes>

In [13]:
print(f"\n*** training {cfg_list[1]['name']} ***")
train_custom(cfg_list[1], train_loader, val_loader, test_loader, num_classes=25)


*** training custom_medium ***


                                                                                                                       

[custom_medium] epoch 01  train=2.6670  val=2.3232  acc=0.3000  prec=0.3139  rec=0.2985


                                                                                                                       

[custom_medium] epoch 02  train=2.2600  val=2.1127  acc=0.3552  prec=0.3711  rec=0.3546


                                                                                                                       

[custom_medium] epoch 03  train=2.1232  val=2.0084  acc=0.3739  prec=0.4285  rec=0.3701


                                                                                                                       

[custom_medium] epoch 04  train=2.0020  val=1.9408  acc=0.3947  prec=0.4429  rec=0.3924


                                                                                                                       

[custom_medium] epoch 05  train=1.9211  val=1.7956  acc=0.4403  prec=0.4521  rec=0.4367


                                                                                                                       

[custom_medium] epoch 06  train=1.8473  val=1.7407  acc=0.4600  prec=0.4631  rec=0.4566


                                                                                                                       

[custom_medium] epoch 07  train=1.7834  val=1.7187  acc=0.4675  prec=0.4868  rec=0.4657


                                                                                                                       

[custom_medium] epoch 08  train=1.7392  val=1.6346  acc=0.4848  prec=0.4960  rec=0.4810


                                                                                                                       

[custom_medium] epoch 09  train=1.6836  val=1.6016  acc=0.4923  prec=0.5028  rec=0.4896


                                                                                                                       

[custom_medium] epoch 10  train=1.6278  val=1.5059  acc=0.5269  prec=0.5407  rec=0.5246


  model.load_state_dict(torch.load(best_path))


[custom_medium] TEST loss=1.5295  acc=0.5208  P=0.5369  R=0.5221  F1=0.5105


<Figure size 640x480 with 0 Axes>

In [14]:
print(f"\n*** training {cfg_list[2]['name']} ***")
train_custom(cfg_list[2], train_loader, val_loader, test_loader, num_classes=25)


*** training custom_deep_bn ***


                                                                                                                       

[custom_deep_bn] epoch 01  train=2.4326  val=2.0471  acc=0.3619  prec=0.3689  rec=0.3590


                                                                                                                       

[custom_deep_bn] epoch 02  train=1.9252  val=1.7928  acc=0.4432  prec=0.4875  rec=0.4390


                                                                                                                       

[custom_deep_bn] epoch 03  train=1.7023  val=1.7530  acc=0.4491  prec=0.5001  rec=0.4441


                                                                                                                       

[custom_deep_bn] epoch 04  train=1.5246  val=1.4821  acc=0.5464  prec=0.5953  rec=0.5448


                                                                                                                       

[custom_deep_bn] epoch 05  train=1.3906  val=1.3485  acc=0.5757  prec=0.6484  rec=0.5745


                                                                                                                       

[custom_deep_bn] epoch 06  train=1.2682  val=1.1804  acc=0.6416  prec=0.6694  rec=0.6383


                                                                                                                       

[custom_deep_bn] epoch 07  train=1.1601  val=1.2162  acc=0.6307  prec=0.6867  rec=0.6302


                                                                                                                       

[custom_deep_bn] epoch 08  train=1.0791  val=0.9996  acc=0.6925  prec=0.7157  rec=0.6897


                                                                                                                       

[custom_deep_bn] epoch 09  train=1.0029  val=0.9492  acc=0.7152  prec=0.7402  rec=0.7129


                                                                                                                       

[custom_deep_bn] epoch 10  train=0.9267  val=1.0547  acc=0.6792  prec=0.7262  rec=0.6779


  model.load_state_dict(torch.load(best_path))


[custom_deep_bn] TEST loss=0.9136  acc=0.7181  P=0.7439  R=0.7197  F1=0.7182


<Figure size 640x480 with 0 Axes>

In [16]:
ultra_cfg = {
    "name": "custom_ultra_bn",
    "filters": [32, 64, 128, 256, 512, 512, 512],
    "fc": 1024,
    "dropout": 0.5,
    "batch_norm": True
}

In [17]:
train_custom(ultra_cfg, train_loader, val_loader, test_loader, num_classes=25)

                                                                                                                       

[custom_ultra_bn] epoch 01  train=1.9295  val=1.2776  acc=0.5907  prec=0.6141  rec=0.5871


                                                                                                                       

[custom_ultra_bn] epoch 02  train=1.0813  val=0.9085  acc=0.7147  prec=0.7380  rec=0.7143


                                                                                                                       

[custom_ultra_bn] epoch 03  train=0.7451  val=0.7303  acc=0.7736  prec=0.8052  rec=0.7724


                                                                                                                       

[custom_ultra_bn] epoch 04  train=0.5487  val=0.5490  acc=0.8328  prec=0.8362  rec=0.8324


                                                                                                                       

[custom_ultra_bn] epoch 05  train=0.4013  val=0.6539  acc=0.7925  prec=0.8310  rec=0.7927


                                                                                                                       

[custom_ultra_bn] epoch 06  train=0.2968  val=0.5982  acc=0.8261  prec=0.8451  rec=0.8251


                                                                                                                       

[custom_ultra_bn] epoch 07  train=0.2174  val=0.6359  acc=0.8256  prec=0.8394  rec=0.8251


                                                                                                                       

[custom_ultra_bn] epoch 08  train=0.1738  val=0.4475  acc=0.8733  prec=0.8783  rec=0.8736


                                                                                                                       

[custom_ultra_bn] epoch 09  train=0.1374  val=0.5176  acc=0.8581  prec=0.8675  rec=0.8593


                                                                                                                       

[custom_ultra_bn] epoch 10  train=0.1129  val=0.4929  acc=0.8693  prec=0.8804  rec=0.8692


  model.load_state_dict(torch.load(best_path))


[custom_ultra_bn] TEST loss=0.4049  acc=0.8819  P=0.8884  R=0.8817  F1=0.8832


<Figure size 640x480 with 0 Axes>

### Part 5

| Model                 | Params\* | Best Val Acc | Test Acc | Convergence notes                                                                                                                             |
| --------------------- | -------- | ------------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| **custom\_small**     | ≈ 0.5 M  | 0.39         | 0.36     | Very slow learning; loss still > 2.0 after 10 epochs. Under-capacity for 25-class task.                                                       |
| **custom\_medium**    | ≈ 1.5 M  | 0.53         | 0.52     | Steady but shallow improvement; begins to capture some class structure.                                                                       |
| **custom\_deep\_bn**  | ≈ 3.5 M  | 0.71         | 0.72     | Depth+BN clearly help: val loss falls below 1.0, metrics reach mid-70 %. Minor over-fit after epoch 9.                                        |
| **custom\_ultra\_bn** | ≈ 6 M    | **0.87**     | **0.88** | Adds two extra 512-channel blocks & bigger FC. Rapid drop to < 0.5 val loss, peaks \~88 % macro metrics. Best trade-off inside custom family. |

\*rough estimate from channel sizes.

#### Key observations

* **Capacity drives accuracy** – every extra block lifts val accuracy by \~15 pp until “ultra”, which surpasses 85 %.
* **Batch-norm + deeper stacks** are crucial; without them (small / medium) training is slow and under-fits.
* **custom\_ultra\_bn vs. pretrained baselines**

  * Still below MobileNet V3-small pretrained (≈ 98 %) but **closes half the gap** to ImageNet transfer with only in-domain learning.
  * Shows that a purpose-built, moderately deep network can approach good accuracy when transfer weights are unavailable.
  
  
### Loss And Metric Graphs For Part 5

Small:

<img src="custom_small_loss.png" alt="small" width="400">

<img src="custom_small_metrics_metrics.png" alt="small" width="400">

Medium:

<img src="custom_medium_loss.png" alt="med" width="400">

<img src="custom_medium_metrics.png" alt="med" width="400">

Deep:

<img src="custom_deep_bn_loss.png" alt="deep" width="400">

<img src="custom_deep_bn_metrics.png" alt="deep" width="400">

Ultra Deep:

<img src="custom_ultra_bn_loss.png" alt="ultra" width="400">

<img src="custom_ultra_bn_metrics_metrics.png" alt="ultra" width="400">


  
# OVERALL

Across the project we moved from classic computer-vision pipelines to deep learning and saw a clear accuracy ladder:

1. **Hand-crafted features (Parts 1 & 2):** colour-histograms with a Random Forest peaked at \~85 % test accuracy; all other feature/model combos stayed ≤ 73 %. Dimensionality-reduction or feature fusion never beat the raw colour-hist baseline.

2. **Pre-trained CNNs (Part 3):** fine-tuning ImageNet models skyrocketed performance - MobileNet V3-small reached \~98 % accuracy with minimal training.

3. **Same CNNs from scratch (Part 4):** without transfer learning accuracy fell sharply (ResNet-18 ≈ 81 %, MobileNet ≈ 60 %, ShuffleNet ≈ 48 %), confirming the value of pre-training.

4. **Custom CNNs (Part 5):** progressively deeper scratch-built nets improved from 36 % (small) to 88 % (ultra). Our “custom\_ultra\_bn” narrowed the gap to pre-trained models but still trailed MobileNet by \~10 pp.

**Bottom line:** colour histograms + trees are strong for shallow methods, but pre-trained deep networks remain unbeatable; deep models trained from scratch need substantial depth (and likely more data / augmentation) to compete.