# Week 7 â€” Convolutional Neural Networks (Image Classification) â€” **Solutions**

This notebook contains **one possible set of solutions** for the Week 7 lab.

We will:

- Load the **CIFARâ€‘10** dataset using ðŸ¤— `datasets` and prepare PyTorch `DataLoader`s.
- Build a **PCA + Logistic Regression** baseline and evaluate it (accuracy, precision, recall, F1, confusion matrix).
- Implement a **simple CNN** that barely beats (or even underperforms) the baseline.
- Implement a **stronger CNN** that clearly outperforms the baseline.
- Train a CNN on **dataâ€‘augmented images** and inspect augmentations.
- Add an **advanced CNN feature (Global Average Pooling)** and compare its performance.


In [None]:
import math
import random
from collections import defaultdict

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix

from datasets import load_dataset
from torchvision import transforms

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


## 1. Loading CIFARâ€‘10 and Visualizing Samples

We will use the **CIFARâ€‘10** dataset from ðŸ¤— `datasets`.  
Each example is a `32Ã—32` RGB image from one of 10 classes.

In [None]:
# Load CIFARâ€‘10 from Hugging Face Datasets
cifar10 = load_dataset("cifar10")

# For faster experiments, we can optionally subsample
MAX_TRAIN = 20000   # out of 50k
MAX_TEST = 4000     # out of 10k

train_split = cifar10["train"].shuffle(seed=42).select(range(MAX_TRAIN))
test_split = cifar10["test"].shuffle(seed=42).select(range(MAX_TEST))

class_labels = cifar10["train"].features["label"].names
print("Classes:", class_labels)

# Standard normalization values for CIFARâ€‘10
mean = [0.4914, 0.4822, 0.4465]
std = [0.2470, 0.2435, 0.2616]

basic_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std),
])

class HFCIFAR10(Dataset):
    def __init__(self, hf_split, transform=None):
        self.hf_split = hf_split
        self.transform = transform

    def __len__(self):
        return len(self.hf_split)

    def __getitem__(self, idx):
        example = self.hf_split[idx]
        img = example["img"]  # PIL Image
        label = example["label"]
        if self.transform is not None:
            img = self.transform(img)
        return img, label

train_dataset = HFCIFAR10(train_split, transform=basic_transform)
test_dataset = HFCIFAR10(test_split, transform=basic_transform)

BATCH_SIZE = 128

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

len(train_dataset), len(test_dataset)


In [None]:
# Visualize a batch of training images (unnormalized for display)
inv_transform = transforms.Normalize(
    mean=[-m/s for m, s in zip(mean, std)],
    std=[1/s for s in std],
)

def show_batch(loader, n_images=16):
    images, labels = next(iter(loader))
    images = images[:n_images]
    labels = labels[:n_images]

    images = inv_transform(images)  # roughly undo normalization

    grid_size = int(math.sqrt(n_images))
    fig, axes = plt.subplots(grid_size, grid_size, figsize=(6, 6))
    for i, ax in enumerate(axes.flatten()):
        if i >= n_images:
            ax.axis("off")
            continue
        img = images[i].permute(1, 2, 0).numpy()
        img = np.clip(img, 0, 1)
        ax.imshow(img)
        ax.set_title(class_labels[labels[i].item()], fontsize=8)
        ax.axis("off")
    plt.tight_layout()
    plt.show()

show_batch(train_loader, n_images=16)


## 2. PCA + Logistic Regression Baseline (Solution)

To build a strong but simple baseline:

1. **Flatten** each image to a vector of size `32Ã—32Ã—3 = 3072`.
2. Fit **PCA** on training vectors (e.g., 100 components).
3. Train **Logistic Regression** on the PCA features.
4. Evaluate using accuracy, precision, recall, F1 and the confusion matrix.

In [None]:
def evaluate_classification(y_true, y_pred):
    """Compute and print accuracy, macro precision/recall/F1, and confusion matrix."""
    acc = accuracy_score(y_true, y_pred)
    prec, rec, f1, _ = precision_recall_fscore_support(y_true, y_pred, average="macro", zero_division=0)
    cm = confusion_matrix(y_true, y_pred)
    print(f"Accuracy : {acc:.4f}")
    print(f"Precision: {prec:.4f}")
    print(f"Recall   : {rec:.4f}")
    print(f"F1â€‘score : {f1:.4f}")
    print("Confusion matrix (rows=true, cols=pred):")
    print(cm)
    return {
        "accuracy": acc,
        "precision": prec,
        "recall": rec,
        "f1": f1,
        "cm": cm,
    }


def pca_logistic_baseline(train_dataset, test_dataset, n_components=100, max_iter=1000):
    # Convert datasets to numpy arrays
    def dataset_to_numpy(ds):
        X_list, y_list = [], []
        for img, y in DataLoader(ds, batch_size=256):
            # undo normalization for PCA stability (optional)
            img = inv_transform(img)
            X_list.append(img.view(img.size(0), -1).numpy())
            y_list.append(y.numpy())
        X = np.concatenate(X_list, axis=0)
        y = np.concatenate(y_list, axis=0)
        return X, y

    X_train, y_train = dataset_to_numpy(train_dataset)
    X_test, y_test = dataset_to_numpy(test_dataset)

    print("Train X shape:", X_train.shape)
    print("Test  X shape:", X_test.shape)

    # 1) Fit PCA
    pca = PCA(n_components=n_components, random_state=42)
    X_train_pca = pca.fit_transform(X_train)
    X_test_pca = pca.transform(X_test)

    # 2) Fit Logistic Regression on PCA features
    clf = LogisticRegression(
        max_iter=max_iter,
        multi_class="multinomial",
        solver="lbfgs",
        n_jobs=-1,
    )
    clf.fit(X_train_pca, y_train)

    # 3) Evaluate
    y_pred = clf.predict(X_test_pca)
    metrics = evaluate_classification(y_test, y_pred)
    return metrics

pca_metrics = pca_logistic_baseline(train_dataset, test_dataset, n_components=100)
print("\nPCA + Logistic Regression baseline accuracy:", pca_metrics["accuracy"])


## 3. Simple CNN (Solution)

We now implement a small CNN that may **struggle** to significantly beat the baseline:

- Two convolutional layers with small numbers of filters.
- A small fullyâ€‘connected head.
- Trained for only a few epochs.

This is intentionally *underâ€‘powered* so we can appreciate why a good baseline is important.

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))   # 32x32 -> 16x16
        x = self.pool(F.relu(self.conv2(x)))   # 16x16 -> 8x8
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


def train_epoch(model, loader, optimizer, criterion):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)
        preds = outputs.argmax(dim=1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
    return running_loss / total, correct / total


def evaluate_model(model, loader, criterion=None):
    model.eval()
    all_preds = []
    all_labels = []
    running_loss = 0.0
    total = 0
    with torch.no_grad():
        for inputs, labels in loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            if criterion is not None:
                loss = criterion(outputs, labels)
                running_loss += loss.item() * inputs.size(0)
                total += labels.size(0)
            preds = outputs.argmax(dim=1)
            all_preds.append(preds.cpu())
            all_labels.append(labels.cpu())

    all_preds = torch.cat(all_preds).numpy()
    all_labels = torch.cat(all_labels).numpy()
    metrics = evaluate_classification(all_labels, all_preds)
    if criterion is not None and total > 0:
        metrics["loss"] = running_loss / total
    return metrics


def train_model(model, train_loader, test_loader, epochs=5, lr=1e-3):
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    for epoch in range(1, epochs + 1):
        train_loss, train_acc = train_epoch(model, train_loader, optimizer, criterion)
        print(f"Epoch {epoch:02d} | train_loss={train_loss:.4f} | train_acc={train_acc:.4f}")

    print("\nFinal evaluation on test set:")
    test_metrics = evaluate_model(model, test_loader, criterion)
    print("Test accuracy:", test_metrics["accuracy"])
    return test_metrics

simple_cnn = SimpleCNN(num_classes=len(class_labels))
simple_metrics = train_model(simple_cnn, train_loader, test_loader, epochs=3, lr=1e-3)
print("\nSimple CNN test accuracy:", simple_metrics["accuracy"], "(baseline was ~", pca_metrics["accuracy"], ")")


## 4. Stronger CNN (Solution)

Now we build a **deeper CNN** with more channels and layers:

- 3 convolutional blocks (Conv â†’ ReLU â†’ BatchNorm â†’ MaxPool).
- A larger fullyâ€‘connected head with dropout.

We train it for more epochs, expecting better performance than both the PCA baseline and the simple CNN.

In [None]:
class ProperCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),   # 32 -> 16

            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),   # 16 -> 8

            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),   # 8 -> 4
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128 * 4 * 4, 256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(256, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

proper_cnn = ProperCNN(num_classes=len(class_labels))
proper_metrics = train_model(proper_cnn, train_loader, test_loader, epochs=8, lr=1e-3)
print("\nProper CNN test accuracy:", proper_metrics["accuracy"])


## 5. CNN with Data Augmentation (Solution)

To improve generalization we apply common **data augmentation** techniques:

- Random horizontal flips.
- Random crops with padding.

We keep the test transform unchanged.

In [None]:
augment_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std),
])

aug_train_dataset = HFCIFAR10(train_split, transform=augment_transform)
aug_train_loader = DataLoader(aug_train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)

print("Original train size:", len(train_dataset))
print("Augmented train size (same images, different transforms each epoch):", len(aug_train_dataset))

aug_cnn = ProperCNN(num_classes=len(class_labels))
aug_metrics = train_model(aug_cnn, aug_train_loader, test_loader, epochs=8, lr=1e-3)
print("\nAugmented CNN test accuracy:", aug_metrics["accuracy"])


## 6. Advanced CNN Feature: Global Average Pooling (Solution)

A common architectural pattern in modern CNNs is **Global Average Pooling (GAP)**:

- Replace large fullyâ€‘connected layers on top of feature maps with
  a global average over spatial dimensions (`HÃ—W â†’ 1Ã—1` per channel).
- This reduces the number of parameters and can act as a form of regularization.

We build a model that uses GAP before the final classifier.

In [None]:
class GAPCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),

            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
        )
        self.gap = nn.AdaptiveAvgPool2d((1, 1))  # global average pooling
        self.classifier = nn.Linear(128, num_classes)

    def forward(self, x):
        x = self.features(x)
        x = self.gap(x)              # (B, 128, 1, 1)
        x = x.view(x.size(0), -1)    # (B, 128)
        x = self.classifier(x)
        return x

gap_cnn = GAPCNN(num_classes=len(class_labels))
gap_metrics = train_model(gap_cnn, aug_train_loader, test_loader, epochs=8, lr=1e-3)
print("\nGAP CNN test accuracy:", gap_metrics["accuracy"])


## 7. Discussion â€” Sample Answers

**1. Did the deeper CNN outperform the baseline models?**  
Yes. In typical runs, the **PCA + Logistic Regression** baseline reaches around *40â€“45%* test accuracy,  
the **simple CNN** reaches *similar or slightly better* performance, while the **Proper CNN** and **augmented CNN** usually reach *60â€“75%* depending on hyperâ€‘parameters and runtime. This clearly shows the benefit of more expressive models for image data.

**2. How did data augmentation affect performance?**  
Data augmentation usually **improves generalization**. Even though the training loss might decrease more slowly, the augmented CNN often gets **higher test accuracy** than the same architecture trained on nonâ€‘augmented images. The model sees more varied versions of the same objects (different crops, flips), so it becomes less sensitive to small perturbations.

**3. What effect did Global Average Pooling have?**  
Global Average Pooling removes large fullyâ€‘connected layers over spatial maps and replaces them with a simple average. This:
- **Reduces parameters**, making the model lighter and less prone to overfitting.
- Forces the network to learn more **global, translationâ€‘invariant** features.
In practice the GAP model often performs similarly to or slightly better than the nonâ€‘GAP CNN, but with fewer parameters.

**4. Why is it important to compare against simple baselines?**  
Without a baseline we might celebrate a CNN that reaches, say, 45% accuracy, without realizing that a **simple PCA + Logistic Regression** already gets 40â€“45%. Baselines:
- Provide a **sanityâ€‘check** (are we really learning something?).
- Help us detect **bugs** or ineffective architectures.
- Make it easier to justify the **extra complexity** of deep models when they actually provide gains.
