# CIFAR‑10 & MNIST – Custom CNN vs AlexNet

This notebook follows the assignment requirements:
* Load and preprocess CIFAR‑10 (30 % per class) and MNIST.
* Implement **one custom CNN** and **one AlexNet‑style** network in pure TF/Keras.
* Compare accuracy, training speed, and model size.
* Identify weak classes → targeted augmentation → retrain.

Run locally with GPU support (TensorFlow will automatically use the most powerful
CUDA device it can see).

In [1]:
import os, random, numpy as np, tensorflow as tf
from tensorflow import keras

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

print("TensorFlow version:", tf.__version__)
print("GPU available?", tf.config.list_physical_devices('GPU'))

TensorFlow version: 2.10.0
GPU available? []


## 1 · Data loading & exploration
* CIFAR‑10 – 50 000 train / 10 000 test – **use 30 % per class** to save time.
* MNIST – keep full dataset (it is already small).

`load_and_subsample(dataset, pct)` returns tuples of `(x_train, y_train, x_test, y_test)`.

In [2]:
from collections import defaultdict

def load_and_subsample_cifar10(per_class_pct: float = 0.3):
    (x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
    x_train, y_train = x_train.astype("float32") / 255.0, y_train.flatten()
    x_test, y_test   = x_test.astype("float32") / 255.0,  y_test.flatten()

    if per_class_pct < 1.0:
        # Stratified down‑sampling
        indices_by_class = defaultdict(list)
        for idx, label in enumerate(y_train):
            indices_by_class[label].append(idx)
        chosen_idx = []
        for cls, idxs in indices_by_class.items():
            k = int(len(idxs) * per_class_pct)
            chosen_idx.extend(random.sample(idxs, k))
        x_train, y_train = x_train[chosen_idx], y_train[chosen_idx]

    return (x_train, y_train), (x_test, y_test)

# Load MNIST (grayscale → expand dims to 3 channels for AlexNet compatibility)
(x_mnist_train, y_mnist_train), (x_mnist_test, y_mnist_test) = keras.datasets.mnist.load_data()
x_mnist_train = np.expand_dims(x_mnist_train, -1).astype("float32") / 255.0
x_mnist_test  = np.expand_dims(x_mnist_test,  -1).astype("float32") / 255.0

### Visual sanity check (optional)

In [3]:
import matplotlib.pyplot as plt

def plot_samples(x, y, class_names, n=25):
    idxs = np.random.choice(len(x), n, replace=False)
    plt.figure(figsize=(6,6))
    for i, idx in enumerate(idxs):
        plt.subplot(5,5,i+1)
        plt.imshow(x[idx])
        plt.title(class_names[y[idx]])
        plt.axis('off')
    plt.tight_layout()

CIFAR10_CLASSES = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']
# plot_samples(x_train, y_train, CIFAR10_CLASSES)  # uncomment to view

## 2 · tf.data pipelines
We build reusable input pipelines with optional augmentation layers. Augmentations
are **off** for baseline training and selectively toggled later.

In [4]:
BATCH_SIZE = 128
AUTOTUNE = tf.data.AUTOTUNE

def make_dataset(images, labels, training=False, augment=False):
    ds = tf.data.Dataset.from_tensor_slices((images, labels))
    if training:
        ds = ds.shuffle(buffer_size=len(images), seed=SEED)
    def _preprocess(image, label):
        image = tf.image.resize_with_pad(image, 32, 32)  # ensures consistent size
        if augment:
            image = data_augmentation(image)
        return image, label
    ds = ds.map(_preprocess, num_parallel_calls=AUTOTUNE)
    return ds.batch(BATCH_SIZE).prefetch(AUTOTUNE)

# Define augmentation layers (can be updated in section 06)
data_augmentation = keras.Sequential([
    keras.layers.RandomFlip("horizontal"),
    keras.layers.RandomRotation(0.1),
    keras.layers.RandomZoom(0.1),
])

## 3 · Model builders
We create two functions:
* `build_custom_cnn()` – your own compact network
* `build_alexnet()` – Keras recreation of 2012 AlexNet (adapted to 32×32 inputs)

Both accept `input_shape` & `num_classes` so they can work on CIFAR‑10 **and** MNIST.

In [5]:
from tensorflow.keras import layers, models

def build_custom_cnn(input_shape, num_classes):
    inputs = layers.Input(shape=input_shape)
    x = layers.Conv2D(32, 3, activation='relu', padding='same')(inputs)
    x = layers.Conv2D(32, 3, activation='relu', padding='same')(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Dropout(0.25)(x)

    x = layers.Conv2D(64, 3, activation='relu', padding='same')(x)
    x = layers.Conv2D(64, 3, activation='relu', padding='same')(x)
    x = layers.MaxPooling2D()(x)
    x = layers.Dropout(0.25)(x)

    x = layers.Flatten()(x)
    x = layers.Dense(256, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    return models.Model(inputs, outputs, name="Custom_CNN")


def build_alexnet(input_shape, num_classes):
    inputs = layers.Input(shape=input_shape)
    # Layer 1
    x = layers.Conv2D(96, kernel_size=11, strides=4, activation='relu', padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D(pool_size=3, strides=2)(x)
    # Layer 2
    x = layers.Conv2D(256, kernel_size=5, activation='relu', padding='same')(x)
    x = layers.BatchNormalization()(x)
    x = layers.MaxPooling2D(pool_size=3, strides=2)(x)
    # Layer 3‑5
    x = layers.Conv2D(384, kernel_size=3, activation='relu', padding='same')(x)
    x = layers.Conv2D(384, kernel_size=3, activation='relu', padding='same')(x)
    x = layers.Conv2D(256, kernel_size=3, activation='relu', padding='same')(x)
    x = layers.MaxPooling2D(pool_size=3, strides=2)(x)
    # FC layers
    x = layers.Flatten()(x)
    x = layers.Dense(4096, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(4096, activation='relu')(x)
    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(num_classes, activation='softmax')(x)
    return models.Model(inputs, outputs, name="AlexNet")

## 4 · Training utilities
EarlyStopping + ReduceLROnPlateau callbacks and a helper `train_model()`
that returns history and evaluation metrics.

In [6]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

EPOCHS = 50

def compile_and_train(model, train_ds, val_ds, run_name="run"):
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    cbs = [
        EarlyStopping(patience=8, restore_best_weights=True),
        ReduceLROnPlateau(patience=4, factor=0.3, verbose=1)
    ]
    history = model.fit(train_ds, epochs=EPOCHS, validation_data=val_ds, callbacks=cbs)
    return history

## 5 · Baseline experiments
Run *Custom CNN* and *AlexNet* on:
* **CIFAR‑10 30 %**
* **MNIST**

Use the cells below as templates – uncomment / duplicate as needed.

In [7]:
# Example – CIFAR‑10
(x_train, y_train), (x_test, y_test) = load_and_subsample_cifar10(0.3)
train_ds = make_dataset(x_train, y_train, training=True)
test_ds  = make_dataset(x_test,  y_test)

custom_model = build_custom_cnn((32,32,3), 10)
history_custom = compile_and_train(custom_model, train_ds, test_ds)
custom_eval = custom_model.evaluate(test_ds, verbose=0)
print("Custom CNN – test accuracy:", custom_eval[1])


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50

KeyboardInterrupt: 

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns; import matplotlib.pyplot as plt


def plot_confusion_matrix(model, ds, class_names):
    y_true, y_pred = [], []
    for images, labels in ds:
        preds = model.predict(images, verbose=0)
        y_true.extend(labels.numpy())
        y_pred.extend(np.argmax(preds, axis=1))
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(8,6))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=class_names, yticklabels=class_names)
    plt.xlabel("Predicted"); plt.ylabel("True")
    plt.show()
    print(classification_report(y_true, y_pred, target_names=class_names))
