## 🔢 Data Loading and Preprocessing

This section is responsible for loading, normalizing, and optionally sampling the **MNIST** and **CIFAR-10** datasets.

### 📥 Datasets Used

- **MNIST**: A dataset of grayscale handwritten digit images with shape **28×28×1**.
- **CIFAR-10**: A dataset of RGB color images across 10 object classes with shape **32×32×3**.

### ⚙️ Preprocessing Workflow

The preprocessing function performs the following tasks:

- **Normalization**:
  Pixel values are scaled from the original **[0, 255]** range to the normalized **[0, 1]** range. This is a common practice to improve the stability and performance of neural networks.

- **Class-Wise Sampling (Optional)**:
  A specified percentage (**20%** in this case) of samples is randomly selected **per class**, maintaining a balanced distribution. This is useful for:
  - Reducing dataset size
  - Speeding up training for experimentation
  - Ensuring every class is still represented in the reduced dataset

### 🧹 Dataset-Specific Preprocessing

- **MNIST**:
  - The data is reshaped to add a channel dimension, converting shape from **(28, 28)** to **(28, 28, 1)**.
  - Both training and test sets are normalized and optionally sampled using the preprocessing function.

- **CIFAR-10**:
  - Labels are flattened to match the expected format.
  - The images are normalized, and sampling is applied similarly to MNIST.

### 🧾 Output Shapes

At the end of preprocessing, the dataset shapes are printed to verify the data pipeline is functioning as expected.


In [12]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import mnist, cifar10
from sklearn.model_selection import train_test_split
from sklearn.utils import resample
from tensorflow.keras.utils import to_categorical

def sample_dataset(X, y, sample_fraction=0.5, random_state=42):
    n_samples = int(len(X) * sample_fraction)
    X_sampled, y_sampled = resample(X, y, n_samples=n_samples, random_state=random_state)
    return X_sampled, y_sampled

# Load and sample MNIST
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = mnist.load_data()
x_train_mnist, y_train_mnist = sample_dataset(x_train_mnist, y_train_mnist)
x_test_mnist, y_test_mnist = sample_dataset(x_test_mnist, y_test_mnist)

# Preprocess MNIST
x_train_mnist = x_train_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0
x_test_mnist = x_test_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0
y_train_mnist = to_categorical(y_train_mnist, 10)
y_test_mnist = to_categorical(y_test_mnist, 10)

# Load and sample CIFAR-10
(x_train_cifar, y_train_cifar), (x_test_cifar, y_test_cifar) = cifar10.load_data()
x_train_cifar, y_train_cifar = sample_dataset(x_train_cifar, y_train_cifar)
x_test_cifar, y_test_cifar = sample_dataset(x_test_cifar, y_test_cifar)

# Preprocess CIFAR-10
x_train_cifar = x_train_cifar.astype('float32') / 255.0
x_test_cifar = x_test_cifar.astype('float32') / 255.0
y_train_cifar = to_categorical(y_train_cifar, 10)
y_test_cifar = to_categorical(y_test_cifar, 10)

print("MNIST Train Shape:", x_train_mnist.shape)
print("MNIST Test Shape:", x_test_mnist.shape)
print("CIFAR-10 Train Shape:", x_train_cifar.shape)
print("CIFAR-10 Test Shape:", x_test_cifar.shape)


MNIST Train Shape: (30000, 28, 28, 1)
MNIST Test Shape: (5000, 28, 28, 1)
CIFAR-10 Train Shape: (25000, 32, 32, 3)
CIFAR-10 Test Shape: (5000, 32, 32, 3)


## 🧠 Custom CNN Architecture Definition

This section defines a **custom Convolutional Neural Network (CNN)** using Keras' `Sequential` API. The architecture is designed to be flexible and efficient for image classification tasks such as MNIST and CIFAR-10.

### 🏗️ Model Structure

The model is composed of **three convolutional blocks** followed by **fully connected layers**, structured as follows:

#### 🔹 Convolutional Blocks (Feature Extraction)
Each block performs:
- **Convolution**: Extracts spatial features using filters of size *(3×3)* (configurable).
- **Batch Normalization**: Normalizes activations to stabilize training and accelerate convergence.
- **LeakyReLU Activation**: Introduces non-linearity with a small slope for negative inputs, avoiding neuron death.
- **Max Pooling**: Reduces spatial dimensions by selecting max values in a *(2×2)* window.
- **Dropout**: Randomly disables neurons during training to prevent overfitting (default: 30%).

The number of filters **doubles** at each block:
- Block 1: `base_filters` (default: 32)
- Block 2: `base_filters × 2`
- Block 3: `base_filters × 4`

#### 🔸 Fully Connected Layers (Classification)
After flattening the feature maps:
- A **Dense layer** with `dense_units` (default: 128) and **ReLU** activation.
- A final **Dropout layer**.
- An output **Dense layer** with `num_classes` neurons and **softmax** activation for multi-class classification.

### 🧪 Compilation

- **Optimizer**: `Adam`, known for adaptive learning rates and good default performance.
- **Loss Function**: `categorical_crossentropy`, suitable for one-hot encoded multi-class labels.
- **Metric**: `accuracy`, to track classification performance.

### 🧩 Customizable Parameters

| Parameter        | Description                                 | Default |
|------------------|---------------------------------------------|---------|
| `base_filters`   | Number of filters in the first conv layer   | 32      |
| `kernel_size`    | Size of the convolution filters             | (3, 3)  |
| `dropout_rate`   | Dropout rate applied after pooling          | 0.3     |
| `dense_units`    | Units in the dense layer before output      | 128     |

This architecture provides a balance between **depth, performance, and regularization**, making it suitable for both small and medium-scale image classification tasks.


In [13]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Flatten, BatchNormalization, LeakyReLU, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Input

def build_custom_cnn(input_shape, num_classes,
                     base_filters=32,
                     kernel_size=(3, 3),
                     dropout_rate=0.3,
                     dense_units=128):

    model = Sequential()

    model.add(Input(shape=input_shape))

    # Block 1
    model.add(Conv2D(base_filters, kernel_size=kernel_size, padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(dropout_rate))

    # Block 2
    model.add(Conv2D(base_filters * 2, kernel_size=kernel_size, padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(dropout_rate))

    # Block 3
    model.add(Conv2D(base_filters * 4, kernel_size=kernel_size, padding='same'))
    model.add(BatchNormalization())
    model.add(LeakyReLU())
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(dropout_rate))


    model.add(Flatten())

    model.add(Dense(dense_units, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(num_classes, activation='softmax'))

    model.compile(optimizer=Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

    return model


## 🧪 Hyperparameter Configuration Grid

To evaluate the impact of various design choices, a set of **hyperparameter combinations** was defined. Each configuration alters the architecture of the custom CNN by changing:

- **Dropout Rate**: Controls the fraction of neurons randomly deactivated during training. Higher values promote regularization.
- **Dense Units**: Sets the number of neurons in the dense (fully connected) layer, influencing model capacity.
- **Kernel Size**: Defines the size of the convolutional filters, affecting receptive field and feature resolution.
- **Activation Function**: Modifies how neuron outputs are transformed. Different activations impact convergence and learning dynamics.

In [14]:
import matplotlib.pyplot as plt

# List of hyperparameter configs to test
param_sets = [
    { 'dropout_rate': 0.5, 'dense_units': 128, 'kernel_size': (3,3)},
  #  { 'dropout_rate': 0.3, 'dense_units': 256, 'kernel_size': (3,3)},
   # { 'dropout_rate': 0.7, 'dense_units': 64,  'kernel_size': (5,5)},
    #{ 'dropout_rate': 0.5, 'dense_units': 128, 'kernel_size': (3,3)},
]


## 📊 Model Training and Evaluation Loop

This section implements a loop that **trains and evaluates multiple CNN models** across two datasets: **MNIST** and **CIFAR-10**. For each dataset:

- The data is prepared (reshaped and one-hot encoded).
- A set of custom CNN architectures is trained using the predefined hyperparameter configurations.
- Each model is trained for **20 epochs** with a **batch size of 64**, using **20% of training data for validation**.

### 🧾 Metrics Collected

For each model configuration, the following are recorded:

- **Training History**: Tracks training and validation accuracy/loss over epochs.
- **Test Accuracy and Loss**: Evaluated on the hold-out test set to measure generalization.
- **Model Labels**: Descriptive tags containing parameter settings for clarity in comparison plots.

These results enable in-depth comparison of how architectural and training changes affect performance across datasets.


In [15]:
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.utils import to_categorical

histories = {"MNIST": [], "CIFAR-10": []}
labels = {"MNIST": [], "CIFAR-10": []}
test_results = {"MNIST": [], "CIFAR-10": []}

for dataset in ['MNIST', 'CIFAR-10']:
    if dataset == 'MNIST':
        x_train, y_train = x_train_mnist, y_train_mnist
        x_test, y_test = x_test_mnist, y_test_mnist
        input_shape = (28, 28, 1)
        num_classes = 10
    else:
        x_train, y_train = x_train_cifar, y_train_cifar
        x_test, y_test = x_test_cifar, y_test_cifar
        input_shape = (32, 32, 3)
        num_classes = 10




    for i, params in enumerate(param_sets):
        print(f"\nTraining {dataset} model {i+1} with params: {params}")

        model = build_custom_cnn(
            input_shape=input_shape,
            num_classes=num_classes,
            kernel_size=params['kernel_size'],
            dropout_rate=params['dropout_rate'],
            dense_units=params['dense_units'],
        )

        history = model.fit(
            x_train, y_train,
            epochs=20,
            batch_size=64,
            validation_split=0.2,
            verbose=2
        )


        history_dict = history.history
        histories[dataset].append(history_dict)

        # Generate and store label
        label = f"Dropout:{params['dropout_rate']}, Dense:{params['dense_units']}, Kernel:{params['kernel_size']}"
        labels[dataset].append(label)

        # Evaluate and store results
        test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
        test_results[dataset].append((test_loss, test_acc))
        print(f"Test accuracy for {dataset} model {i+1}: {test_acc:.4f}")



Training MNIST model 1 with params: {'dropout_rate': 0.5, 'dense_units': 128, 'kernel_size': (3, 3)}
Epoch 1/20
375/375 - 15s - 40ms/step - accuracy: 0.4576 - loss: 1.5447 - val_accuracy: 0.2235 - val_loss: 3.5734
Epoch 2/20
375/375 - 13s - 34ms/step - accuracy: 0.8193 - loss: 0.5617 - val_accuracy: 0.9407 - val_loss: 0.1974
Epoch 3/20
375/375 - 13s - 36ms/step - accuracy: 0.8846 - loss: 0.3734 - val_accuracy: 0.9602 - val_loss: 0.1271
Epoch 4/20
375/375 - 13s - 35ms/step - accuracy: 0.9062 - loss: 0.3080 - val_accuracy: 0.9663 - val_loss: 0.1019
Epoch 5/20
375/375 - 17s - 44ms/step - accuracy: 0.9222 - loss: 0.2532 - val_accuracy: 0.9730 - val_loss: 0.0900
Epoch 6/20
375/375 - 14s - 37ms/step - accuracy: 0.9323 - loss: 0.2203 - val_accuracy: 0.9760 - val_loss: 0.0766
Epoch 7/20
375/375 - 14s - 37ms/step - accuracy: 0.9416 - loss: 0.1996 - val_accuracy: 0.9767 - val_loss: 0.0754
Epoch 8/20
375/375 - 13s - 36ms/step - accuracy: 0.9474 - loss: 0.1788 - val_accuracy: 0.9768 - val_loss: 0

In [16]:
import pandas as pd

# After your training loops finish, add this code:

all_results = []

for dataset in ['MNIST', 'CIFAR-10']:
    for i, (label, (loss, acc)) in enumerate(zip(labels[dataset], test_results[dataset])):
        all_results.append({
            "Dataset": dataset,
            "Model #": i + 1,
            "Params": label,
            "Test Accuracy": acc
        })

# Convert to DataFrame for a nice tabular view
df_results = pd.DataFrame(all_results)

# Print the table sorted by Dataset and Model #
print(df_results.sort_values(by=["Dataset", "Model #"]).to_string(index=False))


 Dataset  Model #                                Params  Test Accuracy
CIFAR-10        1 Dropout:0.5, Dense:128, Kernel:(3, 3)         0.6240
   MNIST        1 Dropout:0.5, Dense:128, Kernel:(3, 3)         0.9928


In [17]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def plot_results(dataset_name, histories, labels):
    num_models = len(histories)
    epochs = range(1, len(histories[0]['val_accuracy']) + 1)

    # Create subplots for Accuracy and Loss
    fig = make_subplots(rows=1, cols=2, subplot_titles=("Validation Accuracy", "Validation Loss"))

    # Plot Accuracy
    for i in range(num_models):
        fig.add_trace(
            go.Scatter(x=list(epochs),
                       y=histories[i]['val_accuracy'],
                       mode='lines+markers',
                       name=labels[i],
                       legendgroup='accuracy'),
            row=1, col=1
        )

    # Plot Loss
    for i in range(num_models):
        fig.add_trace(
            go.Scatter(x=list(epochs),
                       y=histories[i]['val_loss'],
                       mode='lines+markers',
                       name=labels[i],
                       legendgroup='loss',
                       showlegend=False),  # Hide duplicate legends
            row=1, col=2
        )

    fig.update_layout(
        title_text=f"Training Results for {dataset_name}",
        width=1200,
        height=500,
        xaxis_title="Epoch",
        yaxis_title="Accuracy",
        xaxis2_title="Epoch",
        yaxis2_title="Loss"
    )
    fig.show()


In [18]:

# Call the function for each dataset
plot_results("MNIST", histories["MNIST"], labels["MNIST"])
plot_results("CIFAR-10", histories["CIFAR-10"], labels["CIFAR-10"])


## 🧠 AlexNet-Inspired CNN Architecture

This section defines a custom implementation of the **AlexNet architecture**, adapted for smaller input sizes like MNIST and CIFAR-10 by adjusting kernel sizes, strides, and the number of neurons in dense layers.

### 🔧 Architecture Overview

- **Convolutional Blocks (5 Total)**:
  - Each block applies convolutional filters to extract increasingly abstract features.
  - Local Response Normalization (LRN) is applied after the first two convolutional layers to mimic biological lateral inhibition and stabilize activation.
  - ReLU activation functions introduce non-linearity.
  - Max pooling reduces spatial dimensions and controls overfitting.

- **Fully Connected Layers**:
  - Two dense layers with **512 units** each and ReLU activation.
  - **Dropout (0.5)** is applied to both dense layers to reduce overfitting.
  - The final output layer uses **softmax** for multi-class classification.

### 📌 Adaptations

Original AlexNet used larger input sizes and more filters. This version is **scaled down** to fit smaller images (e.g., 28×28, 32×32), while preserving the core design elements like depth and normalization.


In [19]:
from tensorflow.keras import models, layers
import tensorflow as tf
from tensorflow.keras.layers import Input
def build_alexnet(input_shape, num_classes):
    model = models.Sequential()
    model.add(Input(shape=input_shape))

    # 1. First Convolutional Block
    model.add(layers.Conv2D(96, kernel_size=3, strides=1, padding='same'))
    model.add(layers.Lambda(tf.nn.local_response_normalization))
    model.add(layers.Activation('relu'))
    model.add(layers.MaxPooling2D(pool_size=2, strides=2))

    # 2. Second Convolutional Block
    model.add(layers.Conv2D(256, kernel_size=3, strides=1, padding='same'))
    model.add(layers.Lambda(tf.nn.local_response_normalization))
    model.add(layers.Activation('relu'))
    model.add(layers.MaxPooling2D(pool_size=2, strides=2))

    # 3. Third Convolutional Layer
    model.add(layers.Conv2D(384, kernel_size=3, strides=1, padding='same'))
    model.add(layers.Activation('relu'))

    # 4. Fourth Convolutional Layer
    model.add(layers.Conv2D(384, kernel_size=3, strides=1, padding='same'))
    model.add(layers.Activation('relu'))

    # 5. Fifth Convolutional Layer
    model.add(layers.Conv2D(256, kernel_size=3, strides=1, padding='same'))
    model.add(layers.Activation('relu'))
    model.add(layers.MaxPooling2D(pool_size=2, strides=1))

    # Flatten and Fully Connected Layers
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(512, activation='relu'))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(num_classes, activation='softmax'))

    return model



In [20]:
histories = {"MNIST": [], "CIFAR-10": []}
labels = {"MNIST": [], "CIFAR-10": []}
test_results = {"MNIST": [], "CIFAR-10": []}

for dataset in ['MNIST', 'CIFAR-10']:
    if dataset == 'MNIST':
        x_train, y_train = x_train_mnist, y_train_mnist
        x_test, y_test = x_test_mnist, y_test_mnist
        input_shape = (28, 28, 1)
        num_classes = 10
    else:
        x_train, y_train = x_train_cifar, y_train_cifar
        x_test, y_test = x_test_cifar, y_test_cifar
        input_shape = (32, 32, 3)
        num_classes = 10




    for i, params in enumerate(param_sets):
        print(f"\nTraining {dataset} model {i+1} with params: {params}")

        model = build_custom_cnn(
            input_shape=input_shape,
            num_classes=num_classes,
            kernel_size=params['kernel_size'],
            dropout_rate=params['dropout_rate'],
            dense_units=params['dense_units'],
        )

        history = model.fit(
            x_train, y_train,
            epochs=20,
            batch_size=64,
            validation_split=0.2,
            verbose=2
        )


        history_dict = history.history
        histories[dataset].append(history_dict)

        # Generate and store label
        label = f"Dropout:{params['dropout_rate']}, Dense:{params['dense_units']}, Kernel:{params['kernel_size']}"
        labels[dataset].append(label)

        # Evaluate and store results
        test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
        test_results[dataset].append((test_loss, test_acc))
        print(f"Test accuracy for {dataset} model {i+1}: {test_acc:.4f}")



Training MNIST model 1 with params: {'dropout_rate': 0.5, 'dense_units': 128, 'kernel_size': (3, 3)}
Epoch 1/20
375/375 - 15s - 39ms/step - accuracy: 0.4578 - loss: 1.5432 - val_accuracy: 0.1247 - val_loss: 5.1685
Epoch 2/20
375/375 - 12s - 32ms/step - accuracy: 0.7979 - loss: 0.6151 - val_accuracy: 0.9522 - val_loss: 0.1656
Epoch 3/20
375/375 - 12s - 32ms/step - accuracy: 0.8825 - loss: 0.3853 - val_accuracy: 0.9633 - val_loss: 0.1228
Epoch 4/20
375/375 - 12s - 32ms/step - accuracy: 0.9116 - loss: 0.2928 - val_accuracy: 0.9673 - val_loss: 0.1072
Epoch 5/20
375/375 - 12s - 32ms/step - accuracy: 0.9235 - loss: 0.2569 - val_accuracy: 0.9732 - val_loss: 0.0913
Epoch 6/20
375/375 - 12s - 33ms/step - accuracy: 0.9346 - loss: 0.2198 - val_accuracy: 0.9777 - val_loss: 0.0811
Epoch 7/20
375/375 - 12s - 33ms/step - accuracy: 0.9410 - loss: 0.2058 - val_accuracy: 0.9787 - val_loss: 0.0713
Epoch 8/20
375/375 - 13s - 35ms/step - accuracy: 0.9442 - loss: 0.1875 - val_accuracy: 0.9805 - val_loss: 0

In [21]:
all_results = []

for dataset in ['MNIST', 'CIFAR-10']:
    for i, (label, (loss, acc)) in enumerate(zip(labels[dataset], test_results[dataset])):
        all_results.append({
            "Dataset": dataset,
            "Model #": i + 1,
            "Params": label,
            "Test Accuracy": acc
        })

# Convert to DataFrame for a nice tabular view
df_results = pd.DataFrame(all_results)

# Print the table sorted by Dataset and Model #
print(df_results.sort_values(by=["Dataset", "Model #"]).to_string(index=False))
    # Plot results
plot_results("MNIST", histories["MNIST"], labels["MNIST"])
plot_results("CIFAR-10", histories["CIFAR-10"], labels["CIFAR-10"])

 Dataset  Model #                                Params  Test Accuracy
CIFAR-10        1 Dropout:0.5, Dense:128, Kernel:(3, 3)         0.6028
   MNIST        1 Dropout:0.5, Dense:128, Kernel:(3, 3)         0.9902
