# MNIST Digit Classification with CNN Variants Based on LeNet-5

This notebook implements a digit classification task using the Reduced MNIST dataset. We explore various Convolutional Neural Network (CNN) architectures based on the LeNet-5 model, testing modifications such as different activation functions, layer configurations, pooling methods, regularization techniques, optimizers, and kernel sizes.

In [1]:
# Import necessary libraries
import os
import cv2
import keras
import time
import numpy as np
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, AveragePooling2D, MaxPooling2D, Dropout, BatchNormalization
from keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import Adam, SGD

2025-03-24 22:05:47.626267: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-24 22:05:47.630164: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-24 22:05:47.642787: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742846747.663334    6974 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742846747.671004    6974 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1742846747.688834    6974 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linkin

## 1. Data Loading and Preprocessing

In this section, we load the Reduced MNIST dataset from the specified training and testing directories. The images are grayscale (28x28 pixels), and the folder names (0-9) serve as labels. We shuffle the data to ensure randomness and reshape it for CNN input.

In [2]:
# Define paths to training and testing data directories
train_data_dir = './Reduced MNIST Data/Reduced Training data'
test_data_dir = './Reduced MNIST Data/Reduced Testing data'

# Get list of subdirectories (each representing a digit class)
train_class_dirs = os.listdir(train_data_dir)
test_class_dirs = os.listdir(test_data_dir)

print("Training class directories:", train_class_dirs)
print("Testing class directories:", test_class_dirs)

# Initialize lists to store images and labels
train_images = []
train_labels = []

# Load training data
for digit_class in train_class_dirs:
    class_path = os.path.join(train_data_dir, digit_class)
    for image_file in os.listdir(class_path):
        image_path = os.path.join(class_path, image_file)
        # Read image in grayscale (0-255 pixel values)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        train_images.append(image)
        train_labels.append(digit_class)  # Folder name is the label

# Convert lists to NumPy arrays
train_images = np.array(train_images)
train_labels = np.array(train_labels)

print("Training images shape:", train_images.shape)
print("Training labels shape:", train_labels.shape)

# Load testing data
test_images = []
test_labels = []

for digit_class in test_class_dirs:
    class_path = os.path.join(test_data_dir, digit_class)
    for image_file in os.listdir(class_path):
        image_path = os.path.join(class_path, image_file)
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        test_images.append(image)
        test_labels.append(digit_class)

# Convert lists to NumPy arrays
test_images = np.array(test_images)
test_labels = np.array(test_labels)

print("Testing images shape:", test_images.shape)
print("Testing labels shape:", test_labels.shape)

# Shuffle training and testing data for randomness
train_images, train_labels = shuffle(train_images, train_labels, random_state=4)
test_images, test_labels = shuffle(test_images, test_labels, random_state=4)

# Reshape data for CNN input (add channel dimension)
def reshape_for_cnn(images_train, images_test):
    """Reshape image arrays to include a channel dimension for CNN input."""
    train_cnn_input = images_train.reshape(images_train.shape[0], 28, 28, 1)
    test_cnn_input = images_test.reshape(images_test.shape[0], 28, 28, 1)
    return train_cnn_input, test_cnn_input

train_cnn_input, test_cnn_input = reshape_for_cnn(train_images, test_images)

Training class directories: ['0', '7', '5', '2', '6', '8', '4', '9', '3', '1']
Testing class directories: ['0', '7', '5', '2', '6', '8', '4', '9', '3', '1']
Training images shape: (10000, 28, 28)
Training labels shape: (10000,)
Testing images shape: (2000, 28, 28)
Testing labels shape: (2000,)


## 2. Model Training and Evaluation Functions

We define helper functions to train and evaluate CNN models, measuring training time, testing time, and accuracy. These functions will be used across all model variants.

In [3]:
def train_and_evaluate_cnn(model, train_data, train_labels, test_data, test_labels, 
                          epochs=10, batch_size=64, use_early_stopping=False, verbose=0):
    """Train and evaluate a CNN model, returning performance metrics."""
    # Set up callbacks (e.g., early stopping)
    callbacks = []
    if use_early_stopping:
        early_stop = EarlyStopping(monitor='accuracy', patience=3, restore_best_weights=True)
        callbacks.append(early_stop)
    
    # Train the model and measure training time
    start_time = time.time()
    history = model.fit(
        train_data, 
        to_categorical(train_labels), 
        epochs=epochs, 
        batch_size=batch_size,
        shuffle=True, 
        verbose=verbose, 
        callbacks=callbacks
    )
    training_time = time.time() - start_time
    
    # Evaluate the model and measure testing time
    start_time = time.time()
    test_loss, test_accuracy = model.evaluate(test_data, to_categorical(test_labels), verbose=0)
    testing_time = (time.time() - start_time) * 1000  # Convert to milliseconds
    
    return training_time, testing_time, test_accuracy, history

def print_performance_metrics(model_name, training_time, testing_time, test_accuracy):
    """Display performance metrics in a formatted manner."""
    print(f"----- {model_name} -----")
    print(f"Training Time: {training_time:.2f} seconds")
    print(f"Testing Time: {testing_time:.2f} milliseconds")
    print(f"Test Accuracy: {test_accuracy * 100:.2f}%")

## 3. CNN Model Variants

We define multiple CNN architectures based on LeNet-5, each with a specific modification:
- Base LeNet-5 model
- Increased number of filters
- Tanh activation function
- ELU activation function
- Fewer layers
- Additional layer
- MaxPooling instead of AveragePooling
- Dropout regularization
- Batch normalization
- SGD optimizer with momentum
- Smaller kernel size (3x3)

In [4]:
# Base LeNet-5 Model
def create_base_lenet5():
    """Create the original LeNet-5 architecture."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1), padding='valid'),
        AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
        Conv2D(16, (5, 5), activation='relu', padding='valid'),
        AveragePooling2D(pool_size=(2, 2), strides=(2, 2), padding='valid'),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 1: Increased Number of Filters
def create_increased_filters():
    """Increase the number of filters in convolutional layers."""
    model = Sequential([
        Conv2D(18, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(24, (5, 5), activation='relu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 2: Tanh Activation Function
def create_tanh_activation():
    """Use tanh activation instead of ReLU."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='tanh', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='tanh'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='tanh'),
        Dense(84, activation='tanh'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 3: ELU Activation Function
def create_elu_activation():
    """Use ELU activation instead of ReLU."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='elu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='elu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='elu'),
        Dense(84, activation='elu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 4: Fewer Layers
def create_fewer_layers():
    """Remove one dense layer from the base model."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='relu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(84, activation='relu'),  # Removed 120-unit layer
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 5: Additional Layer
def create_additional_layer():
    """Add an extra dense layer to the base model."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='relu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(42, activation='relu'),  # Additional layer
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 6: MaxPooling Instead of AveragePooling
def create_max_pooling():
    """Replace AveragePooling with MaxPooling."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        MaxPooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='relu'),
        MaxPooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 7: Dropout Regularization
def create_dropout_regularization():
    """Add dropout layers for regularization."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Dropout(0.25),
        Conv2D(16, (5, 5), activation='relu'),
        AveragePooling2D((2, 2)),
        Dropout(0.25),
        Flatten(),
        Dense(120, activation='relu'),
        Dropout(0.5),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 8: Batch Normalization
def create_batch_normalization():
    """Add batch normalization layers."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        BatchNormalization(),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='relu'),
        BatchNormalization(),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        BatchNormalization(),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 9: SGD Optimizer with Momentum
def create_sgd_optimizer():
    """Use SGD optimizer with momentum instead of Adam."""
    model = Sequential([
        Conv2D(6, (5, 5), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (5, 5), activation='relu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.9), 
                 loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# Variant 10: Smaller Kernel Size (3x3)
def create_smaller_kernel():
    """Use 3x3 kernels instead of 5x5."""
    model = Sequential([
        Conv2D(6, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        AveragePooling2D((2, 2)),
        Conv2D(16, (3, 3), activation='relu'),
        AveragePooling2D((2, 2)),
        Flatten(),
        Dense(120, activation='relu'),
        Dense(84, activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
    return model

## 4. Experiment Runner and Results

We run experiments for each CNN variant, training them on the dataset and evaluating their performance based on training time, testing time, and accuracy.

In [5]:
def run_cnn_experiments(train_images, train_labels, test_images, test_labels):
    """Run experiments for all CNN variants and collect results."""
    # Reshape data for CNN input
    train_cnn_data, test_cnn_data = reshape_for_cnn(train_images, test_images)
    
    # Define all model variants
    model_variants = {
        "Base LeNet-5 Model": create_base_lenet5,
        "Increased Number of Filters": create_increased_filters,
        "Tanh Activation Function": create_tanh_activation,
        "ELU Activation Function": create_elu_activation,
        "Fewer Layers": create_fewer_layers,
        "Additional Layer": create_additional_layer,
        "MaxPooling Instead of AveragePooling": create_max_pooling,
        "Dropout Regularization": create_dropout_regularization,
        "Batch Normalization": create_batch_normalization,
        "SGD Optimizer with Momentum": create_sgd_optimizer,
        "Smaller Kernel Size (3x3)": create_smaller_kernel
    }
    
    # Store results
    experiment_results = []
    
    # Run each experiment
    for variant_name, create_model_fn in model_variants.items():
        model = create_model_fn()
        training_time, testing_time, test_accuracy, history = train_and_evaluate_cnn(
            model, train_cnn_data, train_labels, test_cnn_data, test_labels
        )
        print_performance_metrics(variant_name, training_time, testing_time, test_accuracy)
        print("\n")
        experiment_results.append({
            "name": variant_name,
            "training_time": training_time,
            "testing_time": testing_time,
            "test_accuracy": test_accuracy
        })
    
    return experiment_results

# Execute experiments
results = run_cnn_experiments(train_images, train_labels, test_images, test_labels)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
E0000 00:00:1742846865.086843    6974 cuda_executor.cc:1228] INTERNAL: CUDA Runtime error: Failed call to cudaGetRuntimeVersion: Error loading CUDA libraries. GPU will not be used.: Error loading CUDA libraries. GPU will not be used.
W0000 00:00:1742846865.088040    6974 gpu_device.cc:2341] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2025-03-24 22:07:46.298413: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 25171200 exceeds 10% of free system memory.
2025-03-24 22:07:46.308774: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 25171200 exceeds 10% of free system memory.
2025-03-24 22:07:46.31657

----- Base LeNet-5 Model -----
Training Time: 12.45 seconds
Testing Time: 339.65 milliseconds
Test Accuracy: 96.20%


----- Increased Number of Filters -----
Training Time: 20.08 seconds
Testing Time: 413.23 milliseconds
Test Accuracy: 97.35%


----- Tanh Activation Function -----
Training Time: 12.35 seconds
Testing Time: 417.19 milliseconds
Test Accuracy: 96.90%


----- ELU Activation Function -----
Training Time: 12.88 seconds
Testing Time: 312.66 milliseconds
Test Accuracy: 97.75%


----- Fewer Layers -----
Training Time: 10.32 seconds
Testing Time: 288.19 milliseconds
Test Accuracy: 97.05%


----- Additional Layer -----
Training Time: 10.73 seconds
Testing Time: 301.06 milliseconds
Test Accuracy: 97.15%


----- MaxPooling Instead of AveragePooling -----
Training Time: 10.85 seconds
Testing Time: 306.11 milliseconds
Test Accuracy: 96.25%


----- Dropout Regularization -----
Training Time: 11.61 seconds
Testing Time: 296.35 milliseconds
Test Accuracy: 98.05%


----- Batch Normalizat

## 5. Conclusion

This notebook evaluates multiple CNN architectures derived from LeNet-5 on the Reduced MNIST dataset. By comparing training time, testing time, and accuracy across variants, we can assess the impact of architectural changes such as activation functions, layer counts, pooling methods, regularization, optimizers, and kernel sizes on model performance.