# MNIST CNN Image Classification

This Jupyter notebook implements a Convolutional Neural Network (CNN) for classifying handwritten digits from the MNIST dataset using PyTorch.

## Overview

1. **Data Loading & Preprocessing**
   - Loads the MNIST dataset and splits it into training, validation, and test sets.
   - Reshapes the image data into tensors with shape `(batch_size, 1, 28, 28)` to match the CNN input requirements.
   - Normalizes and batches the data for efficient training.

2. **Model Architecture**
   - Implements a CNN using PyTorch's `nn.Sequential`, consisting of:
     - Two convolutional layers with ReLU activation
     - MaxPooling layers to reduce spatial size
     - Dropout for regularization
     - Fully connected (linear) layers for classification
   - A custom `Flatten` module is used to reshape feature maps into flat vectors before feeding into the fully connected layers.

3. **Training Procedure**
   - Uses Stochastic Gradient Descent (SGD) with optional momentum and Nesterov acceleration.
   - Trains the model over several epochs, printing the training and validation loss/accuracy at each epoch.

4. **Evaluation**
   - Evaluates the model's performance on a held-out test set and reports final accuracy and loss.


For detailed information about SGD optimizer parameters and configuration options, refer to the [PyTorch SGD documentation](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html).

## Library Dependencies

The following libraries are essential for our analysis:

In [2]:
import numpy as np
from preprocessing_data import get_MNIST_data
import torch
import torch.nn.functional as F
import torch.nn as nn
from sklearn.model_selection import train_test_split

### Loading the data

In [3]:
train_x, train_y, test_x, test_y = get_MNIST_data()

**`Flatten`**
 A custom PyTorch layer that reshapes a 4D input tensor of shape `(batch_size, channels, height, width)` into a 2D tensor of shape `(batch_size, -1)` by flattening all dimensions except the batch size. This is useful when transitioning from convolutional layers to fully connected layers.


In [4]:
class Flatten(nn.Module):
    """A custom PyTorch layer that flattens the input tensor to 1D.
    
    This layer transforms a multi-dimensional input tensor into a 2D tensor
    where the first dimension is the batch size and the second dimension
    contains all other dimensions flattened into one dimension.
    
    Args:
        None
        
    Returns:
        torch.Tensor: Flattened output tensor of shape (batch_size, -1)
    """

    def forward(self, input):
        return input.view(input.size(0), -1)


**`batchify_data(x_data, y_data, batch_size)`**
  Takes raw feature/label arrays and splits them into a list of PyTorch tensor batches of a given size. Drops remainder to ensure consistent batch sizes.


In [5]:
def batchify_data(x_data, y_data, batch_size):
    """Creates batches of data for mini-batch training with PyTorch.

    This function takes input data and corresponding labels, converts them into NumPy arrays,
    and creates batches of a specified size. The function ensures all batches are of equal size
    by dropping any remaining samples that don't fit into a complete batch.

    Args:
        x_data (array-like): Input features/data points to be batched.
                            Can be a list, NumPy array, or similar structure.
        y_data (array-like): Labels/targets corresponding to x_data.
                            Must have the same length as x_data.
        batch_size (int): Number of samples per batch. Should be > 0.

    Returns:
        list[dict]: A list of dictionaries, where each dictionary represents a batch
                   containing:
                   - 'x': torch.Tensor of shape (batch_size, *feature_dims) with dtype float32
                   - 'y': torch.Tensor of shape (batch_size,) with dtype int64
    """
    # Convert inputs to NumPy arrays for consistent handling
    x_data = np.array(x_data)  # Ensure it's a single NumPy array
    y_data = np.array(y_data)  # Ensure it's a single NumPy array
    # Calculate number of complete batches (dropping remainder)
    N = (len(x_data) // batch_size) * batch_size
    batches = []
    for i in range(0, N, batch_size):
        batches.append({
            'x': torch.tensor(x_data[i:i+batch_size], dtype=torch.float32),
            'y': torch.tensor(y_data[i:i+batch_size], dtype=torch.long
        )})
    return batches

**`compute_accuracy(predictions, y)`**
  Computes classification accuracy by comparing model predictions with ground truth labels.

In [6]:
def compute_accuracy(predictions, y):
    """
    Computes the classification accuracy between predicted and true labels.

    Calculates the accuracy as the mean of correct predictions divided by total predictions.
    Both inputs are converted to NumPy arrays for computation.

    Args:
        predictions (torch.Tensor): Model predictions.
                                  Shape: (batch_size,)
                                  dtype: torch.long
        y (torch.Tensor): Ground truth labels.
                         Shape: (batch_size,)
                         dtype: torch.long

    Returns:
        float: Classification accuracy as a decimal between 0.0 and 1.0.
               - 1.0 means all predictions are correct
               - 0.0 means all predictions are wrong
    """
    return np.mean(np.equal(predictions.numpy(), y.numpy()))

**`run_epoch(batches, model, optimizer)`**
  Performs one full pass (epoch) over the given dataset (training or validation). Computes average loss and accuracy.
  - If model is in training mode (`model.train()`), it performs backpropagation and optimization.
  - If in evaluation mode (`model.eval()`), it only computes predictions and loss without updating weights.


In [7]:
def run_epoch(batches, model, optimizer):
    """
    Processes one epoch of data through the model in either training or evaluation mode.

    This function handles both training and evaluation passes through the data:
    - In training mode (model.training=True): performs backpropagation and parameter updates
    - In evaluation mode (model.training=False): only computes forward pass and metrics

    Args:
        batches (list): List of dictionaries, each containing:
                       - 'x': torch.Tensor of shape (batch_size, *feature_dims)
                       - 'y': torch.Tensor of shape (batch_size,) with class labels
        model (torch.nn.Module): The neural network model to train/evaluate
        optimizer (torch.optim.Optimizer): The optimizer for updating model parameters
                                         (used only in training mode)

    Returns:
        tuple: Contains:
            - avg_loss (float): Mean cross-entropy loss across all batches
            - avg_accuracy (float): Mean prediction accuracy across all batches

    Note:
        - In training mode, gradients are zeroed before each batch
        - Cross-entropy loss is used for classification
        - Predictions are made using argmax on model outputs
        - Accuracy is computed using exact matches between predictions and labels
    """

    # Initialize lists to store per-batch metrics
    losses = []
    batch_accuracies = []

    # Check if we're in training mode
    is_training = model.training

    # Process each batch
    for batch in batches:
        # Extract features and labels
        x, y = batch['x'], batch['y']

        # Forward pass through the model
        out = model(x)

        # Compute predictions and accuracy
        predictions = torch.argmax(out, dim=1) # dim=1 for class dimension
        batch_accuracies.append(compute_accuracy(predictions, y))

        # Compute classification loss
        loss = F.cross_entropy(out, y)
        losses.append(loss.data.item())

        # Perform optimization step if in training mode
        if is_training:
            optimizer.zero_grad()    # Clear previous gradients
            loss.backward()          # Compute gradients
            optimizer.step()         # Update parameters

    # Compute epoch-level metrics
    avg_loss = np.mean(losses)
    avg_accuracy = np.mean(batch_accuracies)
    return avg_loss, avg_accuracy


**`train_model(train_data, dev_data, model, ...)`**

  Main training loop:
  - Iterates through a fixed number of epochs.
  - Runs training and validation in each epoch using `run_epoch`.
  - Saves the model after each epoch.
  - Returns the final validation accuracy.

In [8]:
def train_model(train_data,
                dev_data,
                model,
                lr=0.01,
                momentum=0.9,
                nesterov=False):
    """
    Trains a PyTorch model using SGD optimization and validates its performance.

    This function implements a training loop that:
    1. Initializes an SGD optimizer with specified parameters
    2. Trains for 10 epochs, evaluating after each epoch
    3. Saves the model after each epoch
    4. Returns the final validation accuracy

    Args:
        train_data (list): List of training data batches, where each batch is a dict
                          containing 'x' (features) and 'y' (labels) tensors
        dev_data (list): List of validation data batches, similar structure to train_data
        model (torch.nn.Module): The PyTorch model to train
        lr (float, optional): Learning rate for SGD optimizer. Defaults to 0.01
        momentum (float, optional): Momentum factor for SGD optimizer. Defaults to 0.9
        nesterov (bool, optional): Whether to use Nesterov momentum. Defaults to False

    Returns:
        float: The validation accuracy from the final epoch
    """

    # Initialize SGD optimizer with specified parameters
    optimizer = torch.optim.SGD(model.parameters(),
                                lr=lr,
                                momentum=momentum,
                                nesterov=nesterov)
    # Train for 10 epochs
    for epoch in range(1, 11):
        print("-------------\nEpoch {}:\n".format(epoch))


        # Training phase
        loss, acc = run_epoch(train_data, model.train(), optimizer)
        print('Train loss: {:.6f} | Train accuracy: {:.6f}'.format(loss, acc))

        # Validation phase
        val_loss, val_acc = run_epoch(dev_data, model.eval(), optimizer)
        print('Val loss:   {:.6f} | Val accuracy:   {:.6f}'.format(val_loss, val_acc))

        # Save current model state
        torch.save(model, 'mnist_model_fully_connected.pt')

    return val_acc

**`main()`**
  Coordinates the full training workflow:
  - Loads and reshapes the MNIST dataset.
  - Splits the data into training, validation, and test sets.
  - Shuffles the training data to avoid training bias.
  - Defines the CNN model using `nn.Sequential`.
  - Trains the model with `train_model`.
  - Evaluates final model performance on the test set using `run_epoch`.

In [9]:
def main():
    """
    Main function for training and evaluating a CNN model on the MNIST dataset.

    Workflow:
    1. Data Loading and Preprocessing:
       - Loads MNIST data
       - Reshapes data into proper image format (1x28x28)
       - Splits data into train/validation/test sets
       - Applies random shuffling to training data
       - Creates batches for all datasets

    2. Model Architecture:
       Creates a CNN with the following structure:
       - Conv2D(1→32, 3x3) → ReLU → MaxPool(2x2)
       - Conv2D(32→64, 3x3) → ReLU → MaxPool(2x2)
       - Flatten
       - Linear(1600→128) → Dropout(0.5)
       - Linear(128→10)

    3. Training and Evaluation:
       - Trains model using SGD with Nesterov momentum
       - Evaluates on test set
       - Prints final test loss and accuracy

    Data Dimensions:
        - Input images: (batch_size, 1, 28, 28)
        - Output: 10 classes (digits 0-9)
        - Batch size: 32

    Note:
        - Uses 10% of training data for validation
        - Applies shuffling only to training data
        - Test set remains unshuffled for reproducibility
    """

    # Load and prepare MNIST dataset
    num_classes = 10
    X_train, y_train, X_test, y_test = get_MNIST_data()

    # Reshape flat images into 4D format (batch_size, channels, height, width)
    X_train = np.reshape(X_train, (X_train.shape[0], 1, 28, 28))
    X_test = np.reshape(X_test, (X_test.shape[0], 1, 28, 28))

    # Create validation split (90% train, 10% validation)
    X_train, X_val, y_train, y_val = train_test_split(X_train,
                                                      y_train,
                                                      test_size=0.1)
    # Shuffle training data
    permutation = np.array([i for i in range(len(X_train))])
    np.random.shuffle(permutation)
    X_train = [X_train[i] for i in permutation]
    y_train = [y_train[i] for i in permutation]

    # Create batched datasets
    batch_size = 32
    train_batches = batchify_data(X_train, y_train, batch_size)
    dev_batches = batchify_data(X_val, y_val, batch_size)
    test_batches = batchify_data(X_test, y_test, batch_size)

    # Define CNN model architecture
    model = nn.Sequential(
    # First convolutional block
    nn.Conv2d(1, 32, (3, 3)),     # Input: 28x28x1  → Output: 26x26x32
    nn.ReLU(),
    nn.MaxPool2d((2, 2)),         # Output: 13x13x32

    # Second convolutional block
    nn.Conv2d(32, 64, (3, 3)),    # Output: 11x11x64
    nn.ReLU(),
    nn.MaxPool2d((2, 2)),         # Output: 5x5x64

    # Fully connected layers
    Flatten(),                     # Output: 1600 (5*5*64)
    nn.Linear(1600, 128),
    nn.Dropout(0.5),              # Prevent overfitting
    nn.Linear(128, num_classes)    # Output: 10 classes
    )

    # Train model
    train_model(train_batches, dev_batches, model, nesterov=True)

    # Evaluate on test set
    loss, accuracy = run_epoch(test_batches, model.eval(), None)
    print("Loss on test set:" + str(loss) + " Accuracy on test set: " + str(accuracy))


### Results

The model's performance metrics track progression across epochs, displaying training and validation losses alongside their respective accuracies for each iteration. The final evaluation reports comprehensive test set performance metrics, demonstrating the model's generalization capabilities on unseen data.

In [10]:
main()

-------------
Epoch 1:

Train loss: 0.242198 | Train accuracy: 0.922922
Val loss:   0.069764 | Val accuracy:   0.978108
-------------
Epoch 2:

Train loss: 0.080397 | Train accuracy: 0.975752
Val loss:   0.048263 | Val accuracy:   0.986297
-------------
Epoch 3:

Train loss: 0.060442 | Train accuracy: 0.981476
Val loss:   0.041506 | Val accuracy:   0.985963
-------------
Epoch 4:

Train loss: 0.048583 | Train accuracy: 0.984903
Val loss:   0.038915 | Val accuracy:   0.987467
-------------
Epoch 5:

Train loss: 0.040266 | Train accuracy: 0.987496
Val loss:   0.038104 | Val accuracy:   0.988636
-------------
Epoch 6:

Train loss: 0.034403 | Train accuracy: 0.989460
Val loss:   0.034159 | Val accuracy:   0.989806
-------------
Epoch 7:

Train loss: 0.030224 | Train accuracy: 0.990164
Val loss:   0.033432 | Val accuracy:   0.989472
-------------
Epoch 8:

Train loss: 0.026691 | Train accuracy: 0.991497
Val loss:   0.034987 | Val accuracy:   0.989305
-------------
Epoch 9:

Train loss: 0.02

# Conclusion

## Model Performance Analysis

The Convolutional Neural Network (CNN) demonstrates excellent learning progression:
- Performance improves consistently across epochs
- Achieved test accuracy exceeding 98%
- Shows strong generalization capabilities on unseen data

## Comparative Analysis

When compared to the traditional classification models implemented in `mnist_classification_part1.ipynb`, the CNN architecture demonstrates superior performance:
- Better accuracy metrics
- More efficient feature extraction through convolutional layers

This improvement in performance validates the choice of using CNNs for image classification tasks.