**LinkedIn Profile:** [Matencio Montana](https://www.linkedin.com/in/montana-matencio-b01111376)  
**Contact:** [montana.matencio@gmail.com](mailto:montana.matencio@gmail.com)

**MNIST Image Classification using CNN**

This project focuses on building and evaluating a deep learning model for classifying handwritten digits from the MNIST dataset using a Convolutional Neural Network (CNN).

**Project Goal:**
The primary objective is to develop an accurate and efficient image classification model that can correctly identify handwritten digits (0-9).

**Technologies & Methodologies:**

- Python: The core programming language.

- PyTorch: The deep learning framework used to build and train the CNN.

- TorchVision: Utilized for easily accessing and transforming the MNIST dataset.

- Convolutional Neural Networks (CNN): A specialized type of neural network highly effective for image processing tasks, employed here for feature extraction and classification.

    - nn.Conv2d: For applying convolutional filters to input images.

    - nn.BatchNorm2d: Implemented for normalizing layer outputs, improving training stability and convergence.

    -nn.ReLU: The activation function used to introduce non-linearity into the model.

    -nn.MaxPool2d: For downsampling feature maps, reducing computational cost and preventing overfitting.

    -nn.Flatten: To convert the 2D feature maps into a 1D vector for the fully connected layer.

    -nn.Linear: The final fully connected layer for classification.

    -nn.Dropout: Implemented for regularization to prevent overfitting by randomly setting a fraction of input units to zero at each update during training.

-nn.CrossEntropyLoss: The loss function used for multi-class classification problems.

-torch.optim.AdamW: An adaptive optimization algorithm used to update model weights, with weight decay for regularization.

-lr_scheduler.CosineAnnealingLR: A learning rate scheduler that adjusts the learning rate using a cosine annealing schedule, aiding in convergence and potentially finding better local minima.

-NumPy & Random: Used for setting random seeds to ensure reproducibility.

-GPU Acceleration: Leveraged torch.cuda when available for faster model training.

**Data Source:**

The model is trained and tested on the MNIST dataset, a widely used benchmark dataset of handwritten digits, readily available through torchvision.datasets.

**Notebook Structure:**

- Environment Setup: Setting random seeds for reproducibility and configuring device (CPU/GPU).

- Data Loading & Preprocessing: Loading the MNIST dataset and applying ToTensor() transformation.

- DataLoader Setup: Creating data loaders for efficient batch processing of training and testing data.

- Model Definition: Building the MNIST_CNN architecture with convolutional, batch normalization, activation, pooling, dropout, flatten, and linear layers.

- Hyperparameter Initialization: Setting up input_channels, hidden_channels, output_size, images_size, and dropout_prob.

- Loss Function and Optimizer: Defining nn.CrossEntropyLoss and torch.optim.AdamW.

- Training Loop: Implementing the training_loop function for iterating through batches, performing forward pass, calculating loss, backpropagation, and weight updates.

- Testing Loop: Implementing the test_loop function for evaluating model performance (accuracy and average loss) on the test set.

- Learning Rate Scheduling: Utilizing lr_scheduler.CosineAnnealingLR to dynamically adjust the learning rate during training.

- Model Training & Evaluation: Executing the training and testing loops over several epochs, displaying progress and performance metrics.

**Reproducibility:**
- Random seeds are set at the beginning of the script to ensure the reproducibility of results.

In [1]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.optim import lr_scheduler

In [2]:
import random
import numpy as np
import os # To set environment variables, useful for some libraries

def set_seed(seed):
    """
    Sets the random seed for reproducibility across different libraries.
    """
    # 1. Set seed for Python's built-in random module
    random.seed(seed)

    # 2. Set seed for NumPy
    np.random.seed(seed)

    # 3. Set seed for PyTorch (CPU and GPU)
    torch.manual_seed(seed) # For CPU operations
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed) # For current GPU
        torch.cuda.manual_seed_all(seed) # For all GPUs (if you have multiple)

    # 4. Ensure deterministic behavior for CuDNN (GPU operations)
    #    This can sometimes slightly slow down training, but ensures exact reproducibility.
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False # Disable CuDNN auto-tuner for deterministic ops

    # 5. Set environment variable for Python hashing (affects dicts, sets, etc.)
    os.environ['PYTHONHASHSEED'] = str(seed)

    print(f"Random seed set to {seed} for all relevant libraries.")

MY_RANDOM_SEED = 42
set_seed(MY_RANDOM_SEED)

Random seed set to 42 for all relevant libraries.


In [3]:
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print(f"Using {device} device")

Using cuda device


In [4]:
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [5]:
batch_size = 128

training_dataloader = DataLoader(training_data, batch_size = batch_size, shuffle = True)
test_dataloader = DataLoader(test_data, batch_size = batch_size, shuffle = False)

for X,y in training_dataloader:
    print(X.shape)
    print(y.shape)
    break 

torch.Size([128, 1, 28, 28])
torch.Size([128])


In [6]:
class MNIST_CNN(nn.Module):
    def __init__(self, input_channels, hidden_channels, output_size, images_size, dropout_prob):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(input_channels,hidden_channels, kernel_size=(3,3), stride=(1,1), padding=(1,1) ),
            nn.BatchNorm2d(hidden_channels),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(dropout_prob),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(hidden_channels * images_size[0]//2 * images_size[1]//2, output_size)
        )

    def forward(self, x):
        logits = self.cnn(x)
        return logits

In [7]:
input_channels = 1
hidden_channels = 32
output_size = 10
images_size = (28,28)
dropout_prob = 0.3

model_CNN = MNIST_CNN(input_channels, hidden_channels, output_size, images_size, dropout_prob).to(device)
criterion_CNN = nn.CrossEntropyLoss() # We use CrossEntropyLoss as we are solving a classification problem
optimizer_CNN = torch.optim.AdamW(model_CNN.parameters(), lr=0.0001, weight_decay=0.0001) 

In [8]:
def training_loop(dataloader,model,criterion, optimizer):
    size = len(dataloader.dataset)
    total_samples_processed_in_epoch = 0 #Initialize a variable to track the total samples processed in this epoch
    model.train()
    for batch,(X,y) in enumerate(dataloader):
        X, y = X.to(device), y.to(device)
        
        pred=model(X)
        loss = criterion(pred, y)

        loss.backward() # Backpropagation
        optimizer.step() #update of weights and biases  
        optimizer.zero_grad() #gradient reset 

        #Accumulate the number of samples processed in the current batch
        total_samples_processed_in_epoch += len(X) 


        if batch%100==0:
            loss_val = loss.item()
            print(f"loss: {loss_val:>7f}  [{total_samples_processed_in_epoch:>5d}/{size:>5d}]")

def test_loop(dataloader, model, criterion):
    size = len(dataloader.dataset)
    elements_per_batch = len(dataloader)
    model.eval()
    sum_loss_per_batch, correct = 0, 0
    with torch.no_grad():
        for X,y in dataloader:
            X,y = X.to(device), y.to(device)

            pred = model(X)
            sum_loss_per_batch+=criterion(pred,y).item()
            correct += (pred.argmax(1)==y).type(torch.float).sum().item()
    sum_loss_per_batch/=elements_per_batch
    correct/=size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {sum_loss_per_batch:>8f} \n")

In [9]:
epochs = 5
scheduler_CNN = lr_scheduler.CosineAnnealingLR(optimizer_CNN, T_max=epochs, eta_min=1e-5)
for iteration in range(epochs):
    print(f"Epoch {iteration+1}\n-------------------------------")
    training_loop(training_dataloader,model_CNN,criterion_CNN,optimizer_CNN)
    test_loop(test_dataloader,model_CNN,criterion_CNN)
    scheduler_CNN.step()
    #display the current learning rate 
    current_lr = optimizer_CNN.param_groups[0]['lr']
    print(f"Current Learning Rate: {current_lr:.6f}")
print("Done!")

Epoch 1
-------------------------------
loss: 2.373308  [  128/60000]
loss: 0.433661  [12928/60000]
loss: 0.342856  [25728/60000]
loss: 0.229722  [38528/60000]
loss: 0.287554  [51328/60000]
Test Error: 
 Accuracy: 94.7%, Avg loss: 0.206569 

Current Learning Rate: 0.000091
Epoch 2
-------------------------------
loss: 0.232835  [  128/60000]
loss: 0.411702  [12928/60000]
loss: 0.232680  [25728/60000]
loss: 0.200513  [38528/60000]
loss: 0.082251  [51328/60000]
Test Error: 
 Accuracy: 96.0%, Avg loss: 0.145730 

Current Learning Rate: 0.000069
Epoch 3
-------------------------------
loss: 0.129178  [  128/60000]
loss: 0.100402  [12928/60000]
loss: 0.132377  [25728/60000]
loss: 0.097645  [38528/60000]
loss: 0.123066  [51328/60000]
Test Error: 
 Accuracy: 96.8%, Avg loss: 0.116954 

Current Learning Rate: 0.000041
Epoch 4
-------------------------------
loss: 0.133229  [  128/60000]
loss: 0.133255  [12928/60000]
loss: 0.190191  [25728/60000]
loss: 0.157921  [38528/60000]
loss: 0.096495  [5