# Convolutional Neural Networks (CNN)
### Introduction
This code details the implementation of the LeNet-5 convolutional neural network (CNN) architecture to classify images from the MNIST dataset. The MNIST dataset consists of 28x28 grayscale images of handwritten digits (0-9). The goal is to train a model that can accurately classify these images into one of the 10 digit classes.

### Implementation
By using PyTorch library, LeNet-5 can be modelled and trained. The following hyperparameters are required to define:
- `batch_size`: Number of samples per batch.
- `num_classes`: Number of output classes (10 for digits 0-9).
- `learning_rate`: Learning rate for the optimizer.
- `num_epochs`: Number of times the entire dataset is passed through the network.

In [1]:
# Load in relevant libraries, and alias where appropriate
import torch
import torch.nn as nn
import torchvision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define relevant variables for the ML task
batch_size = 64
num_classes = 10
learning_rate = 0.01
num_epochs = 20

# Device will determine whether to run the training on GPU or CPU.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Data Loading and Transformation
We load the MNIST dataset and apply transformations to resize the images to 32x32 (as required by LeNet-5), convert them to tensors, and normalize them.

In [2]:
#Loading the dataset and preprocessing
train_dataset = datasets.MNIST(root = './MNIST',
                               train = True,
                               transform = transforms.Compose([
                               transforms.Resize((32,32)),
                               transforms.ToTensor(),
                               transforms.Normalize(mean = (0.1307,), std = (0.3081,))]),
                               download = True)

test_dataset = datasets.MNIST(root = './MNIST',
                              train = False,
                              transform = transforms.Compose([
                              transforms.Resize((32,32)),
                              transforms.ToTensor(),
                              transforms.Normalize(mean = (0.1325,), std = (0.3105,))]),
                              download=True)

train_loader = DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
test_loader = DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = True)

### Apply the CNN (Le-Net5) model
We define the LeNet-5 model, which consists of two convolutional layers followed by three fully connected layers. The activation function used is `Tanh`, and `MaxPool2d` is used for pooling. The final layer uses `Softmax` for classification.\
\
Note:
- Include 3 parts: convolution, pooling, non-linear activation function
- LeNet-5 CNN architecture has 7 layers, 3 convolution, 2 subsampling and 2 fully linked layers

![LeNet-5](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*fDvp2DDqNMPEUkmp6kijJw.jpeg)

#### Original Model:
<img src=https://miro.medium.com/v2/resize:fit:1100/format:webp/1*zxrGm9YBq__CPE3EUZKDOQ.jpeg width="700">

After some modifications towards the model, I have improved it's accuracy from **~95%** to **~99%**, here are the modifications:
- Using `ReLU` as Layers 1-6 activation function
- Remove `Softmax` from Output Layer

#### After Modifications:
<img src=https://i.imgur.com/Hwgffby.jpeg width="700">

*Note: If the img isn't present, viewable via link: i.imgur.com/Hwgffby.jpeg*

In [3]:
#Defining the convolutional neural network
class LeNet5(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=0),
            nn.BatchNorm2d(6),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(6, 16, kernel_size=5, stride=1, padding=0),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.fc = nn.Linear(400, 120)
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(120, 84)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(84, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.relu(out)
        out = self.fc1(out)
        out = self.relu1(out)
        out = self.fc2(out)
        return out

### Model, Loss Function, and Optimizer
We instantiate the model, define the loss function as `CrossEntropyLoss`, and use the `Adam` optimizer.

In [4]:
model = LeNet5(num_classes).to(device)

#Setting the loss function
cost = nn.CrossEntropyLoss()

#Setting the optimizer with the model parameters and learning rate
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

#this is defined to print how many steps are remaining when training
total_step = len(train_loader)

### Training the Model
We train the model for the specified number of epochs. During each epoch, we:
- Forward pass: Compute the model’s predictions.
- Compute the loss.
- Backward pass: Compute gradients.
- Update the model parameters using the optimizer.
- Track the running loss and accuracy.

In [5]:
total_step = len(train_loader)

for epoch in range(num_epochs):
    correct = 0
    total = 0
    running_loss = 0.0

    for i, (images, labels) in enumerate(train_loader):  
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = cost(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Track loss
        running_loss += loss.item()
        
        # Track accuracy
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    # Calculate average loss and accuracy
    avg_loss = running_loss / total_step
    accuracy = 100 * correct / total
    
    print('Epoch [{}/{}], Loss: {:.4f}, Accuracy: {:.2f}%'
          .format(epoch+1, num_epochs, avg_loss, accuracy))

Epoch [1/20], Loss: 0.1480, Accuracy: 95.53%
Epoch [2/20], Loss: 0.0666, Accuracy: 98.06%
Epoch [3/20], Loss: 0.0546, Accuracy: 98.44%
Epoch [4/20], Loss: 0.0515, Accuracy: 98.58%
Epoch [5/20], Loss: 0.0452, Accuracy: 98.75%
Epoch [6/20], Loss: 0.0459, Accuracy: 98.78%
Epoch [7/20], Loss: 0.0387, Accuracy: 98.95%
Epoch [8/20], Loss: 0.0362, Accuracy: 99.04%
Epoch [9/20], Loss: 0.0358, Accuracy: 99.06%
Epoch [10/20], Loss: 0.0345, Accuracy: 99.09%
Epoch [11/20], Loss: 0.0333, Accuracy: 99.13%
Epoch [12/20], Loss: 0.0299, Accuracy: 99.22%
Epoch [13/20], Loss: 0.0294, Accuracy: 99.25%
Epoch [14/20], Loss: 0.0332, Accuracy: 99.23%
Epoch [15/20], Loss: 0.0245, Accuracy: 99.37%
Epoch [16/20], Loss: 0.0292, Accuracy: 99.35%
Epoch [17/20], Loss: 0.0268, Accuracy: 99.36%
Epoch [18/20], Loss: 0.0267, Accuracy: 99.35%
Epoch [19/20], Loss: 0.0276, Accuracy: 99.36%
Epoch [20/20], Loss: 0.0233, Accuracy: 99.45%


### Testing the Model
We evaluate the model on the test dataset without computing gradients to save memory. We calculate the accuracy of the model on the test images.

In [6]:
# Test the model
# In test phase, we don't need to compute gradients (for memory efficiency)
  
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

Accuracy of the network on the 10000 test images: 98.85 %


### Results
With a modified LeNet-5 model, an accuracy of 98.85% is achieved on the MNIST test dataset.\
\
As Comparisons:
- KNN model (k=3) accuracy: 96.33%
- MLP model (n=128) accuracy: 96.75%

### Discussion
1. Activation Function:
- ReLU outputs zero for negative inputs and the input itself for positive inputs, which helps maintain gradient flow during backpropagation. (Faster, more effective CNN training)
- Tanh outputs between -1 to 1, might cause saturation for large positive/negative inputs. (Might slow down training process)

2. Softmax
- Softmax might lead to numerical instability, especially with extreme values, it might negatively impact the model's performance.

### Conclusion
The use of ReLU activation functions and reliance on `CrossEntropyLoss` for handling Softmax contribute to its superior performance, achieving a higher accuracy of 98.85%.