# Feedforward classification of the NMIST data
### Advanced Deep Learning 2024
This notebook was written originally Jon Sporring (mailto:sporring@di.ku.dk) and heavily inspired by https://clay-atlas.com/us/blog/2021/04/22/pytorch-en-tutorial-4-train-a-model-to-classify-mnist.

We consider the Modified National Institute of Standards and Technology database of handwritten digits (MNIST): http://yann.lecun.com/exdb/mnist/

## Installs

On non-colab system, is usually good to make an environment and install necessary tools there. E.g., anaconda->jupyter->terminal create an environment, if you have not already, and activate it:
```
conda create -n adl python=3.9
conda activate adl
```
then install missing packages such as:
```
conda install ipykernel torch matplotlib torchmetrics scikit-image jpeg
conda install -c conda-forge segmentation-models-pytorch ipywidgets
```
and if you want to add it to jupyter's drop-down menu
```
ipython kernel install --user --name=adl
```
Now reload the jupyter-notebook's homepage and make a new or load an existing file. On colab, the tools have to be installed everytime

In [19]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False
if IN_COLAB:
    !pip3 install torch matplotlib torchmetrics scikit-image segmentation-models-pytorch

## Imports

In [20]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as dset
from torchvision import datasets, transforms

## Set global device

In [21]:
# GPU
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
print('GPU State:', device)

GPU State: cuda:0


## Functions

In [22]:
def training_loop(model, loss, optimizer, loader, epochs, verbose=True, device=device):
    """
    Run training of a model given a loss function, optimizer and a set of training and validation data.
    """

    # Train
    for epoch in range(epochs):
        running_loss = 0.0

        for times, data in enumerate(loader):
            inputs, labels = data[0].to(device), data[1].to(device)
            # inputs = inputs.view(inputs.shape[0], -1)

            # Zero the parameter gradients
            optimizer.zero_grad()

            # Foward + backward + optimize
            outputs = model(inputs)
            loss_tensor = loss(outputs, labels)
            loss_tensor.backward()
            optimizer.step()

            # Print statistics
            running_loss += loss_tensor.item()
            if verbose:
                if times % 100 == 99 or times+1 == len(loader):
                    print('[%d/%d, %d/%d] loss: %.3f' % (epoch+1, epochs, times+1, len(loader), running_loss/2000))

In [23]:
def evaluate_model(model, loader, device=device):
    """
    Evaluate a model 'model' on all batches of a torch DataLoader 'data_loader'.

    Returns: the total number of correct classifications,
             the total number of images
             the list of the per class correct classification,
             the list of the per class total number of images.
    """

    # Test
    correct = 0
    total = 0

    with torch.no_grad():
        for data in loader:
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)
            # inputs = inputs.view(inputs.shape[0], -1)

            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    class_correct = [0 for i in range(10)]
    class_total = [0 for i in range(10)]

    with torch.no_grad():
        for data in loader:
            inputs, labels = data[0].to(device), data[1].to(device)
            # inputs = inputs.view(inputs.shape[0], -1)

            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            c = (predicted == labels).squeeze()
            for i in range(10):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1

    return (correct, total, class_correct, class_total)


## Main program

In [24]:
# Transform
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5,), (0.5,)),]
)

In [25]:
# Data
trainSet = datasets.MNIST(root='MNIST', download=True, train=True, transform=transform)
testSet = datasets.MNIST(root='MNIST', download=True, train=False, transform=transform)
trainLoader = dset.DataLoader(trainSet, batch_size=64, shuffle=True)
testLoader = dset.DataLoader(testSet, batch_size=64, shuffle=False)

In [35]:
trainSet[0][0].shape

torch.Size([1, 28, 28])

In [26]:
# Model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size =(3, 3), stride =(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size =2, stride=2, padding=0, dilation=1, ceil_mode=False),
            nn.Conv2d (16, 32, kernel_size =(3, 3), stride =(1, 1)),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size =2, stride=2, padding=0, dilation=1, ceil_mode=False),
            nn.Flatten(start_dim=1, end_dim =-1),
            nn.Linear(800, out_features =10, bias=True),
            nn.LogSoftmax(dim=1)
        )

    def forward(self, input):
        return self.main(input)


net = Net().to(device)
print(net)

Net(
  (main): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
    (4): ReLU()
    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=800, out_features=10, bias=True)
    (8): LogSoftmax(dim=1)
  )
)


In [27]:
# Parameters
epochs = 4
lr = 0.002
loss = nn.NLLLoss()
optimizer = optim.SGD(net.parameters(), lr=0.002, momentum=0.9)

# Train
print('Training on %d images' % trainSet.data.shape[0])
training_loop(net, loss, optimizer, trainLoader, epochs)
print('Training Finished.\n')

# Test
correct, total, class_correct, class_total = evaluate_model(net, testLoader)
print('Accuracy of the network on the %d test images: %d %%' % (testSet.data.shape[0], (100*correct / total)))
for i in range(10):

    print('Accuracy of %d: %3f' % (i, (class_correct[i]/class_total[i])))

Training on 60000 images


[1/4, 100/938] loss: 0.095
[1/4, 200/938] loss: 0.125
[1/4, 300/938] loss: 0.143
[1/4, 400/938] loss: 0.157
[1/4, 500/938] loss: 0.170
[1/4, 600/938] loss: 0.181
[1/4, 700/938] loss: 0.192
[1/4, 800/938] loss: 0.201
[1/4, 900/938] loss: 0.210
[1/4, 938/938] loss: 0.213
[2/4, 100/938] loss: 0.008
[2/4, 200/938] loss: 0.015
[2/4, 300/938] loss: 0.022
[2/4, 400/938] loss: 0.029
[2/4, 500/938] loss: 0.035
[2/4, 600/938] loss: 0.042
[2/4, 700/938] loss: 0.048
[2/4, 800/938] loss: 0.054
[2/4, 900/938] loss: 0.060
[2/4, 938/938] loss: 0.062
[3/4, 100/938] loss: 0.005
[3/4, 200/938] loss: 0.010
[3/4, 300/938] loss: 0.015
[3/4, 400/938] loss: 0.021
[3/4, 500/938] loss: 0.026
[3/4, 600/938] loss: 0.030
[3/4, 700/938] loss: 0.035
[3/4, 800/938] loss: 0.039
[3/4, 900/938] loss: 0.044
[3/4, 938/938] loss: 0.045
[4/4, 100/938] loss: 0.004
[4/4, 200/938] loss: 0.008
[4/4, 300/938] loss: 0.013
[4/4, 400/938] loss: 0.017
[4/4, 500/938] loss: 0.020
[4/4, 600/938] loss: 0.024
[4/4, 700/938] loss: 0.028
[

Accuracy of the network on the 10000 test images: 97 %   
Accuracy of 0: 0.993421   
Accuracy of 1: 0.983784   
Accuracy of 2: 0.988372   
Accuracy of 3: 0.961538   
Accuracy of 4: 0.988701   
Accuracy of 5: 0.976190   
Accuracy of 6: 0.968750   
Accuracy of 7: 0.945122   
Accuracy of 8: 0.965035   
Accuracy of 9: 0.976048   

- Why must the Linear layer be preceded by Flatten?   
    A: The outputs of the last convolutional layer are usually not an one dimensional tensor,   
    but a three dimensional tensor. However, the Linear layer only takes an one dimensional tensor as an input. 
- Why does it have 800 input features?  
    A: All the calculations are accorded to the equations in https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html and https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html.  
    Input image shape: 1x28x28  
    After the first convolutional layer: 16x26x26  
    After the first pooling layer: 16x13x13  
    After the second convolutional layer: 32x11x11  
    After the second pooling layer: 32x5x5  
    Flattened: 32x5x5 = 800  
    The flattened tensor has 800 elements, therefore, the input of the linear layer has 800 features.  
- How well does it classify digits as compared to the feed-forward network?   
    Better  
- What are its number parameters relative to the feed-forward network?  
    The formula to calculate the number of parameters in a convolutional layer is:  
    Number of Parameters=(Kernel Height×Kernel Width×Input Channels+1)×Output Channels  
    The number of parameters:  
    (3*3*1+1)*16+(3*3*16+1)*32+800*10+10 = 12810, fewer than that of the feed-forward network.


In [36]:
print('Parameters:', sum(p.numel() for p in net.parameters()))



Parameters: 12810
