<a href="https://colab.research.google.com/github/Angus-Eastell/Intro_to_AI/blob/main/9_2_cnns_for_mnist.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolutional neural networks for MNIST

Below, I've coded up an utterly dreadful CNN for MNIST.  But its really, really bad (45% accuracy after 5 epochs).

You should try to improve the model!  Here are some key suggestions:
* Try a deeper network with more convolutional layers.
* Try some padding.
* Try some layers with `stride=2`.
* Try some pooling layers.
* Try changing the optimizer to Adam.
* Try tuning the learning rate.
* Try taking inspiration from the LeNet-5 architecture in the notes.

Other points:
* I'd advise keeping the final `nn.AdaptiveAvgPool2d(1)` layer.
* If you have too deep a model, with too many layers (especially layers with `stride=2`, or pooling layers), then you might find the model failing.

You might well want to check out the docs for:
* [Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
* [MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#maxpool2d)
* [AvgPool2d](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html#avgpool2d)

The notebook will be faster on GPU, but its still perfectly fine on CPU.  Remember, to switch to GPU, go to "Runtime" -> "Change Runtime Type".

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
torch.manual_seed(0)

# Check whether we have a GPU.  Use it if we do.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')



# MNIST train and test datasets.  I'm not going to talk about these in this course.
# you should just be able to follow "recipes" online.
train_dataset = torchvision.datasets.MNIST(root='data',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='data',
                                          train=False,
                                          transform=transforms.ToTensor())

# MNIST train and test datasets.  I'm not going to talk about these in this course.
# However, note that I'm using a much bigger batch size at test-time.  That's
# because at training time, we have to backprop, so we have to save all the
# intermediate variables, which takes alot of memory.  We don't have to do that
# at test-time.
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=100,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=1000,
                                          shuffle=False)




In [None]:
############################################
#### Tweak the network architecture !!! ####
############################################
input_size = 784
hidden_size = 500
num_classes = 10

model = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=100, kernel_size=5, padding=2), #Output shape is [N, 100, H, W]
    nn.ReLU(), #Output shape is [N, 100, H, W]
    nn.BatchNorm2d(100), #batch norm num_features = 100
    nn.MaxPool2d(kernel_size=2, stride=2, padding=0),
    nn.Conv2d(in_channels=100, out_channels=100, kernel_size=5),  #Output shape is [N, 100, H/2, W/2]
    nn.ReLU(),
    nn.BatchNorm2d(100),
    nn.Conv2d(in_channels=100, out_channels=100, kernel_size=5, stride = 1 , padding = 1),  #Output shape is [N, 100, H/2, W/2]
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(in_channels=100, out_channels=10, kernel_size=3, padding=1),#Output shape is [N,  10, H/2, W/2]
    nn.BatchNorm2d(10), # num_features = 10
    nn.AdaptiveAvgPool2d(1)                                                #output shape is [N,  10, 1, 1]; does global average pooling
).to(device)

#################################
#### Tweak the optimizer !!! ####
#################################
opt = torch.optim.Adam(model.parameters(), lr=0.001)

def train():
    # Does one training epoch (i.e. one pass over the data.)
    for images, labels in train_loader:
        # Move tensors to the configured device, and convert image to vector.
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        logits = model(images).squeeze((-1, -2)) #output shape is [N, 10]

        # Backpropagation and optimization
        loss = nn.functional.cross_entropy(logits, labels)
        loss.backward()
        opt.step()
        opt.zero_grad()

def test(epoch):
    # Do one pass over the test data.
    # In the test phase, don't need to compute gradients (for memory efficiency)
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in test_loader:
            #Convert image pixels to vector
            images = images.to(device)
            labels = labels.to(device)

            # Forward pass
            logits = model(images).squeeze((-1, -2))

            # Compute total correct so far
            predicted = torch.argmax(logits, -1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
        print(f'Test accuracy after {epoch+1} epochs: {100 * correct / total} %')


# Run training
for epoch in range(5):
    train()
    test(epoch)


Test accuracy after 1 epochs: 99.34 %
Test accuracy after 2 epochs: 99.43 %
Test accuracy after 3 epochs: 99.54 %
Test accuracy after 4 epochs: 99.5 %
Test accuracy after 5 epochs: 99.54 %
