Bearbeitet von: Ali Jaabous [612979], Daniel Christoph [589721]

In this part of the exercise we will repeat our experiment from the first part, however, this time we are going to use PyTorch to do so.

In [1]:
# load packages
import numpy as np

import torch
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

In [2]:
# load datasets
train_data = MNIST(root="./", train=True, transform=None, target_transform=None, download=True)
test_data = MNIST(root="./", train=False, transform=None, target_transform=None, download=True)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:01<00:00, 9637620.37it/s] 


Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 12537331.18it/s]

Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 9952841.84it/s]


Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 19864993.50it/s]

Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw






First we want to experiment with the nn Module of torch. This module can be used to create layers of neural networks.
Refer to this tutorial for more info: https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html

Instantiate a convolutional layer with kernel size 3, stride 1 and zero_padding_dim of 0. nn Modules except torch tensors instead of numpy arrays as inputs, these can easily be constructed with the torch.tensor() or torch.from_numpy() commands. Note that the torch conv layer (unlike the one we built in the previous exercise) requires that the channel in- and output-dimensions are specififed as well.
Given that MNIST are grayscale images, we need to artificially add a dimension of size 1 to our tensors, this can be done with the unsqueeze method.

After the instantiation, create a dummy input and forward it through the conv layer.

In [3]:
# build and test Conv2d here
import torch.nn as nn

# Instantiation of the CNN
conv_layer = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0)

# dummy input
dummy_input = torch.tensor([[[[1, 2, 3], [4, 5, 6], [7, 8, 9]]]], dtype=torch.float32)

# application
output = conv_layer(dummy_input)

# output
print(output)

tensor([[[[0.2917]]]], grad_fn=<ConvolutionBackward0>)


Your next task is to rebuild the simple_cnn from the previous exercise. This time, add a MaxPooling "layer" with kernel size 2 and stride 2 after the ReLu activation (https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html). Set the kernel size of the conv layer to 3, stride to 1 and padding to 0. Calculate the necessary input dimension of the fully connected (also called linear) layer, given that the input is of dim (batch_size, 1, 28, 28). You can do that by hand or just try to forward a dummy input through the Convolution and MaxPooling layers.

In [6]:
class simple_cnn(torch.nn.Module):
    def __init__(self):
        super(simple_cnn, self).__init__()

        # Definition of the different layers
        self.conv = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, stride=1, padding=0)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2) # out 1x169 vector
        self.fc = nn.Linear(169, 10)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        x = self.softmax(x)
        return x

In [7]:
# load cnn
cnn = simple_cnn()

# load input
input = train_data.__getitem__(0)[0]
input = np.array(input)
input = torch.tensor(input).float().unsqueeze(0).unsqueeze(0)

# forward, check output dimensions
output = cnn.forward(input)
print(output)

tensor([[1.3106e-12, 6.5234e-27, 9.1693e-05, 4.3099e-21, 1.1046e-26, 5.9528e-18,
         2.3054e-19, 3.5924e-25, 1.1400e-15, 9.9991e-01]],
       grad_fn=<SoftmaxBackward0>)


In [8]:
# we already met this function
def label_one_hot_encoding(x, dim=10):
    output = np.zeros(dim)
    output[x] = 1
    return output

Now we want to train our simple_cnn. Similar to the previous exercise, build a training loop but use the pytorch SGD optimizer to update the weights. (We don't need to manually calculate the gradients anymore!). Try out different learning rates and monitor the loss to identify conversion.

Refer to this tutorial: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#

In [9]:
cnn = simple_cnn()
batch_size = 15
learning_rate = 0.3

loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.SGD(cnn.parameters(), lr = learning_rate)

num_samples = len(train_data)
num_epochs = 4

for epoch in range(num_epochs):
    for i in range(num_samples // batch_size):
        # load and build input and target tensors here
        start_idx = i * batch_size
        end_idx = (i + 1) * batch_size

        images_batch = []
        labels_batch = []

        for j in range(start_idx, end_idx):
            img, label = train_data.__getitem__(j)
            img = np.array(img) / 255.0  # Normalize the image by dividing by 255
            label = np.array(label)
            images_batch.append(img)
            one_hot_label = label_one_hot_encoding(label)
            labels_batch.append(one_hot_label)

        input_batch = np.stack(images_batch)
        target_batch = np.stack(labels_batch)

        input_batch_tensor = torch.tensor(input_batch).float()
        input_batch_tensor = input_batch_tensor.view(-1, 1, 28, 28) # -> transform to (15, 1, 28, 28)
        target_batch_tensor = torch.tensor(target_batch).float()
        # perform forward / obtain prediction
        pred = cnn.forward(input_batch_tensor)

        # calculate loss
        loss = loss_fn(pred, target_batch_tensor)

        # perform backprop and gradient descent
        optimizer.zero_grad() # zeros gradients from previous step
        loss.backward()
        optimizer.step()


Finally, as before, compute the accuracy on the test data. Try to train multiple models with different specs and learning rates in order to achieve the highest accuracy.

In [10]:
# set model to eval mode, this kills dropout cells etc. not relevant for our model but still important
cnn.eval()

num_test_samples = len(test_data)
correct_predictions = 0
for i in range(num_test_samples):
    # load and build input and target tensors
    images_batch = []
    img, label = test_data.__getitem__(i)
    img = np.array(img) / 255.0  # Normalize the image by dividing by 255
    images_batch.append(img)
    input_batch = np.stack(images_batch)

    input_batch_tensor = torch.tensor(input_batch).float()
    input_batch_tensor = input_batch_tensor.view(-1, 1, 28, 28) # -> transform to (15, 1, 28, 28)

    # forward
    pred = cnn.forward(input_batch_tensor)
    pred_labels = torch.argmax(pred, dim=1)
    pred_label = pred_labels.item()
    print(f"Predicted: {pred_label} -> Actual {label}")
    # update accuracy or save results
    if pred_label == label:
        correct_predictions += 1
    accuracy = correct_predictions / num_test_samples  # correctly classified samples / over total number of samples
    print(f"Accuracy: {accuracy}")
# print accuracy
print(f"Final Accuracy: {accuracy}")

Predicted: 7 -> Actual 7
Accuracy: 0.0001
Predicted: 2 -> Actual 2
Accuracy: 0.0002
Predicted: 1 -> Actual 1
Accuracy: 0.0003
Predicted: 0 -> Actual 0
Accuracy: 0.0004
Predicted: 4 -> Actual 4
Accuracy: 0.0005
Predicted: 1 -> Actual 1
Accuracy: 0.0006
Predicted: 4 -> Actual 4
Accuracy: 0.0007
Predicted: 9 -> Actual 9
Accuracy: 0.0008
Predicted: 6 -> Actual 5
Accuracy: 0.0008
Predicted: 9 -> Actual 9
Accuracy: 0.0009
Predicted: 0 -> Actual 0
Accuracy: 0.001
Predicted: 6 -> Actual 6
Accuracy: 0.0011
Predicted: 9 -> Actual 9
Accuracy: 0.0012
Predicted: 0 -> Actual 0
Accuracy: 0.0013
Predicted: 1 -> Actual 1
Accuracy: 0.0014
Predicted: 5 -> Actual 5
Accuracy: 0.0015
Predicted: 9 -> Actual 9
Accuracy: 0.0016
Predicted: 7 -> Actual 7
Accuracy: 0.0017
Predicted: 3 -> Actual 3
Accuracy: 0.0018
Predicted: 4 -> Actual 4
Accuracy: 0.0019
Predicted: 9 -> Actual 9
Accuracy: 0.002
Predicted: 6 -> Actual 6
Accuracy: 0.0021
Predicted: 6 -> Actual 6
Accuracy: 0.0022
Predicted: 5 -> Actual 5
Accuracy: 0