In this self study you should experiment with convolutional neural networks using PyTorch. In the last self study session we only made limited use of PyTorch (only using it for calculating gradients), but in this self study we will take advantage of much more of its functionality.

In particular, we will work with the _torch.nn_ module provided by PyTorch. A short introduction to this module and how to define neural networks in PyTorch can be found at

* https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html#sphx-glr-beginner-blitz-neural-networks-tutorial-py

* https://pytorch.org/tutorials/beginner/nn_tutorial.html

For this self study you may either go through these tutorials before working on the notebook or consult themt when needed as you move forward in the notebook. The former tutorial is part of a general tutorial package to PyTorch, which can be found at (this also includes a nice introduction to tensors in PyTorch)

* https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

First we import relevant modules:

In [26]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data.sampler import SubsetRandomSampler
from matplotlib import pyplot as plt
import numpy as np

## Loading the data

As last time we will be working with the MNIST data set: The MNIST database consists of grey scale images of handwritten digits. Each image is of size $28\times 28$; see figure below for an illustration. The data set is divided into a training set consisting of $60000$ images and a test set with $10000$ images; in both
data sets the images are labeled with the correct digits. If interested you can find more information about the MNIST data set at http://yann.lecun.com/exdb/mnist/, including accuracy results for various machine learning methods.

![MNIST DATA](images/MNIST-dataset.png)

For this self study, we will be a bit more careful with our data. Specifically, we will divide the data into a training, validation, and test, and use the training and validation set for model learning (in the previous self study we did not have a validation set). 

The data set is created by setting aside a randomly chosen subset of the data, where the splitting point is found using the help function *split_indicies* below.

In [2]:
def split_indicies(n, val_pct):
    # Size of validation set
    n_val = int(n*val_pct)
    # Random permutation
    idxs = np.random.permutation(n)
    # Return first indexes for the validation set
    return idxs[n_val:], idxs[:n_val]

# Load the data
train_dataset = datasets.MNIST('../data', train=True, download=False,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ]))

# Get the indicies for the training data and test data (the validation set will consists of 20% of the data)
train_idxs, val_idxs = split_indicies(len(train_dataset), 0.2)

# Define samplers (used by Dataloader) to the two sets of indicies
train_sampler = SubsetRandomSampler(train_idxs)
val_sampler = SubsetRandomSampler(val_idxs)

# Specify data loaders for our training and test set (same functionality as in the previous self study)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, sampler=train_sampler)
val_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, sampler=val_sampler)

print(f"Number of training examples: {len(train_idxs)}")
print(f"Number of validation examples: {len(val_idxs)}")

Number of training examples: 48000
Number of validation examples: 12000


The test set is loaded in the usual fashion

In [3]:
test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])),
        batch_size=64, shuffle=True)

## Specifying the model

When using the _torch.nn_ for specifying our model we subclass the _nn.Module_. The model thus holds all the parameters of the model (see the _init_ function) as well as a specification of the forward step. We don't have to keep track of the backward pass, as PyTorch handles this for us.

In [4]:
class MNIST_CNN(nn.Module):

    def __init__(self):
        super().__init__()

        # Define a convolution operator with 1 input channel, 15 output channels and a kernel size of 5x5
        self.conv1 = nn.Conv2d(1, 15, 5)
        # Since we are not doing padding (see Lecture 2, Slide 38) the width of the following layer is reduced; for
        # each channel the resulting dimension is 24x24. We feed the resulting representation through a linear 
        # layer, giving 10 values as output - one for each digit.
        self.fc = nn.Linear(15 * 24 * 24, 10)
        self.out = None

    def forward(self, xb):

        # Reshape the input tensor; '-1' indicates that PyTorch will fill-in this 
        # dimension, whereas the '1' indicates that we only have one color channel. 
        xb = xb.view(-1, 1, 28, 28)
        # Apply convolution and pass the result through a ReLU function
        xb = F.relu(self.conv1(xb))
        # Reshape the representation
        xb = xb.view(-1, 15*24*24)
        # Apply the linear layer
        xb = self.fc(xb)
        # and set the result as the output. Note that we don't take a softmax as this is handled internally in the 
        # loss function defined below.
        self.out = xb

        return xb

## Learning and evaluating the model

For learning the model, we will use the following function which performs one iteration over the training data. The function also takes an _epoch_ argument, but this is only used for reporting on the learning progress

In [40]:
def train(model, train_loader, loss_fn, epoch, optimizer):
    losses = []
    # Tell PyTorch that this function is part of the training
    model.train()

    # As optimizer we use stochastic gradient descent as defined by PyTorch. PyTorch also includes a variety 
    # of other optimizers 
    # learning_rate = 0.01
    opt = optimizer #torch.optim.SGD(model.parameters(), lr=learning_rate)

    # Iterate over the training set, one batch at the time, as in the previous self sudy
    for batch_idx, (data, target) in enumerate(train_loader):
        # Get the prediction
        y_pred = model(data)
        
        # Remember to zero the gradients so that they don't accumulate
        opt.zero_grad()

        # Calculate the loss and and the gradients  
        loss = loss_fn(y_pred, target)
        loss.backward()

        # Optimize the parameters by taking one 'step' with the optimizer
        opt.step()

        losses.append(loss.item())

        # For every 10th batch we output a bit of info
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.sampler),
                       100. * batch_idx * len(data) / len(train_loader.sampler), loss.item()))
    return losses

In the end, we also want to validate our model. To do this we define the function below, which takes a data_loader (either the validation or test set) and reports the model's accuracy and loss on that data set.

In [37]:
def test_model(model, data_loader, loss_fn):
    # Tell PyTorch that we are performing evaluation
    model.eval()

    losses = []
    
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in data_loader:
            output = model(data)
            test_loss += loss_fn(output, target).item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
            losses.append(loss_fn(output, target).item())

    test_loss /= len(data_loader.dataset)

    print('\nTest/validation set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(data_loader.sampler),
        100. * correct / len(data_loader.sampler)))
    
    return losses

## A couple of helper functions

Learning a deep neural network can be time consuming, and it might therefore be nice to be able to save and load previously learned models (see also https://pytorch.org/tutorials/beginner/saving_loading_models.html).

In [7]:
def save_model(file_name, model):
    torch.save(model, file_name)

def load_model(file_name):
    model = torch.load(file_name)
    model.eval()
    return model

## Wrapping things up

Finally, we will do the actual learning of the model.

In [8]:
# The number of passes that will be made over the training set
num_epochs = 2
# torch.nn defines several useful loss-functions, which we will take advantage of here (see Lecture 1, Slide 11, Log-loss).
loss_fn = nn.CrossEntropyLoss()

In [39]:
# Instantiate the model class
model = MNIST_CNN()
# and get some information about the structure
lr = 0.001
# optimizer = torch.optim.SGD(model.parameters(), lr=lr)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
print('Model structure:')
print(model)

Model structure:
MNIST_CNN(
  (conv1): Conv2d(1, 15, kernel_size=(5, 5), stride=(1, 1))
  (fc): Linear(in_features=8640, out_features=10, bias=True)
)


### Iterate over the data set

We iterate over the data set for *num_epochs* number of iterations. At each iteration we also calculate the loss/accuracy on the validation set.

In [38]:
test_losses = []
train_losses = []

for i in range(num_epochs):
    train_losses.extend(train(model, train_loader, loss_fn, i, optimizer))
    # Evaluate the model on the test set
    test_losses.extend(test_model(model, val_loader, loss_fn))



Test/validation set: Average loss: 0.0002, Accuracy: 11746/12000 (98%)


Test/validation set: Average loss: 0.0002, Accuracy: 11753/12000 (98%)



After learning we evaluate the model on the _test set_ and save the resulting structure.

[0.030578596517443657,
 0.02941614855080843,
 0.030515209461251896,
 0.03012579120695591,
 0.0400108203291893,
 0.03934938584764799,
 0.04414943924971989,
 0.045835102908313274,
 0.04174212138685915,
 0.042880857922136785,
 0.039880626665597614,
 0.04093884187750518,
 0.03951198700815439,
 0.04565300825717194,
 0.04438287559896707,
 0.04522982978960499,
 0.04541637085597305,
 0.045437384552011885,
 0.0462404487066363,
 0.04491249811835587,
 0.04419171273530949,
 0.042644134341654455,
 0.04328655309813178,
 0.04405432027609398,
 0.04492973316460848,
 0.045708483120856375,
 0.04830278970163177,
 0.04733036614821425,
 0.047394613221544646,
 0.04895336177820961,
 0.04970504965392813,
 0.048992630356224254,
 0.04802137277455944,
 0.047431183841956014,
 0.04714020083525351,
 0.04678032917177512,
 0.04614332386267346,
 0.04545639394047229,
 0.04478350637528377,
 0.044576360401697455,
 0.044359094145276196,
 0.044907007565987964,
 0.04547312927194113,
 0.045310407390140674,
 0.0445406480795807

In [11]:
# Evaluate the model on the test set
test_loss = test_model(model, test_loader, loss_fn)
# Save the model
save_model('data/conv.pt', model)


Test/validation set: Average loss: 0.0022, Accuracy: 9624/10000 (96%)



## Exercises

1. Familiarize yourself with the code above and consult the PyTorch documentation when needed.


2. Experiment with different NN architectures (also varying the convolutional parameters: size, stride, padding, etc) and observe the effect wrt. the loss/accuracy on the training and validation dataset (training, validation). Note that when adding new layers (including dropout [Lecture 2, Slide 13], pooling, etc.) you need to make sure that the dimensionality of the layers match up. **IMPORTANT:** ignore the test set at this stage (i.e., comment out the relevant lines above) so that the results for the test set do not influence your model choice.


3. In the model above we use a simple gradient descent learning scheme. Try other types of optimizers (see https://pytorch.org/docs/stable/optim.html) and analyze the effect.


4. Lastly, save your best model and results. At the next lecture we will then see who got the best results :-) Note that for this to be meaningful it is important that you have not relied on the test set while doing model learning/selection.


5. If you feel adventurous, try investigating some of the other datasets that come prepacakged with PyTorch (see https://pytorch.org/vision/0.8/datasets.html). For instnce, for FashionMNIST you only need to change the dataloader from datasets.MNIST to datasets.FashionMNIST.