<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/ETH_Z%C3%BCrich_wordmark.svg" width="200" height="200" align="left">
<br />
<br /><br />
<div align="right"> <b/> FS 2022
<br />
    
## <div align="center"> Project & Seminars: Python for Science & Machine Learning

---

# <div align="center"> Exercise 7th week: Machine Learning with PyTorch

In this exercise we will do basic image classification on the popular CIFAR-10 dataset using the PyTorch framework.

## 1. Load PyTorch Packages and Dataset
First we load PyTorch packages and some helper libraries that you already know. `torch` is the core package of PyTorch. `torchvision` is PyTorch's computer vision package and contains some popular datasets, models and image transformations. 

In [1]:
# PyTorch packages
import torch
import torchvision

# helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(torch.__version__)

ModuleNotFoundError: No module named 'torch'

Now we are ready to load and play with the dataset.

In [None]:
# the training set is used to train the model
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)
# the test set is used to test the accuracy of the model
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True)

Now you can see the downloaded dataset in the data folder.

## 2. Explore the Data

It is useful to have a basic familiarity with the data structure. Let's explore the format of the dataset before training the model. We can find the official website for CIFAR-10 via search engine. There, we see some basic information about it, such as the classes, number of images and their resolution. Let's verify this information.

**Exercise 2.1: Number of images**  
Print the length of both train and test set. What is the total number of images?

In [None]:
# INSERT CODE HERE
test12

Now try to print their shapes by using dot notation: `.data.shape`

In [None]:
# INSERT CODE HERE

**Exercise 2.2: Format**  
Print the first element of each set via indexing like a list. Then print the type for one element of your choice using `type()`. What do you notice about the format?

In [None]:
# INSERT CODE HERE

**Exercise 2.3: Labels**  
Let's obtain the labels for the trainset.

In [None]:
labels = trainset.targets

`labels` is a list. Use indexing to print the first 10.

In [None]:
# INSERT CODE HERE

As you can see, the labels are integers. Let's find out what each of them means.

In [None]:
trainset.class_to_idx

As we have seen above, the dataset elements are tuples containing (PIL image, label). PIL images can be displayed directly by indexing them like this:

In [None]:
trainset[42][0]

Now combine your new knowledge to display the first cat in the training set.

In [None]:
# INSERT CODE HERE

**Exercise 2.4: PIL to numpy**  
Convert the cat image to a numpy array and print its shape and data type using dot notation: `.shape` and `.dtype`

In [None]:
cat = # INSERT CODE HERE
# INSERT CODE HERE

**Exercise 2.5: Plot the image**  
From the previous exercise we can see that the image resolution is 32x32 with 3 color channels. Each pixel is represented as three 8-bit integers with values from 0 to 255 for its red, green and blue values. Let's have a look at the array itself to verify this. Execute the following line to see a part of it:

In [None]:
cat

Now plot the image using pyplot's `imshow` function.

In [None]:
# INSERT CODE HERE

plt.show()

**Exercise 2.6 (Optional): Grid plot**  
Let's use a for loop to plot the first 25 images of the training set in a 5x5 grid. Use pyplot's `subplot` function for this and don't forget to convert the PIL images to numpy arrays. Try to keep it simple, only 2 lines of code are missing.

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    # INSERT CODE HERE
    
    
    plt.xticks([])
    plt.yticks([])
plt.show()

Can you classify them all by eye? The difficulty varies a lot. Trying some of the harder examples makes me appreciate how well neural nets work.

## 3. Dataloaders  
In this section we prepare the dataloaders used for training the model. Dataloaders are an essential part and handle the data for us.  
  
For both PyTorch and TensorFlow we usually work with tensors. They have a lot in common with numpy arrays. Torch tensors and numpy arrays can easily be converted from one to the other.  
  
Tensors can be seen simply as generalization of vectors and matrices. While vectors are 1D-grids of numbers and matrices are 2D-grids, tensors are grids with arbitrary number of dimensions. Vectors are first order tensors and matrices are second order tensors.  
  
Images are usually represented as 4th order tensors: (batch size, number of color channels, height, width). The batch size will become relevant later when we train a model. The other 3 values you are already familiar with from previous exercises with numpy.  
  
**Exercise 3.1: Tensor format**  
Let's transform the images to tensors.

In [None]:
# we do this so we have to type less
import torchvision.transforms as transforms

# define the transformation
transform = transforms.Compose([transforms.ToTensor()])

# reload the dataset with the transformation applied
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)


Print the first element of the training set and compare it with the previous exercise. Then print the tensor's shape and datatype using `.shape` and `.dtype` like with the numpy array in exercise 2.4. What do you notice?

In [None]:
# INSERT CODE HERE

Indeed, conversion to tensor also changes the variables to floats in the range [0, 1] and moves the number of channels in front of the resolution. This is also described in the official documentation:  
https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html  
  


**Exercise 3.2: Normalization**  
Depending on application, the input to neural networks is often normalized so that the feature ranges are similar. Normalization is the following transformation: `out = (in - mean)/standard deviation`. A common choice for normalization is to simply use means and standard deviations 0.5 for all channels like this.


In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # (mean), (std)

# let's test the normalization
transform(np.array([[[0,0.5,1]]],dtype='float32'))

Maybe we can do better. Let's calculate CIFAR-10's mean and std.

In [None]:
print((trainset.data.mean(axis=(0,1,2))/255).round(4))
print((trainset.data.std(axis=(0,1,2))/255).round(4))

Use the result above to write your own transformation below with modified normalization.

In [None]:
# define the transformation
transform = # INSERT CODE HERE

# reload the dataset with the transformation applied
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

To train neural networks we use dataloaders, which automatically shuffle the images, apply the transformation and serve them in batches. Note that we only shuffle for the training set.

In [None]:
# the batch size represents how many images we use per training iteration
batch_size = 10

testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)


## 4. Model

It is time for us to define our model. For this, we choose the famous LeNet-5 modified to CIFAR-10. For now, you don't need to understand the details. We will have a closer look during the next lecture.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class LeNet(nn.Module):
    # here we define the layers that we will use
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.flat = nn.Flatten()
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10) # 10 outputs
    # here we define the forward pass
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.flat(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

## 5. Learning Rate Range Test
We do not yet know what learning rate we should choose for training. The ideal learning rate can vary by many orders of magnitude depending on a lot of different factors from model, data, optimizer, batch size and more. Randomly guessing is very inefficient and time consuming. Therefore, we will use a learning rate range test to narrow it down.  
  
Since our model is very small and GPU availability on Colab is limited, we will stick to CPU for now.

In [None]:
# we create our neural network
net = LeNet()


import torch.optim as optim

# this is our loss
criterion = nn.CrossEntropyLoss()

# this is our optimizer: stochastic gradient descent with momentum
# notice we set our starting learning rate to 1e-4, you can change it
optimizer = optim.SGD(net.parameters(), lr=1e-4, momentum=0.9)

# collect [learning rate, loss] for every training iteration
metrics = [[],[]]

# we use a maximum of  2 epochs to prevent it from taking too long
for epoch in range(2):
    for i, data in enumerate(trainloader, 0):
        # get the inputs from data, which is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward pass
        outputs = net(inputs)
        # calculate loss
        loss = criterion(outputs, labels)
        # backpropagation
        loss.backward()
        optimizer.step()

        # print our progress every 1000 batches
        if i % 1000 == 999:
            print(f'Epoch {epoch+1} batch {i+1} complete.')
        
        # access our learning rate hyperparameter
        for pg in optimizer.param_groups:
            # update metrics
            metrics[0].extend([pg['lr']])
            metrics[1].extend([loss.item()])
            # exponentially increase learning rate
            pg['lr'] *= 1.001 # you can change this!
        # early abort if learning rate gets too big
        if metrics[0][-1] > 0.1: # you can change this!
            break 
    else:
        continue
    print('Finished early!')
    break



**Exercise 5.1: Plot metrics**  
It's time to plot the metrics with x-axis as learning rate and y-axis as loss. We want to use a `semilogx` plot because the learning rate is increasing exponentially. Since the data is very noisy, we also want to plot a moving average.

In [None]:
cumsum = np.cumsum(np.insert(np.array(metrics[1]), 0, 0))
width = 20
moving_average = (cumsum[width:] - cumsum[:-width])/width

plt.figure(figsize=(10,6))
# INSERT CODE HERE


plt.ylim(0,4) # feel free to modify this as needed
plt.xlabel('Learning Rate')
plt.ylabel('Loss')
plt.show()

Even though the data is noisy, it should give us some key information. We see at which learning rate the loss starts to decrease and at which point it stops decreasing and then increases again, perhaps even diverges. A learning rate in the middle of the downward slope might be a good starting point.

## 6. Training

It is time to train our neural network. This should take 3-4 minutes.


In [None]:
net = LeNet()

import time
import torch.optim as optim

# this is our loss
criterion = nn.CrossEntropyLoss()

# this is our optimizer: stochastic gradient descent with momentum
optimizer = optim.SGD(net.parameters(), lr=0.002, momentum=0.9)

start_time = time.time()
for epoch in range(5):  # loop over the dataset every epoch
    # to collect training statistics
    running_loss = 0.0
    correct = 0
    total = 0
    for i, data in enumerate(trainloader, 0):
        # get the inputs from data, which is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward pass
        outputs = net(inputs)
        # calculate loss
        loss = criterion(outputs, labels)
        # backpropagation
        loss.backward()
        optimizer.step()

        # update training statistics
        running_loss += loss.item()
        predicted = torch.argmax(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    # print training statistics after every epoch
    print(f'Epoch {epoch+1} complete. Loss: {running_loss / 5000:.3f}, '+
          f'training accuracy: {100 * correct / total}')


print('Finished training in', round(time.time()-start_time,3),'seconds')

**Exercise 6.1: Test accuracy**  
It is time to test our true accuracy on the test set. Finish the code below. 5 lines of code are missing. Each can be copied from the training loop above.  
*Tip: we only need the forward pass.*


In [None]:
correct = 0
total = 0
# since we are not training, we don't want to evaluate the gradients
with torch.no_grad():
    for data in testloader:
        # INSERT CODE HERE
        

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total}%')


**Exercise 6.2: Logits and softmax**  
Let's look at a single output.

In [None]:
images, labels = iter(testloader).next()
with torch.no_grad(): outputs = net(images)

`outputs` now contains the outputs for the first test batch. Print the output for the first image only.

In [None]:
# INSERT CODE HERE

These are called logits. By themselves, they are not very meaningful for humans. The softmax function can be used to turn the logits into pseudo probabilities by rescaling them to the range [0, 1] with a sum of 1. Use torch's softmax functional to turn the outputs into probabilities and print the probabilities for the first image.  
https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html


In [None]:
# INSERT CODE HERE
