# Before you start
1. **Don't edit this file, make a copy first:**
  * Click on File -> Save a copy in Drive

2. Also do the following:
  * Click on Runtime -> Change runtime type -> Make sure hardware accelerator is set to GPU

3. I recommend going through this notebook TWICE. The first time, just try and get a feel for the overall structure of the code, and how it's similar to FastAI. The second time, try unhiding the functions to get a feel for how pytorch is different to FastAI.

# Introduction to PyTorch

This workshop is going to be a runthrough of creating a deep learning system with PyTorch.

I've tried to simplify things as much as possible, wrapping the code into functions. The code should hopefully feel similar to the FastAI code from last week.

I've hidden the code for the functions so you can get a feel for how the whole system works and how it compares to the FastAI approach, but once you get a feel for the structure of the code, I really recommend trying to open up the hidden functions and go through them with your demonstrator to see how pytorch works.

You should find the notebook is structured as follows:
1. Create the dataset
2. Creating a model
3. Training and testing the model
4. Evaluate the model


# Library Imports
PyTorch is called "torch" when importing

In [None]:
import torch
from torch import nn
from torch import optim
from torchvision import datasets, models
from torchvision import transforms as T
import torchvision.transforms.functional as F
import torch.backends.cudnn as cudnn

import matplotlib.pyplot as plt
import sklearn.metrics as skMet
import numpy as np
from tqdm.notebook import tqdm

# Dataset functions
(Open up to see what the dataset function contains)

## create_dataset()

First we put our transforms into a form to be used by PyTorch by composing them with `.Compose()`. Note that we only apply our defined transforms to the train data, as the goal is just to make the model generalize better, so we don't need to use them on the test set. We also do these 2 other transforms on both sets, `.ToTensor()` and `.Normalize()`. These help prep the data for the model. Feel free to ask your demos if you want the details!

Next we create a *dataset*. This is basically an object that tells us *where* our data is located and can load *individual* images.

Finally we create a *dataloader*. This is an object that can load *batches* (sets) of data. 

In [None]:
def create_dataset(dataset, transforms, batch_size):
    # Define transforms for the train and test set
    train_transforms = T.Compose(transforms + 
                                [T.ToTensor(),
                                 T.Normalize([0.485, 0.456, 0.406], 
                                             [0.229, 0.224, 0.225])])

    test_transforms = T.Compose([T.ToTensor(),
                                 T.Normalize([0.485, 0.456, 0.406], 
                                             [0.229, 0.224, 0.225])])

    # Create dataset
    if dataset == "CIFAR10":
        train_dataset = datasets.CIFAR10('data/train', train=True, transform=train_transforms, download=True)
        test_dataset = datasets.CIFAR10('data/test', train=False, transform=test_transforms, download=True)

    # Create dataloaders
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    return train_dataset, test_dataset, train_loader, test_loader

# Dataset
Last time to create our dataset with FastAI, we used the follwing code:


```
data_bunch = (
    fastai.vision.ImageList.from_folder(path=Path('hotdog-not-hotdog-dataset'))
    .random_split_by_pct()
    .label_from_folder()
    .transform(size=224)
    .databunch()
)
```

We get the data with `fastai.vision.ImageList.from_folder(path)`, split into train and test with `random_split_by_pct()`, get labels with `label_from_folder()`, transform (resize) our data with `transform(size=224)` and get our data into the proper format with `databunch()`.
<br></br>

---

<br> 
With PyTorch, we need a few things. We could make our own dataset if we want, but thats a bit complicated. Instead, we can use some datasets provbided by pytorch for us for now. We're going to use CIFAR10 for now, which we'll look into soon. 

We also need to tell it what transforms to do. Here we're just going to do a random flip and a random rotation.

Finally, we're going to tell the dataset how many images to pass at once to the model (This is called the batch size). This is a bit more abstract but can change how the model performs (feel free to try change this number to see what effects it has).
</br>

In [None]:
transforms = [
    T.RandomRotation(30),
    T.RandomHorizontalFlip()
]

dataset = "CIFAR10"

train_dataset, test_dataset, train_loader, test_loader = create_dataset(
    dataset = dataset, transforms = transforms, batch_size = 32)

## Dataset exploration

Here, we visualize the data and some of it's properties to get a better idea of what our dataset consists of. 

Its also good to do this as it can help you ensure all your dataset code and transforms have worked correctly.

This shows what the 10 classes are:

In [None]:
train_dataset.classes

Here we visualize a sample of the images that make up the dataset:

In [None]:
inputs, labels = next(iter(train_loader))

# Fix image scaling (ask demos if you want the details)
inv_norm = T.Normalize(
    mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
    std=[1/0.229, 1/0.224, 1/0.255])
inputs = inv_norm(inputs)

for im, label in zip(inputs, labels):
    plt.imshow(im.permute(1, 2, 0))
    plt.title(train_dataset.classes[label])
    plt.show()

Notice that the images are really blurry. This is because CIFAR10 is meant to be a dataset of small images. You should still be able to make out what each of the images are though, but hopefully this puts into context where the model could make mistakes.

Also, you can see that some of the images are also rotated from the transform we applied.

# Model functions
(Open up to see what the model function contains)

## create_model()

Here we do 2 main things. The first is this step where we set `param.requires_grad = True` for all the parameters in the model. What this is doing is telling the model to update ALL the weights (parameters = weights) in the model. We might only want to update some of the weights in the model, in which case we could tell the model to only update the weights we want.

The second change we make is changing the size of the final layer. The models we use are often designed for ImageNet, which has 1000 different classes to predict. CIFAR10 only has 10 classes however, so we need to change the final layer to match this.

In [None]:
def create_model(model_type, dataset):
    model = model_type

    # Set all weight to be updated
    for param in model.parameters():
        param.requires_grad = True

    # Change size of final layer
    if dataset == "CIFAR10":
        out_ftrs = model.fc.in_features
        model.fc = nn.Linear(out_ftrs, 10)

    return model

# Model

Now we need to create our model. in FastAi, we did this by calling: 

```
learner = fastai.vision.cnn_learner(
    data=data_bunch, 
    base_arch=fastai.vision.models.resnet50,
    pretrained=True, 
    metrics=fastai.vision.accuracy # this is a function which computes accuracy
)
```

In this code, we're creating a CNN (a more complex model type that handles images well) with `fastai.vision.cnn_learner()`. This model needs `data_bunch` to make sure the model is properly sized to work with the dataset. The model is goung to have the same structure as a famous model resnet50, be pretrained, and we care about the accuracy of the model.
<br></br>

---

<br> 
In PyTorch, we need everything but accuracy. This is because we're going to have to manually compute the accuracy ourselves later, we can't just tell the model to automatically record it.

In [None]:
model = create_model(model_type = models.resnet50(pretrained=True), dataset = dataset)

# GPU functions
(Open up to see what the GPU initialization function contains)

## setup_GPU()

This is pretty straigntforward, just a couple of lines to make sure the GPU will be properly optimized, along with checking the GPU is actually available.

We also call `.to(device)` here, which is how we tell pytorch what our model should run on. If there is a GPU available, the device should be `cuda:0`, otherwise it will be the `cpu`.

In [None]:
%%capture
def setup_GPU():
    torch.cuda.empty_cache()
    cudnn.benchmark = True  # Optimise for hardware

    # Check that the GPU is available
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    #Make sure model uses the GPU
    model.to(device)

    return device

# Initialize GPU
In PyTorch we need to explicitly write code to use the GPU, so we need to first setup the GPU. If there isn't a GPU that can be used, `device` will instead just be set to CPU (and the model will train slower).

In [None]:
device = setup_GPU()

# Train/Test functions 
(Open up to see what the train/test functions contain)

## train()

Okay, this is going to be the most complicated part of our PyTorch system, but if we break it down, you'll find that it's actually quite straightforward.

1. We loop over all of the data using `for i, (inputs, labels) in enumerate(train_loader, 0):`.
2. We make sure the data is on the GPU so it can be used by the model with `inputs, labels = inputs.to(device), labels.to(device)`. This is similar to `model.to(device)` if you saw this earlier.
3. Reset the optimizer with `optimizer.zero_grad()`.
4. Pass the inputs through the model with `outputs = model(inputs)`.
5. Compute the loss with `loss = loss_fn(outputs, labels)`.
6. Do gradient descent with `loss.backward()` and `optimizer.step()`

You can see the the fundamental process here, cutting all the pytorch specific fluff is:

> Pass data through model -> Compute the loss -> Update the model's weights


For all the data in the dataset.

And that's all there is to it! The only other thing we do is also make sure to track our loss by keeping a running total.

In [None]:
def train(model, train_loader, loss_fn, optimizer, device):
    model.train() # puts the model in training mode
    running_loss = 0
    with tqdm(total=len(train_loader)) as pbar: # Creates progress bar, can ignore
        for i, (inputs, labels) in enumerate(train_loader, 0): # loops through training data
            inputs, labels = inputs.to(device), labels.to(device) # puts the data on the GPU

            # forward + backward + optimize                                          
            optimizer.zero_grad() # clear the gradients in model parameters
            outputs = model(inputs) # forward pass and get predictions
            loss = loss_fn(outputs, labels) # calculate loss
            loss.backward() # calculates gradient w.r.t to loss for all parameters in model that have requires_grad=True
            optimizer.step() # iterate over all parameters in the model with requires_grad=True and update their weights.

            running_loss += loss.item() # sum total loss in current epoch for print later

            pbar.update(1) #increment our progress bar

    return running_loss/len(train_loader) # returns the total training loss for the epoch

## test()

The test function is very similar to the train function, so if you can understand that, you can understand this too. I'll run through the changes here:

1. Tell the model to not bother storing gradients (what we need for weight updates) with `with torch.no_grad():`.
2. We don't want to update the model, so we don't perform gradient descent. This means we don't need to run `loss.backward()` and `optimizer.step()` here.

We also want to track the accuracy, so we have some code to do that too.
1. Take the highest probability class as the prediction by the model. We do this by just taking the index of the max output with `_, predicted = torch.max(outputs, 1)`.
2. Track the number of correct predictions with `correct += (predicted == labels).sum().item()`.
3. Track the total number of predictions with `total += labels.size(0)`.

At the end we can get the accuracy by dividing the two (`correct/total`). 

In [None]:
def test(model, test_loader, loss_fn, device):
    model.eval() # puts the model in test mode
    running_loss = 0
    total = 0
    correct = 0
    
    with torch.no_grad(): # save memory by not saving gradients which we don't need 
        with tqdm(total=len(test_loader)) as pbar:
            for images, labels in iter(test_loader):
                images, labels = images.to(device), labels.to(device) # put the data on the GPU
                outputs = model(images) # passes image to the model, and gets a ouput which is the class probability prediction

                test_loss = loss_fn(outputs, labels) # calculates test_loss from model predictions and true labels
                running_loss += test_loss.item()

                _, predicted = torch.max(outputs, 1) # turns class probability predictions to class labels
                correct += (predicted == labels).sum().item() # sums the number of correct predictions
                total += labels.size(0) # sums the number of predictions
                
        
                pbar.update(1)

        return running_loss/len(test_loader), correct/total # return loss value, accuracy

# Training and testing 

To train the model in FastAI, it was a really simple, one line call:
```
learner.fit_one_cycle(cyc_len=5, max_lr=1e-4)
```
This would train our model for 5 epochs with a learning rate of 1e-4.
<br></br>

---
<br> 
Things are a bit more complicated in PyTorch. We need to define a lot more things ourself, but this also give us a lot more power and choice over the training.

First we have to define what our loss function is going to be. This is the actual mathematical function that tells the model how right/wrong it is, and what we want to minimize.

Next, we define our optimizer. This is basically what performs gradient descent and updates the weights of our model. Just know that there are a few different strategies for the best way to update the weights, and here we use 'Adam'. As the optimizer is updating the weights, it also needs the learning rate.

Finally, we want to train and test for a specific number of epochs. We do this by looping over a train and test function. These functions are where all the magic really happens, and I really recommend trying to dive into how these functions actually work (Don't be shy to get your demonstrator to help you walk through them!).

The only other thing we're doing here is storing all the losses and accuracies so we can plot them later.

In [None]:
# Loss functions
loss_fn = nn.CrossEntropyLoss()

# Optimizer that performs gradient descent
optimizer = optim.Adam(model.parameters(), lr=1e-4)

train_loss_list = []
test_loss_list = []
acc_list = []

Note that if you want, you can rerun the below code block to train your model more without restarting

In [None]:
# Train and test loop
total_epoch = 10

for epoch in range(total_epoch): # loops through number of epochs
  train_loss = train(model, train_loader, loss_fn, optimizer, device) 
  test_loss, accuracy = test(model, test_loader, loss_fn, device) 
  print("Epoch: {}/{}, Training Loss: {}, Test Loss: {}, Test Accuracy: {}".format(
      epoch+1, total_epoch, train_loss, test_loss, accuracy))
  print('-' * 20)

  train_loss_list.append(train_loss)
  test_loss_list.append(test_loss)
  acc_list.append(accuracy)

print("Finished Training")

#Save the model after we're done
torch.save(model.state_dict(), 'trained_model')

# Evaluation functions
(Open up to see what the evaluation functions contain)

## Plot functions

For this we use matplotlib. Most of the functions here should be pretty self-explanatory, its really all just setting attributes of the plot.

In [None]:
def plot_losses(train_loss, test_loss):
    plt.plot(train_loss, label='Train')
    plt.plot(test_loss, label='Test')
    plt.legend()
    plt.grid(True)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Model losses')
    plt.show()

In [None]:
def plot_accuracy(test_accuracy):
    plt.plot(test_accuracy)
    plt.grid(True)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy (%)')
    plt.title('Model accuracy')
    plt.show()

## Confusion matrix value generator

To make the confusion matrix, we need the true labels and predicted labels of every datapoint in 2 lists. To make these lists we're going to basically do a test loop, but instead of computing loss and accuracy we're going to store the labels and model predictions into 2 lists `true_label` and `pred_label`.

In [None]:
def getConfMatValues(model, loader, device):
    model.eval() 

    true_label = np.array([])
    pred_label = np.array([])
    with torch.no_grad():
        with tqdm(total=len(loader)) as pbar:
            # loops through training data
            for inputs, labels in loader:
                # Store true labels
                true_label = np.append(true_label, labels.flatten().int().numpy())

                inputs, labels = inputs.to(device), labels.to(device)  
                outputs = model(inputs)

                # store predictions
                _, predicted = torch.max(outputs, 1)
                pred_label = np.append(pred_label, predicted.to('cpu').flatten().int().numpy())

                pbar.update(1)

    return true_label, pred_label

## Top losses

(I'm gonna be honest, this is a bit of a monstrosity. This is something I had to whip up really quickly for an assignment and it shows. If you ever want to do this yourself I would really recommend looking up a better way of doing this yourself. - Jason) 

The quick summary is that this is basically the test loop we've already seen, but it also keeps an ordered list with the top n worst predictions by the model. It does this by adding a new value to the list when the model both predicts the class incorrectly and the prediction probability is higher than a value in the list. The value goes to the correct position so that the probabilities are in descending order. Finally, the last value in the list is removed if the length of the list is above n.

In [None]:
def top_n_errors(model, device, loader, n):
    model.eval()
    
    top_class_ims = [0.]
    top_class_pr = [0.]
    labels = [0.]
    ground_truth = [0.]
    
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(loader):
            target = target.type(torch.LongTensor)
            outputs = model(data.to(device))
            
            for i, out in enumerate(outputs):
                pr_vec = torch.nn.functional.softmax(out)

                prediction_pr, prediction = torch.max(pr_vec, 0, keepdim=True)
                prediction_pr = prediction_pr.item()
                
                if prediction != target[i].to(device):
                    if prediction_pr > top_class_pr[-1]:
                        for j, pr in enumerate(top_class_pr):
                            if prediction_pr > pr:
                                top_class_pr.insert(j, prediction_pr)
                                top_class_ims.insert(j, inv_norm(data[i]).permute(1, 2, 0))
                                labels.insert(j, prediction)
                                ground_truth.insert(j, target[i].to(device))
                                break
                
                if len(top_class_pr) > n:
                    top_class_pr.pop()
                    top_class_ims.pop()
                    labels.pop()
                    ground_truth.pop()
                    
        return top_class_ims, labels, top_class_pr, ground_truth

# Evaluation

Finally, we want to know how well our model has really done!


## Plotting

We can get a rough idea of the loss and accuracy of the model from the info we printed above, but just printing a bunch of numbers doesn't really make any trends in our loss/accuracy too clear. A better way would be to plot them.

Here we have one plot with training and testing loss and another plot with test accuracy. Its good to have train and test on the same plot, as then you can check for overfitting. You'll know this is happening because the the loss will be increasing while the train loss will be decreasing.

In [None]:
plot_losses(train_loss = train_loss_list, test_loss = test_loss_list)
print('')
plot_accuracy(test_accuracy = acc_list)

## Confusion matrix

For image classification tasks, especially one without too many classes like this, it can be helpful to see exactly where the model is getting things wrong. This is where the confusion matrix can help. It tells us what the model predicts vs what the actual class is, letting us pinpoint where the model is making it's mistakes.

FastAI can deal with the confusion matrix for us really nicely. All we need to do is call: 
```
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
```

Because pytorch is less self-contained than FastAI, we need to do a bit more to plot our confusion matrix. We can still use a package like scikit learn to make the plotting easier, but we now need to get the model's predictions ourselves.

In [None]:
true_label, pred_label = getConfMatValues(model, test_loader, device)

skMet.ConfusionMatrixDisplay.from_predictions(true_label, 
                                              pred_label, 
                                              display_labels = train_dataset.classes, 
                                              xticks_rotation = 'vertical')

## Top Losses

Another interesting way to visualize where the model went wrong is to plot the datapoints that gave the highest loss. These are basically the images the the image was the most confident in, but was wrong. Here we'll take the 5 most wrong images.

Again, in FastAI we can just call a function:
```
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(5)
```

But in pytorch things become more complicated. We need to write the code ourselves to find and plot the top losses.

In [None]:
top_class_ims, predicted_labels, top_class_pr, true_labels = top_n_errors(model, device, test_loader, 5)

for i in range(5):
    plt.imshow(top_class_ims[i])
    plt.title('True class: %s \nPredicted class: %s \nProbability: %.8f'%(
        train_dataset.classes[true_labels[i]], train_dataset.classes[predicted_labels[i]], top_class_pr[i]))
    plt.show()
    print('')

# And that's it!

You've now hopefully seen how the basic structure of deep learning in pytorch is exactly like in FastAI. 

If you haven't yet and feel up to it, I recommend going through this tutorial again and trying to unhide all the functions, to really engage what pytorch is all about and what makes it different to FastAI. This should give you a starting point for any deep learning project you feel like trying yourself.
<br></br>
If you want to go further on your own, the 2 main things you'll want to learn first are [how to create your own custom dataset](https://medium.com/analytics-vidhya/creating-a-custom-dataset-and-dataloader-in-pytorch-76f210a1df5d) and [how to create your own model](https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html).

Also, check out [anaconda](https://www.anaconda.com/) and [weights and biases](https://wandb.ai/site), which make python package management and logging way easier.
