<a href="https://colab.research.google.com/github/SelAw432/DeepLearning/blob/main/Practical2%20/WhenThingsGoWrong.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning Practical 2 - Part 2 - When things go wrong!
---

## Author : Amir Atapour-Abarghouei, amir.atapour-abarghouei@durham.ac.uk

This notebook will provide you with an exercise in identifying issues when building and training a simple nerual network.

Copyright (c) 2023 Amir Atapour-Abarghouei, UK.

License : LGPL - http://www.gnu.org/licenses/lgpl.html

We are going to have the entire setup and training loop in one cell here to make things a bit easier. We are going to be using the Ackbins dataset we saw during the lecture.

There should be nothing new or surprising here but make sure you go through and understand every part of the code. Ask questions if there is something you don't understand.

In [None]:
!pip install livelossplot --quiet

import torch
import torch.nn as nn
import torchvision
from livelossplot import PlotLosses
import os.path

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# get the dataset:
if os.path.isdir('AckBinks'):
    print ("Dataset has already been downloaded...")
else:
    !wget -q -O AckBinks.zip https://github.com/atapour/dl-pytorch/blob/main/2.Datasets/AckBinks/AckBinks.zip?raw=true
    !unzip -q AckBinks.zip
    !rm AckBinks.zip

# transform images for training:
train_transform = torchvision.transforms.Compose([
    torchvision.transforms.RandomResizedCrop(128),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

print('created transforms for training set!')

# create train dataset:
train_dataset = torchvision.datasets.ImageFolder('AckBinks/train', train_transform)
print(f"There are {len(train_dataset)} images in the training set!")

class_names = train_dataset.classes
print(*class_names, sep = ", ")

# create dataloaders:
train_loader = torch.utils.data.DataLoader(train_dataset,
    batch_size=8, shuffle=True, num_workers=2)

# we will use a ResNet model here - you will learn about these later
# but you can see how easy CNNs are to use in PyTorch
model = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)
num_infea = model.fc.in_features
model.fc = nn.Linear(num_infea, 1)
model = model.to(device)

# create the optimiser:
optimiser = torch.optim.Adam(model.parameters(), lr=0.001)

# since this is a binary problem, we will use binary cross entropy as the loss function
criterion = nn.BCEWithLogitsLoss()

# initialising epoch variable
epoch = 0

# to plot losses
liveloss = PlotLosses()

# to keep the logs for loss plots
logs = {}

# main training loop:
for epoch in range(5):

    for j, batch in enumerate(train_loader):

        x, y = batch
        x, y = x.to(device), y.to(device)

        output = model(x)
        y = y.reshape((y.shape[0], 1)).float()
        loss = criterion(output, y)

        model.zero_grad()
        optimiser.step()

        # this is calculating the accuracy for the last batch
        # accumulating values or a running mean would be better
        y_prob = output > 0.5
        accuracy = (y == y_prob).sum().item() / y.size(0)

    print(epoch)
    logs['Loss'] = loss.item()
    logs['Accuracy'] = accuracy
    liveloss.update(logs)
    liveloss.send()

It should hopefully be clear to you that the model is not being trained well. The question is why!

We know the model cannot be the problem as it is built into PyTorch - and off-the-shelf architectures in stable version of deep learning framework rarely go wrong.

The dataset is fine, so there must be something wrong with our training process.

The key thing for the model to train is calculating the gradients. So let's look at the gradients of one our layers:

In [None]:
print(model.conv1.weight.grad)

You should be getting a `None` value. Is that expected? Why is this happening? What could be the cause?

It might help to understand how gradients are calculated in PyTorch. Have a look at this as it might shine some light on your issue:

https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

By this point, you should be familiar with the scientific method of debugging.

And of course there are lots of resources (e.g. tutorials, documentation, discussion forums) to help you find issues. For instance, take a look at this, which is not exactly the issue we have, but should give you an idea as to what has happened:

https://discuss.pytorch.org/t/how-to-print-the-computed-gradient-values-for-a-network/34179



Hopefully, by this point, you will have resolved the issue and can now see gradients.

Try to access the gradients of the parameters of our network and plot the histogram of the gradients to see how large or small they are.

In [None]:
# access gradient and try to plot histogram
# ...