Assigment by: Hans Martin Aannestad

# Getting acquainted with pre-trained deep neural networks and PyTorch

### General instructions

Each week you will be given an assignment related to the associated module. You have roughly one week to complete and submit each of them. There are 3 weekly group sessions available to help you complete the assignments, you are invited to attend one of them each week (please choose one group session and stick to it unless exceptional case). Attendance is not mandatory but recommended. However, **assignments are graded** each week and not submitting them or submitting them after the deadline will give you no points.

This jupyter notebook is the first weekly assignment of this course. We strongly recommend you to use anaconda to manage your python packages as well as for installing jupyter if it is not already done. For this course we will use PyTorch so you probably need to install some of the packages imported in the first cell of this notebook. 

If you are new to jupyter notebook please follow this [link](https://realpython.com/jupyter-notebook-introduction/) or any beginner's tutorial that would suit you well.

In this notebook you will be guided since it is the first assignment. At the beginning of this notebook, most cells just need to be run and you just have to read (and click on the links). Later, you will have to complete the code (when there are "TODOS" and "..."), answer the questions (you can either create a new cell below the questions or answer directly in the same cell as the questions) and submit an **archive** to MittUiB containing
- your completed notebook
- the file '``list_labels.txt``'
- your images (in the '``imgs``' folder)

Please include **your name** in your submitted file so that graders can easily distinguish between all students files

Submit your archive by **Sunday 31st, 23:59.**

### Introduction

This week we will get a quick idea of what a deep neural network looks like and what it is capable of when it comes to image classification tasks. To do so we will play with a pre-trained neural network (ResNet101). In addition we will make our first steps in PyTorch.

## Contents:

1. Deep neural network in PyTorch

  1.1 Pre-trained deep neural network models available in PyTorch  
  1.2. ResNet architecture and ResNet101 model  
  1.3 Neural network implementation in PyTorch
  1.4 Define a preprocess pipeline using PyTorch's transforms
  
2. Making predictions with a pre-trained network

  2.1 Load and preprocess an image  
  2.2 Extract labels with which ResNet was trained  
  2.3 Interpreting the output
  2.4 Top-1 and Top-5 errors

  
3. Playing with the ResNet model

4. Good to know

In [12]:
import pandas as pd

In [13]:
from torchvision import models

In [14]:
import torch
from torchvision import models
from torchvision import transforms
from PIL import Image
from os import listdir

## 1. Deep neural network in PyTorch

### 1.1 Pre-trained deep neural network models available in PyTorch

As written in the documentation:

> The [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html#torchvision-models) subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection and video classification. \[...\] It provides pre-trained models,

In [5]:
print(dir(models))
print(len(dir(models)))

['AlexNet', 'DenseNet', 'GoogLeNet', 'GoogLeNetOutputs', 'Inception3', 'InceptionOutputs', 'MNASNet', 'MobileNetV2', 'ResNet', 'ShuffleNetV2', 'SqueezeNet', 'VGG', '_GoogLeNetOutputs', '_InceptionOutputs', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_utils', 'alexnet', 'densenet', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'detection', 'googlenet', 'inception', 'inception_v3', 'mnasnet', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet', 'mobilenet_v2', 'quantization', 'resnet', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext50_32x4d', 'segmentation', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0', 'shufflenetv2', 'squeezenet', 'squeezenet1_0', 'squeezenet1_1', 'utils', 'vgg', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'video', 'wide_resnet101_2', 'wide_r

### 1.2 ResNet architecture and ResNet101 model

[ResNet](https://arxiv.org/abs/1512.03385) is a deep residual neural network that aims at classifying images. We can get an idea of its architecture by simply using ``print``

In [15]:
resnet = models.resnet101(pretrained=True)   # 101 means that we choose the ResNet architecture with 101 layers
print(resnet)        # Instance of a ResNet model

track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats

**QUESTIONS**

1. If we want to use 256x256 RGB images as input, what should be the dimension of the input layer of the neural network? 
2. If we have 1000 different labels (e.g cat, dog, mouse, goose, etc) what should be the dimension of the output layer of the neural network

1. ANSWER: (width x height x RGB channels) = (256 x 256 x 3) dimensional
2. ANSWER: Output layer contains the label, one-hot encoded (one for each cat, dog, mouse, goose, etc), so n different labels gives n dimensions, here 1000.

### 1.3 Neural network implementation in PyTorch

In PyTorch all neural networks should be a class that is itself a subclass of the [PyTorch's torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module) class

In the next 2 cells we'll show that [ResNet](https://pytorch.org/docs/stable/torchvision/models.html#id10) class is indeed a subclass of ``torch.nn.Module``

In [19]:
print(type(resnet))  # Find the python class of this instance (models.resnet.ResNet)

<class 'torchvision.models.resnet.ResNet'>


Now that we know that our model is an instance of the class ``torchvision.models.resnet.ResNet`` we can check if this is a subclass of [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module)

In [20]:
resnet_class = models.resnet.ResNet 
print(issubclass(resnet_class, torch.nn.Module))  # Show that the ResNet class is indeed a subclass of 'torch.nn.Module'

True


### 1.4 Define a preprocess pipeline using PyTorch's transforms

The [torchvision.transforms](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision-transforms) module can easily performs the most common image transformations such as [resize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize), [normalize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Normalize), etc. 

In addition, this module allow us to quickly define pipelines of basic preprocessing functions using the [transforms.Compose](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Compose) method.

Thus in the following cell we define the pre-processing transformations that will be used on our input images.

In [21]:
preprocessor = transforms.Compose([
    transforms.Resize(256),     # Resize to a 256x256 image
    transforms.CenterCrop(224), # Crop the center (usually where the interesting object is)
    transforms.ToTensor(),      # PyTorch's counterpart of Numpy's arrays
    transforms.Normalize(       # Normalize input the same way ResNet training inputs were normalized 
    mean=[0.485, 0.456, 0.406], ### Mean given to match what was presented to ResNet during training
    std=[0.229, 0.224, 0.225]   ### Same here
)])

## 2. Making predictions with a pre-trained network

### 2.1 Load and preprocess an image 

In PyTorch, the data are stored in [tensors](https://pytorch.org/docs/stable/tensors.html#torch.Tensor). This is the counterpart of Numpy's arrays and most of the methods that are available with numpy arrays are also available with tensors. (e.g 
[size](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size), 
[amax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.amax), 
[argmax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.argmax), 
[sort](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sort), 
[abs](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.abs), 
[cos](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.cos), 
[sum](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sum) etc.)

In [22]:
img = Image.open("imgs/Bobby.jpeg")
img_t = preprocessor(img)             # Preprocess our image using our preprocessor ('t' stands for 'tensor')
batch_t = torch.unsqueeze(img_t, 0)   # Reshape so that it is a batch (of size 1) as required by the model 

### 2.2 Extract labels with which ResNet was trained

In [23]:
# Read all the labels with which ResNet was trained and store them in the list 'labels'
with open('list_labels.txt') as f:
    labels = [line.strip() for line in f.readlines()]

As stated in the PyTorch documentation: 

> "Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use [model.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) or [model.eval()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval) (from the [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module)) as appropriate.

In [24]:
resnet.eval()  # Pytorch method to indicate that we are now using the model to make predictions and not to train it anymore 
print(" ")

 


Now we are ready to make some predictions on our images! 
Let's show the output of the resnet model given our image of Bobby the Golden Retriever

**QUESTION** 

1. Print the dimension of the tensor ``out`` using the [Tensor.size()](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size) method
2. Does it match your previous answer about the output dimension? 

In [28]:
out = resnet(batch_t)
#print(out)

print("\n Size: ", out.size())


 Size:  torch.Size([1, 1000])


### 2.3 Interpreting the output

You don't know what to do with that right? How do you know if this output tensor means that the image is a dog or a cat or something else? 

Well that's actually simple. The first idea would be to find the most activated output unit, that is to say, the index of max value and find the label with the corresponding index. To do so we use the [torch.max](https://pytorch.org/docs/stable/generated/torch.max.html?highlight=max#torch.max) function




In [29]:
_, index = torch.max(out, 1)
print(
    "Index: ",index,  
    "\nLabel: ", labels[index], 
    "\nOutput value: ", out[0, index]
    ) 

Index:  tensor([207]) 
Label:  golden retriever 
Output value:  tensor([15.6744], grad_fn=<IndexBackward>)


But now the question is how to interpret this output value? How can we say if the model hesitates between this label and another one? 

We would like to convert this tensor value into something that could be interpreted as the confidence that the model has in its prediction. To do so, we use the [softmax](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.softmax) function which normalizes our outputs to \[0, 1\] and divide by the sum. 

For more information about the SoftMax function, watch the videos by Andrew Ng: 
- [Softmax Regression (C2W3L08)](https://www.youtube.com/watch?v=LLux1SW--oM)
- [Training Softmax Classifier (C2W3L09)](https://www.youtube.com/watch?v=ueO_Ph0Pyqk)

**QUESTION** 

1. Find the index corresponding to the max value of ``out`` **Hint:** Look at the previous cell 

In [31]:

_, index = torch.max(out, 1)
confidences = torch.nn.functional.softmax(out, dim=1)[0]
percentages = confidences * 100
print(
    "Label: ",labels[index[0]], 
    "\nConfidence: ", round(percentages[index[0]].item(), 2), "%")



Label:  golden retriever 
Confidence:  96.29 %


### 2.4 Top-1 and Top-5 errors

When evaluating a image classifier we often use the terms *Top-1 error* and *Top-5 error* 

f the classifier’s top guess is the correct answer (e.g., the highest score is for the “dog” class, and the test image is actually of a dog), then the correct answer is said to be in the Top-1.

If the correct answer is at least among the classifier’s top 5 guesses, it is said to be in the Top-5.

The top-1 score is the conventional accuracy, that is to say its checks if the top class (the one having the highest confidence) is the same as the target label. This is what we have done in the cell above

On the other hand, the top-5 score checks if the target label is one of your top 5 predictions (the 5 ones with the highest confidences). To do so we use the [torch.sort](https://pytorch.org/docs/stable/generated/torch.sort.html#torch-sort) function

**QUESTIONS**

1. Complete the code below **Hint:** Look at how we preprocessed the first image Bobby 
2. Does the model seem confident about the first prediction?

2. Answer: 96.29 % for the first prediction (golden retriever) and near zero for the rest, seems highly confident.

In [34]:
num_preds = 5

img = Image.open("imgs/golden_retriever_online.jpeg")
img_t = preprocessor(img) 
batch_t = torch.unsqueeze(img_t, 0)

out = resnet(batch_t)

percentages = confidences * 100
_, indices = torch.sort(out, descending=True)

results = [(labels[idx], round(percentages[idx].item(), 2)) for idx in indices[0][:num_preds]]
for i_pred in range(num_preds):
    print(
        i_pred, ": ",
        "\nLabel: ", results[i_pred][0], 
        "\nConfidence: ",  results[i_pred][1],"%"
        )

0 :  
Label:  golden retriever 
Confidence:  96.29 %
1 :  
Label:  cocker spaniel, English cocker spaniel, cocker 
Confidence:  0.28 %
2 :  
Label:  tennis ball 
Confidence:  0.12 %
3 :  
Label:  Pembroke, Pembroke Welsh corgi 
Confidence:  0.0 %
4 :  
Label:  Irish setter, red setter 
Confidence:  0.03 %


## 3. Playing with the ResNet model

Put all the images that you want in the 'imgs/' (could be personal pictures or taken from the internet)

**QUESTIONS**

1. Complete the code below so that for each image it prints the 5 best guests according to the model
2. When the image is a dog, what are usually the 1st, 2nd, 3rd guesses? 
3. Use one of your personal picture of an object whose label is in the list of labels.
4. Try to find an image on the web whose label is in the list of labels but whose corresponding prediction is wrong. How can you try to make it difficult for the model to recognize the object? 
5. Try to find an image on the web whose label is NOT in the list of labels with which the model was trained. Look at the output, is it consistent even though it is necesseraly wrong? 

In [None]:
2. Answer: For images of dogs the list of guesses seems to be consistently different types of dogs

In [36]:
# ------------------------------
# Load inputs
# ------------------------------

# Load all the images in the 'imgs/' folder
list_img_t = []                  # Where input tensors will be stored
path_imgs = 'imgs/'   
list_files = listdir('imgs/')    # Find all filenames in the 'imgs/' folder
for f in list_files:
    img = Image.open(path_imgs + f)
    img = img.convert('RGB')  # Because some of the images are in the RGBA format while ResNet requires a RGB format
    img_t = preprocessor(img)
    list_img_t.append(torch.unsqueeze(img_t, 0) )

# ------------------------------
# Make predictions
# ------------------------------
num_preds = 5
for i, batch_t in enumerate(list_img_t):
    print("\n ====== ", list_files[i], " ====== ")

    out = resnet(batch_t)
    confidences = torch.nn.functional.softmax(out, dim=1)[0]
    percentages = confidences * 100

    _, indices = torch.sort(out, descending=True)
    results = [(labels[idx], round(percentages[idx].item(), 2)) for idx in indices[0][:num_preds]]
    for i_pred in range(num_preds):
        print(
            i_pred, ": ",
            "\nLabel: ", results[i_pred][0], 
            "\nConfidence: ",  results[i_pred][1],"%"
            )



0 :  
Label:  Eskimo dog, husky 
Confidence:  20.56 %
1 :  
Label:  chow, chow chow 
Confidence:  20.46 %
2 :  
Label:  Samoyed, Samoyede 
Confidence:  20.06 %
3 :  
Label:  malamute, malemute, Alaskan malamute 
Confidence:  14.73 %
4 :  
Label:  golden retriever 
Confidence:  8.75 %

0 :  
Label:  golden retriever 
Confidence:  96.29 %
1 :  
Label:  Labrador retriever 
Confidence:  2.81 %
2 :  
Label:  cocker spaniel, English cocker spaniel, cocker 
Confidence:  0.28 %
3 :  
Label:  redbone 
Confidence:  0.21 %
4 :  
Label:  tennis ball 
Confidence:  0.12 %

0 :  
Label:  dingo, warrigal, warragal, Canis dingo 
Confidence:  28.91 %
1 :  
Label:  chow, chow chow 
Confidence:  8.2 %
2 :  
Label:  Siberian husky 
Confidence:  7.86 %
3 :  
Label:  Eskimo dog, husky 
Confidence:  6.67 %
4 :  
Label:  Pembroke, Pembroke Welsh corgi 
Confidence:  5.65 %

0 :  
Label:  golden retriever 
Confidence:  97.04 %
1 :  
Label:  cocker spaniel, English cocker spaniel, cocker 
Confidence:  0.48 %
2 :

## 4. Good to know
- In PyTorch, data are stored in [tensors](https://pytorch.org/docs/stable/tensors.html#torch.Tensor). This is the counterpart of Numpy's array and most of the methods that are available with numpy arrays are also available with tensors. (e.g 
[size](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.size), 
[amax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.amax), 
[argmax](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.argmax), 
[sort](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sort), 
[abs](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.abs), 
[cos](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.cos), 
[sum](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.sum) etc.)
- In PyTorch all neural networks should be a class that is itself a subclass of the PyTorch's [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#module) class
- There are many well-known deep neural network architectures available in the [torchvision.models](https://pytorch.org/docs/stable/torchvision/models.html?highlight=models) sub-package. 
  - For each of these architecture a pre-trained model is available. 
  - Some of them such as the ResNet architecture even have multiple pre-trained model instances of different depths. Thus from the [ResNet](https://pytorch.org/docs/stable/torchvision/models.html#id10) class, we have [resnet18](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.resnet18), [resnet50](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.resnet50), [resnet101](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.resnet101), etc.
- During the preprocessing, we can use the [torchvision.transforms](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision-transforms) module to perform the most common image transformations
- Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, we use [model.train()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) or [model.eval()](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval)
- Top-1 and Top-5 scores are commenly used in image classification
- When there are more than 2 possible classes we often use the [SoftMax]((https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.softmax)) in the output layer to convert the output tensor vales into confidence values.