## Visualization of CNN: Grad-CAM
* **Objective**: Convolutional Neural Networks are widely used on computer vision. It is powerful for processing grid-like data. However we hardly know how and why it works, due to the lack of decomposability into individually intuitive components. In this assignment, we use Grad-CAM, which highlights the regions of the input image that were important for the neural network prediction.


* NB: if `PIL` is not installed, try `conda install pillow`.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, datasets, transforms
import matplotlib.pyplot as plt
import pickle
import urllib.request
import matplotlib as mpl
import numpy as np
from PIL import Image
import cv2

%matplotlib inline

### Download the Model
We provide you a pretrained model `ResNet-34` for `ImageNet` classification dataset.
* **ImageNet**: A large dataset of photographs with 1 000 classes.
* **ResNet-34**: A deep architecture for image classification.

In [None]:
resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode

![ResNet34](https://miro.medium.com/max/1050/1*Y-u7dH4WC-dXyn9jOG4w0w.png)


Input image must be of size (3x224x224). 

First convolution layer with maxpool. 
Then 4 ResNet blocks. 

Output of the last ResNet block is of size (512x7x7). 

Average pooling is applied to this layer to have a 1D array of 512 features fed to a linear layer that outputs 1000 values (one for each class). No softmax is present in this case. We have already the raw class score!

In [None]:
classes = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl'))

##classes is a dictionary with the name of each class 
print(classes)

### Input Images
We provide you 20 images from ImageNet (download link on the webpage of the course or download directly using the following command line,).<br>
In order to use the pretrained model resnet34, the input image should be normalized using `mean = [0.485, 0.456, 0.406]`, and `std = [0.229, 0.224, 0.225]`, and be resized as `(224, 224)`.

In [None]:
def preprocess_image(dir_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = datasets.ImageFolder(dir_path, transforms.Compose([
            transforms.Resize(256), 
            transforms.CenterCrop(224), # resize the image to 224x224
            transforms.ToTensor(), # convert numpy.array to tensor
            normalize])) #normalize the tensor

    return (dataset)

In [None]:
# The images should be in a *sub*-folder of "data/" (ex: data/TP2_images/images.jpg) and *not* directly in "data/"!
# otherwise the function won't find them

import os
os.mkdir("data")
os.mkdir("data/TP2_images")
!cd data/TP2_images && wget "https://www.lri.fr/~gcharpia/deeppractice/2023/TP2/TP2_images.zip" && unzip TP2_images.zip
dir_path = "data/" 
dataset = preprocess_image(dir_path)

In [None]:
# show the orignal image 
index = 0
input_image = Image.open(dataset.imgs[index][0]).convert('RGB')
plt.imshow(input_image)

In [None]:
output = resnet34(dataset[index][0].view(1, 3, 224, 224))
values, indices = torch.topk(output, 3)
print("Top 3-classes:", indices[0].numpy(), [classes[x] for x in indices[0].numpy()])
print("Raw class scores:", values[0].detach().numpy())

### Grad-CAM 
* **Overview:** Given an image, and a category (‘tiger cat’) as input, we forward-propagate the image through the model to obtain the `raw class scores` before softmax. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the `rectified convolutional feature map` of interest, where we can compute the coarse Grad-CAM localization (blue heatmap).


* **To Do**: Define your own function Grad_CAM to achieve the visualization of the given images. For each image, choose the top-3 possible labels as the desired classes. Compare the heatmaps of the three classes, and conclude. 


* **To be submitted within 2 weeks**: this notebook, **cleaned** (i.e. without results, for file size reasons: `menu > kernel > restart and clean`), in a state ready to be executed (if one just presses 'Enter' till the end, one should obtain all the results for all images) with a few comments at the end. No additional report, just the notebook!


* **Hints**: 
 + We need to record the output and grad_output of the feature maps to achieve Grad-CAM. In pytorch, the function `Hook` is defined for this purpose. Read the tutorial of [hook](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) carefully. 
 + The pretrained model resnet34 doesn't have an activation function after its last layer, the output is indeed the `raw class scores`, you can use them directly. 
 + The size of feature maps is 7x7, so your heatmap will have the same size. You need to project the heatmap to the resized image (224x224, not the original one, before the normalization) to have a better observation. The function [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate) may help.  
 + Here is the link of the paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/pdf/1610.02391.pdf)

Class: ‘pug, pug-dog’ | Class: ‘tabby, tabby cat’
- | - 
![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/dog.jpg)| ![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/cat.jpg)

# Our implementation

In [None]:
#Function used in the forward and backward hook, to retrieve the gradient and the value output by the layer 4.
def get_grad(module,inp,out,list_grad):
  return list_grad.append(inp[0].squeeze().numpy())

def get_value(module,inp,out,list_value):
  return list_value.append(out.detach().squeeze().numpy())

In [None]:
class GradCam:
  def __init__(self,model,classes):
    self.model = model
    self.classes = classes
    self.list_grad = []#list to store the input gradient of the layer 4 for the top 3 classes.
    self.list_value =  []#list to store the value output by the layer 4 for the top 3 classes.
    self.labels = []#list to keep the label of the top 3 classes.
    self.importance_feature = np.zeros((3,512,1,1))#list that will containt the importance coefficients of each feature map.
    self.heat_map = np.zeros((3,7,7))
    self.hooks = []

  def set_hook(self):
    self.hooks.append(self.model.layer4.register_backward_hook((lambda module,inp,out : get_grad(module,inp,out,list_grad=self.list_grad))))
    self.hooks.append(self.model.layer4.register_forward_hook((lambda module,inp,out : get_value(module,inp,out,list_value=self.list_value))))

  def made_map(self,input):
    output = resnet34(input)
    values, indices = torch.topk(output, 3)
    indices = indices.numpy().reshape(-1)
    
    for i in range(len(indices)) :

      indice = indices[i]
      value = values[0][i]
      self.labels.append(self.classes[indice])#Store the label of the classe indice
      
      grad = torch.zeros(1,1000)
      grad[:,indice] = value
      output.backward(grad,retain_graph=True)
    
    for i in range(len(self.list_grad)):
      self.importance_feature[i]=(self.list_grad[i].mean(axis=(1,2)))[:,None,None]#Compute the importance coeffcients of each feature map, according to the formula given during the class. [:,None,None] is to have a array of size (512,1,1).
      self.heat_map[i] =(self.importance_feature[i]*self.list_value[0]).sum(axis=0)#Compute the heat map before the interpolation.


  def remove_hook(self):
    for hook in self.hooks:
      hook.remove()


In [None]:
def Grad_CAM(model,image,input):
  img = np.array(Image.open(image).convert('RGB'))
  img = np.float32(cv2.resize(img, (224, 224)))/255
  input = input.view(1, 3, 224, 224)
  
  #Build the heatmaps of size 7x7 for the top 3 classes.
  grad_cam = GradCam(model,classes)
  grad_cam.set_hook()
  grad_cam.made_map(input)
  grad_cam.remove_hook()

  #Interpolate the heatmaps to  get image of size 224x224
  heat_map_1 = nn.ReLU()(nn.functional.interpolate(torch.from_numpy((grad_cam.heat_map[0]/np.max(grad_cam.heat_map[0]))[None,None]),size=(224,224),mode='bilinear')).squeeze().numpy()
  heat_map_2 = nn.ReLU()(nn.functional.interpolate(torch.from_numpy((grad_cam.heat_map[1]/np.max(grad_cam.heat_map[1]))[None,None]),size=(224,224),mode='bilinear')).squeeze().numpy()
  heat_map_3 = nn.ReLU()(nn.functional.interpolate(torch.from_numpy((grad_cam.heat_map[2]/np.max(grad_cam.heat_map[2]))[None,None]),size=(224,224),mode='bilinear')).squeeze().numpy()

  #Applied a colormap and ensure that the heatmaps are in RGB format.
  heat_map_1 = cv2.applyColorMap(np.uint8(255 * heat_map_1), cv2.COLORMAP_JET)
  heat_map_1 = cv2.cvtColor(heat_map_1, cv2.COLOR_BGR2RGB)

  heat_map_2 = cv2.applyColorMap(np.uint8(255 * heat_map_2), cv2.COLORMAP_JET)
  heat_map_2 = cv2.cvtColor(heat_map_2, cv2.COLOR_BGR2RGB)

  heat_map_3 = cv2.applyColorMap(np.uint8(255 * heat_map_3), cv2.COLORMAP_JET)
  heat_map_3 = cv2.cvtColor(heat_map_3, cv2.COLOR_BGR2RGB)

  alpha=0.6
  
  #Overlay image and heatmaps
  plt.figure(figsize=(20,20))
  plt.subplot(1,4,1)
  plt.imshow(img)
  plt.subplot(1,4,2)
  plt.title(grad_cam.labels[0].split(',')[0])
  plt.imshow(img)
  plt.imshow(heat_map_1,alpha=alpha)
  plt.subplot(1,4,3)
  plt.title(grad_cam.labels[1].split(',')[0])
  plt.imshow(img)
  plt.imshow(heat_map_2,alpha=alpha)
  plt.subplot(1,4,4)
  plt.title(grad_cam.labels[2].split(',')[0])
  plt.imshow(img)
  plt.imshow(heat_map_3,alpha=alpha)

  plt.show()


## 10 Examples

In [None]:
#Compute and display gradcam on the 20 images.
for i in range (20):
  Grad_CAM(resnet34,dataset.imgs[i][0],dataset[i][0])

##Commented pictures

In [None]:
Grad_CAM(resnet34,dataset.imgs[2][0],dataset[2][0])

In the three first guesses, one can see that the neural network clearly identifies the position of the dog. The differences lies on the region that were mostly esploited. 
The third one mainly focused on the head of the dog. This part is indeed very similar to Shih-tzu. 
The second one focuses mainly on the back of the dog, which is uniformly brown, as it is for old english sheepdog.
The first one focuseson the head and the forward body of the dog. Tibetan terrier often has brown fur on the top and white fur on the other parts. Grad-Cam can make us wonder if this is that caracteristic that made it choose this race.

In [None]:
Grad_CAM(resnet34,dataset.imgs[7][0],dataset[7][0])

This example demonstrate the usefullness of such algorithms. One can clearly see that the changes in the ouput come from the object it is looking at: when center around the head of the first dog it is labeled "great dane" (image 1), and when it is centered around the head of the second dog it is labeled "Chesapeake Bay Retriever" (image 3). But both only look at the head of both dogs, so it is natural that the neural network does not have enough data to make a good performance.

In [None]:
Grad_CAM(resnet34,dataset.imgs[15][0],dataset[15][0])

On this example, we can see that the second highest score is for the "cowboy boot" class. Indeed, we can see on the heat map that the activation is maximum in an area that has a kind of "L" shape like a cowboy boot. This gives us a good explanation on why the model predicts such a score here.