## Visualization of CNN: Grad-CAM
* **Objective**: Convolutional Neural Networks are widely used on computer vision. It is powerful for processing grid-like data. However we hardly know how and why it works, due to the lack of decomposability into individually intuitive components. In this assignment, we use Grad-CAM, which highlights the regions of the input image that were important for the neural network prediction.


* NB: if `PIL` is not installed, try `conda install pillow`.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, datasets, transforms
import matplotlib.pyplot as plt
import pickle
import urllib.request
import cv2
from skimage import exposure 
import numpy as np
from PIL import Image

%matplotlib inline

### Download the Model
We provide you a pretrained model `ResNet-34` for `ImageNet` classification dataset.
* **ImageNet**: A large dataset of photographs with 1 000 classes.
* **ResNet-34**: A deep architecture for image classification.

In [None]:
resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode

![ResNet34](https://miro.medium.com/max/1050/1*Y-u7dH4WC-dXyn9jOG4w0w.png)


Input image must be of size (3x224x224). 

First convolution layer with maxpool. 
Then 4 ResNet blocks. 

Output of the last ResNet block is of size (512x7x7). 

Average pooling is applied to this layer to have a 1D array of 512 features fed to a linear layer that outputs 1000 values (one for each class). No softmax is present in this case. We have already the raw class score!

In [None]:
classes = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl'))

##classes is a dictionary with the name of each class 
print(classes)

### Input Images
We provide you 20 images from ImageNet (download link on the webpage of the course or download directly using the following command line,).<br>
In order to use the pretrained model resnet34, the input image should be normalized using `mean = [0.485, 0.456, 0.406]`, and `std = [0.229, 0.224, 0.225]`, and be resized as `(224, 224)`.

In [None]:
def preprocess_image(dir_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = datasets.ImageFolder(dir_path, transforms.Compose([
            transforms.Resize(256), 
            transforms.CenterCrop(224), # resize the image to 224x224
            transforms.ToTensor(), # convert numpy.array to tensor
            normalize])) #normalize the tensor

    return (dataset)

In [None]:
# The images should be in a *sub*-folder of "data/" (ex: data/TP2_images/images.jpg) and *not* directly in "data/"!
# otherwise the function won't find them

import os
os.mkdir("data")
os.mkdir("data/TP2_images")
!cd data/TP2_images && wget "https://www.lri.fr/~gcharpia/deeppractice/2023/TP2/TP2_images.zip" && unzip TP2_images.zip
dir_path = "data/" 
dataset = preprocess_image(dir_path)

In [None]:
# show the orignal image 
index = 5
input_image = Image.open(dataset.imgs[index][0]).convert('RGB')
plt.imshow(input_image)

In [None]:
output = resnet34(dataset[index][0].view(1, 3, 224, 224))
values, indices = torch.topk(output, 3)
print("Top 3-classes:", indices[0].numpy(), [classes[x] for x in indices[0].numpy()])
print("Raw class scores:", values[0].detach().numpy())

### Grad-CAM 
* **Overview:** Given an image, and a category (‘tiger cat’) as input, we forward-propagate the image through the model to obtain the `raw class scores` before softmax. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the `rectified convolutional feature map` of interest, where we can compute the coarse Grad-CAM localization (blue heatmap).


* **To Do**: Define your own function Grad_CAM to achieve the visualization of the given images. For each image, choose the top-3 possible labels as the desired classes. Compare the heatmaps of the three classes, and conclude. 


* **To be submitted within 2 weeks**: this notebook, **cleaned** (i.e. without results, for file size reasons: `menu > kernel > restart and clean`), in a state ready to be executed (if one just presses 'Enter' till the end, one should obtain all the results for all images) with a few comments at the end. No additional report, just the notebook!


* **Hints**: 
 + We need to record the output and grad_output of the feature maps to achieve Grad-CAM. In pytorch, the function `Hook` is defined for this purpose. Read the tutorial of [hook](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) carefully. 
 + The pretrained model resnet34 doesn't have an activation function after its last layer, the output is indeed the `raw class scores`, you can use them directly. 
 + The size of feature maps is 7x7, so your heatmap will have the same size. You need to project the heatmap to the resized image (224x224, not the original one, before the normalization) to have a better observation. The function [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate) may help.  
 + Here is the link of the paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/pdf/1610.02391.pdf)

Class: ‘pug, pug-dog’ | Class: ‘tabby, tabby cat’
- | - 
![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/dog.jpg)| ![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/cat.jpg)

In [None]:
def Grad_Cam(image,category):
  feats = []
  grads = []
  def hook1(model, grad_input, grad_output):
    grads.append(grad_output[0])
  def hook2(model, input, output):
    feats.append(output.data)
  h1, h2 = model.layer4[2].bn2.register_backward_hook(hook1), model.layer4[2].bn2.register_forward_hook(hook2)
  out = model(image)
  sl = np.zeros((1, out.size()[-1]), dtype=np.float32)
  sl[0][category] = 1
  sl = torch.sum(torch.from_numpy(sl).requires_grad_(True) * out)
  model.zero_grad()
  sl.backward(retain_graph=True) 
  grads = grads[0][-1].numpy()
  feats = feats[0][-1].numpy() 
  h1.remove()
  h2.remove()
  W = np.mean(grads, axis=(1,2))
  hp = np.zeros(feats.shape[1:])
  for i in range(W.shape[0]): 
      hp += W[i] * feats[i, :, :]
  hp = torch.from_numpy(np.maximum(hp, 0).reshape(1,1,7,7))
  hp = F.interpolate(hp,scale_factor=32,mode='bilinear')
  hp = hp.numpy()[0,0,:,:]
  max_hp = np.max(hp)
  hp = hp / max_hp
  return hp

for i in range(20):
  image = np.float32(cv2.resize(np.array(Image.open(
      dataset.imgs[i][0]).convert('RGB')), (224, 224))) / 255
  input = dataset[i][0].view(1, 3, 224, 224)
  model = models.resnet34(pretrained=True).eval()
  out = model(input)
  values, indices = torch.topk(out,3)
  figs, axs = plt.subplots(1,4,figsize=(12,3))
  axs[0].imshow(image)
  axs[0].set_title('Original image '+str(i+1))
  for j in range(1,4): 
    cat = indices[0].numpy()[j-1]
    hp = Grad_Cam(input, cat)
    hp = np.float32(cv2.applyColorMap(np.uint8(255*hp), cv2.COLORMAP_JET) )/255
    new_image = np.uint8(255*cv2.addWeighted(hp, 0.5, image, 0.5, 0.0)[:, :, ::-1])
    axs[j].imshow(new_image)
    label = classes[cat].split(',') 
    axs[j].set_title(label[0])
  plt.show()


Selvaraju's article "Grad-CAM 2019" employed a method known as Grad-CAM on the last convolutional layer. I utilized the ResNet34 architecture in this experiment and employed Grad-CAM on the last BatchNormalization layer, which was read as the last convolutional layer. When the heatmaps created by Grad-CAM on the final bn2 layer and the last conv2 layer were compared, it was discovered that the bn2 layer produced much superior results.

Based on the prior results, it is clear that the initial predictions connected with the first heatmaps are more accurate and related to the actual image in virtually all circumstances. But, in certain circumstances, the second or third forecasts outperform the first. The three heatmaps show that the network observes almost the same region for the prediction for most of the samples. The projected animals in each of these samples are members of the same mammalian family and share similar traits. However, some examples show strange results, for example for image 8, the three heatmaps show that the network does not focus on the same locations for all predictions. In first 2 times the network concentrates on the foreground. Yet, in the third one it concentrates on the animal's back. Nevertheless, some incorrect identification was present during the experiment. For instance, image 13 depicts a case in which the network fails to detect the animal, as evidenced by the heatmaps. In all cases, the network focuses on the border regions outside of the animal's real position. In conclusion, Grad Cam is a useful tool for learning about the network's internal operations and the factors it considers when generating predictions. We can see changes in the network's predictions and how they relate to certain portions of the input image by evaluating the heatmaps. Grad Cam was crucial in exposing how the network distinguishes between animal species and breeds within the same family by identifying the distinct traits that the network focuses on throughout the prediction phase in our specific challenge.