<a href="https://colab.research.google.com/github/dasjyotishka/Explainable-AI_Custom-implementation-of-GradCAM-algorithm-for-the-visualization-of-CNN-layers-/blob/main/TP2_GradCAM_Jyotishka_Jialin_Zeli_Final_result.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Visualization of CNN: Grad-CAM
* **Objective**: Convolutional Neural Networks are widely used on computer vision. It is powerful for processing grid-like data. However we hardly know how and why it works, due to the lack of decomposability into individually intuitive components. In this assignment, we use Grad-CAM, which highlights the regions of the input image that were important for the neural network prediction.


* NB: if `PIL` is not installed, try `conda install pillow`.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, datasets, transforms
import matplotlib.pyplot as plt
import pickle
import urllib.request
from matplotlib import pyplot as plt
import numpy as np
from PIL import Image

%matplotlib inline

### Download the Model
We provide you a pretrained model `ResNet-34` for `ImageNet` classification dataset.
* **ImageNet**: A large dataset of photographs with 1 000 classes.
* **ResNet-34**: A deep architecture for image classification.

In [None]:
resnet34 = models.resnet34(pretrained=True)
resnet34.eval() # set the model to evaluation mode

![ResNet34](https://miro.medium.com/max/1050/1*Y-u7dH4WC-dXyn9jOG4w0w.png)


Input image must be of size (3x224x224).

First convolution layer with maxpool.
Then 4 ResNet blocks.

Output of the last ResNet block is of size (512x7x7).

Average pooling is applied to this layer to have a 1D array of 512 features fed to a linear layer that outputs 1000 values (one for each class). No softmax is present in this case. We have already the raw class score!

In [None]:
classes = pickle.load(urllib.request.urlopen('https://gist.githubusercontent.com/yrevar/6135f1bd8dcf2e0cc683/raw/d133d61a09d7e5a3b36b8c111a8dd5c4b5d560ee/imagenet1000_clsid_to_human.pkl'))

#classes is a dictionary with the name of each class
print(classes)

### Input Images
We provide you 20 images from ImageNet (download link on the webpage of the course or download directly using the following command line,).<br>
In order to use the pretrained model resnet34, the input image should be normalized using `mean = [0.485, 0.456, 0.406]`, and `std = [0.229, 0.224, 0.225]`, and be resized as `(224, 224)`.

In [None]:
# Preprocessing the images
def preprocess_image(dir_path):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

    dataset = datasets.ImageFolder(dir_path, transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224), # resize the image to 224x224
            transforms.ToTensor(), # convert numpy.array to tensor
            normalize])) #normalize the tensor

    return (dataset)

In [None]:
# The images should be in a *sub*-folder of "data/" (ex: data/TP2_images/images.jpg) and *not* directly in "data/"!
# otherwise the function won't find them

import os
os.mkdir("data")
os.mkdir("data/TP2_images")
!cd data/TP2_images && wget "https://www.lri.fr/~gcharpia/deeppractice/2023/TP2/TP2_images.zip" && unzip TP2_images.zip
dir_path = "data/"
# Delete the zip file, which is not needed anymore
os.remove('/content/data/TP2_images/TP2_images.zip')
dataset = preprocess_image(dir_path)

In [None]:
# Show the orignal image
index = 5
input_image = Image.open(dataset.imgs[index][0]).convert('RGB')
plt.imshow(input_image)

In [None]:
output = resnet34(dataset[index][0].view(1, 3, 224, 224))
values, indices = torch.topk(output, 3)
# To store the indices of the top three classes predicted
top_3_classes = indices[0].numpy()
# To store the labels of the top three classes predicted
top_3_names= [classes[x] for x in indices[0].numpy()]
print("Top 3-classes:", top_3_classes , [classes[x] for x in indices[0].numpy()])
print("Raw class scores:", values[0].detach().numpy())

### Grad-CAM
* **Overview:** Given an image, and a category (‘tiger cat’) as input, we forward-propagate the image through the model to obtain the `raw class scores` before softmax. The gradients are set to zero for all classes except the desired class (tiger cat), which is set to 1. This signal is then backpropagated to the `rectified convolutional feature map` of interest, where we can compute the coarse Grad-CAM localization (blue heatmap).


* **To Do**: Define your own function Grad_CAM to achieve the visualization of the given images. For each image, choose the top-3 possible labels as the desired classes. Compare the heatmaps of the three classes, and conclude.


* **To be submitted within 2 weeks**: this notebook, **cleaned** (i.e. without results, for file size reasons: `menu > kernel > restart and clean`), in a state ready to be executed (if one just presses 'Enter' till the end, one should obtain all the results for all images) with a few comments at the end. No additional report, just the notebook!


* **Hints**:
 + We need to record the output and grad_output of the feature maps to achieve Grad-CAM. In pytorch, the function `Hook` is defined for this purpose. Read the tutorial of [hook](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks) carefully.
 + The pretrained model resnet34 doesn't have an activation function after its last layer, the output is indeed the `raw class scores`, you can use them directly.
 + The size of feature maps is 7x7, so your heatmap will have the same size. You need to project the heatmap to the resized image (224x224, not the original one, before the normalization) to have a better observation. The function [`torch.nn.functional.interpolate`](https://pytorch.org/docs/stable/nn.functional.html?highlight=interpolate#torch.nn.functional.interpolate) may help.  
 + Here is the link of the paper [Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/pdf/1610.02391.pdf)

Class: ‘pug, pug-dog’ | Class: ‘tabby, tabby cat’
- | -
![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/dog.jpg)| ![alt](https://raw.githubusercontent.com/jacobgil/pytorch-grad-cam/master/examples/cat.jpg)

In [None]:
import cv2


# To get the parameters of the last fully-connected network in a list
params = list(resnet34.fc.parameters())
# The weights of the last fully-connected network are stored in a numpy array for easy indexing with the output classes.
# The shape of weights is (1000, 512)
weight = np.squeeze(params[0].data.numpy())
print('weight.shape', weight.shape)


# Custom GradCAM implementation function
def return_CAM(feature_conv, weight, class_idx):
  """
  return_CAM generates the CAMs and up-sample it to 224x224
  arguments:
  feature_conv: the feature maps of the last convolutional layer of the model
  weight: the weights that have been extracted from the trained parameters
  class_idx: the label of the class for which we are interested to generate the heatmap
  """


  # Since we only consider one input image at a time, therefore in this case, the shape of the image after the last convolution layer is (1, 512, 7, 7)
  bz, nc, h, w = feature_conv.shape

  # The heatmap should be upscaled to 224x224 to superimpose it on the original image
  size_upsample = (224, 224)

  output_cam = []

  # The shape of beforeDot is (512, 49)
  beforeDot =  feature_conv.reshape((nc, h*w))

  # The weight corresponding to the desired label in the last fully connected network is multiplied with beforeDot
  # This, in effect, gives us a featuremap, in which only the excitations with respect to the desired class is highlighted
  # The shape of cam is (1, 512) x (512, 49) = (1, 49)
  cam = np.matmul(weight[class_idx], beforeDot)

  # The shape of cam is resized to (7, 7)
  cam = cam.reshape(h, w)
  # Scaling the pixel values corresponding to cam between 0 to 255
  cam = cam - np.min(cam)
  cam_img = cam / np.max(cam)
  cam_img = np.uint8(255 * cam_img)
  # Resizing the cam_img to the upscale value
  output_cam.append(cv2.resize(cam_img, size_upsample))
  return output_cam

# Defining a hook function to get the feature maps of the last convolutional layer
def hook_fn(module, input, output):
    # Get the feature maps of the last convolutional layer
    global feature_maps
    feature_maps = output



In [None]:
# There are 20 pictures in the test directory
for index in range(20):

  # Plot the original image
  img = cv2.imread(dataset.imgs[index][0])
  img = cv2.resize(img, (224,224))
  rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)# Convert the image from BGR to RGB
  rgb_img = np.float32(rgb_img)/255 # Convert the image to a float32 numpy array


  # Plotting the original image
  fig, axs = plt.subplots(1, 4, figsize=(20, 5))
  axs[0].imshow(rgb_img)
  axs[0].set_title('Original Image')

  image= dataset[index][0].view(1,3,224,224)
  output = resnet34(image)
  values, indices = torch.topk(output, 3)
  # Getting the top 3 predicted class indices
  top_3_classes = indices[0].numpy()
  # Getting the top 3 predicted class names
  top_3_names= [classes[x] for x in indices[0].numpy()]

  # Attach the hook to the last convolutional layer
  handle = resnet34.layer4[2].register_forward_hook(hook_fn)
  # Pass an input image through the network to get the feature maps
  resnet34(image)
  # Detach the hook after getting the feature maps
  handle.remove()


  n=0
  for i in top_3_classes:

    # We have to specify the target(i) we want to generate the Class Activation Maps for.
    grayscale_cam = return_CAM(feature_maps.detach().numpy(), weight, i) # generate the CAM for the input image
    heatmap = cv2.applyColorMap(grayscale_cam[0], cv2.COLORMAP_JET) # generate the heatmap after processing the CAM
    heatmap[:,:,0], heatmap[:,:,2] = heatmap[:,:,2], heatmap[:,:,0].copy()
    #Projecting the image from the target layer on the input image after pre-processing (224x224x3)
    # A small fraction of the heatmap colour gradient is superimposed on the original image
    projection = rgb_img + heatmap*0.003
    projection = projection - np.min(projection)
    projection = projection / np.max(projection)
    projection = np.uint8(255 * projection)
    n+=1

    # Plot figure
    axs[n].imshow(projection)
    axs[n].set_title(top_3_names[n-1])


**Comments on the observation of our heatmaps**:


We can see that our custom Grad-CAM algorithm can place red/yellow/blue circular highlight around the object relatively accurately. For example, in the 2nd row of the above image, all 3 heatmaps are able to apply red highlight on the animal, regardless of the label (porcupine, marmoset, sloth bear).


In addition, our Grad-CAM algorithm can shift the focus towards the correct object depending on the class. For example, in the 6th row of the images above with two cats, the left cat appears to be an egyptian cat and the right cat appears to be a tiger cat based on our manual visual comparison of the photo with Google images. When the predicted label is "Egyptian cat", our heat map has red colour focused on the head of the left cat. When the predicted label is "Tiger cat", our heat map has red colour concentrated on the hip of the right cat, an area with distinct black and yellow strips of hair that makes the cat look like a tiger (we think this is where the name, "Tiger cat" comes from).


Similarly, for the 8th row of the images above, with two dogs, the top dog appears to be a chesapeake bay retriever and the bottom dog appears to be a great dame based on our manual visual comparison of the photo with Google images. The heat map has red colour concentrated on the face of the bottom dog when it classifies the image as a great dame, while the red colour is concentrated on the face of the top dog when it classifies the image as a chesapeake bay retriever.

These examples show that with the help of the heatmaps, we can be fairly sure that the model is able to classify the images by really focusing on the regions of interest, which helps in vouching for the explainability of the model.