<a href="https://colab.research.google.com/github/agarwalsiddhant10/xai_isact_2021/blob/main/XAI_ICAST_2021.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## XAI Workshop

This is a tutorial on the commonly used XAI techniques for Image Classification.

We use Python Programming language for the tutorial so we expect you to have sufficient experience with Python. We also expect you to have a good profiency in using deep learning frameworks, we will be using ```pytorch``` for all of the coding tasks below.

You will be asked to code 3 algorithms (defined as classes) namely RISE, CAM and GradCAM as discussed in the lecture. We have already written snippets to help you. The saliency maps that you generate should have values between $0$ and $1$ with $0$ being the least important region and $1$ being the most. For RISE, the shape of the map you return will be ```(No. of class, H, W)``` but for CAM and GradCAM, since you need to mention the class while computing the saliency map, the shape will naturally be ```(H, W)```. Also note that the maps returned must by numpy arrays rather than torch tensors. You need to write your code between ```###START CODE HERE###``` and ```###END CODE HERE###``` markers.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torchvision
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from skimage.transform import resize

We have created a module for performing auxilliary functions like sampling an image, viewing the saliency map etc.

In [None]:
!git clone https://github.com/agarwalsiddhant10/xai_isact_2021.git
import xai_isact_2021.xai_utils as xai_utils

In [None]:
# Loading a random image
random_image_path = xai_utils.sample_random_image()
image_tensor = xai_utils.read_tensor(random_image_path).float().cuda()

# Creating an instance of a pretrained resnet50 model
model = torchvision.models.resnet50(True)
model.eval().cuda()

# View the predictions on the sampled image
logits = model(image_tensor)
class_to_explain = np.argmax(logits.cpu().detach().numpy())
xai_utils.tensor_imshow(image_tensor, logits)

### 1. Randomized Importance Sampling for Explanations
[[Paper](https://arxiv.org/pdf/1806.07421.pdf)]

The following image will briefly explain the algorithm.
![RISE](https://raw.githubusercontent.com/agarwalsiddhant10/xai_isact_2021/main/imgs/rise.png)

The first step is to generate random masks. RISE generates random masks of size $(s, s)$ and then linearly upsamples it and crops it. We have already written the code for that.

The next step is the apply these masks on the input and use the predicted class scores to linearly combine these masks.

In [None]:
class RISE(nn.Module):
    def __init__(self, model, input_size):
        super(RISE, self).__init__()
        self.model = model
        self.input_size = input_size

    def generate_masks(self, N, s, p1):
        '''
        Args:
                N (int): Number of masks
                s (int): Size of the random mask (this will later be upscaled to the input size)
                p1 (float): probability to unmask
                
        Note: We have written this function. Do have a look at the different operations for 
        generating the random masks.
        '''
        
        # Computing the size of the random mask
        cell_size = np.ceil(np.array(self.input_size) / s)
        up_size = (s + 1) * cell_size

        # Creating the N drandom mask 
        grid = np.random.rand(N, s, s) < p1
        grid = grid.astype('float32')

        # Create an empty numpy array to store the upsampled masks
        self.masks = np.empty((N, *self.input_size))

        # Iterate over the masks
        for i in tqdm(range(N), desc='Generating filters'):
            # Random shifts
            x = np.random.randint(0, cell_size[0])
            y = np.random.randint(0, cell_size[1])
            # Linear upsampling and cropping
            self.masks[i, :, :] = resize(grid[i], up_size, order=1, mode='reflect',
                                         anti_aliasing=False)[x:x + self.input_size[0], y:y + self.input_size[1]]
            
        # Convert the masks to torch tensor and store it
        self.masks = self.masks.reshape(-1, 1, *self.input_size)
        self.masks = torch.from_numpy(self.masks).float()
        self.masks = self.masks.cuda()
        self.N = N
        self.p1 = p1

    def forward(self, x):
        with torch.no_grad():
            ###START CODE HERE###

            # Apply array of filters to the image
            
            # Compute the predictions for the masked images
            
            # Number of classes
            
            # Linearly combine the masks using the prediction scores as weights

            # Final processing. Remember, the saliency maps must be shape (CL, H, W)

            ###END CODE HERE###

Once the RISE class is defined, it can be easily used to compute the saliency map as shown in the following code. We will generate the saliency map to explain the top predicted class.

In [None]:
explainer = RISE(model, (224, 224))
explainer.generate_masks(1000, 7, 0.1)
salmaps = explainer(image_tensor)
salmaps = salmaps.cpu().numpy()

xai_utils.show_saliency(image_tensor, salmaps[class_to_explain])

### 2. Class Activation Mapping
[[Paper](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf)]

CAM, unlike RISE, is a white box explanation technique. It uses the weights and the activations of the network to compute the importance map.

![CAM](https://raw.githubusercontent.com/agarwalsiddhant10/xai_isact_2021/main/imgs/cam.png)

As shown in the image, CAM uses the activations of the last feature map and weighs them using the weights for the linear layer that follows it (and is used for classification). As you can see, CAM can be ideally applied on a very selective range of architectures. This is the disadvantage of white box techniques.

You will need to use a hook to save the activations during the forward pass. Hooks are functions that are executed during either forward or backward pass. Naturally, there are two kinds of hooks, one for each type of pass. For CAM, you will require only forward hook. In pytorch, you can register a forward hook to a tensor or network parameter using the function ```<net_param>.register_forward_hook(<hook>)```. 

In [None]:
class CAM(nn.Module):
    def __init__(self, model, feature_name):
        super(CAM, self).__init__()
        self.model = model
        self.feature_name = feature_name

        ###START CODE HERE###
        # Create and register a forward hook for saving the activatiions of the final feature map

        # Extract the weights of the linear layer or the final predictor
        
        ###END CODE HERE###

    
    def forward(self, input_tensor, class_idx):
        ###START CODE HERE###

        # Forward Pass

        # Construct CAM by multiplying weights and activations
        
        #Upsample the cam to the image size
        
        ###END CODE HERE###


Now run the following snippet to check if your class works fine.

In [None]:
explainer = CAM(model, 'layer4')
cam = explainer(image_tensor, class_to_explain)
xai_utils.show_saliency(image_tensor, cam)

CAM opens up a new class of techniques for generating saliency maps, CAM being the simplest of the lot. GradCAM is an extention to CAM. 

### 3. GradCAM
[[Paper](https://arxiv.org/pdf/1610.02391.pdf)]

GradCAM is similar to CAM but here instead of multiplying the activations with the model weights, we use gradients when the outputs of the explained class is backpropagated through the model.

Clearly the class will look very similar to the CAM class. Here since we are using gradients as well, you need to use both forward and backward hooks.

In [None]:
class GradCAM(nn.Module):
    def __init__(self, model, feature_name):
        super(GradCAM, self).__init__()
        self.model = model
        self.feature_name = feature_name

        ###START CODE HERE####
        # Create the forward and backward hooks

        ###END CODE HERE###

    
    def forward(self, input_tensor, class_idx):
        ###START CODE HERE###

        # Forward pass

        # Backward pass through the outputs

        # Multiply the gradients and activations
        
        # Apply ReLU over the obtained product
        
        # Upsample the gradcam map to the image size
    
        ###END CODE HERE###

Now, run the snippet below

In [None]:
explainer = GradCAM(model, 'layer4')
gradcam = explainer(image_tensor, class_to_explain)
xai_utils.show_saliency(image_tensor, gradcam)