# Vanilla Backpropagation Saliency Map 

#### Link to Readme section:    

https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/blob/main/README.md#saliency-maps

#### Citations:

- https://towardsdatascience.com/saliency-map-using-pytorch-68270fe45e80

**Motivation**: A deep Learning model is a black-box model. It means that we cannot analyze how the model can predict the result based on the data. As the model gets more complex, the interpretability of the model will reduce. However, we still can infer the deep learning model through a visualization known as a saliency map. Saliency maps are a way to measure the spatial support of a particular class in each image. 

#### 1. Initial Set-Up

This adds all the imports that are necessary for the code to run smoothly. It involves importing `torch` which is necessary to work with our model and retrieve our datasets. Additionally, `matplotlib.pyplot` is imported in order to visualize the saliency map.

In [None]:
import os
import time
import copy
import csv

from pprint import pprint

import torch
from torchvision import transforms, models
from torch.utils.data import DataLoader

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score

from PIL import Image
import matplotlib.pyplot as plt

Here, we set up the model by loading the trained ResNet50 model and getting it set up to be on 'eval' mode.

In [None]:
data_dir = 'data/face_images_80_10_10'
print(f'using {data_dir} as data folder')

num_classes = 7

# print if running on gpu or on cpu
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f'Using {device} device')

# load trained model
model = models.resnet50(num_classes=num_classes)
model.load_state_dict(torch.load('main_resnet50/FEC_resnet50_trained_model.pt', map_location=device))

model = model.to(device)

# Set the model on Eval Mode
model.eval()

#### 2. Perform Image Pre-Processing and Data Augmentation

This performs the desired pre-processing and data augmentation steps. It splits the necessary transformations based on whether the image is used for training, validation or testing. 

The training images are resized, having arbitrary rotations added and random horizontal flips. They are also altered by varying their brightness, contrast and saturation values. They are lastly normalizd as per the ImageNet standard.

The validation and testing images are only resized and normalized.

In [None]:
# transformations to apply to images
# data augmentation and normalization for training
# just normalization for validation and testing
# https://pytorch.org/vision/stable/transforms.html
input_size = 224
data_transforms = {
	'train': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		# transforms.Grayscale(), (cannot use greyscale with resnet)
		# rotation augmentation
		transforms.RandomRotation(10),
		# random flip augmentaion
		transforms.RandomHorizontalFlip(),
		# jitter brightness, contrast, saturation augmentaion
		transforms.ColorJitter(brightness=0.2, contrast=0.1, saturation=0.1, hue=0),
		# convert to tensor and normalize
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	]),
	'val': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	]),
	'test': transforms.Compose([
		transforms.Resize(size=(input_size, input_size)),
		transforms.ToTensor(),
		# use ImageNet standard mean and std dev for transfer learning
		transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
	])
}

#### 3. Perform Backpropogation

After we transform the image, we have to reshape it because our model reads the tensor on 4-dimensional shape (batch size, channel, width, height). Then, we have to set the image to catch gradient when we do backpropagation to it. Once we do this, we can catch the gradient by put the image on the model and do the backpropagation. 

In [None]:
# Open sample image 
image = Image.open('data/face_images_unsplit/happy/AF01HAHR.JPG')

# apply transforms the image
image = data_transforms['test'](image)

# Reshape the image (because the model use 
# 4-dimensional tensor (batch_size, channel, width, height))
image = image.reshape(1, 3, input_size, input_size)

# Set the device for the image
image = image.to(device)

# # Set the requires_grad_ to the image for retrieving gradients
image.requires_grad_()

# Retrieve output from the image
output = model(image)

# Catch the output
output_idx = output.argmax()
output_max = output[0, output_idx]

# Do backpropagation to get the derivative of the output based on the image
output_max.backward()

#### 4. Visualize Saliency Map

Now, we can visualize the gradient using matplotlib. Before doing that, since the image has three channels to it, we have to take the maximum value from those channels on each pixel position. Finally, we can visualize the result 

In [None]:
# Retireve the saliency map and also pick the maximum value from channels on each pixel.
# In this case, we look at dim=1. Recall the shape (batch_size, channel, width, height)
saliency, _ = torch.max(image.grad.data.abs(), dim=1)
saliency = saliency.reshape(224, 224)

# Reshape the image
image = image.reshape(-1, 224, 224)

# Visualize the image and the saliency map
inverse_normalize = transforms.Normalize(
    mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
    std=[1/0.229, 1/0.224, 1/0.255]
)
# undo normalization for viewing
with torch.no_grad():
        image = inverse_normalize(image)

# plot
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image.cpu().numpy().transpose(1, 2, 0))
ax[0].axis('off')
ax[1].imshow(saliency.cpu(), cmap='hot')
ax[1].axis('off')
plt.tight_layout()
fig.suptitle('The Image and Its Saliency Map')
plt.show()

<div align="center">
<img src="https://git.cs.vt.edu/sdeepti/facial-expression-recognition/-/raw/main/Images/vanilla_saliency_map.png">
</div>