---
layout: post
title:  "Interpretable AI: CAM"
date:   2023-06-07 10:14:54 +0700
categories: MachineLearning
---

# Introduction

CAM (Class Activation Mapping) is a way to understand how and why a deep learning model has arrived at its prediction. It adds transparency to the applications, especially helpful in healthcare, finance, and autonomous vehicles where safety, compliance and robustness are important. It does so by visualize the decision of a convolutional neural network on the image. Roughly speaking, it shows where the model was looking at while it made the decision. This means that CAM provides a spatial map of the important features/pixels for the task at hand, giving some explanation for it. It can easily imagined that a heatmap of where the decision was focused on would provide great aid to doctors in medical imaging tasks.

# CAM

CAM's authors argue that the convolutional units in the CNN are the part that actually localize objects in the images despite having not being instructed explicitly. This ability would be diluted in the last layer of fully connected neurons. To avoid this, some new network architecture was developed to be fully convolutional. Of those, some use a global average pooling layer, that acts as a structural regularizer, preventing overfitting. The authors provide some tweaking to make such network able to retain the ability to localize discriminative regions.

The authors use a similar network architecture to GoogLeNet, with mostly convolutional layers, and then just before the final softmax output, they put a global average pooling (GAP) and a fully connected layer. They would then project the weights of the fully connected layer back to the last convolutional feature maps (so it is called class activation mapping).

<img width="1217" alt="Screenshot 2023-06-07 at 14 14 26" src="https://github.com/FlyingWhalesHQ/flying-whales-blog/assets/7457301/2c55180f-428a-419d-8b79-666dbd3b8a0b">

The authors also make a different between using global average pooling and global max pooling method. The global average pooling layer would encourage the network to recognize the whole extent of objects. Meanwhile the average max pooling only identify one discriminative part. 


In [None]:
import numpy as np
import cv2
from keras.applications.vgg16 import VGG16, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
from keras.layers.core import Lambda
from keras.models import Sequential
from tensorflow.python.framework import ops
import keras.backend as K
import tensorflow as tf
import matplotlib.pyplot as plt
tf.compat.v1.disable_eager_execution()

# Load VGG16 model
model = VGG16(weights='imagenet')

# Load the image
img_path = '/kaggle/input/photo2monet-examples/cat-dog.jpg'  # Insert your image path here
original_img = cv2.imread(img_path)
width, height, _ = original_img.shape

img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Get the predictions
preds = model.predict(x)

# Take the topmost class index
top = np.argmax(preds[0])

# Take output from the final convolutional layer
output = model.output[:, top]
last_conv_layer = model.get_layer('block5_conv3')

# Compute the gradient of the class output value with respect to the feature map
grads = K.gradients(output, last_conv_layer.output)[0]

# Pool the gradients over all the axes leaving out the channel dimension
pooled_grads = K.mean(grads, axis=(0, 1, 2))

# Weigh the output feature map with the computed gradient values
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([x])

for i in range(512):   # we have 512 features in our last conv layer
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

# Average the weighted feature map along the channel dimension resulting in a heat map of size 14x14 
heatmap = np.mean(conv_layer_output_value, axis=-1)

# Normalize the heat map to make the values between 0 and 1
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)

# Resize heatmap to original image size
heatmap = cv2.resize(heatmap, (height, width))

# Plot original image and heatmap side by side
plt.figure()
plt.imshow(original_img)
plt.title('Original Image')

plt.figure()
plt.imshow(heatmap)
plt.title('Class Activation Map')
plt.show()

![cat-dog](https://github.com/FlyingWhalesHQ/flying-whales-blog/assets/7457301/473b5b41-c0a4-4361-922d-fb27b6263f10)

![CAM](https://github.com/FlyingWhalesHQ/flying-whales-blog/assets/7457301/eae97d23-ac18-487d-b23d-90e8bbfd3015)