# Class activation map

CNNs are little bit less black boxy than FCN.

One of the reason is ability to help developers explain why the neural network made some decision using technique called Class Activation Map (CAM).

It could help us debug problems with predictions becasue it allows us to see what exact part of images played the biggest role for making decisions.

It produces 2D grid of scores associated with specific output class computed for every location of the input image.

The easiest CAM can be obtained from CNN architectures working with GlobalAveragePooling (ResNet, Xception, ...).

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import zoom
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Model
from tensorflow.keras.applications import ResNet50

Load ResNet50.

In [None]:
res_model = ResNet50()

Load and preprocess image.

In [None]:
img = cv2.imread('dog.jpg')
img = cv2.resize(img, (224,224))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img)

In [None]:
X = np.expand_dims(img, axis=0)
X = tf.keras.applications.resnet50.preprocess_input(X)

Select the last convolutional layer.

In [None]:
for l in res_model.layers:
    print(l.name)

Select last convolutional layer output and prediction as outputs from the CAM model.

In [None]:
conv_output = res_model.get_layer("conv5_block3_out").output
pred_ouptut = res_model.get_layer("predictions").output
model = Model(res_model.input, outputs=[conv_output, pred_ouptut])

Make prediction.

In [None]:
features, results = model.predict(X)
tf.keras.applications.resnet50.decode_predictions(results)

Last convolutional layer before GlobalAveragePooling creates 2048 7*7 feature maps.

In [None]:
print(features.shape)
print(results.shape)

### Class activations
To generate class activation map we want to see which image feature are the most important to for generating output probabilities.

Each feature filters is tailored to look for a specific set of features - these are learned during the training.

Dense layer decides how much weight to give for each of features for the class.

In [None]:
plt.figure(figsize=(16, 16))
for i in range(36):
    plt.subplot(6, 6, i + 1)
    plt.imshow(img)
    heatmap = cv2.resize(features[0, :,:,i], (img.shape[1], img.shape[0]))
    plt.imshow(heatmap, cmap='jet', alpha=0.3)

Global average pooling collapses output feature maps of the last CNN to a single value features that go to the dense layer for prediction.

The dense layer assigns weights to each of those features (for each of 1000 classes in this case).

So we need to get weights of the last dense layer and compute dot product with features from the last CNN layer.

In [None]:
# weights from the last Dense layer
w = model.get_layer("predictions").weights[0]
print(w.shape)

CNN features for the image.

In [None]:
features_for_img = features[0, :,:,:]
features_for_img.shape

Scale are just 7*7, so scale them up to match dimensions of the image.

In [None]:
%%time
features_for_img_scaled = zoom(features_for_img, (224/7, 224/7, 1), order=2)
features_for_img_scaled.shape

Select weights used for predicted class.

In [None]:
target = np.argmax(results, axis=1).squeeze()
print(target)
weights = w[:, target]
weights.shape

Calculate class activation map as the dot product of the scaled convolution features and weights for one class.

Dot product results in a scalar value at each pixel.

The resulting scalar result will be larger when the image both has the particular feature, and that feature is also weighted more heavily for the particular class.

In [None]:
cam = np.dot(features_for_img_scaled, weights)

Show class activation map.

In [None]:
plt.figure(figsize=(12, 12))
plt.imshow(img)
plt.imshow(cam, cmap='jet', alpha=0.5)

## Grad-CAM

More generalized solution for creating activation maps using gradients.

https://arxiv.org/abs/1610.02391

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import zoom
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Model
from tensorflow.keras.applications.vgg16 import VGG16

In [None]:
vgg_model = VGG16(weights="imagenet")

VGG16 has multiple classifier layers and do not contain Global Average Pooling.

In [None]:
tf.keras.utils.plot_model(vgg_model, show_shapes=True)

In [None]:
for l in vgg_model.layers:
    print(l.name)

Take the last convolutional layer.

In [None]:
conv_output = 'block5_conv3'

Process the image for prediction.

In [None]:
img = cv2.imread('elephant.jpg')
img = cv2.resize(img, (224, 224))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(12, 12))
plt.imshow(img)

In [None]:
X = np.expand_dims(img, axis=0)
X = tf.keras.applications.vgg16.preprocess_input(X)

In [None]:
results = vgg_model.predict(X)
tf.keras.applications.vgg16.decode_predictions(results)

Again, we need to create a model that maps image to the activations of the last conv layer and predictions.

In [None]:
conv_output = vgg_model.get_layer(conv_output).output
pred_ouptut = vgg_model.get_layer('predictions').output
grad_model = Model(vgg_model.input, outputs=[conv_output, pred_ouptut])

Computing gradient of the predicted class with respect to the activations of the last convolutional layer.

In [None]:
with tf.GradientTape() as tape:
    last_conv_layer_output, preds = grad_model(X)
    pred_index = tf.argmax(preds[0])
    class_channel = preds[:, pred_index]
# gradient of the output neuron with regard to the output feature map of the last conv layer
grads = tape.gradient(class_channel, last_conv_layer_output)
print(f'gradient shape: {grads.shape}')

Get average intesity of the gradient of each of feature maps.

In [None]:
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
print(f'pooled gradients shape: {pooled_grads.shape}')

Weight the convolution outputs with the computed gradients.

In [None]:
last_conv_layer_output = np.squeeze(last_conv_layer_output.numpy())
for i in range(pooled_grads.shape[-1]):
    last_conv_layer_output[:, :, i] *= pooled_grads[i]

Average all feature channels to a one channel.

In [None]:
heatmap = np.mean(last_conv_layer_output, axis = -1)

Normalize between 0..1 for easier visualization.

In [None]:
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)

In [None]:
plt.figure(figsize=(12, 12))
plt.imshow(img)
cam = cv2.resize(heatmap.numpy(), (img.shape[1], img.shape[0]))
plt.imshow(cam, cmap='jet', alpha=0.3)    