# Appendix B: Computer Vision & Model Interpretability

## 1️⃣ Overview

In **Chapter 7**, we trained a powerful image classifier using Transfer Learning with **InceptionResNetV2**. While high accuracy is great, in real-world applications (like medical imaging or autonomous driving), we need to know *why* the model made a specific decision.

This appendix provides the technical implementation for **Grad-CAM**, a technique that highlights the regions of an image that influenced the model's prediction the most.

**Key Concepts:**
* **Nested Models:** Keras `Applications` often wrap models in a single layer. We'll learn how to "unwrap" them to access internal convolutional layers.
* **Gradient Tape:** Using TensorFlow's automatic differentiation to compute gradients of the top class score with respect to the feature maps.
* **Heatmap Visualization:** overlaying the activation map on the original image.

---

## 2️⃣ Theoretical Refresher: Grad-CAM

**Grad-CAM** works by answering this question: *"How much would the prediction score change if I changed the values in this specific feature map?"*

1.  **Forward Pass:** Run the image through the model to get the prediction.
2.  **Backward Pass:** Calculate the gradient of the predicted class score $y^c$ with respect to the feature maps $A^k$ of the last convolutional layer.
3.  **Global Average Pooling:** Average these gradients to get a weight $\alpha_k$ for each feature map. This tells us "how important" that specific map is.
4.  **Weighted Combination:** Combine the feature maps using these weights to get a single 2D heatmap.
    $$ L_{Grad-CAM}^c = ReLU(\sum_k \alpha_k A^k) $$

We use **ReLU** because we are only interested in features that have a *positive* influence on the class of interest.

## 3️⃣ Setup and Model Loading

We will recreate the **InceptionResNetV2** transfer learning model used in Chapter 7.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, Model
from tensorflow.keras.applications import InceptionResNetV2
import numpy as np
import matplotlib.pyplot as plt
import cv2

# 1. Recreate the Model from Chapter 7
def get_model():
    # Define the base model (Pretrained)
    base_model = InceptionResNetV2(include_top=False, pooling='avg', weights='imagenet', input_shape=(299, 299, 3))
    base_model.trainable = False
    
    # Define the wrapper model
    model = models.Sequential([
        base_model,
        layers.Dropout(0.4),
        layers.Dense(200, activation='softmax') # Assuming 200 classes like TinyImageNet
    ])
    return model

model = get_model()
model.summary()

### 3.1 The Problem: Nested Models

Look at the summary above. The entire InceptionResNetV2 architecture is hidden inside a single layer named `inception_resnet_v2`. 

To perform Grad-CAM, we need access to the **last convolutional layer** (usually named `conv_7b` in InceptionResNetV2). We cannot access it directly because it is buried inside the nested model.

**Solution:** We must "unwrap" the model by creating a new Functional API model that explicitly connects the internal layers.

In [None]:
def unwrap_model(nested_model):
    # 1. Access the inner base model
    inception = nested_model.get_layer('inception_resnet_v2')
    
    # 2. Get the input of the inner model
    inp = inception.input
    
    # 3. Re-connect the outputs
    # We pass the inception output to the subsequent layers of the outer model
    x = inception.output
    x = nested_model.get_layer('dropout')(x)
    out = nested_model.get_layer('dense')(x)
    
    # 4. Create a new flat model
    return Model(inputs=inp, outputs=out)

unwrapped_model = unwrap_model(model)

# Now we can see all the internal layers!
# Note: The summary will be huge, so we check just the last few layers
# unwrapped_model.summary()

## 4️⃣ Implementing Grad-CAM

We need a special sub-model that outputs two things:
1.  The activations of the last convolutional layer (`conv_7b`).
2.  The final predictions of the model.

We will compute the gradients of (2) with respect to (1).

In [None]:
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    # 1. Create a model that maps the input image to the activations
    #    of the last conv layer as well as the output predictions
    grad_model = tf.keras.models.Model(
        [model.inputs], 
        [model.get_layer(last_conv_layer_name).output, model.output]
    )

    # 2. Record operations for automatic differentiation
    with tf.GradientTape() as tape:
        last_conv_layer_output, preds = grad_model(img_array)
        
        # If no specific class index is provided, use the predicted class
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]

    # 3. Compute Gradients of the predicted class w.r.t the feature map
    grads = tape.gradient(class_channel, last_conv_layer_output)

    # 4. Pool the gradients (Global Average Pooling)
    # This gives us a vector of weights (one for each filter in the conv layer)
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    # 5. Multiply each channel in the feature map by its weight
    last_conv_layer_output = last_conv_layer_output[0]
    
    # Matrix multiplication: (H, W, Channels) @ (Channels, 1) -> (H, W, 1)
    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)

    # 6. Apply ReLU (we only care about features that have a positive influence)
    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
    
    return heatmap.numpy()

## 5️⃣ Visualization

We will generate a dummy image (since we don't have the dataset loaded) and visualize the heatmap. In a real scenario, you would pass a real photo of a cat or dog.

The `conv_7b` layer in InceptionResNetV2 usually produces an $8 \times 8$ output grid. The heatmap will be this size, so we must resize it to overlay it on the original $299 \times 299$ image.

In [None]:
# Generate a dummy image
# InceptionResNetV2 expects values in [-1, 1]
img_size = (299, 299)
dummy_img = np.random.rand(1, *img_size, 3).astype('float32') * 2 - 1 

# Specify the layer name for InceptionResNetV2
last_conv_layer_name = "conv_7b"

# Generate Heatmap
heatmap = make_gradcam_heatmap(dummy_img, unwrapped_model, last_conv_layer_name)

# Display
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow((dummy_img[0] + 1) / 2) # Rescale back to [0, 1] for display
plt.title("Input Image")
plt.axis('off')

plt.subplot(1, 2, 2)
plt.matshow(heatmap, fignum=0)
plt.title("Grad-CAM Heatmap")
plt.colorbar()
plt.show()

### 5.1 Superimposing the Heatmap

To make the result interpretable, we merge the heatmap with the original image.

In [None]:
def superimpose_heatmap(img, heatmap, alpha=0.4):
    # Rescale heatmap to a range 0-255
    heatmap = np.uint8(255 * heatmap)

    # Use jet colormap
    jet = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

    # Resize heatmap to match image size
    jet = cv2.resize(jet, (img.shape[1], img.shape[0]))

    # Superimpose
    # Convert img back to [0, 255] if it was preprocessed
    img_uint8 = np.uint8((img + 1) / 2 * 255)
    
    superimposed_img = jet * alpha + img_uint8
    superimposed_img = np.clip(superimposed_img, 0, 255).astype('uint8')

    return superimposed_img

final_img = superimpose_heatmap(dummy_img[0], heatmap)

plt.figure(figsize=(6, 6))
plt.imshow(final_img)
plt.title("Superimposed Grad-CAM")
plt.axis('off')
plt.show()

## 6️⃣ Summary

* **Model Unwrapping:** When using Keras `Sequential` models containing other models (like `InceptionResNetV2`), we must explicitly reconstruct the graph to access internal layers.
* **Feature Importance:** Grad-CAM uses the gradients flowing into the final convolutional layer to determine which filters are most active for a specific class.
* **Localization:** By upsampling the coarse $8 \times 8$ heatmap to the image size ($299 \times 299$), we can roughly localize the object that triggered the classification, effectively turning a classifier into a weak object detector.