# Model Interpretability in Deep Learning

Deep learning models are powerful but often work as **black boxes**, making it hard to understand how they make predictions. Model interpretability aims to make these models more **transparent and explainable**.

In this notebook, we’ll cover:
- Saliency Maps
- Grad-CAM
- Integrated Gradients
- Why interpretability is important


## 🎯 Objective
- Understand how deep models make predictions.
- Visualize which regions in an image most influence model decisions.
- Build trust and accountability in AI models.

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications import VGG16
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.vgg16 import preprocess_input

## 🧩 Load Pre-trained Model

In [2]:
model = VGG16(weights='imagenet')
model.summary()

## 🖼️ Load and Preprocess Image

In [3]:
img_path = tf.keras.utils.get_file('elephant.jpg', 'https://i.imgur.com/Bvro0YD.png')
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

plt.imshow(image.load_img(img_path))
plt.axis('off')
plt.title('Original Image')
plt.show()

## 🔍 Saliency Maps
Saliency maps visualize which pixels influence the output prediction most by computing gradients of the output with respect to the input image.

In [4]:
img_tensor = tf.convert_to_tensor(x)
with tf.GradientTape() as tape:
    tape.watch(img_tensor)
    preds = model(img_tensor)
    top_index = tf.argmax(preds[0])
    top_class = preds[:, top_index]

grads = tape.gradient(top_class, img_tensor)[0]
saliency = np.max(np.abs(grads), axis=-1)

plt.imshow(saliency, cmap='hot')
plt.axis('off')
plt.title('Saliency Map')
plt.show()

## 🔥 Grad-CAM Visualization
Grad-CAM (Gradient-weighted Class Activation Mapping) highlights **which regions** in an image are most influential for a prediction.

In [5]:
grad_model = tf.keras.models.Model([
    model.inputs], [model.get_layer('block5_conv3').output, model.output])

with tf.GradientTape() as tape:
    conv_outputs, predictions = grad_model(img_tensor)
    loss = predictions[:, top_index]

grads = tape.gradient(loss, conv_outputs)
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
conv_outputs = conv_outputs[0]
heatmap = tf.reduce_sum(tf.multiply(pooled_grads, conv_outputs), axis=-1)
heatmap = np.maximum(heatmap, 0) / np.max(heatmap)

plt.matshow(heatmap)
plt.title('Grad-CAM Heatmap')
plt.axis('off')
plt.show()

## 🧠 Overlay Grad-CAM on Original Image

In [6]:
import cv2
img_orig = cv2.imread(img_path)
heatmap_resized = cv2.resize(heatmap.numpy(), (img_orig.shape[1], img_orig.shape[0]))
heatmap_colored = cv2.applyColorMap(np.uint8(255 * heatmap_resized), cv2.COLORMAP_JET)
superimposed_img = cv2.addWeighted(img_orig, 0.6, heatmap_colored, 0.4, 0)

plt.imshow(cv2.cvtColor(superimposed_img, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.title('Grad-CAM Overlay')
plt.show()

## 📈 Integrated Gradients (Optional)
Integrated Gradients measure feature importance by integrating gradients from a baseline to the actual input.

In [7]:
try:
    from tf_explain.core.integrated_gradients import IntegratedGradients
    ig = IntegratedGradients()
    grid = ig.explain((x, None), model, class_index=int(top_index))
    plt.imshow(grid)
    plt.axis('off')
    plt.title('Integrated Gradients')
    plt.show()
except Exception as e:
    print('Integrated Gradients not available:', e)

## ✅ Summary
- **Saliency Maps** highlight pixel importance.
- **Grad-CAM** shows the most influential image regions.
- **Integrated Gradients** provide smooth attribution.

These techniques make deep models more **interpretable**, helping ensure **transparency and trust** in AI systems.