# Visualizing Features: Seeing What CNNs See

**Module 3.3, Lesson 1** | CourseAI

In this notebook you will open a pretrained ResNet-18 and see what it actually learned. Three techniques, each answering a different question:

1. **Filter Visualization** — What does this layer look for?
2. **Activation Maps** — What did this layer find in this image?
3. **Grad-CAM** — What in this image mattered for this prediction?

Boilerplate for image loading and display is provided. Your job is to write the visualization code at the `# TODO` markers.

---

## Setup

Run this cell to import everything and set up display utilities.

In [None]:
import torch
import torch.nn.functional as F
import torchvision.models as models
from torchvision.models import ResNet18_Weights
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import requests
from io import BytesIO

# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

## Load Pretrained ResNet-18

The same model you used in Transfer Learning. Load it and put it in eval mode.

In [None]:
model = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
model.eval()
model = model.to(device)

# ImageNet class labels
weights = ResNet18_Weights.IMAGENET1K_V1
categories = weights.meta['categories']

print(f'Model loaded: {sum(p.numel() for p in model.parameters()):,} parameters')
print(f'Number of classes: {len(categories)}')

## Image Loading Utilities

These helpers load and preprocess images for ResNet-18. Nothing to modify here.

In [None]:
# Standard ImageNet preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

def load_image_from_url(url):
    """Download an image from a URL and return both PIL and tensor versions."""
    response = requests.get(url)
    img_pil = Image.open(BytesIO(response.content)).convert('RGB')
    img_tensor = preprocess(img_pil).unsqueeze(0).to(device)  # [1, 3, 224, 224]
    return img_pil, img_tensor

def show_image(img_pil, title=None):
    """Display a PIL image."""
    plt.figure(figsize=(6, 6))
    plt.imshow(img_pil)
    if title:
        plt.title(title, fontsize=14)
    plt.axis('off')
    plt.show()

def predict_top_k(img_tensor, k=5):
    """Get top-k predictions for an image tensor."""
    with torch.no_grad():
        output = model(img_tensor)
        probs = F.softmax(output, dim=1)
        top_probs, top_indices = probs.topk(k, dim=1)
    results = []
    for i in range(k):
        results.append((categories[top_indices[0, i].item()], top_probs[0, i].item()))
    return results

print('Utilities loaded.')

## Load a Sample Image

We'll use a photo of a golden retriever. Feel free to try your own images later.

In [None]:
# Sample image: golden retriever
img_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/bd/Golden_Retriever_Dukedestiny01_dv.jpg/800px-Golden_Retriever_Dukedestiny01_dv.jpg'
img_pil, img_tensor = load_image_from_url(img_url)

# Show the image and top predictions
show_image(img_pil, 'Sample Image')
print('Top-5 predictions:')
for label, prob in predict_top_k(img_tensor):
    print(f'  {label}: {prob:.1%}')

---

## Part 1: Filter Visualization

**Question this answers:** *What does this layer look for?*

ResNet-18's first layer (`model.conv1`) has 64 filters of size 7x7x3. Each filter operates on RGB input, so we can display each one as a tiny 7x7 color image.

### TODO: Visualize conv1 Filters

Access `model.conv1.weight.data`, normalize the filters to [0, 1], and display them in an 8x8 grid.

In [None]:
# TODO: Get the conv1 filter weights
# Hint: model.conv1.weight.data has shape [64, 3, 7, 7]
filters = model.conv1.weight.data.clone().cpu()

# TODO: Normalize each filter to [0, 1] for display
# Hint: subtract the min, then divide by the max
# Be careful — normalize each filter independently, not all at once
for i in range(filters.shape[0]):
    f = filters[i]
    filters[i] = (f - f.min()) / (f.max() - f.min() + 1e-8)

# Display as 8x8 grid
fig, axes = plt.subplots(8, 8, figsize=(12, 12))
for i, ax in enumerate(axes.flat):
    if i < filters.shape[0]:
        # TODO: Display filter i
        # Hint: filters[i] has shape [3, 7, 7] — you need to permute to [7, 7, 3] for matplotlib
        ax.imshow(filters[i].permute(1, 2, 0).numpy())
    ax.axis('off')

fig.suptitle('Conv1 Filters (64 filters, each 7x7x3)', fontsize=16)
plt.tight_layout()
plt.show()

### What Do You See?

Look at the grid above. You should see:
- **Oriented edges** at different angles (horizontal, vertical, diagonal)
- **Color gradients** — transitions between complementary colors
- **High-frequency patterns** — alternating light/dark stripes

Notice what you do **NOT** see: no cats, no dogs, no objects. A 7x7 patch cannot contain a recognizable object. These are the "questions" the first layer asks at every spatial position: *"Is there a vertical edge here? A color boundary? A diagonal gradient?"*

This confirms what you learned in Transfer Learning: conv1 features are **universal**. Edge detectors work for any image domain.

---

## Part 2: Activation Maps via Hooks

**Question this answers:** *What did this layer find in this image?*

To see what a layer produces for a specific image, we need to capture its output during a forward pass. PyTorch **hooks** let us do this without modifying the model.

### Learn: Forward Hooks

A forward hook is a callback that fires every time a layer completes its forward pass. It receives the layer's output, which we can store for inspection.

In [None]:
# Example: capture layer2's output
activation = {}

def hook_fn(name):
    """Return a hook function that stores the output under the given name."""
    def hook(module, input, output):
        activation[name] = output.detach().cpu()
    return hook

# Register the hook
hook_handle = model.layer2.register_forward_hook(hook_fn('layer2'))

# Run a forward pass — the hook fires automatically
with torch.no_grad():
    _ = model(img_tensor)

# Check what we captured
print(f'Captured activation shape: {activation["layer2"].shape}')
print('Expected: [1, 128, 28, 28] — 128 channels at 28x28 spatial resolution')

# Clean up
hook_handle.remove()

### TODO: Capture Activations at Three Depths

Register hooks on `model.conv1`, `model.layer2`, and `model.layer4`. Run a single forward pass. Then display a grid of activation maps from each layer.

**Expected shapes:**
- conv1: [1, 64, 112, 112]
- layer2: [1, 128, 28, 28]
- layer4: [1, 512, 7, 7]

In [None]:
# TODO: Register hooks on three layers
activations = {}

def make_hook(name):
    def hook(module, input, output):
        activations[name] = output.detach().cpu()
    return hook

hooks = [
    model.conv1.register_forward_hook(make_hook('conv1')),
    model.layer2.register_forward_hook(make_hook('layer2')),
    model.layer4.register_forward_hook(make_hook('layer4')),
]

# Forward pass — all hooks fire
with torch.no_grad():
    _ = model(img_tensor)

# Verify shapes
for name, act in activations.items():
    print(f'{name}: {act.shape}')

# Remove hooks
for h in hooks:
    h.remove()

### Display Activation Maps

Now visualize a selection of activation maps from each layer. We'll show 16 channels (4x4 grid) from each depth.

In [None]:
def show_activation_grid(act_tensor, layer_name, num_channels=16):
    """Display a grid of activation maps from a layer.
    
    act_tensor: shape [1, C, H, W]
    """
    act = act_tensor[0]  # remove batch dim -> [C, H, W]
    num_channels = min(num_channels, act.shape[0])
    cols = 4
    rows = (num_channels + cols - 1) // cols
    
    fig, axes = plt.subplots(rows, cols, figsize=(12, 3 * rows))
    for i, ax in enumerate(axes.flat):
        if i < num_channels:
            ax.imshow(act[i].numpy(), cmap='viridis')
            ax.set_title(f'Channel {i}', fontsize=9)
        ax.axis('off')
    
    spatial = f'{act.shape[1]}x{act.shape[2]}'
    fig.suptitle(f'{layer_name} — {act.shape[0]} channels at {spatial}', fontsize=14)
    plt.tight_layout()
    plt.show()

# Show activation maps at each depth
show_activation_grid(activations['conv1'], 'conv1 (edges, gradients)')
show_activation_grid(activations['layer2'], 'layer2 (textures, patterns)')
show_activation_grid(activations['layer4'], 'layer4 (abstract features)')

### Interpret What You See

Compare the three grids:

- **conv1** (112x112): Sharp, spatially detailed. You can see the edges and boundaries of the original image. Different channels respond to different edge orientations.

- **layer2** (28x28): Less spatially precise, more abstract. Channels respond to textures and local patterns rather than simple edges.

- **layer4** (7x7): Abstract blobs. Individual channels are **not** recognizable as objects. The representation is distributed across all 512 channels — no single channel encodes "dog" or "grass."

This is the feature hierarchy in action: **spatial resolution shrinks, channel count grows, representations go from concrete to abstract**. Exactly what you were told — now confirmed by observation.

---

## Part 3: Grad-CAM

**Question this answers:** *What in this image mattered for this prediction?*

Activation maps are class-agnostic — they show the same thing regardless of which class the model predicts. Grad-CAM adds **class specificity** by using gradients to weight the activation maps.

### The Algorithm

1. Forward pass → capture last conv layer activations
2. Backward pass → compute gradients of class score w.r.t. those activations  
3. Global average pool the gradients → one weight per channel
4. Weighted sum of activation maps → single spatial heatmap
5. ReLU → keep only positive contributions
6. Upsample and overlay on the input image

### TODO: Implement Grad-CAM

Fill in the TODO sections below. The scaffold handles hooks and display — you write the core computation.

In [None]:
def grad_cam(model, img_tensor, target_class=None):
    """Compute Grad-CAM for a given image and target class.
    
    Args:
        model: pretrained model (in eval mode)
        img_tensor: preprocessed image tensor [1, 3, 224, 224]
        target_class: class index (int). If None, uses the predicted class.
    
    Returns:
        cam: numpy array of shape [224, 224], values in [0, 1]
        predicted_class: the class index used
    """
    # Storage for activations and gradients
    stored = {}
    
    def forward_hook(module, input, output):
        stored['activations'] = output
    
    def backward_hook(module, grad_input, grad_output):
        stored['gradients'] = grad_output[0]
    
    # Register hooks on the last conv layer (layer4)
    fhook = model.layer4.register_forward_hook(forward_hook)
    bhook = model.layer4.register_full_backward_hook(backward_hook)
    
    # --- Step 1: Forward pass ---
    output = model(img_tensor)
    
    # Determine target class
    if target_class is None:
        target_class = output.argmax(dim=1).item()
    
    # --- Step 2: Backward pass ---
    # TODO: Zero gradients, then backward from the target class score
    # Hint: output[0, target_class].backward()
    model.zero_grad()
    class_score = output[0, target_class]
    class_score.backward()
    
    # --- Step 3: Global average pool the gradients ---
    # TODO: Compute channel weights by averaging gradients over spatial dimensions
    # stored['gradients'] has shape [1, 512, 7, 7]
    # You want one weight per channel -> shape [1, 512]
    # Hint: .mean(dim=[2, 3])
    gradients = stored['gradients']  # [1, 512, 7, 7]
    weights = gradients.mean(dim=[2, 3])  # [1, 512]
    
    # --- Step 4: Weighted sum of activation maps ---
    # TODO: Multiply each activation map by its weight and sum across channels
    # stored['activations'] has shape [1, 512, 7, 7]
    # Result should be shape [1, 1, 7, 7]
    # Hint: (weights.unsqueeze(-1).unsqueeze(-1) * activations).sum(dim=1, keepdim=True)
    activations = stored['activations']  # [1, 512, 7, 7]
    cam = (weights.unsqueeze(-1).unsqueeze(-1) * activations).sum(dim=1, keepdim=True)
    
    # --- Step 5: ReLU ---
    # TODO: Apply ReLU to keep only positive contributions
    cam = F.relu(cam)
    
    # --- Step 6: Upsample to input size ---
    cam = F.interpolate(cam, size=(224, 224), mode='bilinear', align_corners=False)
    cam = cam.squeeze().detach().cpu().numpy()
    
    # Normalize to [0, 1]
    if cam.max() > 0:
        cam = cam / cam.max()
    
    # Clean up hooks
    fhook.remove()
    bhook.remove()
    
    return cam, target_class

print('grad_cam function defined.')

### Display Utility: Heatmap Overlay

In [None]:
def show_grad_cam(img_pil, cam, class_name, ax=None):
    """Overlay Grad-CAM heatmap on the original image."""
    # Resize PIL image to 224x224 for overlay
    img_resized = img_pil.copy()
    img_resized = img_resized.resize((224, 224))
    img_np = np.array(img_resized).astype(np.float32) / 255.0
    
    # Create heatmap
    heatmap = plt.cm.jet(cam)[:, :, :3]  # RGB from colormap
    
    # Overlay: blend original image with heatmap
    overlay = 0.5 * img_np + 0.5 * heatmap
    overlay = np.clip(overlay, 0, 1)
    
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(6, 6))
    
    ax.imshow(overlay)
    ax.set_title(f'Grad-CAM: {class_name}', fontsize=12)
    ax.axis('off')
    
    if ax is None:
        plt.show()

print('Display utility loaded.')

### Run Grad-CAM on the Sample Image

Apply Grad-CAM for the top predicted class. The heatmap should highlight the dog, not the background.

In [None]:
# Grad-CAM for the top predicted class
cam, pred_class = grad_cam(model, img_tensor)
class_name = categories[pred_class]
print(f'Predicted class: {class_name} (index {pred_class})')

# Show the result
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
axes[0].imshow(img_pil.resize((224, 224)))
axes[0].set_title('Original', fontsize=12)
axes[0].axis('off')
show_grad_cam(img_pil, cam, class_name, ax=axes[1])
plt.tight_layout()
plt.show()

### TODO: Class-Specific Heatmaps

Grad-CAM is class-specific. For the same image, compute heatmaps for the **top-2 predicted classes** and display them side by side. The heatmaps should focus on different regions.

For example, if the top prediction is "golden retriever" and the second is "tennis ball," the heatmaps should highlight different parts of the image.

In [None]:
# Get top-2 predictions
top_preds = predict_top_k(img_tensor, k=2)
print(f'Top-2: {top_preds}')

# TODO: Compute Grad-CAM for each of the top-2 classes
# Hint: use the target_class parameter of grad_cam()
# You need to find the class index from the category name
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Original image
axes[0].imshow(img_pil.resize((224, 224)))
axes[0].set_title('Original', fontsize=12)
axes[0].axis('off')

# Grad-CAM for each top prediction
for i, (label, prob) in enumerate(top_preds):
    class_idx = categories.index(label)
    cam_i, _ = grad_cam(model, img_tensor, target_class=class_idx)
    show_grad_cam(img_pil, cam_i, f'{label} ({prob:.1%})', ax=axes[i + 1])

plt.tight_layout()
plt.show()

### Interpret the Results

Compare the two heatmaps. For different class labels, Grad-CAM highlights **different spatial regions** of the same image. This is because the gradients change — different class scores produce different gradient signals, which weight the activation maps differently.

This is the key difference from activation maps: activation maps are the same regardless of the predicted class. Grad-CAM is class-specific.

---

## Part 4: Try Multiple Images

Apply Grad-CAM to different images to build your intuition. Pay attention to where the model focuses.

In [None]:
# A few diverse images to try
image_urls = {
    'cat': 'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg',
    'car': 'https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/2019_Toyota_Corolla_Hybrid_1.8.jpg/1280px-2019_Toyota_Corolla_Hybrid_1.8.jpg',
    'bird': 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Eopsaltria_australis_-_Mogo_Campground.jpg/1280px-Eopsaltria_australis_-_Mogo_Campground.jpg',
}

fig, axes = plt.subplots(len(image_urls), 2, figsize=(12, 6 * len(image_urls)))

for row, (name, url) in enumerate(image_urls.items()):
    try:
        pil_img, tensor_img = load_image_from_url(url)
        cam_result, pred_idx = grad_cam(model, tensor_img)
        pred_name = categories[pred_idx]
        
        axes[row, 0].imshow(pil_img.resize((224, 224)))
        axes[row, 0].set_title(f'{name} (original)', fontsize=12)
        axes[row, 0].axis('off')
        
        show_grad_cam(pil_img, cam_result, pred_name, ax=axes[row, 1])
    except Exception as e:
        print(f'Could not load {name}: {e}')
        axes[row, 0].text(0.5, 0.5, f'Failed to load {name}', ha='center', va='center')
        axes[row, 0].axis('off')
        axes[row, 1].axis('off')

plt.tight_layout()
plt.show()

---

## Part 5: Context vs Object — Where Does the Model Focus?

This is the most practically important part of the notebook.

A classic failure mode called **shortcut learning**: a model achieves high accuracy by focusing on **spurious correlations** in the data rather than the actual object features. Grad-CAM reveals this.

### The Husky vs Wolf Problem (Conceptual)

In a famous example, a model trained to classify "wolf" vs "husky" achieved 90% accuracy — but Grad-CAM revealed it was looking at the *background* (snow = husky, forest = wolf), not the animal. The training data happened to correlate background with label. The accuracy was real; the reasoning was broken.

We cannot reproduce that exact failure with a pretrained ImageNet model (it was trained on diverse data, not biased wolf/husky photos). But we **can** use Grad-CAM to ask an important question about any image: **is the model looking at the object, or at the context?** The answer might surprise you.

In [None]:
# Two images where background context is prominent:
# A husky by a lake, and a dog in a very busy scene
test_urls = {
    'husky (with lake background)': 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Le%C3%AFka_au_bord_du_Lac_de_Laffrey.jpg/1280px-Le%C3%AFka_au_bord_du_Lac_de_Laffrey.jpg',
    'bird (with foliage background)': 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Eopsaltria_australis_-_Mogo_Campground.jpg/1280px-Eopsaltria_australis_-_Mogo_Campground.jpg',
}

for name, url in test_urls.items():
    try:
        pil_img, tensor_img = load_image_from_url(url)
        
        # Get predictions
        preds = predict_top_k(tensor_img, k=5)
        print(f'\n{name}')
        print('Top-5 predictions:')
        for label, prob in preds:
            print(f'  {label}: {prob:.1%}')
        
        # Grad-CAM for top prediction
        cam_result, pred_idx = grad_cam(model, tensor_img)
        
        fig, axes = plt.subplots(1, 2, figsize=(12, 6))
        axes[0].imshow(pil_img.resize((224, 224)))
        axes[0].set_title(f'{name}', fontsize=12)
        axes[0].axis('off')
        show_grad_cam(pil_img, cam_result, categories[pred_idx], ax=axes[1])
        plt.suptitle('Does the model focus on the animal or the background?', fontsize=14)
        plt.tight_layout()
        plt.show()
    except Exception as e:
        print(f'Could not load {name}: {e}')

### Key Takeaway

Look carefully at where the heatmaps landed. Did the model always focus on the animal? Or did it sometimes attend to the background context (water, foliage, ground)?

A pretrained ImageNet model is trained on diverse data, so it generally does a good job focusing on the object. But even here you may see some attention to context. For models trained on **smaller, biased datasets**, shortcut learning is far more likely — and far more dangerous.

**Correct prediction does not mean correct reasoning.** Visualization is a debugging tool — use it to verify that your model learned what you intended, not a shortcut. In the next lesson, you will fine-tune your own model and use Grad-CAM to check what it actually learned.

---

## Stretch: Grad-CAM on a Wrong Prediction

If you have extra time, try finding an image where the model is **wrong**. Run Grad-CAM and see if the heatmap reveals why the model made the mistake. This is the most realistic use case — debugging production failures.

In [None]:
# TODO (stretch): Find an image the model gets wrong and run Grad-CAM on it.
# Try unusual angles, occluded objects, or ambiguous scenes.
# Does the heatmap explain the mistake?

# your_url = '...'
# your_pil, your_tensor = load_image_from_url(your_url)
# cam_result, pred_idx = grad_cam(model, your_tensor)
# show_grad_cam(your_pil, cam_result, categories[pred_idx])
# plt.show()

---

## Summary

You now have three tools for understanding CNN behavior:

| Technique | Question | Scope |
|-----------|----------|-------|
| **Filter Visualization** | What does this layer look for? | Model-level (same for all images) |
| **Activation Maps** | What did this layer find in this image? | Input-specific, class-agnostic |
| **Grad-CAM** | What in this image mattered for this prediction? | Input-specific AND class-specific |

The feature hierarchy is real — you saw it. Visualization is a debugging tool — not just a pretty picture.