# Coding Assignment 4: Advanced Computer Vision with CNNs

**Name:** [Your Name Here]  
**Student ID:** [Your Student ID]  
**Date:** [Today's Date]  

## Overview

Welcome to advanced computer vision! In this assignment, you'll implement state-of-the-art CNN techniques used in modern AI systems. You'll work with ResNet architectures, transfer learning, model interpretability, and real-world optimization challenges.

**Learning Goals:**
- Implement ResNet-style architectures with skip connections
- Apply transfer learning with pre-trained models
- Work with complex color image datasets (CIFAR-10)
- Analyze model interpretability using Grad-CAM
- Optimize models for deployment and efficiency
- Reflect on ethics and bias in computer vision systems

**Estimated Time:** 3-4 hours

## Setup and Imports

In [None]:
# Core libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, SubsetRandomSampler
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models

# Data manipulation and visualization
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import pandas as pd

# Machine learning utilities
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

# Progress tracking and utilities
from tqdm import tqdm
import time
import os
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')

# Reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"✅ Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
print(f"PyTorch version: {torch.__version__}")
print("Setup complete!")

---

# Part 1: Advanced CNN Architecture (45 minutes)

**Goal:** Implement ResNet-style architectures with skip connections and modern techniques

## 1.1 Understanding ResNet Architecture

ResNet (Residual Networks) revolutionized deep learning by solving the vanishing gradient problem with skip connections. Instead of learning `H(x)`, residual blocks learn `F(x) = H(x) - x`, making optimization easier.

**Key Components:**
- **Skip Connections:** `y = F(x) + x`
- **Batch Normalization:** Normalizes layer inputs
- **ReLU Activation:** Non-linear activation function
- **Downsampling:** Reduces spatial dimensions while increasing channels

In [None]:
# Visualize the difference between traditional and residual learning
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Traditional CNN block
ax1.text(0.5, 0.9, 'Input x', ha='center', va='center', fontsize=12, 
         bbox=dict(boxstyle='round', facecolor='lightblue'))
ax1.text(0.5, 0.7, 'Conv + BN + ReLU', ha='center', va='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='lightgreen'))
ax1.text(0.5, 0.5, 'Conv + BN + ReLU', ha='center', va='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='lightgreen'))
ax1.text(0.5, 0.3, 'Learn H(x)', ha='center', va='center', fontsize=12,
         bbox=dict(boxstyle='round', facecolor='orange'))
ax1.text(0.5, 0.1, 'Output H(x)', ha='center', va='center', fontsize=12,
         bbox=dict(boxstyle='round', facecolor='lightcoral'))

# Add arrows
for y in [0.8, 0.6, 0.4, 0.2]:
    ax1.arrow(0.5, y, 0, -0.15, head_width=0.03, head_length=0.02, 
              fc='black', ec='black')

ax1.set_xlim(0, 1)
ax1.set_ylim(0, 1)
ax1.axis('off')
ax1.set_title('Traditional CNN Block', fontsize=14, fontweight='bold')

# Residual block
ax2.text(0.3, 0.9, 'Input x', ha='center', va='center', fontsize=12,
         bbox=dict(boxstyle='round', facecolor='lightblue'))
ax2.text(0.3, 0.7, 'Conv + BN + ReLU', ha='center', va='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='lightgreen'))
ax2.text(0.3, 0.5, 'Conv + BN', ha='center', va='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='lightgreen'))
ax2.text(0.7, 0.3, '+', ha='center', va='center', fontsize=20, fontweight='bold')
ax2.text(0.3, 0.3, 'Learn F(x)', ha='center', va='center', fontsize=10,
         bbox=dict(boxstyle='round', facecolor='orange'))
ax2.text(0.7, 0.1, 'Output F(x) + x', ha='center', va='center', fontsize=12,
         bbox=dict(boxstyle='round', facecolor='lightcoral'))

# Main path arrows
for y in [0.8, 0.6, 0.4]:
    ax2.arrow(0.3, y, 0, -0.15, head_width=0.02, head_length=0.02,
              fc='blue', ec='blue')

# Skip connection
ax2.arrow(0.4, 0.9, 0.25, 0, head_width=0.02, head_length=0.02,
          fc='red', ec='red', linestyle='--', linewidth=2)
ax2.arrow(0.65, 0.9, 0, -0.55, head_width=0.02, head_length=0.02,
          fc='red', ec='red', linestyle='--', linewidth=2)
ax2.arrow(0.65, 0.35, 0.03, -0.02, head_width=0.02, head_length=0.02,
          fc='red', ec='red', linestyle='--', linewidth=2)

# Final arrow
ax2.arrow(0.7, 0.25, 0, -0.1, head_width=0.02, head_length=0.02,
          fc='black', ec='black')

ax2.text(0.55, 0.95, 'Skip Connection', ha='center', va='center', fontsize=10,
         color='red', fontweight='bold')

ax2.set_xlim(0, 1)
ax2.set_ylim(0, 1)
ax2.axis('off')
ax2.set_title('Residual Block', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("🔑 Key Advantages of Skip Connections:")
print("   • Solve vanishing gradient problem")
print("   • Enable training of very deep networks (100+ layers)")
print("   • Improve gradient flow during backpropagation")
print("   • Allow identity mappings when needed")

## 1.2 Implementing Residual Blocks

Let's implement the core building blocks of ResNet:

In [None]:
class BasicBlock(nn.Module):
    """Basic residual block for ResNet-18 and ResNet-34"""
    
    expansion = 1  # Output channels multiplier
    
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        
        # TODO: Implement the first convolutional layer
        # HINT: Use 3x3 conv, specified stride, padding=1, bias=False
        # HINT: self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=?, stride=?, padding=?, bias=?)
        self.conv1 = None  # Your code here
        
        # TODO: Implement batch normalization for first conv
        # HINT: BatchNorm2d takes the number of output channels
        self.bn1 = None  # Your code here
        
        # TODO: Implement the second convolutional layer
        # HINT: Use 3x3 conv, stride=1, padding=1, bias=False
        # HINT: Both input and output channels should be out_channels
        self.conv2 = None  # Your code here
        
        # TODO: Implement batch normalization for second conv
        self.bn2 = None  # Your code here
        
        # Downsample for skip connection when dimensions don't match
        self.downsample = downsample
        self.stride = stride
        
    def forward(self, x):
        # Store input for skip connection
        identity = x
        
        # TODO: Implement the main path
        # HINT: Follow this sequence: conv1 -> batch_norm1 -> relu -> conv2 -> batch_norm2
        # HINT: Use F.relu() for activation
        # HINT: Structure: out = self.conv1(x), then out = self.bn1(out), etc.
        
        # Step 1: First convolution and batch norm
        out = None  # Apply conv1 to x
        out = None  # Apply bn1 to out  
        out = None  # Apply ReLU to out
        
        # Step 2: Second convolution and batch norm (no ReLU yet!)
        out = None  # Apply conv2 to out
        out = None  # Apply bn2 to out
        
        # TODO: Apply downsample to identity if needed
        # HINT: Check if self.downsample is not None, then apply it to identity
        if self.downsample is not None:
            identity = None  # Apply downsample to identity
        
        # TODO: Add skip connection and apply final ReLU
        # HINT: Add identity to out, then apply ReLU to the result
        out = None  # Add skip connection: out + identity
        out = None  # Apply final ReLU
        
        return out

# Test the basic block (this will fail until you implement the TODOs above)
print("🧱 Testing Basic ResNet Block Implementation:")
print("⚠️ This will fail until you implement the TODOs above!")

try:
    test_block = BasicBlock(64, 64)
    test_input = torch.randn(1, 64, 32, 32)
    test_output = test_block(test_input)
    print(f"✅ SUCCESS! Input shape: {test_input.shape}")
    print(f"✅ SUCCESS! Output shape: {test_output.shape}")
    print(f"✅ SUCCESS! Parameters: {sum(p.numel() for p in test_block.parameters()):,}")
    
    # Validation check
    if test_output.shape == test_input.shape:
        print("🎯 EXCELLENT! Skip connection preserved input dimensions")
    else:
        print("❌ ERROR: Output shape doesn't match input shape")
        
except Exception as e:
    print(f"❌ Implementation incomplete: {e}")
    print("💡 Make sure to replace all 'None' with proper implementations")
    print("💡 Check that your conv layers and batch norms are correctly defined")

## 1.3 Complete Advanced CNN Architecture

Now let's build a complete ResNet-style architecture for CIFAR-10:

In [None]:
class AdvancedCNN(nn.Module):
    """ResNet-style CNN for CIFAR-10 classification"""
    
    def __init__(self, block=BasicBlock, layers=[2, 2, 2, 2], num_classes=10):
        super(AdvancedCNN, self).__init__()
        
        self.in_channels = 64
        
        # TODO: Initial convolution layer
        # HINT: For CIFAR-10, use 3x3 conv instead of 7x7 (like ImageNet ResNet)
        # HINT: Use stride=1, padding=1, bias=False to preserve spatial dimensions
        # HINT: Input channels=3 (RGB), output channels=64
        self.conv1 = None  # Your code here
        
        # TODO: Batch normalization for initial conv
        self.bn1 = None  # Your code here
        
        # TODO: Create residual layers using _make_layer helper
        # HINT: Each layer contains multiple residual blocks
        # HINT: layer1: 64 channels, layers[0] blocks, stride=1
        # HINT: layer2: 128 channels, layers[1] blocks, stride=2 (downsamples)
        # HINT: layer3: 256 channels, layers[2] blocks, stride=2 (downsamples)  
        # HINT: layer4: 512 channels, layers[3] blocks, stride=2 (downsamples)
        self.layer1 = None  # Your code here - use self._make_layer(block, 64, layers[0], stride=1)
        self.layer2 = None  # Your code here - use self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = None  # Your code here
        self.layer4 = None  # Your code here
        
        # TODO: Global average pooling - reduces feature maps to 1x1
        # HINT: Use nn.AdaptiveAvgPool2d((1, 1)) to get 1x1 output regardless of input size
        self.avgpool = None  # Your code here
        
        # TODO: Final classifier layer
        # HINT: Input features = 512 * block.expansion, output = num_classes
        self.fc = None  # Your code here
        
        # TODO: Dropout for regularization
        # HINT: Use nn.Dropout with p=0.1
        self.dropout = None  # Your code here
        
        # Initialize weights
        self._initialize_weights()
        
    def _make_layer(self, block, out_channels, blocks, stride=1):
        """Create a layer with multiple residual blocks"""
        downsample = None
        
        # TODO: Create downsample layer if needed
        # HINT: Downsample needed when stride != 1 OR when channels don't match
        # HINT: Check: stride != 1 or self.in_channels != out_channels * block.expansion
        # HINT: Downsample uses 1x1 conv + batch norm to match dimensions
        if None:  # Your condition here
            downsample = nn.Sequential(
                None,  # 1x1 conv: self.in_channels -> out_channels * block.expansion, stride=stride
                None   # Batch norm for out_channels * block.expansion
            )
        
        layers = []
        
        # TODO: First block (may need downsampling)
        # HINT: Use block(self.in_channels, out_channels, stride, downsample)
        layers.append(None)  # Your code here
        
        # Update in_channels for subsequent blocks
        self.in_channels = out_channels * block.expansion
        
        # TODO: Add remaining blocks
        # HINT: Loop from 1 to blocks (since first block already added)
        # HINT: Remaining blocks: block(self.in_channels, out_channels) - no stride or downsample
        for _ in range(1, blocks):
            layers.append(None)  # Your code here
        
        return nn.Sequential(*layers)
    
    def _initialize_weights(self):
        """Initialize network weights using He initialization"""
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
    
    def forward(self, x):
        # TODO: Implement forward pass
        # HINT: Follow this sequence: initial conv -> residual layers -> global pooling -> classifier
        
        # TODO: Initial convolution block
        # HINT: conv1 -> bn1 -> relu
        x = None  # Apply conv1
        x = None  # Apply bn1  
        x = None  # Apply ReLU
        
        # TODO: Pass through residual layers
        # HINT: Apply layer1, layer2, layer3, layer4 in sequence
        x = None  # Apply layer1
        x = None  # Apply layer2
        x = None  # Apply layer3
        x = None  # Apply layer4
        
        # TODO: Global average pooling
        # HINT: Reduces spatial dimensions to 1x1
        x = None  # Apply avgpool
        
        # TODO: Flatten for classifier
        # HINT: Use torch.flatten(x, 1) to flatten all dims except batch
        x = None  # Flatten
        
        # TODO: Apply dropout and final classifier
        x = None  # Apply dropout
        x = None  # Apply fc layer
        
        return x
    
    def count_parameters(self):
        """Count total trainable parameters"""
        return sum(p.numel() for p in self.parameters() if p.requires_grad)

# Test the advanced CNN architecture (will fail until BasicBlock and AdvancedCNN are implemented)
print("🏗️ Testing Advanced CNN Architecture")
print("⚠️ This requires both BasicBlock and AdvancedCNN to be implemented!")

try:
    advanced_cnn = AdvancedCNN().to(device)
    
    # Test forward pass
    test_input = torch.randn(1, 3, 32, 32).to(device)
    test_output = advanced_cnn(test_input)
    
    print(f"✅ SUCCESS! Architecture created successfully!")
    print(f"✅ Input shape: {test_input.shape}")
    print(f"✅ Output shape: {test_output.shape}")
    print(f"✅ Total parameters: {advanced_cnn.count_parameters():,}")
    
    # Validation checks
    if test_output.shape == (1, 10):
        print("🎯 EXCELLENT! Output has correct shape for CIFAR-10 (batch_size, 10)")
    else:
        print(f"❌ ERROR: Expected output shape (1, 10), got {test_output.shape}")
        
    param_count = advanced_cnn.count_parameters()
    if 1000000 < param_count < 12000000:  # Reasonable range for ResNet
        print(f"🎯 EXCELLENT! Parameter count {param_count:,} is in reasonable range")
    else:
        print(f"⚠️ WARNING: Parameter count {param_count:,} seems unusual")
    
    print(f"\n📋 Architecture Summary:")
    print(f"   • Initial conv: 3 → 64 channels")
    print(f"   • Layer 1: 64 → 64 channels (2 blocks)")
    print(f"   • Layer 2: 64 → 128 channels (2 blocks, downsample)")
    print(f"   • Layer 3: 128 → 256 channels (2 blocks, downsample)")
    print(f"   • Layer 4: 256 → 512 channels (2 blocks, downsample)")
    print(f"   • Global average pooling + classifier")
    
except Exception as e:
    print(f"❌ Implementation incomplete: {e}")
    print("💡 Common issues:")
    print("   • Make sure BasicBlock is implemented first")
    print("   • Check that all 'None' are replaced with proper code")
    print("   • Verify layer dimensions and channel counts")
    print("   • Ensure _make_layer method is correctly implemented")

## 1.4 Architecture Comparison

Let's compare different architectural choices:

In [None]:
# TODO: Architecture comparison experiment
print("🔍 Architecture Comparison Challenge")
print("⚠️  Complete this section after implementing BasicBlock and AdvancedCNN")

# TODO: Create different architecture variants for comparison
# HINT: Use different layer configurations: [1,1,1,1], [2,2,2,2], [3,4,6,3]
# HINT: Compare shallow vs standard vs deep ResNet architectures

print("🎯 Your Task:")
print("1. Create 3 different AdvancedCNN models with varying depths")
print("2. Count parameters for each model")
print("3. Measure inference time for each model")
print("4. Analyze the trade-offs between model complexity and speed")
print("5. Create visualizations comparing the architectures")

print("\n💡 Implementation Guide:")
print("   • Shallow ResNet: AdvancedCNN(layers=[1, 1, 1, 1])")
print("   • Standard ResNet: AdvancedCNN(layers=[2, 2, 2, 2])")  
print("   • Deep ResNet: AdvancedCNN(layers=[3, 4, 6, 3])")

print("\n📊 What to analyze:")
print("   • Parameter count scaling with depth")
print("   • Inference time vs model size")
print("   • Memory usage considerations")
print("   • Accuracy vs efficiency trade-offs")

print("\n🎨 Suggested visualizations:")
print("   • Bar charts comparing parameter counts")
print("   • Scatter plot of inference time vs parameters")
print("   • Architecture diagram showing layer depths")

# TODO: Implement this comparison after completing the basic architecture
try:
    # Test if AdvancedCNN is properly implemented
    test_model = AdvancedCNN(layers=[1, 1, 1, 1])
    print("✅ Ready for architecture comparison!")
    
    # Your comparison code will go here
    # architectures = {
    #     'Shallow ResNet': AdvancedCNN(layers=[1, 1, 1, 1]),
    #     'Standard ResNet': AdvancedCNN(layers=[2, 2, 2, 2]),
    #     'Deep ResNet': AdvancedCNN(layers=[3, 4, 6, 3]),
    # }
    
except Exception as e:
    print(f"⚠️ AdvancedCNN not ready: {e}")
    print("💡 Complete the architecture implementation first")

---

# Part 2: CIFAR-10 Classification Challenge (60 minutes)

**Goal:** Work with complex color image dataset and implement comprehensive data augmentation

## 2.1 CIFAR-10 Dataset Exploration

CIFAR-10 is significantly more challenging than MNIST:
- **32×32 color images** (vs 28×28 grayscale)
- **10 diverse classes** (airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, trucks)
- **Real-world objects** with varying poses, lighting, backgrounds

In [None]:
# Load CIFAR-10 dataset
print("📦 Loading CIFAR-10 Dataset")

# Basic transforms for exploration
basic_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

# TODO: Load CIFAR-10 train and test sets
train_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=basic_transform
)

test_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=basic_transform
)

# CIFAR-10 class names
cifar10_classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                   'dog', 'frog', 'horse', 'ship', 'truck']

print(f"Training samples: {len(train_dataset):,}")
print(f"Test samples: {len(test_dataset):,}")
print(f"Image shape: {train_dataset[0][0].shape}")
print(f"Classes: {cifar10_classes}")

# Analyze class distribution
train_labels = [train_dataset[i][1] for i in range(len(train_dataset))]
class_counts = np.bincount(train_labels)

print(f"\n📊 Class Distribution:")
for i, (class_name, count) in enumerate(zip(cifar10_classes, class_counts)):
    print(f"   {i}: {class_name:12} - {count:,} samples")

In [None]:
# Visualize CIFAR-10 samples
def show_cifar_samples():
    """Display sample images from each CIFAR-10 class"""
    
    # Load dataset without normalization for visualization
    viz_dataset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=False,
        transform=transforms.ToTensor()
    )
    
    fig, axes = plt.subplots(2, 10, figsize=(20, 8))
    fig.suptitle('CIFAR-10 Dataset Samples', fontsize=16, fontweight='bold')
    
    # Find samples for each class
    class_samples = {i: [] for i in range(10)}
    
    for idx, (image, label) in enumerate(viz_dataset):
        if len(class_samples[label]) < 2:
            class_samples[label].append((image, idx))
        
        # Stop when we have enough samples
        if all(len(samples) >= 2 for samples in class_samples.values()):
            break
    
    # TODO: Display samples
    for class_idx in range(10):
        for sample_idx in range(2):
            image, orig_idx = class_samples[class_idx][sample_idx]
            
            # Convert tensor to numpy for matplotlib
            img_np = image.permute(1, 2, 0).numpy()
            
            axes[sample_idx, class_idx].imshow(img_np)
            axes[sample_idx, class_idx].set_title(
                f'{cifar10_classes[class_idx]}\n#{orig_idx}', fontsize=10
            )
            axes[sample_idx, class_idx].axis('off')
    
    plt.tight_layout()
    plt.show()

show_cifar_samples()

print("🔍 Key Observations:")
print("   • Much more complex than MNIST digits")
print("   • Significant intra-class variation (different poses, colors)")
print("   • Challenging inter-class similarities (cat vs dog, deer vs horse)")
print("   • Real-world objects with backgrounds and occlusion")
print("   • This is why we need advanced CNN architectures!")

## 2.2 Advanced Data Augmentation Pipeline

Data augmentation is crucial for CIFAR-10 to achieve good generalization:

In [None]:
# Define comprehensive data augmentation
print("🎨 Implementing Advanced Data Augmentation")

# TODO: Create training transforms with augmentation
# HINT: Data augmentation helps the model generalize by showing it variations of the training data
# HINT: Common augmentations: flip, rotate, translate, color changes, erasing
# HINT: Always end with ToTensor() and Normalize()

# HINT: CIFAR-10 normalization values:
# mean = (0.4914, 0.4822, 0.4465) - computed from training set
# std = (0.2023, 0.1994, 0.2010) - computed from training set

train_transform = transforms.Compose([
    # TODO: Add horizontal flip augmentation
    # HINT: Use transforms.RandomHorizontalFlip(p=0.5) for 50% chance
    None,  # Your code here
    
    # TODO: Add rotation augmentation  
    # HINT: Use transforms.RandomRotation(degrees=10) for ±10 degree rotation
    None,  # Your code here
    
    # TODO: Add translation augmentation
    # HINT: Use transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)) for ±10% translation
    None,  # Your code here
    
    # TODO: Add color jitter augmentation
    # HINT: Use transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
    None,  # Your code here
    
    # TODO: Add random erasing augmentation (optional but effective)
    # HINT: Use transforms.RandomErasing(p=0.1, scale=(0.02, 0.25)) to randomly erase patches
    None,  # Your code here
    
    # TODO: Convert to tensor
    # HINT: Use transforms.ToTensor()
    None,  # Your code here
    
    # TODO: Normalize with CIFAR-10 statistics
    # HINT: Use transforms.Normalize(mean, std) with values above
    None   # Your code here
])

# TODO: Create test transforms (no augmentation for consistent evaluation)
# HINT: Test set should only have ToTensor() and Normalize() - no augmentation!
test_transform = transforms.Compose([
    # TODO: Convert to tensor
    None,  # Your code here
    
    # TODO: Normalize with same statistics as training
    None   # Your code here
])

# Test your transforms
print("🧪 Testing Transform Implementations...")

try:
    # Create augmented datasets
    train_dataset_aug = torchvision.datasets.CIFAR10(
        root='./data', train=True, transform=train_transform
    )

    test_dataset_clean = torchvision.datasets.CIFAR10(
        root='./data', train=False, transform=test_transform
    )
    
    # Test that transforms work
    sample_img, sample_label = train_dataset_aug[0]
    
    print(f"✅ SUCCESS! Augmented dataset created")
    print(f"✅ Training transforms: {len([t for t in train_transform.transforms if t is not None])} operations")
    print(f"✅ Test transforms: {len([t for t in test_transform.transforms if t is not None])} operations")
    print(f"✅ Sample image shape: {sample_img.shape}")
    print(f"✅ Sample image range: [{sample_img.min():.3f}, {sample_img.max():.3f}]")
    
    # Validation checks
    if sample_img.shape == (3, 32, 32):
        print("🎯 EXCELLENT! Sample has correct CIFAR-10 dimensions")
    else:
        print(f"❌ ERROR: Expected (3, 32, 32), got {sample_img.shape}")
        
    if -3 < sample_img.min() < 0 and 2 < sample_img.max() < 4:
        print("🎯 EXCELLENT! Normalization applied correctly (values in expected range)")
    else:
        print(f"⚠️ WARNING: Unexpected normalized range [{sample_img.min():.3f}, {sample_img.max():.3f}]")
        print("   Expected roughly [-2.5, 2.5] after normalization")
        
except Exception as e:
    print(f"❌ Implementation incomplete: {e}")
    print("💡 Common issues:")
    print("   • Replace all 'None' with proper transform implementations")
    print("   • Make sure ToTensor() and Normalize() are included")
    print("   • Check that normalization values are tuples: (mean1, mean2, mean3), (std1, std2, std3)")
    print("   • Ensure transforms are in logical order")

In [None]:
# Visualize augmentation effects
def visualize_augmentation():
    """Show the effect of data augmentation on sample images"""
    
    # Get original image
    original_dataset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=False,
        transform=transforms.ToTensor()
    )
    
    # Sample image
    img_idx = 100
    original_img, label = original_dataset[img_idx]
    
    fig, axes = plt.subplots(2, 8, figsize=(20, 8))
    fig.suptitle(f'Data Augmentation Effects - {cifar10_classes[label]}', 
                 fontsize=16, fontweight='bold')
    
    # Show original
    axes[0, 0].imshow(original_img.permute(1, 2, 0))
    axes[0, 0].set_title('Original', fontweight='bold')
    axes[0, 0].axis('off')
    
    # TODO: Show augmented versions
    augment_only = transforms.Compose([
        transforms.RandomHorizontalFlip(p=1.0),
        transforms.RandomRotation(degrees=15),
        transforms.ColorJitter(brightness=0.3, contrast=0.3),
        transforms.ToTensor()
    ])
    
    # Create different augmentation pipelines for visualization
    augmentations = [
        transforms.Compose([transforms.RandomHorizontalFlip(p=1.0), transforms.ToTensor()]),
        transforms.Compose([transforms.RandomRotation(degrees=15), transforms.ToTensor()]),
        transforms.Compose([transforms.ColorJitter(brightness=0.3), transforms.ToTensor()]),
        transforms.Compose([transforms.ColorJitter(contrast=0.3), transforms.ToTensor()]),
        transforms.Compose([transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)), transforms.ToTensor()]),
        transforms.Compose([transforms.ColorJitter(saturation=0.3), transforms.ToTensor()]),
        transforms.Compose([transforms.RandomErasing(p=1.0), transforms.ToTensor()])
    ]
    
    aug_names = ['Horizontal Flip', 'Rotation', 'Brightness', 'Contrast', 
                 'Translation', 'Saturation', 'Random Erasing']
    
    # Convert to PIL for augmentation
    pil_img = transforms.ToPILImage()(original_img)
    
    for i, (aug, name) in enumerate(zip(augmentations, aug_names)):
        aug_img = aug(pil_img)
        
        axes[0, i+1].imshow(aug_img.permute(1, 2, 0))
        axes[0, i+1].set_title(name, fontsize=10)
        axes[0, i+1].axis('off')
    
    # Show multiple random augmentations
    full_aug = transforms.Compose([
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(degrees=10),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
        transforms.ToTensor()
    ])
    
    for i in range(8):
        aug_img = full_aug(pil_img)
        axes[1, i].imshow(aug_img.permute(1, 2, 0))
        axes[1, i].set_title(f'Combined Aug {i+1}', fontsize=10)
        axes[1, i].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_augmentation()

print("✨ Augmentation Benefits:")
print("   • Increases effective dataset size")
print("   • Improves model robustness to variations")
print("   • Reduces overfitting on training data")
print("   • Simulates real-world image variations")

## 2.3 Training Setup and Execution

Now let's train our advanced CNN on CIFAR-10:

In [None]:
def train_model(model, train_loader, val_loader, num_epochs=20, learning_rate=0.1):
    """Comprehensive training function with learning rate scheduling"""
    
    # TODO: Setup training components
    # HINT: You need loss function, optimizer, and learning rate scheduler
    
    # TODO: Define loss function for multi-class classification
    # HINT: Use nn.CrossEntropyLoss() for CIFAR-10 classification
    criterion = None  # Your code here
    
    # TODO: Define optimizer
    # HINT: Use optim.SGD with momentum=0.9, weight_decay=5e-4 for good results
    # HINT: SGD(model.parameters(), lr=learning_rate, momentum=?, weight_decay=?)
    optimizer = None  # Your code here
    
    # TODO: Define learning rate scheduler
    # HINT: Use optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
    # HINT: This reduces LR by 10x every 7 epochs
    scheduler = None  # Your code here
    
    # Training history tracking
    history = {
        'train_loss': [], 'train_acc': [],
        'val_loss': [], 'val_acc': [],
        'learning_rates': []
    }
    
    print(f"🚀 Training {model.__class__.__name__}")
    print(f"Epochs: {num_epochs}, Initial LR: {learning_rate}")
    print(f"Optimizer: SGD with momentum, Weight decay: 5e-4")
    print("=" * 60)
    
    best_val_acc = 0.0
    
    for epoch in range(num_epochs):
        # TODO: TRAINING PHASE
        # HINT: Set model to training mode for dropout/batch norm
        model.train()  # Already implemented for you
        
        train_loss = 0.0
        train_correct = 0
        train_total = 0
        
        train_pbar = tqdm(train_loader, desc=f'Epoch {epoch+1:2d}/{num_epochs} [Train]')
        
        for batch_idx, (data, target) in enumerate(train_pbar):
            data, target = data.to(device), target.to(device)
            
            # TODO: Training step - implement the core training loop
            # HINT: 1. Zero gradients, 2. Forward pass, 3. Compute loss, 4. Backward pass, 5. Update weights
            
            # Step 1: Zero gradients
            None  # Your code here - use optimizer
            
            # Step 2: Forward pass
            output = None  # Your code here - pass data through model
            
            # Step 3: Compute loss
            loss = None  # Your code here - use criterion
            
            # Step 4: Backward pass
            None  # Your code here - compute gradients
            
            # Step 5: Update weights
            None  # Your code here - use optimizer
            
            # Statistics tracking (implemented for you)
            train_loss += loss.item()
            _, predicted = torch.max(output.data, 1)
            train_total += target.size(0)
            train_correct += (predicted == target).sum().item()
            
            # Update progress bar
            if batch_idx % 50 == 0:
                current_acc = 100. * train_correct / train_total
                train_pbar.set_postfix({
                    'Loss': f'{loss.item():.4f}',
                    'Acc': f'{current_acc:.2f}%',
                    'LR': f'{optimizer.param_groups[0]["lr"]:.6f}'
                })
        
        # TODO: VALIDATION PHASE
        # HINT: Set model to evaluation mode and use torch.no_grad() for efficiency
        
        # Set model to evaluation mode
        model.eval()  # Already implemented
        
        val_loss = 0.0
        val_correct = 0
        val_total = 0
        
        # TODO: Disable gradient computation for validation
        # HINT: Use 'with torch.no_grad():' context manager
        with None:  # Your code here
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                
                # TODO: Forward pass only (no training)
                output = None  # Your code here - model forward pass
                val_loss += criterion(output, target).item()
                
                # Calculate accuracy
                _, predicted = torch.max(output.data, 1)
                val_total += target.size(0)
                val_correct += (predicted == target).sum().item()
        
        # Calculate epoch metrics
        epoch_train_loss = train_loss / len(train_loader)
        epoch_train_acc = 100. * train_correct / train_total
        epoch_val_loss = val_loss / len(val_loader)
        epoch_val_acc = 100. * val_correct / val_total
        
        # Store history
        history['train_loss'].append(epoch_train_loss)
        history['train_acc'].append(epoch_train_acc)
        history['val_loss'].append(epoch_val_loss)
        history['val_acc'].append(epoch_val_acc)
        history['learning_rates'].append(optimizer.param_groups[0]['lr'])
        
        # TODO: Update learning rate
        # HINT: Use scheduler.step() to update learning rate according to schedule
        None  # Your code here
        
        # Save best model (implemented for you)
        if epoch_val_acc > best_val_acc:
            best_val_acc = epoch_val_acc
            torch.save(model.state_dict(), 'best_model.pth')
        
        # Print epoch summary
        print(f"Epoch {epoch+1:2d}: Train Loss: {epoch_train_loss:.4f}, Train Acc: {epoch_train_acc:.2f}%, "
              f"Val Loss: {epoch_val_loss:.4f}, Val Acc: {epoch_val_acc:.2f}%")
    
    print(f"\n🎯 Training Complete!")
    print(f"Best Validation Accuracy: {best_val_acc:.2f}%")
    
    return history, best_val_acc

# TODO: Create data loaders
# HINT: Data loaders handle batching and shuffling for training

batch_size = 128

# TODO: Create train/validation split
# HINT: Use 90% for training, 10% for validation
# HINT: Use torch.utils.data.random_split(dataset, [train_size, val_size])

train_size = None  # Calculate as int(0.9 * len(train_dataset_aug))
val_size = None    # Calculate as len(train_dataset_aug) - train_size

# TODO: Split the augmented training dataset
# HINT: train_subset, val_subset = torch.utils.data.random_split(train_dataset_aug, [train_size, val_size])
train_subset, val_subset = None  # Your code here

# TODO: Create data loaders
# HINT: Use DataLoader(dataset, batch_size, shuffle=True/False, num_workers=2)
# HINT: Training data should be shuffled, validation/test data should not
train_loader = None  # Your code here - DataLoader for train_subset, shuffle=True
val_loader = None    # Your code here - DataLoader for val_subset, shuffle=False  
test_loader = None   # Your code here - DataLoader for test_dataset_clean, shuffle=False

print("🧪 Testing Data Loader Setup...")

try:
    print(f"📦 Data Loaders Created:")
    print(f"   Training batches: {len(train_loader)}")
    print(f"   Validation batches: {len(val_loader)}")
    print(f"   Test batches: {len(test_loader)}")
    print(f"   Batch size: {batch_size}")
    
    # Test a batch
    sample_batch = next(iter(train_loader))
    print(f"✅ Sample batch shape: {sample_batch[0].shape}")
    print(f"✅ Sample labels shape: {sample_batch[1].shape}")
    
    # Validation checks
    if sample_batch[0].shape[0] == batch_size:
        print("🎯 EXCELLENT! Batch size is correct")
    else:
        print(f"⚠️ Note: Last batch has {sample_batch[0].shape[0]} samples (normal)")
        
except Exception as e:
    print(f"❌ Data loader setup incomplete: {e}")
    print("💡 Common issues:")
    print("   • Make sure train_dataset_aug and test_dataset_clean are defined")
    print("   • Check that train_size and val_size calculations are correct")
    print("   • Verify DataLoader parameters are properly set")

In [None]:
# TODO: Train your advanced CNN on CIFAR-10
print("🔥 Advanced CNN Training Challenge")
print("⚠️  This section requires all previous components to be implemented!")

print("🎯 Your Training Task:")
print("1. Create a fresh AdvancedCNN model")
print("2. Train it using your train_model function")
print("3. Monitor training progress and validation accuracy")
print("4. Analyze the results and training curves")
print("5. Compare with transfer learning approaches")

print("\n⚙️ Training Parameters to Experiment With:")
print("   • Number of epochs: Start with 10, increase to 50+ for best results")
print("   • Learning rate: Try 0.1, 0.01, 0.001")
print("   • Batch size: 64, 128, 256")
print("   • Optimization: SGD vs Adam")

print("\n📊 What to Monitor:")
print("   • Training vs validation accuracy (watch for overfitting)")
print("   • Loss curves (should decrease over time)")
print("   • Learning rate schedule effects")
print("   • Time per epoch and total training time")

print("\n🎯 Expected Performance:")
print("   • Random baseline: ~10% (1/10 classes)")
print("   • Simple CNN: ~60-70%")
print("   • Well-tuned ResNet: ~80-90%")
print("   • State-of-the-art: ~95%+")

# TODO: Implement your training here
try:
    # Test if all components are ready
    print("\n🧪 Testing Component Readiness...")
    
    # Check model
    advanced_cnn = AdvancedCNN().to(device)
    print("✅ AdvancedCNN model ready")
    
    # Check data loaders
    if 'train_loader' in locals() and train_loader is not None:
        print("✅ Data loaders ready")
    else:
        print("❌ Data loaders not ready - implement data loading section first")
        
    # Check training function
    if 'train_model' in locals():
        print("✅ Training function defined")
    else:
        print("❌ Training function not ready - implement training section first")
    
    print("\n🚀 Ready to train! Add your training code here:")
    print("# Your training code:")
    print("# start_time = time.time()")
    print("# history, best_acc = train_model(")
    print("#     advanced_cnn, train_loader, val_loader,")
    print("#     num_epochs=10, learning_rate=0.1")
    print("# )")
    print("# training_time = time.time() - start_time")
    print("# print(f'Training completed in {training_time:.1f} seconds')")
    print("# print(f'Best validation accuracy: {best_acc:.2f}%')")
    
    # TODO: Uncomment and modify the training code above
    # Remember: Training will take 15-30 minutes depending on your hardware
    
    print("\n💡 Training Tips:")
    print("   • Use GPU if available for faster training")
    print("   • Start with fewer epochs to test your implementation")
    print("   • Save model checkpoints during training")
    print("   • Monitor for overfitting (val_acc plateau while train_acc increases)")
    print("   • Try different hyperparameters if results are poor")
    
except Exception as e:
    print(f"❌ Components not ready: {e}")
    print("💡 Complete the previous sections first:")
    print("   • BasicBlock implementation")
    print("   • AdvancedCNN implementation")
    print("   • Data augmentation and loading")
    print("   • Training function implementation")

---

# Part 3: Transfer Learning & Fine-tuning (45 minutes)

**Goal:** Apply transfer learning with pre-trained models for efficient training

Transfer learning leverages models pre-trained on ImageNet to achieve better performance with less training time.

In [None]:
# Setup for transfer learning
print("🔄 Setting up Transfer Learning")

def create_pretrained_model(model_name='resnet18', num_classes=10, feature_extract=True):
    """Create a pre-trained model for transfer learning"""
    
    model = None
    
    if model_name == 'resnet18':
        # TODO: Load pre-trained ResNet18
        # HINT: Use models.resnet18(pretrained=True) to load ImageNet weights
        model = None  # Your code here
        
        # TODO: Freeze parameters for feature extraction
        # HINT: If feature_extract=True, set param.requires_grad = False for all parameters
        # HINT: Use: for param in model.parameters(): param.requires_grad = False
        if feature_extract:
            # Your code here - loop through model.parameters() and freeze them
            pass
        
        # TODO: Replace final layer for CIFAR-10 (10 classes instead of 1000)
        # HINT: Get input features: num_features = model.fc.in_features
        # HINT: Replace: model.fc = nn.Linear(num_features, num_classes)
        num_features = None  # Get model.fc.in_features
        model.fc = None      # Create new Linear layer
        
    elif model_name == 'vgg16':
        # TODO: Load pre-trained VGG16
        # HINT: Use models.vgg16(pretrained=True)
        model = None  # Your code here
        
        # TODO: Freeze feature extraction layers
        # HINT: For VGG, freeze model.features.parameters() if feature_extract=True
        if feature_extract:
            # Your code here - freeze model.features.parameters()
            pass
        
        # TODO: Replace final classifier layer
        # HINT: VGG classifier is model.classifier[6] (the last layer)
        # HINT: Replace: model.classifier[6] = nn.Linear(4096, num_classes)
        model.classifier[6] = None  # Your code here
    
    return model

# TODO: Create different transfer learning approaches
# HINT: Feature extraction = freeze backbone, only train classifier (faster)
# HINT: Fine-tuning = train all layers (slower, potentially better)

print("🔧 Creating Transfer Learning Models...")

try:
    transfer_models = {
        # TODO: Create ResNet18 with feature extraction
        # HINT: Use create_pretrained_model('resnet18', num_classes=10, feature_extract=True)
        'ResNet18 Feature Extraction': None,  # Your code here
        
        # TODO: Create ResNet18 with fine-tuning  
        # HINT: Use create_pretrained_model('resnet18', num_classes=10, feature_extract=False)
        'ResNet18 Fine-tuning': None,  # Your code here
        
        # TODO: Create VGG16 with feature extraction
        'VGG16 Feature Extraction': None  # Your code here
    }

    # Move models to device and analyze
    print("📊 Transfer Learning Model Analysis:")
    print("=" * 70)
    
    for name, model in transfer_models.items():
        if model is not None:
            model = model.to(device)
            
            # Count trainable vs total parameters
            trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
            total_params = sum(p.numel() for p in model.parameters())
            
            print(f"{name:25} | {trainable_params:>8,} / {total_params:>8,} trainable params")
        else:
            print(f"{name:25} | NOT IMPLEMENTED")

    print(f"\n✅ Transfer learning models created")
    print(f"📝 Key Concepts:")
    print(f"   • Feature Extraction: Only train final layer (faster, less overfitting)")
    print(f"   • Fine-tuning: Train all layers (slower, potentially better performance)")
    print(f"   • Pre-trained models provide learned features from ImageNet")
    
except Exception as e:
    print(f"❌ Transfer learning setup incomplete: {e}")
    print("💡 Common issues:")
    print("   • Replace all 'None' with proper implementations")
    print("   • Make sure to import torchvision.models as models")
    print("   • Check that pretrained=True in model loading")
    print("   • Verify parameter freezing loop syntax")

In [None]:
# Compare transfer learning approaches
print("⚖️ Comparing Transfer Learning Approaches")
print("=" * 50)

# Modified training function for transfer learning
def train_transfer_model(model, train_loader, val_loader, model_name, num_epochs=5):
    """Train transfer learning model with appropriate learning rate"""
    
    # Different learning rates for different approaches
    if 'Feature Extraction' in model_name:
        learning_rate = 0.001  # Higher LR for feature extraction
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    else:
        learning_rate = 0.0001  # Lower LR for fine-tuning
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    criterion = nn.CrossEntropyLoss()
    
    print(f"\n🔄 Training {model_name}")
    print(f"Learning Rate: {learning_rate}, Epochs: {num_epochs}")
    
    history = {'train_acc': [], 'val_acc': []}
    
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_correct = 0
        train_total = 0
        
        for data, target in tqdm(train_loader, desc=f'Epoch {epoch+1}'):
            data, target = data.to(device), target.to(device)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            _, predicted = torch.max(output.data, 1)
            train_total += target.size(0)
            train_correct += (predicted == target).sum().item()
        
        # Validation phase
        model.eval()
        val_correct = 0
        val_total = 0
        
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                
                _, predicted = torch.max(output.data, 1)
                val_total += target.size(0)
                val_correct += (predicted == target).sum().item()
        
        train_acc = 100. * train_correct / train_total
        val_acc = 100. * val_correct / val_total
        
        history['train_acc'].append(train_acc)
        history['val_acc'].append(val_acc)
        
        print(f"   Epoch {epoch+1}: Train Acc: {train_acc:.2f}%, Val Acc: {val_acc:.2f}%")
    
    return history, val_acc

# TODO: Train and compare transfer learning models
# Note: Using fewer epochs for demonstration (increase for better results)
transfer_results = {}

# Train each transfer learning approach
for name, model in list(transfer_models.items())[:2]:  # Train first 2 models
    start_time = time.time()
    history, final_acc = train_transfer_model(
        model, train_loader, val_loader, name, num_epochs=3
    )
    training_time = time.time() - start_time
    
    transfer_results[name] = {
        'history': history,
        'final_accuracy': final_acc,
        'training_time': training_time
    }
    
    print(f"   ✅ Completed in {training_time:.1f}s, Final Accuracy: {final_acc:.2f}%")

print(f"\n📊 Transfer Learning Results Summary:")
for name, results in transfer_results.items():
    print(f"   {name}: {results['final_accuracy']:.2f}% in {results['training_time']:.1f}s")

In [None]:
# Visualize transfer learning comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Learning curves comparison
colors = ['blue', 'red', 'green']
for i, (name, results) in enumerate(transfer_results.items()):
    epochs = range(1, len(results['history']['val_acc']) + 1)
    ax1.plot(epochs, results['history']['val_acc'], 
             color=colors[i], linewidth=2, marker='o', 
             label=f"{name.split()[0]} ({results['final_accuracy']:.1f}%)", alpha=0.8)

ax1.set_title('Transfer Learning: Validation Accuracy', fontweight='bold')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Validation Accuracy (%)')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0, 100)

# Performance vs Training Time
names = list(transfer_results.keys())
accuracies = [transfer_results[name]['final_accuracy'] for name in names]
times = [transfer_results[name]['training_time'] for name in names]

scatter = ax2.scatter(times, accuracies, s=100, c=colors[:len(names)], alpha=0.7)
for i, name in enumerate(names):
    ax2.annotate(name.split()[0], (times[i], accuracies[i]), 
                xytext=(5, 5), textcoords='offset points', fontsize=10)

ax2.set_title('Performance vs Training Time', fontweight='bold')
ax2.set_xlabel('Training Time (seconds)')
ax2.set_ylabel('Final Validation Accuracy (%)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("🔍 Transfer Learning Insights:")
print("   • Feature extraction is faster but may have lower accuracy")
print("   • Fine-tuning takes longer but can achieve better performance")
print("   • Pre-trained models provide significant advantage over training from scratch")
print("   • Choose approach based on dataset size and computational budget")

---

# Part 4: Model Analysis & Interpretability (30 minutes)

**Goal:** Analyze what our CNN learns and implement Grad-CAM for visual explanations

Understanding what CNNs learn is crucial for building trust and debugging models.

In [None]:
# Implement Grad-CAM for CNN interpretability
class GradCAM:
    """Gradient-weighted Class Activation Mapping"""
    
    def __init__(self, model, target_layer_name):
        self.model = model
        self.target_layer_name = target_layer_name
        self.gradients = None
        self.activations = None
        
        # Register hooks (implemented for you)
        self._register_hooks()
    
    def _register_hooks(self):
        """Register forward and backward hooks"""
        
        def forward_hook(module, input, output):
            self.activations = output
        
        def backward_hook(module, grad_input, grad_output):
            self.gradients = grad_output[0]
        
        # Find target layer and register hooks
        target_layer = dict(self.model.named_modules())[self.target_layer_name]
        target_layer.register_forward_hook(forward_hook)
        target_layer.register_backward_hook(backward_hook)
    
    def generate_cam(self, input_tensor, class_idx=None):
        """Generate Grad-CAM heatmap"""
        
        # TODO: Forward pass to get model output
        # HINT: Set model to eval mode and pass input through model
        self.model.eval()
        output = None  # Your code here - forward pass
        
        # TODO: Get target class index
        # HINT: If class_idx is None, use torch.argmax(output) to get predicted class
        if class_idx is None:
            class_idx = None  # Your code here
        
        # TODO: Backward pass to compute gradients
        # HINT: Zero gradients first, then backpropagate from target class
        # HINT: Use output[0, class_idx].backward() to get gradients for specific class
        self.model.zero_grad()
        None  # Your code here - backward pass
        
        # TODO: Compute Grad-CAM using gradients and activations
        # HINT: self.gradients and self.activations were captured by hooks
        
        # Remove batch dimension
        gradients = self.gradients[0]    # Shape: [channels, height, width]
        activations = self.activations[0]  # Shape: [channels, height, width]
        
        # TODO: Compute importance weights by global average pooling of gradients
        # HINT: Use torch.mean(gradients, dim=(1, 2)) to average over spatial dimensions
        weights = None  # Your code here - global average pool
        
        # TODO: Compute weighted combination of activation maps
        # HINT: Initialize cam as zeros with same spatial size as activations
        # HINT: Loop through channels and add: cam += weight[i] * activations[i]
        cam = torch.zeros(activations.shape[1:], dtype=torch.float32)  # [height, width]
        
        # Your code here - weighted combination
        for i, w in enumerate(weights):
            cam += None  # w * activations[i]
        
        # TODO: Apply ReLU and normalize
        # HINT: Use F.relu(cam) to remove negative values
        # HINT: Normalize by dividing by torch.max(cam) if max > 0
        cam = None  # Apply ReLU
        if torch.max(cam) > 0:
            cam = None  # Normalize by max value
        
        return cam.detach().cpu().numpy()

print("🔍 Testing Grad-CAM Implementation...")

try:
    # Test Grad-CAM (requires a trained model)
    print("⚠️  Grad-CAM requires a trained model to work properly")
    print("✅ Grad-CAM class structure implemented")
    print("💡 This will be tested after training your model")
    
    print("\n🧠 Grad-CAM Concept:")
    print("   • Grad-CAM shows which image regions influence predictions")
    print("   • Uses gradients flowing back to target layer")
    print("   • Weights activation maps by their importance")
    print("   • Helps understand what the CNN is 'looking at'")
    
except Exception as e:
    print(f"❌ Grad-CAM implementation incomplete: {e}")
    print("💡 Common issues:")
    print("   • Replace all 'None' with proper implementations")
    print("   • Check tensor dimensions and operations")
    print("   • Make sure to use F.relu and torch.max correctly")

In [None]:
# Apply Grad-CAM to analyze model decisions
def visualize_gradcam_results(model, test_loader, num_samples=8):
    """Visualize Grad-CAM results for sample predictions"""
    
    # Initialize Grad-CAM
    gradcam = GradCAM(model, 'layer4.1.conv2')  # Target last conv layer
    
    # Get sample images
    model.eval()
    images, labels, predictions, cams = [], [], [], []
    
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(test_loader):
            data, target = data.to(device), target.to(device)
            
            for i in range(min(num_samples, data.size(0))):
                if len(images) >= num_samples:
                    break
                
                # Get single image
                img = data[i:i+1]
                label = target[i].item()
                
                # Get prediction
                output = model(img)
                pred = torch.argmax(output, dim=1).item()
                
                # Generate Grad-CAM
                cam = gradcam.generate_cam(img, class_idx=pred)
                
                # Store results
                # Denormalize image for visualization
                img_denorm = img[0].cpu()
                img_denorm = img_denorm * torch.tensor([0.2023, 0.1994, 0.2010]).view(3, 1, 1)
                img_denorm = img_denorm + torch.tensor([0.4914, 0.4822, 0.4465]).view(3, 1, 1)
                img_denorm = torch.clamp(img_denorm, 0, 1)
                
                images.append(img_denorm.permute(1, 2, 0).numpy())
                labels.append(label)
                predictions.append(pred)
                cams.append(cam)
            
            if len(images) >= num_samples:
                break
    
    # TODO: Visualize results
    fig, axes = plt.subplots(3, num_samples, figsize=(20, 10))
    fig.suptitle('Grad-CAM Analysis: What Does the CNN See?', fontsize=16, fontweight='bold')
    
    for i in range(num_samples):
        # Original image
        axes[0, i].imshow(images[i])
        axes[0, i].set_title(f'Original\nTrue: {cifar10_classes[labels[i]]}', fontsize=10)
        axes[0, i].axis('off')
        
        # Grad-CAM heatmap
        im1 = axes[1, i].imshow(cams[i], cmap='jet', alpha=0.7)
        axes[1, i].set_title(f'Grad-CAM\nPred: {cifar10_classes[predictions[i]]}', fontsize=10)
        axes[1, i].axis('off')
        
        # Overlay
        axes[2, i].imshow(images[i])
        # Resize CAM to match image size
        cam_resized = np.array(Image.fromarray(cams[i]).resize((32, 32)))
        axes[2, i].imshow(cam_resized, cmap='jet', alpha=0.4)
        
        correct = '✓' if labels[i] == predictions[i] else '✗'
        axes[2, i].set_title(f'Overlay {correct}', fontsize=10)
        axes[2, i].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Analyze results
    correct_predictions = sum(1 for i in range(num_samples) if labels[i] == predictions[i])
    print(f"\n📊 Analysis Results:")
    print(f"   Accuracy on samples: {correct_predictions}/{num_samples} ({100*correct_predictions/num_samples:.1f}%)")
    print(f"   ✅ Red areas show where the CNN focuses attention")
    print(f"   🔍 Does the model focus on relevant object parts?")
    
    return images, labels, predictions, cams

# Apply Grad-CAM analysis
print("🎯 Applying Grad-CAM Analysis...")
gradcam_results = visualize_gradcam_results(advanced_cnn, test_loader, num_samples=6)

print("\n🧠 Interpretability Insights:")
print("   • Grad-CAM shows which image regions influence predictions")
print("   • Helps identify if model focuses on relevant features")
print("   • Can reveal biases or shortcut learning")
print("   • Useful for debugging and building trust in AI systems")

---

# Part 5: Optimization & Deployment (30 minutes)

**Goal:** Optimize models for real-world deployment with quantization and pruning

In [None]:
# Model optimization techniques
print("⚡ Model Optimization for Deployment")

def analyze_model_size(model, model_name):
    """Analyze model size and complexity"""
    
    # TODO: Count total parameters
    # HINT: Use sum(p.numel() for p in model.parameters()) to count all parameters
    total_params = None  # Your code here
    
    # TODO: Estimate model size in MB
    # HINT: Assuming float32 (4 bytes per parameter): total_params * 4 / (1024 * 1024)
    model_size_mb = None  # Your code here
    
    # Measure inference time (implemented for you)
    model.eval()
    dummy_input = torch.randn(1, 3, 32, 32).to(device)
    
    # Warm up
    for _ in range(10):
        with torch.no_grad():
            _ = model(dummy_input)
    
    # Measure inference time
    start_time = time.time()
    for _ in range(100):
        with torch.no_grad():
            _ = model(dummy_input)
    avg_inference_time = (time.time() - start_time) / 100 * 1000  # ms
    
    return {
        'name': model_name,
        'parameters': total_params,
        'size_mb': model_size_mb,
        'inference_time_ms': avg_inference_time
    }

# TODO: Implement dynamic quantization
def quantize_model(model):
    """Apply dynamic quantization to reduce model size"""
    
    # TODO: Move model to CPU for quantization
    # HINT: Use model.cpu() since quantization typically works on CPU
    model_cpu = None  # Your code here
    
    # TODO: Apply dynamic quantization
    # HINT: Use torch.quantization.quantize_dynamic()
    # HINT: Target layer types: {nn.Linear, nn.Conv2d}
    # HINT: Use dtype=torch.qint8 for 8-bit quantization
    quantized_model = None  # Your code here - torch.quantization.quantize_dynamic(...)
    
    return quantized_model

print("🧪 Testing Model Optimization...")

try:
    # This will work once you have a trained model
    print("⚠️  Model optimization requires a trained model")
    print("💡 Complete the training sections first, then return here")
    
    # Placeholder analysis (will work with dummy model)
    print("\n📊 Optimization Concepts:")
    print("   • Quantization: Reduce precision (float32 → int8) for smaller models")
    print("   • Pruning: Remove less important weights/neurons")
    print("   • Knowledge Distillation: Train smaller model to mimic larger one")
    print("   • Mobile deployment: Trade accuracy for speed/size")
    
    print("\n🎯 Expected Benefits:")
    print("   • Model size reduction: 2-4x smaller")
    print("   • Inference speedup: 1.5-3x faster")
    print("   • Memory usage: Significantly reduced")
    print("   • Accuracy loss: Usually <1-2% if done carefully")

except Exception as e:
    print(f"❌ Optimization implementation incomplete: {e}")
    print("💡 Common issues:")
    print("   • Make sure to implement parameter counting")
    print("   • Check quantization function implementation")
    print("   • Verify that model is moved to CPU for quantization")

In [None]:
# Create deployment pipeline
print("🚀 Creating Deployment Pipeline")

class CNNPredictor:
    """Production-ready CNN predictor"""
    
    def __init__(self, model_path, use_quantized=False):
        self.device = torch.device('cpu')  # CPU for deployment
        self.use_quantized = use_quantized
        
        # Load model
        if use_quantized:
            self.model = quantized_model
        else:
            self.model = AdvancedCNN()
            self.model.load_state_dict(torch.load(model_path, map_location=self.device))
        
        self.model.eval()
        
        # Preprocessing pipeline
        self.transform = transforms.Compose([
            transforms.Resize((32, 32)),
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
        ])
        
        # Class names
        self.classes = cifar10_classes
    
    def predict(self, image):
        """Make prediction on a single image"""
        
        # TODO: Preprocess image
        if isinstance(image, np.ndarray):
            image = Image.fromarray(image)
        
        # Apply transforms
        input_tensor = self.transform(image).unsqueeze(0)  # Add batch dimension
        
        # Make prediction
        with torch.no_grad():
            output = self.model(input_tensor)
            probabilities = F.softmax(output, dim=1)
            predicted_class = torch.argmax(probabilities, dim=1).item()
            confidence = probabilities[0, predicted_class].item()
        
        return {
            'class': self.classes[predicted_class],
            'class_id': predicted_class,
            'confidence': confidence,
            'probabilities': probabilities[0].numpy()
        }
    
    def predict_batch(self, images):
        """Make predictions on a batch of images"""
        
        batch_tensor = torch.stack([self.transform(img) for img in images])
        
        with torch.no_grad():
            outputs = self.model(batch_tensor)
            probabilities = F.softmax(outputs, dim=1)
            predicted_classes = torch.argmax(probabilities, dim=1)
        
        results = []
        for i in range(len(images)):
            results.append({
                'class': self.classes[predicted_classes[i].item()],
                'class_id': predicted_classes[i].item(),
                'confidence': probabilities[i, predicted_classes[i]].item(),
                'probabilities': probabilities[i].numpy()
            })
        
        return results

# Test deployment pipeline
print("🧪 Testing Deployment Pipeline...")

try:
    # Create predictor with quantized model
    predictor = CNNPredictor('best_model.pth', use_quantized=True)
    
    # Get test image
    test_image, test_label = test_dataset_clean[0]
    
    # Convert tensor to PIL Image for predictor
    test_image_pil = transforms.ToPILImage()(test_image)
    
    # Make prediction
    result = predictor.predict(test_image_pil)
    
    print(f"✅ Deployment pipeline working!")
    print(f"   Predicted: {result['class']} (confidence: {result['confidence']:.3f})")
    print(f"   Actual: {cifar10_classes[test_label]}")
    
except Exception as e:
    print(f"⚠️ Deployment test failed: {e}")
    print("   This is expected if model wasn't fully trained")

print("\n🏭 Deployment Considerations:")
print("   • Model quantization reduces size and improves speed")
print("   • CPU deployment is common for edge devices")
print("   • Preprocessing pipeline must match training")
print("   • Consider model versioning and A/B testing")
print("   • Monitor model performance in production")

---

# Part 6: Critical Analysis & Ethics (30 minutes)

**Goal:** Reflect on the broader implications of computer vision systems

## 6.1 Technical Analysis

**TODO: Answer these questions based on your experiments:**

**1. How did ResNet-style skip connections improve your model compared to a simple CNN?**

[TODO: Compare the training behavior, final accuracy, and convergence speed. Discuss why skip connections help with very deep networks.]

**2. What were the advantages and disadvantages of transfer learning vs training from scratch?**

**Advantages of Transfer Learning:**
[TODO: List benefits like faster convergence, better performance with limited data, reduced computational cost]

**Disadvantages of Transfer Learning:**
[TODO: List limitations like domain mismatch, potential negative transfer, less interpretability]

**3. What did Grad-CAM reveal about your model's decision-making process?**

[TODO: Analyze whether the model focused on relevant object parts, identify any concerning patterns or biases, discuss reliability of visual explanations]

**4. How did model quantization affect performance and efficiency?**

[TODO: Compare model size, inference speed, and accuracy. Discuss trade-offs and when quantization is appropriate]

## 6.2 Ethical Considerations in Computer Vision

**5. What types of bias might exist in computer vision systems like the ones you built?**

**Data Bias:**
[TODO: Discuss how training data composition affects model behavior, underrepresentation of certain groups or scenarios]

**Algorithmic Bias:**
[TODO: Discuss how model architecture and training procedures might introduce bias]

**Evaluation Bias:**
[TODO: Discuss how evaluation metrics and test sets might not capture all relevant performance aspects]

**6. Consider three real-world applications of computer vision. What are the potential risks and benefits?**

**Application 1: Medical Image Diagnosis**
Benefits: [TODO: List potential benefits like faster diagnosis, consistency, accessibility]
Risks: [TODO: List potential risks like misdiagnosis, overreliance on AI, bias against certain populations]

**Application 2: Autonomous Vehicles**
Benefits: [TODO: List benefits like reduced accidents, accessibility for disabled, efficiency]
Risks: [TODO: List risks like algorithmic bias, accountability issues, job displacement]

**Application 3: Surveillance Systems**
Benefits: [TODO: List benefits like crime prevention, security, efficiency]
Risks: [TODO: List risks like privacy violations, false positives, authoritarian use]

**7. How can we make computer vision systems more fair and trustworthy?**

**Technical Solutions:**
[TODO: Discuss diverse training data, bias detection methods, robust evaluation, interpretability tools]

**Policy Solutions:**
[TODO: Discuss regulation, accountability frameworks, transparency requirements, public oversight]

**Social Solutions:**
[TODO: Discuss diverse development teams, community input, education, democratic governance]

**8. What future developments in computer vision excite you most? What concerns you most?**

**Exciting Developments:**
[TODO: Discuss potential positive applications, technological advances, societal benefits]

**Concerning Developments:**
[TODO: Discuss potential negative applications, risks, societal concerns]

## Summary and Reflection

### What You've Accomplished

Congratulations! In this advanced assignment, you have:

**Mastered advanced CNN architectures** with ResNet-style skip connections  
**Applied transfer learning** for efficient training and better performance  
**Worked with complex datasets** requiring sophisticated preprocessing  
**Implemented model interpretability** using Grad-CAM visualizations  
**Optimized models for deployment** with quantization techniques  
**Reflected critically** on ethics and bias in computer vision systems  

### Key Takeaways

**TODO: Write 4-5 key insights from this assignment:**

1. [TODO: Your first key takeaway about advanced CNN architectures and their capabilities]
2. [TODO: Your second key takeaway about transfer learning and its practical benefits]
3. [TODO: Your third key takeaway about model interpretability and explainable AI]
4. [TODO: Your fourth key takeaway about deployment optimization and real-world constraints]
5. [TODO: Your fifth key takeaway about ethics and responsibility in AI development]

### Comparison to Previous Assignments

**TODO: Compare this advanced CNN work to your previous assignments:**

**Evolution from Linear Models (CA1) to CNNs:**
[TODO: Discuss the progression in complexity and capability]

**Advancement from Basic CNNs (IC) to Advanced Techniques:**
[TODO: Compare simple MNIST CNNs to complex CIFAR-10 architectures with transfer learning]

### Looking Forward

**TODO: What aspects of computer vision would you like to explore further?**

[TODO: Mention interests in object detection, semantic segmentation, generative models, video analysis, or domain-specific applications]

### Final Reflection

**TODO: Write a comprehensive reflection (200-300 words) on your experience with advanced computer vision:**

[TODO: Your final reflection here - discuss the complexity of modern AI systems, the importance of responsible development, what surprised you about transfer learning and interpretability, how this connects to real-world AI applications, and your thoughts on the future of computer vision technology]

---

**Assignment Complete!**

Make sure to:
1. Complete all TODO sections
2. Test your implementations thoroughly
3. Answer all reflection questions thoughtfully
4. Save your notebook and export as PDF
5. Submit both .ipynb and .pdf files
6. Include your name and student ID at the top