# 👁️ Computer Vision with Deep Learning

## From CNNs to Modern Object Detection

**What You'll Learn:**
- Convolutional Neural Networks (CNNs) from scratch
- Image classification with modern architectures
- Transfer learning and fine-tuning
- Object detection (YOLO, Faster R-CNN)
- Image segmentation
- Visualization techniques

**Prerequisites:** Deep Learning Fundamentals (Notebook 06)

---

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import cv2

# Deep learning
try:
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    from torch.utils.data import Dataset, DataLoader
    import torchvision
    from torchvision import transforms, models
    TORCH_AVAILABLE = True
    print(f"PyTorch version: {torch.__version__}")
    print(f"Torchvision version: {torchvision.__version__}")
except ImportError:
    print("PyTorch not installed. Install with: pip install torch torchvision")
    TORCH_AVAILABLE = False

# Check for GPU
if TORCH_AVAILABLE:
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

# Plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

print("✓ Libraries imported successfully!")

---

## Part 1: Understanding Convolutions

### 1.1 2D Convolution from Scratch

**Convolution operation:**
```
(I * K)[i, j] = Σ_m Σ_n I[i+m, j+n] · K[m, n]
```

In [None]:
def conv2d_numpy(image, kernel, stride=1, padding=0):
    """
    2D convolution from scratch using NumPy.
    
    Args:
        image: (H, W) input image
        kernel: (K, K) convolution kernel
        stride: Stride for convolution
        padding: Zero padding
    
    Returns:
        output: Convolved image
    """
    # Add padding
    if padding > 0:
        image = np.pad(image, padding, mode='constant')
    
    H, W = image.shape
    K = kernel.shape[0]
    
    # Output dimensions
    out_H = (H - K) // stride + 1
    out_W = (W - K) // stride + 1
    
    # Initialize output
    output = np.zeros((out_H, out_W))
    
    # Perform convolution
    for i in range(0, out_H):
        for j in range(0, out_W):
            # Extract region
            region = image[i*stride:i*stride+K, j*stride:j*stride+K]
            
            # Element-wise multiplication and sum
            output[i, j] = np.sum(region * kernel)
    
    return output

# Example: Edge detection
# Create sample image
image = np.array([
    [0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 1, 1, 0, 0],
    [0, 0, 1, 1, 1, 0, 0],
    [0, 0, 1, 1, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0],
], dtype=np.float32)

# Sobel edge detection kernels
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])

sobel_y = np.array([[-1, -2, -1],
                    [ 0,  0,  0],
                    [ 1,  2,  1]])

# Apply convolution
edges_x = conv2d_numpy(image, sobel_x, stride=1, padding=1)
edges_y = conv2d_numpy(image, sobel_y, stride=1, padding=1)
edges = np.sqrt(edges_x**2 + edges_y**2)

# Visualize
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

axes[0].imshow(image, cmap='gray')
axes[0].set_title('Original Image')
axes[0].axis('off')

axes[1].imshow(edges_x, cmap='gray')
axes[1].set_title('Horizontal Edges (Sobel X)')
axes[1].axis('off')

axes[2].imshow(edges_y, cmap='gray')
axes[2].set_title('Vertical Edges (Sobel Y)')
axes[2].axis('off')

axes[3].imshow(edges, cmap='gray')
axes[3].set_title('Combined Edges')
axes[3].axis('off')

plt.tight_layout()
plt.show()

print("\nConvolution Output Shape:", edges.shape)

### 1.2 Common Convolution Kernels

In [None]:
# Common kernels for image processing
kernels = {
    'Identity': np.array([[0, 0, 0],
                          [0, 1, 0],
                          [0, 0, 0]]),
    
    'Sharpen': np.array([[ 0, -1,  0],
                         [-1,  5, -1],
                         [ 0, -1,  0]]),
    
    'Blur': np.ones((3, 3)) / 9,
    
    'Edge Detection': np.array([[-1, -1, -1],
                                [-1,  8, -1],
                                [-1, -1, -1]]),
    
    'Emboss': np.array([[-2, -1, 0],
                        [-1,  1, 1],
                        [ 0,  1, 2]]),
}

# Apply all kernels to a sample image
# Create more interesting sample image
sample = np.zeros((15, 15))
sample[3:7, 3:12] = 1
sample[8:12, 3:7] = 1
sample[8:12, 8:12] = 1

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

axes[0].imshow(sample, cmap='gray')
axes[0].set_title('Original')
axes[0].axis('off')

for idx, (name, kernel) in enumerate(kernels.items(), 1):
    result = conv2d_numpy(sample, kernel, padding=1)
    axes[idx].imshow(result, cmap='gray')
    axes[idx].set_title(name)
    axes[idx].axis('off')

plt.tight_layout()
plt.show()

---

## Part 2: Building a CNN from Scratch

### 2.1 Simple CNN for MNIST

In [None]:
if TORCH_AVAILABLE:
    class SimpleCNN(nn.Module):
        """Simple CNN for digit recognition."""
        
        def __init__(self):
            super(SimpleCNN, self).__init__()
            
            # Convolutional layers
            self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
            self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
            self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
            
            # Batch normalization
            self.bn1 = nn.BatchNorm2d(32)
            self.bn2 = nn.BatchNorm2d(64)
            self.bn3 = nn.BatchNorm2d(128)
            
            # Pooling
            self.pool = nn.MaxPool2d(2, 2)
            
            # Fully connected layers
            self.fc1 = nn.Linear(128 * 3 * 3, 256)
            self.fc2 = nn.Linear(256, 10)
            
            # Dropout
            self.dropout = nn.Dropout(0.5)
        
        def forward(self, x):
            # Conv block 1: 28x28 -> 14x14
            x = self.conv1(x)
            x = self.bn1(x)
            x = F.relu(x)
            x = self.pool(x)
            
            # Conv block 2: 14x14 -> 7x7
            x = self.conv2(x)
            x = self.bn2(x)
            x = F.relu(x)
            x = self.pool(x)
            
            # Conv block 3: 7x7 -> 3x3
            x = self.conv3(x)
            x = self.bn3(x)
            x = F.relu(x)
            x = self.pool(x)
            
            # Flatten
            x = x.view(x.size(0), -1)
            
            # Fully connected
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout(x)
            x = self.fc2(x)
            
            return x
    
    # Create model
    model = SimpleCNN()
    
    # Print architecture
    print(model)
    
    # Count parameters
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"\nTotal parameters: {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,}")
    
    # Test forward pass
    dummy_input = torch.randn(4, 1, 28, 28)  # Batch of 4 images
    output = model(dummy_input)
    print(f"\nInput shape: {dummy_input.shape}")
    print(f"Output shape: {output.shape}")
else:
    print("PyTorch not available")

### 2.2 Visualizing Feature Maps

In [None]:
if TORCH_AVAILABLE:
    def visualize_feature_maps(model, image, layer_num=1):
        """
        Visualize feature maps from a convolutional layer.
        
        Args:
            model: CNN model
            image: Input image tensor (1, C, H, W)
            layer_num: Which conv layer to visualize (1, 2, or 3)
        """
        model.eval()
        
        # Forward pass and extract features
        with torch.no_grad():
            x = image
            
            # Go through layers up to desired layer
            if layer_num >= 1:
                x = model.conv1(x)
                x = model.bn1(x)
                x = F.relu(x)
                if layer_num == 1:
                    features = x
                x = model.pool(x)
            
            if layer_num >= 2:
                x = model.conv2(x)
                x = model.bn2(x)
                x = F.relu(x)
                if layer_num == 2:
                    features = x
                x = model.pool(x)
            
            if layer_num >= 3:
                x = model.conv3(x)
                x = model.bn3(x)
                x = F.relu(x)
                if layer_num == 3:
                    features = x
        
        # Plot feature maps
        features = features.squeeze(0).cpu()  # Remove batch dim
        num_features = min(16, features.shape[0])  # Plot first 16
        
        fig, axes = plt.subplots(4, 4, figsize=(12, 12))
        axes = axes.flatten()
        
        for i in range(num_features):
            axes[i].imshow(features[i], cmap='viridis')
            axes[i].set_title(f'Filter {i+1}')
            axes[i].axis('off')
        
        for i in range(num_features, 16):
            axes[i].axis('off')
        
        plt.suptitle(f'Feature Maps from Conv Layer {layer_num}', fontsize=16)
        plt.tight_layout()
        plt.show()
    
    # Create a random input (you can replace with real image)
    test_image = torch.randn(1, 1, 28, 28)
    
    # Visualize different layers
    for layer in [1, 2, 3]:
        visualize_feature_maps(model, test_image, layer_num=layer)
else:
    print("PyTorch not available")

---

## Part 3: Transfer Learning with Pre-trained Models

### 3.1 Loading Pre-trained ResNet

In [None]:
if TORCH_AVAILABLE:
    # Load pre-trained ResNet
    from torchvision.models import resnet50, ResNet50_Weights
    
    # Load with pre-trained weights
    weights = ResNet50_Weights.DEFAULT
    resnet = resnet50(weights=weights)
    
    # Set to eval mode
    resnet.eval()
    
    # Get preprocessing transforms
    preprocess = weights.transforms()
    
    print("ResNet-50 loaded successfully!")
    print(f"\nTotal parameters: {sum(p.numel() for p in resnet.parameters()):,}")
    
    # Show architecture summary
    print("\nArchitecture:")
    for name, module in resnet.named_children():
        print(f"{name}: {module.__class__.__name__}")
else:
    print("PyTorch not available")

### 3.2 Fine-tuning for Custom Dataset

In [None]:
if TORCH_AVAILABLE:
    def create_transfer_learning_model(num_classes, freeze_backbone=True):
        """
        Create transfer learning model from ResNet.
        
        Args:
            num_classes: Number of classes for new task
            freeze_backbone: Whether to freeze pre-trained weights
        
        Returns:
            model: Modified ResNet model
        """
        # Load pre-trained model
        model = resnet50(weights=ResNet50_Weights.DEFAULT)
        
        # Freeze backbone if requested
        if freeze_backbone:
            for param in model.parameters():
                param.requires_grad = False
        
        # Replace final layer
        num_features = model.fc.in_features
        model.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(num_features, 512),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(512, num_classes)
        )
        
        return model
    
    # Example: Create model for 10-class problem
    transfer_model = create_transfer_learning_model(num_classes=10, freeze_backbone=True)
    
    # Count trainable parameters
    trainable = sum(p.numel() for p in transfer_model.parameters() if p.requires_grad)
    total = sum(p.numel() for p in transfer_model.parameters())
    
    print(f"Total parameters: {total:,}")
    print(f"Trainable parameters: {trainable:,}")
    print(f"Frozen parameters: {total - trainable:,}")
    print(f"\nTrainable: {100 * trainable / total:.2f}%")
    
    # Test forward pass
    dummy = torch.randn(2, 3, 224, 224)
    output = transfer_model(dummy)
    print(f"\nOutput shape: {output.shape}")  # (2, 10)
else:
    print("PyTorch not available")

### 3.3 Image Classification Pipeline

In [None]:
if TORCH_AVAILABLE:
    def classify_image(model, image_path, class_names=None):
        """
        Classify an image using pre-trained model.
        
        Args:
            model: PyTorch model
            image_path: Path to image file
            class_names: List of class names (optional)
        
        Returns:
            predictions: Top predictions with probabilities
        """
        # Load and preprocess image
        from PIL import Image
        
        img = Image.open(image_path).convert('RGB')
        
        # Define transforms
        transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]
            )
        ])
        
        # Transform image
        img_tensor = transform(img).unsqueeze(0)  # Add batch dimension
        
        # Make prediction
        model.eval()
        with torch.no_grad():
            output = model(img_tensor)
            probabilities = F.softmax(output, dim=1)
        
        # Get top 5 predictions
        top5_prob, top5_indices = torch.topk(probabilities, 5)
        top5_prob = top5_prob.squeeze().cpu().numpy()
        top5_indices = top5_indices.squeeze().cpu().numpy()
        
        # Format results
        results = []
        for i, (idx, prob) in enumerate(zip(top5_indices, top5_prob)):
            class_name = class_names[idx] if class_names else f"Class {idx}"
            results.append((class_name, prob))
        
        # Visualize
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        
        # Show image
        ax1.imshow(img)
        ax1.set_title('Input Image')
        ax1.axis('off')
        
        # Show predictions
        classes = [r[0] for r in results]
        probs = [r[1] for r in results]
        
        y_pos = np.arange(len(classes))
        ax2.barh(y_pos, probs)
        ax2.set_yticks(y_pos)
        ax2.set_yticklabels(classes)
        ax2.invert_yaxis()
        ax2.set_xlabel('Probability')
        ax2.set_title('Top 5 Predictions')
        ax2.set_xlim([0, 1])
        
        plt.tight_layout()
        plt.show()
        
        return results
    
    print("Image classification function defined.")
    print("\nUsage:")
    print("results = classify_image(resnet, 'path/to/image.jpg')")
else:
    print("PyTorch not available")

---

## Part 4: Data Augmentation

### 4.1 Standard Augmentations

In [None]:
if TORCH_AVAILABLE:
    # Define augmentation pipeline
    train_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.RandomRotation(degrees=15),
        transforms.ColorJitter(
            brightness=0.2,
            contrast=0.2,
            saturation=0.2,
            hue=0.1
        ),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        ),
    ])
    
    # Validation transforms (no augmentation)
    val_transforms = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        ),
    ])
    
    # Visualize augmentations
    def show_augmentations(image_path, transform, num_augmentations=8):
        """Show multiple augmented versions of an image."""
        from PIL import Image
        
        img = Image.open(image_path).convert('RGB')
        
        fig, axes = plt.subplots(2, 4, figsize=(16, 8))
        axes = axes.flatten()
        
        # Original
        axes[0].imshow(img)
        axes[0].set_title('Original')
        axes[0].axis('off')
        
        # Augmented versions
        for i in range(1, num_augmentations):
            # Apply transform (need to remove normalization for visualization)
            aug_transform = transforms.Compose([
                transforms.Resize(256),
                transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
                transforms.RandomHorizontalFlip(p=0.5),
                transforms.RandomRotation(degrees=15),
                transforms.ColorJitter(0.2, 0.2, 0.2, 0.1),
            ])
            
            aug_img = aug_transform(img)
            axes[i].imshow(aug_img)
            axes[i].set_title(f'Augmented {i}')
            axes[i].axis('off')
        
        plt.tight_layout()
        plt.show()
    
    print("Data augmentation transforms defined.")
    print("\nTo visualize:")
    print("show_augmentations('path/to/image.jpg', train_transforms)")
else:
    print("PyTorch not available")

---

## Part 5: Training Loop

### 5.1 Complete Training Pipeline

In [None]:
if TORCH_AVAILABLE:
    def train_one_epoch(model, train_loader, criterion, optimizer, device):
        """Train for one epoch."""
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        from tqdm import tqdm
        pbar = tqdm(train_loader, desc='Training')
        
        for inputs, targets in pbar:
            inputs, targets = inputs.to(device), targets.to(device)
            
            # Zero gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            
            # Backward pass
            loss.backward()
            
            # Gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            
            # Update weights
            optimizer.step()
            
            # Statistics
            running_loss += loss.item() * inputs.size(0)
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            
            # Update progress bar
            pbar.set_postfix({
                'loss': running_loss / total,
                'acc': 100. * correct / total
            })
        
        epoch_loss = running_loss / total
        epoch_acc = correct / total
        
        return epoch_loss, epoch_acc
    
    def validate(model, val_loader, criterion, device):
        """Validate model."""
        model.eval()
        running_loss = 0.0
        correct = 0
        total = 0
        
        with torch.no_grad():
            for inputs, targets in val_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                
                running_loss += loss.item() * inputs.size(0)
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()
        
        val_loss = running_loss / total
        val_acc = correct / total
        
        return val_loss, val_acc
    
    def train_model(model, train_loader, val_loader, epochs=10, lr=0.001, device='cuda'):
        """Complete training pipeline."""
        model = model.to(device)
        
        # Loss and optimizer
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=0.01)
        
        # Learning rate scheduler
        scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)
        
        # Training history
        history = {
            'train_loss': [], 'train_acc': [],
            'val_loss': [], 'val_acc': []
        }
        
        best_val_acc = 0.0
        
        for epoch in range(epochs):
            print(f"\nEpoch {epoch+1}/{epochs}")
            print("-" * 50)
            
            # Train
            train_loss, train_acc = train_one_epoch(
                model, train_loader, criterion, optimizer, device
            )
            
            # Validate
            val_loss, val_acc = validate(model, val_loader, criterion, device)
            
            # Update scheduler
            scheduler.step()
            
            # Save history
            history['train_loss'].append(train_loss)
            history['train_acc'].append(train_acc)
            history['val_loss'].append(val_loss)
            history['val_acc'].append(val_acc)
            
            # Save best model
            if val_acc > best_val_acc:
                best_val_acc = val_acc
                torch.save(model.state_dict(), 'best_model.pth')
                print(f"✓ New best model saved (val_acc: {val_acc:.4f})")
            
            # Print epoch summary
            print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
            print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
            print(f"LR: {scheduler.get_last_lr()[0]:.6f}")
        
        return history
    
    print("Training pipeline defined.")
    print("\nUsage:")
    print("history = train_model(model, train_loader, val_loader, epochs=10)")
else:
    print("PyTorch not available")

### 5.2 Visualizing Training Progress

In [None]:
def plot_training_history(history):
    """Plot training and validation metrics."""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    epochs = range(1, len(history['train_loss']) + 1)
    
    # Plot loss
    ax1.plot(epochs, history['train_loss'], 'b-', label='Train Loss', linewidth=2)
    ax1.plot(epochs, history['val_loss'], 'r-', label='Val Loss', linewidth=2)
    ax1.set_xlabel('Epoch', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Training and Validation Loss', fontsize=14)
    ax1.legend(fontsize=10)
    ax1.grid(True, alpha=0.3)
    
    # Plot accuracy
    ax2.plot(epochs, [100*acc for acc in history['train_acc']], 
             'b-', label='Train Acc', linewidth=2)
    ax2.plot(epochs, [100*acc for acc in history['val_acc']], 
             'r-', label='Val Acc', linewidth=2)
    ax2.set_xlabel('Epoch', fontsize=12)
    ax2.set_ylabel('Accuracy (%)', fontsize=12)
    ax2.set_title('Training and Validation Accuracy', fontsize=14)
    ax2.legend(fontsize=10)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print best results
    best_val_acc = max(history['val_acc'])
    best_epoch = history['val_acc'].index(best_val_acc) + 1
    
    print(f"\nBest Validation Accuracy: {100*best_val_acc:.2f}% (Epoch {best_epoch})")

print("Plotting function defined.")
print("Usage: plot_training_history(history)")

---

## Part 6: Grad-CAM Visualization

### 6.1 Class Activation Mapping

In [None]:
if TORCH_AVAILABLE:
    class GradCAM:
        """Gradient-weighted Class Activation Mapping."""
        
        def __init__(self, model, target_layer):
            self.model = model
            self.target_layer = target_layer
            self.gradients = None
            self.activations = None
            
            # Register hooks
            target_layer.register_forward_hook(self.save_activation)
            target_layer.register_backward_hook(self.save_gradient)
        
        def save_activation(self, module, input, output):
            self.activations = output.detach()
        
        def save_gradient(self, module, grad_input, grad_output):
            self.gradients = grad_output[0].detach()
        
        def generate_cam(self, input_image, target_class=None):
            """
            Generate Grad-CAM heatmap.
            
            Args:
                input_image: Input tensor (1, C, H, W)
                target_class: Target class index (None = predicted class)
            
            Returns:
                cam: Heatmap (H, W)
            """
            # Forward pass
            output = self.model(input_image)
            
            # Get target class
            if target_class is None:
                target_class = output.argmax(dim=1)
            
            # Zero gradients
            self.model.zero_grad()
            
            # Backward pass
            output[0, target_class].backward()
            
            # Get weights (global average pooling of gradients)
            weights = self.gradients.mean(dim=(2, 3), keepdim=True)
            
            # Weighted combination of activation maps
            cam = (weights * self.activations).sum(dim=1, keepdim=True)
            
            # ReLU (only positive influence)
            cam = F.relu(cam)
            
            # Normalize
            cam = cam.squeeze()
            cam = cam - cam.min()
            cam = cam / cam.max()
            
            return cam.cpu().numpy()
    
    def visualize_gradcam(model, image_tensor, original_image, target_layer):
        """Visualize Grad-CAM on an image."""
        # Create Grad-CAM
        gradcam = GradCAM(model, target_layer)
        
        # Generate CAM
        cam = gradcam.generate_cam(image_tensor)
        
        # Resize CAM to match image
        cam_resized = cv2.resize(cam, (original_image.width, original_image.height))
        
        # Create heatmap
        heatmap = cv2.applyColorMap(np.uint8(255 * cam_resized), cv2.COLORMAP_JET)
        heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)
        
        # Overlay on original
        img_array = np.array(original_image)
        overlaid = cv2.addWeighted(img_array, 0.6, heatmap, 0.4, 0)
        
        # Plot
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
        
        axes[0].imshow(original_image)
        axes[0].set_title('Original Image')
        axes[0].axis('off')
        
        axes[1].imshow(heatmap)
        axes[1].set_title('Grad-CAM Heatmap')
        axes[1].axis('off')
        
        axes[2].imshow(overlaid)
        axes[2].set_title('Overlay')
        axes[2].axis('off')
        
        plt.tight_layout()
        plt.show()
    
    print("Grad-CAM visualization defined.")
    print("\nUsage:")
    print("# For ResNet, use layer4 as target layer")
    print("visualize_gradcam(model, image_tensor, original_image, model.layer4)")
else:
    print("PyTorch not available")

---

## 📝 Summary

### Key Concepts

1. **Convolutional Layers:**
   - Extract spatial features from images
   - Share weights across spatial locations
   - Build hierarchical representations

2. **CNN Architecture:**
   - Conv layers: Feature extraction
   - Pooling: Dimensionality reduction
   - Batch norm: Stabilize training
   - Fully connected: Classification

3. **Modern Architectures:**
   - VGG: Deep networks with small filters
   - ResNet: Skip connections enable very deep networks
   - Inception: Multi-scale feature extraction
   - EfficientNet: Compound scaling

4. **Transfer Learning:**
   - Use pre-trained weights
   - Freeze backbone, train classifier
   - Fine-tune entire network
   - Much faster than training from scratch

5. **Data Augmentation:**
   - Prevent overfitting
   - Increase effective dataset size
   - Improve generalization

6. **Visualization:**
   - Feature maps: What filters learn
   - Grad-CAM: Where model looks
   - Helps debug and interpret models

### Interview Questions

1. **Why use convolution instead of fully connected layers for images?**
   - Parameter sharing (fewer parameters)
   - Translation invariance
   - Preserve spatial structure

2. **What does pooling do and why is it useful?**
   - Reduces spatial dimensions
   - Provides translation invariance
   - Reduces computation

3. **Explain skip connections in ResNet.**
   - Add input directly to output: y = F(x) + x
   - Mitigates vanishing gradients
   - Easier to optimize (identity easy to learn)
   - Enables very deep networks (100+ layers)

4. **What is transfer learning and when should you use it?**
   - Use pre-trained model weights
   - Useful when limited data
   - Faster training, better performance
   - Fine-tune on your specific task

5. **How does Grad-CAM work?**
   - Use gradients to weight activation maps
   - Shows which regions influenced prediction
   - Helps interpret and debug models

### Next Steps

- **Advanced Topics:** See [DEEP_LEARNING_ARCHITECTURES.md](../DEEP_LEARNING_ARCHITECTURES.md)
- **Modern Techniques:** See [MODERN_ML_AI_TECHNIQUES_2024_2025.md](../MODERN_ML_AI_TECHNIQUES_2024_2025.md)
- **Projects:** 
  - Image classification on custom dataset
  - Object detection with YOLO
  - Image segmentation
  - Style transfer

---

**Congratulations!** You've mastered Computer Vision with CNNs. You now understand convolutions, modern architectures, transfer learning, and visualization techniques!

**Next:** [11 - MLOps & Production Deployment](./11_mlops_production.ipynb)