# üìö PyTorch Practice Notebook - Lecture 3: Convolutional Neural Networks

**Based on:** SAIR PyTorch Mastery - Lecture 3: Convolutional Neural Networks - Vision & Beyond

**Instructions:** Complete the exercises below to test your understanding of CNNs and computer vision with PyTorch. Try to solve them without looking at the original notebook first!

**Time Estimate:** 3-4 hours

## üÜï Enhanced Features:
- Mathematical foundation exercises
- Visualization and interpretation tasks
- Debugging CNN architectures
- Performance analysis
- Sudanese context applications

## üîß Setup & Imports

Run this cell first to set up your environment.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt
import time
import os
from pathlib import Path
from PIL import Image
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# For reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## üÜï NEW: Debugging Exercise 0 - Find the CNN Bugs!

**Task:** This CNN class has multiple bugs. Identify and fix them all.

In [None]:
# =========== BUGGY CNN - FIND AND FIX ALL BUGS! ===========
class BuggyCNN(nn.Module):
    """CNN with multiple bugs - fix them all!"""
    
    def __init__(self, num_classes=10):
        # BUG 1: Missing super().__init__()
        
        # BUG 2: Inconsistent channel sizes
        self.conv1 = nn.Conv2d(3, 32, kernel_size=5, padding=0)  # Will reduce size too much
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, padding=0)  # Inconsistent padding
        
        # BUG 3: Wrong calculation of flattened size
        # For 32x32 input with kernel=5, stride=1, padding=0:
        # conv1: 32 -> 28, conv2: 28 -> 24
        # After maxpool (2x): 24 -> 12
        # So flattened size should be 64 * 12 * 12 = 9216
        self.fc1 = nn.Linear(64 * 28 * 28, 256)  # Wrong!
        
        # BUG 4: Wrong number of classes parameter
        self.fc2 = nn.Linear(256, 100)  # Should be num_classes
        
        # BUG 5: Missing activation functions
        # Should add ReLU
        
        # BUG 6: Missing pooling layers
        
    def forward(self, x):
        # BUG 7: Wrong order of operations
        x = self.conv1(x)
        x = self.conv2(x)  # Should pool between convs
        
        # BUG 8: Wrong reshape dimensions
        batch_size = x.size(0)
        x = x.view(batch_size, -1)  # This is correct but after wrong flattening calc
        
        # BUG 9: Missing activation functions
        x = self.fc1(x)
        x = self.fc2(x)
        
        # BUG 10: No output activation for classification
        return x

# =========== YOUR FIXED VERSION ===========
class FixedCNN(nn.Module):
    """Your fixed version of the buggy CNN"""
    
    def __init__(self, num_classes=10):
        super().__init__()
        # TODO: Fix all bugs
        
    def forward(self, x):
        # TODO: Fix forward pass
        pass

# Test with a sample input
print("Testing Buggy CNN:")
test_input = torch.randn(4, 3, 32, 32)  # batch_size=4, RGB, 32x32
buggy_model = BuggyCNN()
try:
    output = buggy_model(test_input)
    print(f"Buggy output shape: {output.shape}")
except Exception as e:
    print(f"Buggy model error: {e}")

print("\nTesting Fixed CNN:")
fixed_model = FixedCNN()
# TODO: Test your fixed model

## üéØ Exercise 1: Mathematical Foundations & Manual Implementation

### Part A: Manual 2D Convolution

**Task:** Implement 2D convolution from scratch without using PyTorch's `nn.Conv2d`.

In [None]:
# =========== YOUR CODE HERE ===========
def manual_conv2d(image, kernel, stride=1, padding=0):
    """
    Perform 2D convolution manually.
    
    Args:
        image: 2D numpy array (H, W)
        kernel: 2D numpy array (kH, kW)
        stride: Stride value
        padding: Padding value
    
    Returns:
        output: 2D numpy array
    """
    # TODO: Add padding if specified
    
    # TODO: Get dimensions
    
    # TODO: Calculate output dimensions
    
    # TODO: Create output array
    
    # TODO: Perform convolution (sliding window)
    
    return output

# Test your implementation
print("Testing Manual Convolution:")

# Create test image
test_image = np.array([
    [1, 2, 3, 0, 1],
    [4, 5, 6, 1, 2],
    [7, 8, 9, 2, 3],
    [0, 1, 2, 3, 4],
    [1, 2, 3, 4, 5]
])

# Test with different kernels
identity_kernel = np.array([
    [0, 0, 0],
    [0, 1, 0],
    [0, 0, 0]
])

edge_kernel = np.array([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1]
])

blur_kernel = np.ones((3, 3)) / 9

# TODO: Test your function with different kernels
# Compare with PyTorch's implementation for verification
# =====================================================

### Part B: Convolution Mathematics - Shape Calculations

**Task:** Create a function that calculates output dimensions for convolutional layers.

In [None]:
# =========== YOUR CODE HERE ===========
def calculate_output_size(input_size, kernel_size, stride=1, padding=0, dilation=1):
    """
    Calculate output size for convolution.
    
    Formula: output = floor((input + 2*padding - dilation*(kernel-1) - 1) / stride) + 1
    
    Args:
        input_size: Input dimension (H or W)
        kernel_size: Kernel dimension
        stride: Stride value
        padding: Padding value
        dilation: Dilation value
    
    Returns:
        output_size: Calculated output dimension
    """
    # TODO: Implement the formula
    pass

def calculate_cnn_output_shape(input_shape, conv_layers):
    """
    Calculate final output shape after multiple convolutional layers.
    
    Args:
        input_shape: Tuple (C, H, W)
        conv_layers: List of dicts with layer parameters
            Example: [{'type': 'conv', 'out_channels': 32, 'kernel': 3, 'stride': 1, 'padding': 1},
                      {'type': 'pool', 'kernel': 2, 'stride': 2}]
    
    Returns:
        output_shape: Tuple (C, H, W)
    """
    # TODO: Track shape through layers
    pass

# Test cases
print("Test Cases for Shape Calculations:")
print("="*50)

# Test 1: Simple convolution
test1 = calculate_output_size(32, 3, stride=1, padding=1)
print(f"Test 1 - Input 32, kernel 3, stride 1, padding 1: {test1} (expected: 32)")

# Test 2: With pooling
test2 = calculate_output_size(32, 2, stride=2)  # MaxPool2d(2)
print(f"Test 2 - Input 32, kernel 2, stride 2 (pooling): {test2} (expected: 16)")

# Test 3: Complex CNN
layers = [
    {'type': 'conv', 'out_channels': 32, 'kernel': 3, 'stride': 1, 'padding': 1},
    {'type': 'pool', 'kernel': 2, 'stride': 2},
    {'type': 'conv', 'out_channels': 64, 'kernel': 3, 'stride': 1, 'padding': 1},
    {'type': 'pool', 'kernel': 2, 'stride': 2},
]

input_shape = (3, 32, 32)  # CIFAR-10
output_shape = calculate_cnn_output_shape(input_shape, layers)
print(f"\nComplex CNN shape calculation:")
print(f"Input shape: {input_shape}")
print(f"Output shape: {output_shape}")
print(f"Flattened size: {output_shape[0] * output_shape[1] * output_shape[2]}")
# =====================================================

### üÜï NEW: Part C: Kernel Visualization Challenge

**Task:** Create a visualization tool that shows what different kernels do to images.

In [None]:
class KernelVisualizer:
    """Visualize different convolution kernels"""
    
    def __init__(self):
        self.kernels = {
            'identity': np.array([[0, 0, 0], [0, 1, 0], [0, 0, 0]]),
            'edge_detection': np.array([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]]),
            'sobel_x': np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]),
            'sobel_y': np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]]),
            'sharpen': np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]),
            'box_blur': np.ones((3, 3)) / 9,
            'gaussian_blur': np.array([[1, 2, 1], [2, 4, 2], [1, 2, 1]]) / 16,
        }
    
    def visualize_kernels(self):
        """Visualize all kernels"""
        # TODO: Create a grid plot of all kernels
        pass
    
    def apply_to_image(self, image, kernel_name):
        """Apply kernel to image and show result"""
        # TODO: Apply convolution and show before/after
        pass
    
    def create_custom_kernel(self, weights):
        """Create and test a custom kernel"""
        # TODO: Allow user to create custom kernels
        pass

# Test the visualizer
print("Testing Kernel Visualizer:")
visualizer = KernelVisualizer()
visualizer.visualize_kernels()

## üèóÔ∏è Exercise 2: Building CNN Architectures

### Part A: Build SimpleCNN from Specifications

**Task:** Build a CNN based on these specifications:

**Requirements:**
1. Input: 32x32 RGB images (CIFAR-10)
2. Architecture:
   - Conv1: 32 filters, 3x3, padding=1 ‚Üí ReLU ‚Üí BatchNorm ‚Üí MaxPool(2)
   - Conv2: 64 filters, 3x3, padding=1 ‚Üí ReLU ‚Üí BatchNorm ‚Üí MaxPool(2)
   - Conv3: 128 filters, 3x3, padding=1 ‚Üí ReLU ‚Üí BatchNorm ‚Üí MaxPool(2)
   - Flatten
   - FC1: 256 units ‚Üí ReLU ‚Üí Dropout(0.3)
   - FC2: 128 units ‚Üí ReLU ‚Üí Dropout(0.3)
   - Output: 10 units (softmax in loss)

3. Total parameters should be less than 500,000

In [None]:
# =========== YOUR CODE HERE ===========
class SimpleCNN(nn.Module):
    """Your implementation of SimpleCNN"""
    
    def __init__(self, num_classes=10):
        super().__init__()
        # TODO: Implement the architecture
        
    def forward(self, x):
        # TODO: Implement forward pass
        pass
    
    def count_parameters(self):
        """Count total and trainable parameters"""
        total = sum(p.numel() for p in self.parameters())
        trainable = sum(p.numel() for p in self.parameters() if p.requires_grad)
        return total, trainable

# Test your model
print("Testing SimpleCNN:")
model = SimpleCNN()

# Test forward pass
test_input = torch.randn(4, 3, 32, 32)  # batch_size=4
output = model(test_input)
print(f"Input shape: {test_input.shape}")
print(f"Output shape: {output.shape}")

# Count parameters
total_params, trainable_params = model.count_parameters()
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")
print(f"Under 500,000? {total_params < 500000}")

# Print model architecture
print("\nModel Architecture:")
print(model)
# ===========================================

### Part B: Build LeNet (Adapted for CIFAR-10)

**Task:** Implement AlexLeNetNet architecture adapted for 32x32 images.

In [None]:
def __init__(self, num_classes=10):
    super().__init__()
    
    # TODO: Implement LeNet-5 architecture for CIFAR-10
    # Original LeNet-5 (for 32x32 images):
    # 1. Conv2d: 1 input channel ‚Üí 6 output channels, kernel=5x5
    # 2. Tanh activation
    # 3. AvgPool2d: kernel=2x2, stride=2
    # 4. Conv2d: 6 input channels ‚Üí 16 output channels, kernel=5x5
    # 5. Tanh activation
    # 6. AvgPool2d: kernel=2x2, stride=2
    # 7. Flatten
    # 8. Linear: ? features ‚Üí 120 units
    # 9. Tanh activation
    # 10. Linear: 120 ‚Üí 84 units
    # 11. Tanh activation
    # 12. Linear: 84 ‚Üí num_classes
    
    # Hints for CIFAR-10 adaptation:
    # - CIFAR-10 has 3 input channels (RGB) not 1 (grayscale)
    # - Input size is 32x32 (same as original LeNet paper)
    # - Need to calculate the flattened size after conv/pool layers
    
    # TODO: Define the convolutional layers (features extractor)
    self.features = nn.Sequential(
        # Layer 1
        # TODO: First convolutional layer
        
        # TODO: Activation function (Tanh)
        
        # TODO: Pooling layer
    )
    
    # TODO: Calculate the flattened size
    # After first conv (32x32 ‚Üí ?x?): 
    # output_size = (input_size + 2*padding - kernel_size) / stride + 1
    # After first pool: ?x? ‚Üí ?x?
    # After second conv: ?x? ‚Üí ?x?
    # After second pool: ?x? ‚Üí ?x?
    # flattened_size = channels * height * width
    
    # TODO: Define the fully connected layers (classifier)
    self.classifier = nn.Sequential(
        # TODO: Flatten layer
        
        # TODO: First fully connected layer
        
        # TODO: Activation function
        
        # TODO: Second fully connected layer
        
        # TODO: Activation function
        
        # TODO: Output layer
    )

def forward(self, x):
    # TODO: Implement the forward pass
    # Hint: Pass through features, then classifier
    pass

## üîÑ Exercise 3: Complete Training & Evaluation


**Task:** Write a complete training loop for cats vs dogs with proper validation and monitoring.

In [None]:
# Load cats vs dogs dataset
# pre[are the dataset and dataloader 
# defined the model 
# train and evaluate the model 
# plot the results 
# and save the model 
# [BONUS] test the model on new images + visualize feature maps

## üß™ Challenge Problems

### Challenge 1: Optimize CNN for Mobile Deployment

**Task:** Create a lightweight CNN for mobile deployment in Sudanese farms.

In [None]:
class MobileSudaneseCNN(nn.Module):
    """Lightweight CNN for mobile deployment"""
    
    def __init__(self, num_classes=5):
        super().__init__()
        
        # TODO: Design a CNN with:
        # 1. Less than 100,000 parameters
        # 2. Fast inference on mobile CPU
        # 3. Good accuracy for crop classification
        
        # Techniques to consider:
        # - Depthwise separable convolutions
        # - Bottleneck layers
        # - Reduced channel counts
        # - Efficient activation functions
        
    def forward(self, x):
        pass
    
    def benchmark(self, input_size=(1, 3, 224, 224)):
        """Benchmark model performance"""
        # TODO: Measure parameters, FLOPs, inference time
        pass

print("Testing Mobile CNN:")
mobile_cnn = MobileSudaneseCNN()
mobile_cnn.benchmark()

### Challenge 2: Multi-Task CNN

**Task:** Create a CNN that performs multiple tasks for Sudanese agriculture.

In [None]:
class MultiTaskAgricultureCNN(nn.Module):
    """CNN for multiple agricultural tasks"""
    
    def __init__(self):
        super().__init__()
        
        # Shared backbone
        self.backbone = nn.Sequential(
            # TODO: Shared convolutional layers
        )
        
        # Task-specific heads
        self.crop_classifier = nn.Sequential(
            # TODO: Classify crop type (5 classes)
        )
        
        self.health_classifier = nn.Sequential(
            # TODO: Classify health status (3 classes)
        )
        
        self.yield_regressor = nn.Sequential(
            # TODO: Predict yield (continuous value)
        )
        
    def forward(self, x):
        features = self.backbone(x)
        
        crop_pred = self.crop_classifier(features)
        health_pred = self.health_classifier(features)
        yield_pred = self.yield_regressor(features)
        
        return {
            'crop_type': crop_pred,
            'health_status': health_pred,
            'yield': yield_pred
        }
    
    def multi_task_loss(self, predictions, targets):
        """Compute combined loss for all tasks"""
        # TODO: Weighted combination of:
        # 1. Cross-entropy for crop classification
        # 2. Cross-entropy for health classification
        # 3. MSE for yield regression
        pass

print("Testing Multi-Task CNN:")
# TODO: Implement and test

## üìä Assessment Questions

Answer these questions in markdown cells:

### Q1: What's the key difference between a dense layer and a convolutional layer? When would you use each?

### Q2: How does padding affect convolution output size and feature learning?

### Q3: Why do CNNs use small kernel sizes (3x3) instead of large ones?

### Q4: What's the purpose of pooling layers in CNNs? What are the trade-offs between max pooling and average pooling?

### Q5: How does BatchNorm help with CNN training? Why does it behave differently during training vs inference?

### Q6: Explain the concept of "receptive field" in CNNs. How does it change through the network?

### Q7: What's the difference between Conv1D, Conv2D, and Conv3D? Give real-world examples for each.

### üÜï Q8: Design a CNN architecture for classifying Sudanese traditional clothing. What considerations would you make?

### üÜï Q9: How would you optimize a CNN for deployment on mobile phones in rural Sudan?

### üÜï Q10: Create a debugging checklist for when your CNN isn't learning (low accuracy).




**You're ready for Lecture 4: Transfer Learning & Advanced Architectures!** üéâ

## üí° Tips for Success

1. **Start Simple**: Begin with manual convolution, then use PyTorch layers
2. **Visualize Everything**: Use the visualization tools to understand what's happening
3. **Test Shapes**: Always print tensor shapes between layers
4. **Consider Sudanese Context**: Think about real applications in Sudan
5. **Benchmark**: Compare different architectures and techniques
6. **üÜï Debug Systematically**: When something doesn't work, check shapes, devices, gradients
7. **üÜï Think About Deployment**: Consider computational constraints

## ü§ù Need Help?

- Review Lecture 3 notebook for concepts
- Use PyTorch documentation for specific APIs
- Test with small examples first
- Visualize intermediate results
- üÜï Create minimal reproducible examples when debugging
- üÜï Benchmark different approaches to find optimal solutions

### Very Important Note:
# Go to Chapter 11 of Hands On Machine Learning with sklearn and PyTorch by Aur√©lien G√©ron.and solve the exercises at the end of the chapter.and add it in this notebook as well.