<a href="https://colab.research.google.com/github/your-username/pytorch-for-deeplearning/blob/main/notebooks/04_convolutional_neural_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 4: Convolutional Neural Networks

This notebook explores CNNs - the foundation of modern computer vision.

## Learning Objectives
- Understand convolutional layers and their parameters
- Learn about pooling operations
- Build complete CNN architectures
- Train a CNN on image classification
- Visualize learned features

## Setup and Installation

In [None]:
# Install and import necessary libraries
try:
    import torch
    print(f"PyTorch version: {torch.__version__}")
except ImportError:
    !pip install torch torchvision torchaudio
    import torch
    print(f"PyTorch installed. Version: {torch.__version__}")

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

import torchvision
import torchvision.transforms as transforms

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, classification_report

# Set device and random seed
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
torch.manual_seed(42)
np.random.seed(42)

# Set matplotlib style
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Convolution Operation Basics

Understanding how convolution works at the fundamental level.

In [None]:
print("=== Basic Convolution Operation ===")

# Create a simple 5x5 input
input_2d = torch.tensor([
    [1, 2, 3, 0, 1],
    [0, 1, 2, 3, 1],
    [1, 0, 1, 2, 3],
    [2, 1, 0, 1, 2],
    [1, 2, 1, 0, 1]
], dtype=torch.float32)

print(f"Input (5x5):")
print(input_2d)

# Add batch and channel dimensions: (batch_size, channels, height, width)
input_4d = input_2d.unsqueeze(0).unsqueeze(0)  # Shape: (1, 1, 5, 5)
print(f"\nInput shape: {input_4d.shape}")

# Create different kernels
kernels = {
    'identity': torch.tensor([[0, 0, 0], [0, 1, 0], [0, 0, 0]]),
    'edge_horizontal': torch.tensor([[-1, -1, -1], [0, 0, 0], [1, 1, 1]]),
    'edge_vertical': torch.tensor([[-1, 0, 1], [-1, 0, 1], [-1, 0, 1]]),
    'sharpen': torch.tensor([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
}

# Apply convolutions
results = {}
for name, kernel in kernels.items():
    # Add dimensions to kernel: (out_channels, in_channels, height, width)
    kernel_4d = kernel.unsqueeze(0).unsqueeze(0).float()
    
    # Apply convolution
    output = F.conv2d(input_4d, kernel_4d, padding=0)
    results[name] = output.squeeze()  # Remove batch and channel dims
    
    print(f"\n{name.title()} kernel:")
    print(kernel)
    print(f"Output (3x3):")
    print(results[name])

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Original input
im0 = axes[0].imshow(input_2d.numpy(), cmap='viridis')
axes[0].set_title('Original Input', fontsize=14, fontweight='bold')
plt.colorbar(im0, ax=axes[0])

# Kernel outputs
for i, (name, output) in enumerate(results.items(), 1):
    im = axes[i].imshow(output.numpy(), cmap='viridis')
    axes[i].set_title(f'{name.title()} Filter', fontsize=14, fontweight='bold')
    plt.colorbar(im, ax=axes[i])

# Remove empty subplot
axes[5].remove()

plt.tight_layout()
plt.show()

## 2. Convolutional Layers

Building and understanding Conv2d layers.

In [None]:
print("=== Convolutional Layers ===")

# Create different convolutional layers
conv_configs = [
    {'in_channels': 3, 'out_channels': 16, 'kernel_size': 3, 'stride': 1, 'padding': 1},
    {'in_channels': 3, 'out_channels': 16, 'kernel_size': 5, 'stride': 1, 'padding': 2},
    {'in_channels': 3, 'out_channels': 16, 'kernel_size': 3, 'stride': 2, 'padding': 1},
    {'in_channels': 3, 'out_channels': 32, 'kernel_size': 3, 'stride': 1, 'padding': 0},
]

# Create sample input (batch_size=4, channels=3, height=32, width=32)
sample_input = torch.randn(4, 3, 32, 32)
print(f"Input shape: {sample_input.shape}")

for i, config in enumerate(conv_configs):
    conv = nn.Conv2d(**config)
    output = conv(sample_input)
    
    print(f"\nConv {i+1} - {config}")
    print(f"  Parameter count: {sum(p.numel() for p in conv.parameters()):,}")
    print(f"  Weight shape: {conv.weight.shape}")
    print(f"  Bias shape: {conv.bias.shape if conv.bias is not None else 'None'}")
    print(f"  Output shape: {output.shape}")
    
    # Calculate expected output size
    h_out = (32 + 2*config['padding'] - config['kernel_size']) // config['stride'] + 1
    w_out = (32 + 2*config['padding'] - config['kernel_size']) // config['stride'] + 1
    print(f"  Expected spatial: {h_out}×{w_out}")

## 3. Pooling Operations

Understanding max pooling, average pooling, and adaptive pooling.

In [None]:
print("=== Pooling Operations ===")

# Create sample feature map
feature_map = torch.randn(1, 64, 16, 16)  # (batch, channels, height, width)
print(f"Input feature map shape: {feature_map.shape}")

# Different pooling operations
pooling_ops = {
    'MaxPool2d(2, 2)': nn.MaxPool2d(kernel_size=2, stride=2),
    'MaxPool2d(3, 2)': nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
    'AvgPool2d(2, 2)': nn.AvgPool2d(kernel_size=2, stride=2),
    'AdaptiveMaxPool2d(8, 8)': nn.AdaptiveMaxPool2d((8, 8)),
    'AdaptiveAvgPool2d(1, 1)': nn.AdaptiveAvgPool2d((1, 1)),  # Global pooling
}

for name, pool_op in pooling_ops.items():
    output = pool_op(feature_map)
    reduction = feature_map.numel() / output.numel()
    print(f"\n{name}:")
    print(f"  Output shape: {output.shape}")
    print(f"  Size reduction: {reduction:.1f}x")

# Demonstrate pooling on actual values
print("\n=== Pooling Example on 4x4 input ===")
small_input = torch.tensor([
    [[[ 1,  2,  3,  4],
      [ 5,  6,  7,  8],
      [ 9, 10, 11, 12],
      [13, 14, 15, 16]]]
], dtype=torch.float32)

print(f"Input (1x1x4x4):")
print(small_input.squeeze().int())

# Max pooling 2x2
max_pool = nn.MaxPool2d(2, 2)
max_output = max_pool(small_input)
print(f"\nMax pool 2x2 output (1x1x2x2):")
print(max_output.squeeze().int())

# Average pooling 2x2
avg_pool = nn.AvgPool2d(2, 2)
avg_output = avg_pool(small_input)
print(f"\nAvg pool 2x2 output (1x1x2x2):")
print(avg_output.squeeze())

## 4. CNN Architecture Design

Building complete CNN models from scratch.

In [None]:
class SimpleCNN(nn.Module):
    """A simple CNN for CIFAR-10 classification"""
    
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        
        # Convolutional layers
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # 32x32 -> 16x16
        )
        
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # 16x16 -> 8x8
        )
        
        self.conv_block3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2)  # 8x8 -> 4x4
        )
        
        # Classifier
        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d((1, 1)),  # Global average pooling
            nn.Flatten(),
            nn.Dropout(0.5),
            nn.Linear(128, 64),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Linear(64, num_classes)
        )
    
    def forward(self, x):
        # Feature extraction
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = self.conv_block3(x)
        
        # Classification
        x = self.classifier(x)
        return x

# Create and analyze the model
model = SimpleCNN(num_classes=10).to(device)
print("=== Simple CNN Architecture ===")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Test forward pass
sample_input = torch.randn(4, 3, 32, 32).to(device)  # CIFAR-10 input size
with torch.no_grad():
    output = model(sample_input)
    
print(f"\nSample input shape: {sample_input.shape}")
print(f"Model output shape: {output.shape}")
print(f"Output (logits): {output[0]}")

## 5. Visualizing Intermediate Features

Understanding what CNNs learn at different layers.

In [None]:
# Create a hook to capture intermediate activations
class FeatureExtractor:
    def __init__(self, model, layer_name):
        self.features = None
        self.hook = self._get_layer(model, layer_name).register_forward_hook(self.hook_fn)
    
    def _get_layer(self, model, layer_name):
        """Get layer by name"""
        for name, layer in model.named_modules():
            if name == layer_name:
                return layer
        raise ValueError(f"Layer {layer_name} not found")
    
    def hook_fn(self, module, input, output):
        self.features = output.detach()
    
    def close(self):
        self.hook.remove()

# Create sample data (a simple synthetic image)
def create_sample_image():
    """Create a simple synthetic image with patterns"""
    img = torch.zeros(1, 3, 32, 32)
    
    # Add some patterns
    # Vertical lines in red channel
    img[0, 0, :, [5, 15, 25]] = 1.0
    
    # Horizontal lines in green channel
    img[0, 1, [8, 16, 24], :] = 1.0
    
    # Diagonal pattern in blue channel
    for i in range(min(32, 32)):
        img[0, 2, i, i] = 1.0
        if i < 31:
            img[0, 2, i, i+1] = 0.5
            img[0, 2, i+1, i] = 0.5
    
    return img

# Create sample input
sample_img = create_sample_image().to(device)

print("=== Feature Visualization ===")
print(f"Sample image shape: {sample_img.shape}")

# Extract features from different layers
layer_names = ['conv_block1.0', 'conv_block2.0', 'conv_block3.0']  # Conv layers
extractors = []
features = {}

try:
    # Set up feature extractors
    for layer_name in layer_names:
        extractor = FeatureExtractor(model, layer_name)
        extractors.append(extractor)
    
    # Forward pass to extract features
    model.eval()
    with torch.no_grad():
        _ = model(sample_img)
    
    # Collect features
    for i, extractor in enumerate(extractors):
        features[layer_names[i]] = extractor.features
        print(f"Features from {layer_names[i]}: {extractor.features.shape}")
    
    # Visualize original image and features
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    
    # Original image (convert to displayable format)
    img_display = sample_img[0].permute(1, 2, 0).cpu().numpy()
    axes[0, 0].imshow(img_display)
    axes[0, 0].set_title('Original Image', fontweight='bold')
    axes[0, 0].axis('off')
    
    # Show first few feature maps from each layer
    for i, (layer_name, feat) in enumerate(features.items()):
        if i < 3:  # Show first 3 layers
            # Show first feature map
            feature_map = feat[0, 0].cpu().numpy()  # First batch, first channel
            axes[0, i+1].imshow(feature_map, cmap='viridis')
            axes[0, i+1].set_title(f'{layer_name}\n(1st feature map)', fontweight='bold')
            axes[0, i+1].axis('off')
            
            # Show another feature map if available
            if feat.shape[1] > 1:
                feature_map2 = feat[0, min(1, feat.shape[1]-1)].cpu().numpy()
                axes[1, i+1].imshow(feature_map2, cmap='viridis')
                axes[1, i+1].set_title(f'{layer_name}\n(2nd feature map)', fontweight='bold')
                axes[1, i+1].axis('off')
    
    # Remove empty subplots
    axes[1, 0].remove()
    
    plt.tight_layout()
    plt.show()

finally:
    # Clean up hooks
    for extractor in extractors:
        extractor.close()

print("\nObservations:")
print("- Early layers detect simple features (edges, patterns)")
print("- Deeper layers combine simpler features into complex patterns")
print("- Feature maps become smaller but more numerous with depth")

## 6. Training a CNN

Complete training example on CIFAR-10 dataset.

In [None]:
# Download CIFAR-10 dataset
print("=== Downloading CIFAR-10 Dataset ===")

# Data transforms
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

# Download datasets
try:
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                          download=True, transform=transform_train)
    testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                         download=True, transform=transform_test)
    
    # Create data loaders
    trainloader = DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2)
    testloader = DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)
    
    print(f"Training samples: {len(trainset)}")
    print(f"Test samples: {len(testset)}")
    print(f"Number of classes: {len(trainset.classes)}")
    print(f"Classes: {trainset.classes}")
    
    # Show sample images
    dataiter = iter(trainloader)
    images, labels = next(dataiter)
    
    # Denormalize for display
    mean = torch.tensor([0.485, 0.456, 0.406]).view(1, 3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(1, 3, 1, 1)
    images_denorm = images * std + mean
    images_denorm = torch.clamp(images_denorm, 0, 1)
    
    # Plot sample images
    fig, axes = plt.subplots(2, 8, figsize=(16, 4))
    for i in range(16):
        row, col = i // 8, i % 8
        img = images_denorm[i].permute(1, 2, 0)
        axes[row, col].imshow(img)
        axes[row, col].set_title(f'{trainset.classes[labels[i]]}')
        axes[row, col].axis('off')
    
    plt.tight_layout()
    plt.show()

except Exception as e:
    print(f"Could not download CIFAR-10: {e}")
    print("Creating synthetic data for demonstration...")
    
    # Create synthetic data if CIFAR-10 download fails
    train_data = torch.randn(1000, 3, 32, 32)
    train_labels = torch.randint(0, 10, (1000,))
    test_data = torch.randn(200, 3, 32, 32)
    test_labels = torch.randint(0, 10, (200,))
    
    trainloader = DataLoader(TensorDataset(train_data, train_labels), 
                           batch_size=32, shuffle=True)
    testloader = DataLoader(TensorDataset(test_data, test_labels), 
                          batch_size=32, shuffle=False)
    
    print("Using synthetic data for demonstration")

In [None]:
# Training function
def train_model(model, trainloader, testloader, num_epochs=5):
    """Train the CNN model"""
    
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
    
    # Training history
    train_losses = []
    train_accs = []
    test_accs = []
    
    print(f"Starting training for {num_epochs} epochs...")
    print(f"Device: {device}")
    
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        running_loss = 0.0
        correct_train = 0
        total_train = 0
        
        for i, (inputs, labels) in enumerate(trainloader):
            inputs, labels = inputs.to(device), labels.to(device)
            
            # Zero gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            
            # Backward pass
            loss.backward()
            optimizer.step()
            
            # Statistics
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total_train += labels.size(0)
            correct_train += (predicted == labels).sum().item()
            
            # Print progress
            if i % 200 == 199:
                print(f'  Batch [{i+1:4d}] Loss: {running_loss/200:.4f}')
                running_loss = 0.0
        
        # Calculate training accuracy
        train_acc = 100 * correct_train / total_train
        train_accs.append(train_acc)
        
        # Validation phase
        model.eval()
        correct_test = 0
        total_test = 0
        
        with torch.no_grad():
            for inputs, labels in testloader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                _, predicted = torch.max(outputs, 1)
                total_test += labels.size(0)
                correct_test += (predicted == labels).sum().item()
        
        test_acc = 100 * correct_test / total_test
        test_accs.append(test_acc)
        
        # Update learning rate
        scheduler.step()
        
        print(f'Epoch [{epoch+1}/{num_epochs}] '
              f'Train Acc: {train_acc:.2f}% '
              f'Test Acc: {test_acc:.2f}% '
              f'LR: {scheduler.get_last_lr()[0]:.6f}')
    
    return train_accs, test_accs

# Create and train model
model = SimpleCNN(num_classes=10).to(device)
train_accs, test_accs = train_model(model, trainloader, testloader, num_epochs=3)

print("\n=== Training Complete ===")
print(f"Final train accuracy: {train_accs[-1]:.2f}%")
print(f"Final test accuracy: {test_accs[-1]:.2f}%")

## 7. Practice Exercises

In [None]:
# Exercise 1: Design a deeper CNN
print("Exercise 1: Deeper CNN Architecture")

class DeeperCNN(nn.Module):
    """A deeper CNN with residual-like connections"""
    
    def __init__(self, num_classes=10):
        super(DeeperCNN, self).__init__()
        
        # Initial conv
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )
        
        # Conv blocks with increasing channels
        self.conv2 = self._make_layer(64, 64, 2)
        self.conv3 = self._make_layer(64, 128, 2, stride=2)  # Downsample
        self.conv4 = self._make_layer(128, 256, 2, stride=2)  # Downsample
        self.conv5 = self._make_layer(256, 512, 2, stride=2)  # Downsample
        
        # Global average pooling + classifier
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)
        self.dropout = nn.Dropout(0.5)
        
    def _make_layer(self, in_channels, out_channels, num_blocks, stride=1):
        layers = []
        # First block (might downsample)
        layers.append(nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1))
        layers.append(nn.BatchNorm2d(out_channels))
        layers.append(nn.ReLU(inplace=True))
        
        # Remaining blocks
        for _ in range(1, num_blocks):
            layers.append(nn.Conv2d(out_channels, out_channels, 3, padding=1))
            layers.append(nn.BatchNorm2d(out_channels))
            layers.append(nn.ReLU(inplace=True))
        
        return nn.Sequential(*layers)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.dropout(x)
        x = self.fc(x)
        
        return x

# Analyze the deeper model
deeper_model = DeeperCNN().to(device)
print(f"Deeper CNN parameters: {sum(p.numel() for p in deeper_model.parameters()):,}")

# Test forward pass
with torch.no_grad():
    test_input = torch.randn(2, 3, 32, 32).to(device)
    output = deeper_model(test_input)
    print(f"Input shape: {test_input.shape}")
    print(f"Output shape: {output.shape}")

print("\nComparison:")
simple_params = sum(p.numel() for p in model.parameters())
deeper_params = sum(p.numel() for p in deeper_model.parameters())
print(f"Simple CNN:  {simple_params:,} parameters")
print(f"Deeper CNN:  {deeper_params:,} parameters")
print(f"Ratio: {deeper_params/simple_params:.1f}x more parameters")

In [None]:
# Exercise 2: Kernel visualization
print("Exercise 2: Learned Kernel Visualization")

def visualize_kernels(model, layer_name, num_kernels=8):
    """Visualize learned kernels from a conv layer"""
    
    # Get the layer
    layer = None
    for name, module in model.named_modules():
        if name == layer_name and isinstance(module, nn.Conv2d):
            layer = module
            break
    
    if layer is None:
        print(f"Layer {layer_name} not found or not a Conv2d layer")
        return
    
    # Get kernels
    kernels = layer.weight.data.cpu()  # Shape: (out_channels, in_channels, H, W)
    print(f"Kernel shape: {kernels.shape}")
    
    # Normalize kernels for visualization
    kernels_norm = (kernels - kernels.min()) / (kernels.max() - kernels.min())
    
    # Plot first few kernels
    num_show = min(num_kernels, kernels.shape[0])
    fig, axes = plt.subplots(2, 4, figsize=(12, 6))
    axes = axes.flatten()
    
    for i in range(num_show):
        kernel = kernels_norm[i]
        
        if kernel.shape[0] == 3:  # RGB channels
            # Show as RGB image
            kernel_rgb = kernel.permute(1, 2, 0)
            axes[i].imshow(kernel_rgb)
        else:
            # Show first channel as grayscale
            axes[i].imshow(kernel[0], cmap='gray')
        
        axes[i].set_title(f'Kernel {i+1}')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

# Visualize kernels from the first layer
print("First layer kernels (should detect edges, colors, etc.):")
try:
    visualize_kernels(model, 'conv_block1.0', num_kernels=8)
except Exception as e:
    print(f"Could not visualize kernels: {e}")
    print("This is normal - the model needs to be trained to learn meaningful kernels")

# Show kernel statistics
print("\nKernel Statistics:")
for name, module in model.named_modules():
    if isinstance(module, nn.Conv2d):
        weights = module.weight.data
        print(f"{name:15} - Shape: {tuple(weights.shape)}, "
              f"Mean: {weights.mean().item():.4f}, "
              f"Std: {weights.std().item():.4f}")

## Summary

In this notebook, we covered:

1. **Convolution Basics**: Understanding how convolution operations work
2. **Conv2d Layers**: Parameters, input/output shapes, and configurations
3. **Pooling Operations**: Max pooling, average pooling, and adaptive pooling
4. **CNN Architecture**: Building complete convolutional neural networks
5. **Feature Visualization**: Understanding what CNNs learn at different layers
6. **Training Example**: Complete training pipeline on CIFAR-10
7. **Advanced Architectures**: Deeper networks and kernel visualization

### Key Concepts
- **Translation Invariance**: CNNs detect features regardless of position
- **Parameter Sharing**: Same kernel applied across spatial dimensions
- **Hierarchical Features**: Simple to complex feature detection
- **Spatial Reduction**: Pooling reduces spatial dimensions while preserving information

### CNN Design Principles
1. **Early Layers**: Detect low-level features (edges, textures)
2. **Middle Layers**: Combine features into patterns and shapes
3. **Deep Layers**: High-level semantic features
4. **Pooling**: Gradually reduce spatial dimensions
5. **Channels**: Increase feature maps as you go deeper

### Best Practices
- Use batch normalization for stable training
- Apply data augmentation to prevent overfitting
- Use appropriate pooling strategies
- Consider residual connections for very deep networks
- Monitor both training and validation accuracy

### Next Steps
- Experiment with different CNN architectures (ResNet, DenseNet, etc.)
- Try transfer learning with pre-trained models
- Explore advanced techniques like attention mechanisms
- Apply CNNs to different domains (medical imaging, satellite imagery, etc.)
- Move on to the next notebook: Advanced Neural Network Architectures