# Lab 3: Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are the backbone of modern computer vision. In this lab, you'll learn how CNNs process images, build classic architectures, and apply transfer learning.

## Learning Objectives

By the end of this lab, you will:
- Understand convolution operations and feature maps
- Build CNNs from scratch
- Implement famous architectures (LeNet, VGG, ResNet)
- Apply transfer learning with pre-trained models
- Use data augmentation for better generalization
- Visualize CNN features and activations
- Build custom image classifiers

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import torch
import torch.nn as nn
import torch.nn.functional as F

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
np.random.seed(42)
tf.random.set_seed(42)

## Part 1: Understanding Convolutions

A **convolution** slides a filter (kernel) over an image to produce a feature map.

### Operation:
For a 2D convolution with filter $W$ and input $X$:
$$Y[i,j] = \sum_m \sum_n X[i+m, j+n] \cdot W[m,n] + b$$

### Key Parameters:
- **Kernel size**: Size of the filter (e.g., 3×3, 5×5)
- **Stride**: Step size when sliding the filter
- **Padding**: Adding borders to preserve spatial dimensions
- **Channels**: Number of filters (output feature maps)

In [None]:
# Visualize convolution operation
from scipy.signal import convolve2d

# Load sample image
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
sample_image = X_train[0].astype(float) / 255.0

# Define filters
filters = {
    'Vertical Edge': np.array([[-1, 0, 1], [-1, 0, 1], [-1, 0, 1]]),
    'Horizontal Edge': np.array([[-1, -1, -1], [0, 0, 0], [1, 1, 1]]),
    'Blur': np.ones((3, 3)) / 9,
    'Sharpen': np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]])
}

# Apply filters
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(sample_image, cmap='gray')
axes[0, 0].set_title('Original Image')
axes[0, 0].axis('off')

for idx, (name, kernel) in enumerate(filters.items(), 1):
    filtered = convolve2d(sample_image, kernel, mode='same', boundary='symm')
    ax = axes[idx // 3, idx % 3]
    ax.imshow(filtered, cmap='gray')
    ax.set_title(f'{name} Detection')
    ax.axis('off')

axes[1, 2].axis('off')
plt.tight_layout()
plt.show()

print("Convolution filters detect different features:")
print("- Edge detectors find boundaries")
print("- Blur smooths the image")
print("- Sharpen enhances details")

## Part 2: CNN Architecture Components

### Typical CNN Architecture:
1. **Convolutional layers**: Extract features
2. **Activation (ReLU)**: Non-linearity
3. **Pooling layers**: Downsample, reduce parameters
4. **Fully connected layers**: Classification

### Pooling Operations:
- **Max Pooling**: Takes maximum value in region
- **Average Pooling**: Takes average value
- **Global Average Pooling**: One value per feature map

In [None]:
# Simple CNN for MNIST
model_simple_cnn = keras.Sequential([
    # First conv block
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    
    # Second conv block
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Third conv block
    layers.Conv2D(64, (3, 3), activation='relu'),
    
    # Dense layers
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model_simple_cnn.summary()

In [None]:
# Prepare MNIST data for CNN
X_train_cnn = X_train.reshape(-1, 28, 28, 1) / 255.0
X_test_cnn = X_test.reshape(-1, 28, 28, 1) / 255.0

# Compile and train
model_simple_cnn.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

history_cnn = model_simple_cnn.fit(
    X_train_cnn, y_train,
    batch_size=128,
    epochs=10,
    validation_split=0.1,
    verbose=1
)

# Evaluate
test_loss, test_acc = model_simple_cnn.evaluate(X_test_cnn, y_test, verbose=0)
print(f"\nTest Accuracy: {test_acc:.4f}")

## Part 3: Famous CNN Architectures

### LeNet-5 (1998)
- First successful CNN
- Handwritten digit recognition
- 7 layers

### AlexNet (2012)
- Won ImageNet competition
- Deeper network with ReLU
- Dropout for regularization

### VGG (2014)
- Very deep (16-19 layers)
- Small 3×3 filters throughout
- Simple, uniform architecture

### ResNet (2015)
- Residual connections
- Enables training of 100+ layer networks
- Skip connections solve vanishing gradients

In [None]:
# LeNet-5 implementation
def create_lenet():
    model = keras.Sequential([
        layers.Conv2D(6, kernel_size=(5, 5), activation='tanh', input_shape=(28, 28, 1)),
        layers.AveragePooling2D(pool_size=(2, 2)),
        layers.Conv2D(16, kernel_size=(5, 5), activation='tanh'),
        layers.AveragePooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dense(120, activation='tanh'),
        layers.Dense(84, activation='tanh'),
        layers.Dense(10, activation='softmax')
    ])
    return model

# VGG-like (simplified)
def create_vgg_style():
    model = keras.Sequential([
        # Block 1
        layers.Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(28, 28, 1)),
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        # Block 2
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        # Dense layers
        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(10, activation='softmax')
    ])
    return model

# ResNet-style block
def resnet_block(x, filters, kernel_size=3, stride=1):
    # Main path
    fx = layers.Conv2D(filters, kernel_size, strides=stride, padding='same')(x)
    fx = layers.BatchNormalization()(fx)
    fx = layers.Activation('relu')(fx)
    fx = layers.Conv2D(filters, kernel_size, padding='same')(fx)
    fx = layers.BatchNormalization()(fx)
    
    # Shortcut
    if stride != 1 or x.shape[-1] != filters:
        x = layers.Conv2D(filters, 1, strides=stride, padding='same')(x)
        x = layers.BatchNormalization()(x)
    
    # Add and activate
    out = layers.Add()([fx, x])
    out = layers.Activation('relu')(out)
    return out

def create_resnet_style():
    inputs = keras.Input(shape=(28, 28, 1))
    
    x = layers.Conv2D(64, 3, padding='same')(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Activation('relu')(x)
    
    x = resnet_block(x, 64)
    x = resnet_block(x, 128, stride=2)
    x = resnet_block(x, 256, stride=2)
    
    x = layers.GlobalAveragePooling2D()(x)
    outputs = layers.Dense(10, activation='softmax')(x)
    
    return keras.Model(inputs=inputs, outputs=outputs)

print("Created architectures: LeNet, VGG-style, and ResNet-style")

In [None]:
# Compare architectures
architectures = {
    'LeNet': create_lenet(),
    'VGG-style': create_vgg_style(),
    'ResNet-style': create_resnet_style()
}

results = {}

for name, model in architectures.items():
    print(f"\nTraining {name}...")
    
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    history = model.fit(
        X_train_cnn[:10000], y_train[:10000],  # Subset for speed
        batch_size=128,
        epochs=5,
        validation_split=0.2,
        verbose=0
    )
    
    results[name] = history
    
    # Count parameters
    params = model.count_params()
    print(f"{name}: {params:,} parameters, Final val acc: {history.history['val_accuracy'][-1]:.4f}")

## Part 4: Data Augmentation

**Data augmentation** artificially increases training data by applying transformations:
- Rotation
- Translation
- Scaling
- Flipping
- Color jittering
- Cropping

### Benefits:
- Better generalization
- Reduces overfitting
- Acts as regularization

In [None]:
# Create data augmentation pipeline
data_augmentation = keras.Sequential([
    layers.RandomRotation(0.1),
    layers.RandomTranslation(0.1, 0.1),
    layers.RandomZoom(0.1),
])

# Visualize augmentations
sample = X_train_cnn[0:1]

fig, axes = plt.subplots(2, 5, figsize=(15, 6))

for i, ax in enumerate(axes.ravel()):
    if i == 0:
        augmented = sample
        title = 'Original'
    else:
        augmented = data_augmentation(sample, training=True)
        title = f'Augmented {i}'
    
    ax.imshow(augmented[0, :, :, 0], cmap='gray')
    ax.set_title(title)
    ax.axis('off')

plt.tight_layout()
plt.show()

In [None]:
# Model with augmentation
model_with_aug = keras.Sequential([
    data_augmentation,
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model_with_aug.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

history_aug = model_with_aug.fit(
    X_train_cnn, y_train,
    batch_size=128,
    epochs=10,
    validation_split=0.1,
    verbose=1
)

print("Data augmentation helps prevent overfitting!")

## Part 5: Transfer Learning

**Transfer learning** uses pre-trained models as starting points:

### Strategies:
1. **Feature extraction**: Freeze pre-trained layers, train only new layers
2. **Fine-tuning**: Unfreeze some layers and train with small learning rate

### Benefits:
- Faster training
- Better performance with less data
- Leverage knowledge from ImageNet (1.4M images)

In [None]:
# Load CIFAR-10 for transfer learning demo
(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = keras.datasets.cifar10.load_data()

# Normalize
X_train_cifar = X_train_cifar / 255.0
X_test_cifar = X_test_cifar / 255.0
y_train_cifar = y_train_cifar.squeeze()
y_test_cifar = y_test_cifar.squeeze()

cifar_classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
                'dog', 'frog', 'horse', 'ship', 'truck']

# Visualize samples
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
for i, ax in enumerate(axes.ravel()):
    ax.imshow(X_train_cifar[i])
    ax.set_title(cifar_classes[y_train_cifar[i]])
    ax.axis('off')
plt.tight_layout()
plt.show()

print(f"CIFAR-10: {X_train_cifar.shape[0]} training images, {X_test_cifar.shape[0]} test images")

In [None]:
# Transfer learning with MobileNetV2
# Load pre-trained model (without top classification layer)
base_model = MobileNetV2(
    input_shape=(32, 32, 3),
    include_top=False,
    weights='imagenet'
)

# Freeze base model
base_model.trainable = False

# Add custom classification head
inputs = keras.Input(shape=(32, 32, 3))
x = base_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(10, activation='softmax')(x)

model_transfer = keras.Model(inputs, outputs)

# Compile
model_transfer.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print(f"Total parameters: {model_transfer.count_params():,}")
print(f"Trainable parameters: {sum([tf.size(v).numpy() for v in model_transfer.trainable_variables]):,}")

In [None]:
# Train with transfer learning (feature extraction)
history_transfer = model_transfer.fit(
    X_train_cifar[:5000], y_train_cifar[:5000],  # Subset for demo
    batch_size=64,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

# Fine-tuning: Unfreeze some layers
base_model.trainable = True
# Freeze early layers, train later ones
for layer in base_model.layers[:-20]:
    layer.trainable = False

# Recompile with lower learning rate
model_transfer.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.0001),  # Lower LR
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print(f"\nFine-tuning: Trainable parameters: {sum([tf.size(v).numpy() for v in model_transfer.trainable_variables]):,}")

# Continue training (fine-tuning)
history_finetune = model_transfer.fit(
    X_train_cifar[:5000], y_train_cifar[:5000],
    batch_size=64,
    epochs=5,
    validation_split=0.2,
    verbose=1
)

print("\nTransfer learning stages:")
print("1. Feature extraction: Train only new layers")
print("2. Fine-tuning: Unfreeze and train some base layers with low LR")

## Part 6: Visualizing CNN Features

Understanding what CNNs learn helps interpret and debug models.

In [None]:
# Visualize intermediate activations
layer_outputs = [layer.output for layer in model_simple_cnn.layers[:6]]  # First 6 layers
activation_model = keras.Model(inputs=model_simple_cnn.input, outputs=layer_outputs)

# Get activations for a test image
test_image = X_test_cnn[0:1]
activations = activation_model.predict(test_image, verbose=0)

# Visualize first conv layer activations
first_layer_activation = activations[0]
n_features = min(16, first_layer_activation.shape[-1])

fig, axes = plt.subplots(4, 4, figsize=(12, 12))
for i, ax in enumerate(axes.ravel()):
    if i < n_features:
        ax.imshow(first_layer_activation[0, :, :, i], cmap='viridis')
        ax.set_title(f'Filter {i}')
    ax.axis('off')
plt.suptitle('First Convolutional Layer Activations')
plt.tight_layout()
plt.show()

print("Each filter learns to detect different features!")
print("Early layers: edges, textures")
print("Later layers: complex patterns, objects")

## Part 7: PyTorch CNN Implementation

Let's implement a CNN in PyTorch for comparison.

In [None]:
# PyTorch CNN
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 3 * 3, 64)
        self.fc2 = nn.Linear(64, 10)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = F.relu(self.conv3(x))
        x = x.view(-1, 64 * 3 * 3)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Create model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_pytorch = SimpleCNN().to(device)

print(model_pytorch)
print(f"\nDevice: {device}")

## Key Takeaways

1. **Convolutions** detect spatial patterns through learned filters
2. **Pooling** reduces spatial dimensions and parameters
3. **CNNs** learn hierarchical features (edges → textures → objects)
4. **Famous architectures** provide proven building blocks
5. **Data augmentation** improves generalization
6. **Transfer learning** leverages pre-trained models
7. **Feature extraction** trains only new layers
8. **Fine-tuning** adapts pre-trained features
9. **Visualization** helps understand learned features
10. **Both Keras and PyTorch** are excellent for CNNs

## CNN Design Best Practices

1. **Start with transfer learning** if possible
2. **Use 3×3 convolutions** (VGG insight)
3. **Add batch normalization** after conv layers
4. **Use ReLU** activation
5. **Apply data augmentation** for limited data
6. **Global average pooling** instead of flatten when possible
7. **Gradually increase channels** as you go deeper
8. **Add dropout** before final layers
9. **Use residual connections** for deep networks
10. **Monitor validation metrics** to prevent overfitting

## Exercises

1. **Custom Architecture**: Design your own CNN architecture for CIFAR-10
2. **Visualization**: Visualize filters from different layers
3. **Grad-CAM**: Implement class activation maps to see what the model looks at
4. **Transfer Learning**: Try different pre-trained models (ResNet, EfficientNet)
5. **Object Detection**: Explore YOLO or Faster R-CNN basics
6. **Style Transfer**: Implement neural style transfer
7. **Custom Dataset**: Build a classifier for your own images

## Next Steps

In Lab 4, we'll explore:
- Recurrent Neural Networks (RNNs)
- Long Short-Term Memory (LSTM)
- Attention mechanisms
- Sequence-to-sequence models

Great work! You now understand how CNNs revolutionized computer vision.