# Computer Vision with Convolutional Networks

Convolutional neural networks (CNNs) remain the workhorse of vision tasks because they exploit spatial locality and translation invariance. This notebook bridges beginner fundamentals with the attention-heavy models you will build later, emphasizing diagnostics and transfer learning.

_Environment note:_ Network access is disabled here, so examples reflect APIs and best practices current through October 2024.

## Learning Objectives

- Explain how convolution, pooling, and normalization create feature hierarchies.
- Build and visualize activations of a small CNN in PyTorch.
- Simulate transfer learning by freezing backbones and attaching custom heads.
- Prepare for transformer-based vision models by practicing residual patterns and diagnostic plots.

## Feature Hierarchies at a Glance

- Early layers capture simple edges and textures.
- Intermediate layers assemble motifs (corners, contours).
- Deep layers respond to semantic regions of the input.

This inductive bias—weight sharing and local connectivity—keeps CNNs competitive even alongside attention models like ViT.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt

class SmallCNN(nn.Module):
    def __init__(self, num_classes: int = 10):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

model = SmallCNN()
dummy_images = torch.randn(4, 3, 32, 32)
logits = model(dummy_images)
print(logits.shape)


### Inspecting Feature Maps

Hooks let you capture intermediate activations for diagnostics. This is invaluable when a training run underperforms—you can reveal whether the network is extracting meaningful structure.

In [None]:
feature_maps = {}

def save_activation(name):
    def hook(module, _input, output):
        feature_maps[name] = output.detach()
    return hook

model.features[0].register_forward_hook(save_activation("conv1"))
model.features[4].register_forward_hook(save_activation("conv2"))
_ = model(dummy_images)

for name, activation in feature_maps.items():
    print(name, activation.shape)

fig, axes = plt.subplots(1, 4, figsize=(8, 2))
for idx, ax in enumerate(axes):
    ax.imshow(feature_maps["conv1"][0, idx].cpu(), cmap="viridis")
    ax.axis("off")
fig.suptitle("First-layer filters responding to a synthetic sample")
plt.show()


## Mini Task – Freeze the Backbone

Simulate transfer learning: freeze the convolutional backbone, leave the classifier trainable, and report the number of parameters still being updated.

Complete the task before expanding the hidden solution.

In [None]:
def freeze_backbone(model: nn.Module):
    # TODO: set requires_grad appropriately
    raise NotImplementedError


In [None]:
def freeze_backbone(model: nn.Module):
    for param in model.features.parameters():
        param.requires_grad = False
    for param in model.classifier.parameters():
        param.requires_grad = True

freeze_backbone(model)
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Trainable parameters: {trainable_params}")


## Transfer Learning Pattern

Load a pretrained backbone, freeze most layers, and fine-tune a small head on your dataset. The snippet below uses torchvision; adapt the head architecture to match your task.

In [None]:
try:
    import torchvision.models as models
except ImportError:
    models = None

if models is not None:
    backbone = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
    for param in backbone.parameters():
        param.requires_grad = False
    backbone.fc = nn.Sequential(
        nn.Linear(backbone.fc.in_features, 256),
        nn.ReLU(),
        nn.Dropout(0.3),
        nn.Linear(256, 10),
    )
    print("Transfer learning model ready (requires torchvision)")
else:
    print("torchvision not available; install it to run the transfer learning example")


## Comprehensive Exercise – Configurable CNN Builder

Implement `build_cnn(config)` that accepts channel widths, kernel sizes, optional batch norm, and residual blocks. Return an `nn.Sequential` model. Instantiate baseline and deeper configurations and summarize parameter counts.

In [None]:
def build_cnn(config):
    # TODO: construct sequential CNN based on configuration dictionary
    raise NotImplementedError


In [None]:
def conv_block(in_c, out_c, kernel_size, use_bn=True):
    layers = [nn.Conv2d(in_c, out_c, kernel_size=kernel_size, padding=kernel_size // 2), nn.ReLU(inplace=True)]
    if use_bn:
        layers.insert(1, nn.BatchNorm2d(out_c))
    return nn.Sequential(*layers)

class ResidualConvBlock(nn.Module):
    def __init__(self, channels, kernel_size=3, use_bn=True):
        super().__init__()
        self.block = nn.Sequential(
            conv_block(channels, channels, kernel_size, use_bn),
            conv_block(channels, channels, kernel_size, use_bn),
        )

    def forward(self, x):
        return x + self.block(x)

def build_cnn(config):
    channels = config["channels"]
    kernel_sizes = config.get("kernel_sizes", [3] * len(channels))
    use_bn = config.get("use_batchnorm", True)
    residual = config.get("residual", False)

    layers = []
    in_c = 3
    for idx, out_c in enumerate(channels):
        k = kernel_sizes[idx]
        layers.append(conv_block(in_c, out_c, k, use_bn))
        if residual and idx > 0:
            layers.append(ResidualConvBlock(out_c, kernel_size=k, use_bn=use_bn))
        layers.append(nn.MaxPool2d(2))
        in_c = out_c
    layers.append(nn.AdaptiveAvgPool2d((1, 1)))
    layers.append(nn.Flatten())
    layers.append(nn.Linear(in_c, config.get("num_classes", 10)))
    return nn.Sequential(*layers)

baseline_cfg = {"channels": [32, 64], "kernel_sizes": [3, 3], "use_batchnorm": True, "residual": False}
deeper_cfg = {"channels": [32, 64, 128], "kernel_sizes": [3, 3, 3], "use_batchnorm": True, "residual": True}

baseline = build_cnn(baseline_cfg)
deeper = build_cnn(deeper_cfg)

print("Baseline params:", sum(p.numel() for p in baseline.parameters()))
print("Deeper params:", sum(p.numel() for p in deeper.parameters()))


## Further Reading

- PyTorch vision tutorial: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
- He et al. (2015) “Deep Residual Learning for Image Recognition”
- Albumentations and torchvision transforms for richer augmentations
- Papers With Code Vision benchmarks for architecture inspiration