# Day 20 — "Modern CNN Architectures: VGG → ResNet → EfficientNet → ConvNeXt"

CNNs evolved by solving concrete problems: VGG proved depth matters, ResNet fixed gradients, EfficientNet scaled efficiently, ConvNeXt modernized CNN design with Transformer lessons.

## 1. VGG — Depth as Power

- Stack many 3×3 convs (stride 1, same padding) + max pooling.
- Two 3×3 ≈ 5×5; three 3×3 ≈ 7×7 receptive fields.
- Limitations: huge parameter count, vanishing gradients, slow.

## 2. ResNet — Residual Learning

- Learn `H(x) = F(x) + x` to keep gradients alive.
- Backward: `dL/dx = dL/dy * (I + dF/dx)` → gradients bypass layers when needed.
- Enables 50/101/152-layer networks, smooths loss landscape.

## 3. EfficientNet — Compound Scaling

- Scale depth/width/resolution together using α,β,γ (subject to `α ⋅ β^2 ⋅ γ^2 ≈ 2`).
- Uses depthwise separable convs, SE blocks, Swish activation.
- Best accuracy/parameter ratio across mobile + server.

## 4. ConvNeXt — Transformer-Inspired CNN

- Borrow ViT design ideas (GELU, LayerNorm, large kernels, clean stages).
- Keeps CNN locality while reaching ViT-level accuracy.
- Simplified blocks scale better.

## 5. Python — Architecture Summary

`days/day20/code/architecture_summary.py` prints key ideas for each architecture.

In [1]:
from __future__ import annotations

import sys
from pathlib import Path


def find_repo_root(marker: str = "days") -> Path:
    path = Path.cwd()
    while path != path.parent:
        if (path / marker).exists():
            return path
        path = path.parent
    raise RuntimeError("Run this notebook from inside the repository tree.")

REPO_ROOT = find_repo_root()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

from days.day20.code.architecture_summary import ARCHS

for arch in ARCHS:
    print(f"{arch.name}: {arch.key_idea} → solves {arch.solves}")


VGG: Stack many 3x3 convs to go deep → solves Need for depth
ResNet: Skip connections with residual blocks → solves Vanishing gradients
EfficientNet: Compound scaling of depth/width/resolution → solves Inefficient scaling
ConvNeXt: Transformer-inspired CNN redesign → solves Outdated CNN design


## 6. Visualization — Parameter Comparison

`days/day20/code/visualizations.py` plots sample parameter counts for VGG/ResNet/EfficientNet/ConvNeXt.

In [2]:
from days.day20.code.visualizations import plot_param_comparison

RUN_ANIMATIONS = False

if RUN_ANIMATIONS:
    print('Saved parameter plot →', plot_param_comparison())
else:
    print('Set RUN_ANIMATIONS = True to regenerate Day 20 figures in days/day20/outputs/.')


Set RUN_ANIMATIONS = True to regenerate Day 20 figures in days/day20/outputs/.


## 7. Architecture Comparison (Big Picture)

| Model | Key Idea | Solves |
| --- | --- | --- |
| VGG | Go deeper with stacks of 3×3 convs | Limited representation |
| ResNet | Residual skip connections | Vanishing gradients |
| EfficientNet | Compound scaling of depth/width/resolution | Inefficient scaling |
| ConvNeXt | Transformer-inspired CNN redesign | Outdated CNN design |


## 9. Mini Exercises

1. Compare feature maps from VGG vs ResNet at equal depth.
2. Remove skip connections in ResNet and observe training failure.
3. Train EfficientNet-B0 vs ResNet-50 on a small dataset; compare accuracy/efficiency.
4. Replace ReLU+BN with GELU+LN in a CNN block; test optimization stability.
5. Visualize receptive fields in ConvNeXt blocks.

## 10. Key Takeaways

- VGG: depth is powerful but parameter-heavy.
- ResNet: identity paths solved gradient issues.
- EfficientNet: balanced scaling beats brute force.
- ConvNeXt: CNNs can be modernized to match Transformers.
- Architecture design combines geometry, gradients, and optimization lessons.

> Every great architecture responds to a limitation of the previous one.