
# Day 21 — "UNet, FPN & Encoder–Decoder Architectures for Dense Prediction"

Dense prediction requires a network to understand the whole scene and still deliver pixel-level answers. Encoder–decoder architectures resolve the tension between global semantics and local detail by compressing, reasoning, and then expanding feature maps with careful skip connections.


In [4]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / 'days').exists():
    # Walk upward to find the repo root
    for parent in Path.cwd().resolve().parents:
        if (parent / 'days').exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f'Using repo root: {repo_root}')


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv



## 1. Core Intuition — From Classification to Dense Prediction

- Classification answers *what* is present; dense prediction answers *what is at every pixel*.
- Deep layers capture meaning but lose detail, while shallow layers keep detail but lack meaning.
- Successful dense predictors combine deep semantic context with shallow spatial precision.



## 2. Encoder–Decoder Blueprint

**Encoder (down path)**

- Convolutions + stride/pooling reduce resolution.
- Receptive field grows, semantics strengthen.

**Decoder (up path)**

- Upsampling or transposed convolutions recover spatial resolution.
- Skip connections inject spatial clues so localization remains sharp.

> Compress → understand → expand → localize. This template appears in UNet, FPN, SegNet, DeepLab decoders, diffusion U-Nets, and more.



## 3. UNet — Precise Localization via Skip Connections

- Originated in medical imaging where each pixel matters.
- Concatenates shallow encoder features with decoder features at the same scale.
- Preserves boundary sharpness, works with limited data, and keeps gradients short.
- Remains the backbone for medical segmentation, satellite change detection, and diffusion models.



## 4. FPN — Multi-Scale Feature Fusion

- Objects appear at many scales. FPN fuses features from every backbone stage.
- Bottom-up CNN produces \(C_2, C_3, C_4, C_5\). Top-down pathway upsamples and adds lateral skips to yield \(P_2, P_3, P_4, P_5\).
- Results in high-resolution, high-semantic maps ready for detection and instance segmentation heads (Faster/Mask R-CNN, RetinaNet, etc.).



## 5. UNet vs FPN — Design Differences

| Aspect | UNet | FPN |
| --- | --- | --- |
| Primary task | Semantic segmentation | Detection & instance segmentation |
| Skip strategy | Concatenate encoder + decoder maps | Add lateral features | 
| Decoder depth | Symmetric, full decoder | Lightweight top-down pyramid |
| Spatial precision | Very high | High |
| Semantic consistency | Medium | Very strong |
| Typical domains | Medical, remote sensing, diffusion | Detection, panoptic segmentation |



## 6. Upsampling & Transposed Conv Intuition

- Nearest/bilinear interpolation: fast baselines with no learnable weights.
- Transposed convolution: learned upsampling (the gradient of convolution w.r.t. its input); powerful but needs care to avoid checkerboard artifacts.
- Resize + convolution blocks are now common for stable, expressive decoders.



## 7. Encoder–Decoder Gradient Flow

Skip connections shorten gradient paths, decoders send localization feedback to encoders, and supervision at full resolution keeps learning stable. This is why UNet-style models converge quickly even on small datasets.



## 8. Applications

- Semantic segmentation (UNet, DeepLab-UNet, HRNet decoders).
- Object detection & instance segmentation (ResNet+FPN, ConvNeXt+FPN).
- Change detection and depth/surface prediction.
- Diffusion models and autoencoders rely on UNet variants for denoising.



## 9. Python — Blueprint Summaries

`days/day21/code/encoder_decoder.py` lists representative encoder–decoder families.


In [5]:
from days.day21.code.encoder_decoder import summarize_blueprints

for bp in summarize_blueprints():
  print(f"{bp.name}: {bp.description} skip → {bp.skip_strategy} excels at → {bp.excels_at}")

UNet: Symmetric encoder–decoder with spatial skip concatenations skip → Concatenate shallow + deep feature maps excels at → Pixel-precise segmentation and diffusion backbones
FPN: Top-down pyramid that fuses multi-scale backbone features skip → Add lateral features across resolutions excels at → Detection/instance segmentation needing multi-scale context
SegNet: Encoder–decoder that remembers pooling indices skip → Unpool using saved max-pooling masks excels at → Semantic segmentation when memory is limited



## 10. Visualization — Encoder–Decoder + FPN Diagram

`days/day21/code/visualizations.py` renders a simple schematic comparing UNet-style skip concatenation with the FPN pyramid.


In [6]:
from days.day21.code.visualizations import create_encoder_decoder_diagram

RUN_FIGURES = False

if RUN_FIGURES:
    path = create_encoder_decoder_diagram()
    path
else:
    print("Set RUN_FIGURES = True to regenerate Day 21 figures inside days/day21/outputs/.")

Set RUN_FIGURES = True to regenerate Day 21 figures inside days/day21/outputs/.



## 11. Mini Exercises

1. Implement a tiny UNet and visualize encoder/decoder feature maps.
2. Replace concatenation with addition in UNet skips and inspect boundary quality.
3. Build an FPN on top of ResNet-50 and compare detection performance vs. a plain head.
4. Compare bilinear vs. transposed convolution upsampling artifacts.
5. Train a Siamese UNet for change detection on satellite imagery.



## 12. Key Takeaways

- Dense prediction needs both global context and local precision.
- Encoder–decoder blueprints achieve this by compressing then expanding with skip connections.
- UNet prioritizes pixel-accurate segmentation; FPN prioritizes multi-scale semantics.
- Upsampling choices and skip designs shape gradient flow and final quality.
- These ideas power modern segmentation, detection, depth, and diffusion models.
