# Day 24 — "Training Strategies for Dense Prediction (LR Schedules, Augmentation, Curriculum)"

Dense prediction needs careful optimization, spatially consistent augmentation, and progressive difficulty. Strategy shapes how gradients explore the loss landscape.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv


## 1. Core Intuition

- Dense prediction is sensitive to learning rate, data diversity, and training order.
- LR schedules steer optimization from coarse structure to fine boundaries.
- Augmentations must preserve pixel alignment.
- Curriculum learning stabilizes training on non-convex landscapes.


## 2. Learning-Rate Schedules

- **Step decay**: abrupt drops to refine later stages.
- **Cosine annealing**: smooth decay for stable refinement.
- **One-cycle**: fast exploration followed by aggressive refinement.
- **Warmup**: avoids early instability in deep or BN-heavy networks.


## 3. Warmup

Warmup increases LR gradually to prevent early divergence. It is especially helpful with large batches, mixed precision, or transformer-like backbones.


## 4. Data Augmentation (Geometry Matters)

Spatial augmentations must be applied **to both images and masks**:

- flips, crops, rotation, scale
- elastic deformation for medical imagery

Photometric augmentations apply **only to images** (color jitter, noise, blur).


## 5. Curriculum Learning

- **Sample curriculum**: start with easy cases, introduce hard scenes later.
- **Resolution curriculum**: train low-res first, then higher-res for detail.
- **Loss curriculum**: start with BCE, then add Dice/IoU for shape fidelity.


## 6. Batch Size, Accumulation, AMP

- Dense prediction uses large images and small batches.
- Use gradient accumulation to simulate larger batches.
- Mixed precision (AMP) improves speed and stability.


## 7. Python — LR Schedule Demos

`days/day24/code/training_strategies.py` provides simple schedule generators.


In [2]:
from days.day24.code.training_strategies import (
    step_decay,
    cosine_annealing,
    one_cycle,
    warmup_linear,
)

print("Step:", step_decay(1e-3, 0.1, 20, 5))
print("Cosine:", cosine_annealing(1e-3, 1e-6, 5))
print("One-cycle:", one_cycle(1e-3, 1e-5, 5))
print("Warmup:", warmup_linear(1e-3, 3, 5))


Step: [SchedulePoint(step=0, lr=0.001), SchedulePoint(step=1, lr=0.001), SchedulePoint(step=2, lr=0.001), SchedulePoint(step=3, lr=0.001), SchedulePoint(step=4, lr=0.001)]
Cosine: [SchedulePoint(step=0, lr=0.001), SchedulePoint(step=1, lr=0.0008536998372026805), SchedulePoint(step=2, lr=0.0005005000000000001), SchedulePoint(step=3, lr=0.00014730016279731955), SchedulePoint(step=4, lr=1e-06)]
One-cycle: [SchedulePoint(step=0, lr=1e-05), SchedulePoint(step=1, lr=0.001), SchedulePoint(step=2, lr=0.00067), SchedulePoint(step=3, lr=0.00034), SchedulePoint(step=4, lr=1.0000000000000026e-05)]
Warmup: [SchedulePoint(step=0, lr=0.0003333333333333333), SchedulePoint(step=1, lr=0.0006666666666666666), SchedulePoint(step=2, lr=0.001), SchedulePoint(step=3, lr=0.001), SchedulePoint(step=4, lr=0.001)]


## 8. Visualization — LR Schedule Comparison

`days/day24/code/visualizations.py` plots step, cosine, one-cycle, and warmup curves.


In [3]:
from days.day24.code.visualizations import plot_lr_schedules

RUN_FIGURES = False

if RUN_FIGURES:
    plot_lr_schedules()
else:
    print("Set RUN_FIGURES = True to regenerate Day 24 figures inside days/day24/outputs/.")


Set RUN_FIGURES = True to regenerate Day 24 figures inside days/day24/outputs/.


## 9. Default Training Recipe (UNet/FPN)

- Optimizer: AdamW
- LR: 3e-4
- Scheduler: Cosine + warmup
- Loss: BCE + Dice
- Augmentation: flip + crop + scale
- AMP: on
- Batch size: as large as possible (or accumulate)


## 10. Mini Exercises

1. Compare fixed LR vs cosine schedule on the same model.
2. Disable augmentation and track overfitting.
3. Try warmup vs no warmup and compare early stability.
4. Train low-res then high-res (resolution curriculum).
5. Compare one-cycle vs cosine on your dataset.


## 11. Key Takeaways

- Training strategy shapes how gradients explore geometry.
- LR schedules control stability and refinement.
- Augmentation teaches invariance and robustness.
- Curriculum learning smooths non-convex optimization.
- Good training beats fancy architecture.
