
# Day 22 — "Loss Functions for Dense Prediction (CE, Dice, IoU, Focal)"

Dense prediction is naturally imbalanced. Loss functions decide which mistakes matter: pixel-wise accuracy, overlap quality, hard pixels, or boundary behavior.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")

Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv



## 1. Core Intuition — Imbalance Rules Dense Prediction

- Pixel-wise accuracy is misleading when the background dominates.
- Dense losses must reward overlap, punish false negatives, and focus on hard pixels.
- Loss choice shapes the final mask quality (boundaries, holes, small objects).



## 2. Why Accuracy Fails

Predicting background everywhere can yield 95% accuracy and still be useless. Dense prediction needs overlap-aware losses that emphasize the rare foreground.



## 3. Cross-Entropy (CE / BCE)

- Per-pixel classification loss.
- Works for classification, but ignores global shape and is dominated by the majority class.

BCE formula (binary):

L_BCE = -[y log(p) + (1 - y) log(1 - p)]



## 4. Dice Loss — Overlap Matters

Dice coefficient:

Dice = 2|A ∩ B| / (|A| + |B|)

Dice loss:

L_Dice = 1 - Dice

Dice ignores true negatives, making it ideal for imbalanced segmentation.



## 5. IoU (Jaccard) Loss

IoU = |A ∩ B| / |A ∪ B|, with loss L_IoU = 1 - IoU.

IoU penalizes false positives/negatives more strictly than Dice, but can be harder to optimize.



## 6. Focal Loss — Emphasize Hard Pixels

Focal loss downweights easy pixels:

L_FL = -(1 - p)^gamma log(p)

Useful for extreme class imbalance and small objects.



## 7. Combined Losses (Best Practice)

Common recipes:

- BCE + Dice for binary segmentation.
- CE + Dice for multi-class segmentation.
- Focal + Dice for small objects or hard boundaries.



## 8. Python — Loss Function Demos (NumPy)

`days/day22/code/losses.py` provides NumPy implementations of BCE, Dice, IoU, and Focal losses.


In [2]:
from days.day22.code.losses import (
    binary_cross_entropy,
    dice_loss,
    iou_loss,
    focal_loss,
    make_synthetic_batch,
)

pred, gt = make_synthetic_batch()
print("BCE:", binary_cross_entropy(pred, gt))
print("Dice:", dice_loss(pred, gt))
print("IoU:", iou_loss(pred, gt))
print("Focal:", focal_loss(pred, gt))
print("BCE + Dice:", binary_cross_entropy(pred, gt) + dice_loss(pred, gt))

BCE: 0.9983115196228027
Dice: 0.8606790900230408
IoU: 0.9251236319541931
Focal: 0.14754727482795715
BCE + Dice: 1.8589906096458435



## 9. Visualization — Loss Curves Under Imbalance

`days/day22/code/visualizations.py` plots how different losses behave as foreground confidence changes.


In [3]:
from days.day22.code.visualizations import plot_loss_curves

RUN_FIGURES = False

if RUN_FIGURES:
    plot_loss_curves()
else:
    print("Set RUN_FIGURES = True to regenerate Day 22 figures inside days/day22/outputs/.")

Set RUN_FIGURES = True to regenerate Day 22 figures inside days/day22/outputs/.



## 10. Which Loss to Use?

| Task | Recommended Loss |
| --- | --- |
| Binary segmentation | BCE + Dice |
| Multi-class segmentation | CE + Dice |
| Medical imaging | Dice / Tversky |
| Remote sensing | Dice + Focal |
| Small objects | Focal + Dice |
| Instance segmentation | Focal + IoU |
| Change detection | BCE + Dice |



## 11. Mini Exercises

1. Train UNet with BCE only vs BCE+Dice — compare masks.
2. Replace Dice with IoU and observe convergence speed.
3. Increase focal gamma from 1 to 3 and inspect boundary quality.
4. Plot gradient magnitudes for BCE vs Dice.
5. Create extreme imbalance (1% foreground) and compare losses.



## 12. Key Takeaways

- Dense prediction is imbalanced; accuracy alone is misleading.
- Dice and IoU focus on overlap and shape.
- Focal loss emphasizes hard pixels and small objects.
- Combining losses often yields the best results.
- Loss functions decide what the model cares about.
