# Day 55 â€” "Calibration, Confidence & Proper Scoring Rules"

Calibration measures whether predicted confidence matches empirical accuracy.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv


## 1. Calibration vs Accuracy
Calibration asks whether confidence matches empirical frequency.


In [2]:
import numpy as np

conf = np.array([0.2, 0.4, 0.6, 0.8, 0.9])
acc = np.array([0.1, 0.35, 0.55, 0.75, 0.85])

edges = np.linspace(0, 1, 6)
ece = 0.0
for i in range(len(edges) - 1):
    mask = (conf >= edges[i]) & (conf < edges[i + 1])
    if mask.any():
        ece += mask.mean() * abs(acc[mask].mean() - conf[mask].mean())
print('ECE:', ece)


ECE: 0.06000000000000002


## 2. Reusable Module
```bash
python -m days.day55.code.calibration_demo
```


## 3. Visualization
Run the visualization script to generate plots in `days/day55/outputs/`.


In [3]:
# from days.day55.code.visualizations import main
# main()


## 4. Key Takeaways
- Calibration measures trustworthiness.
- Accuracy does not imply calibration.
- Temperature scaling can fix confidence.
