# Metrics and Calibration

Definitions:
- **Accuracy** = (TP+TN)/(TP+TN+FP+FN)
- **Precision** = TP/(TP+FP)
- **Recall** = TP/(TP+FN)
- **F1** = 2PR/(P+R)
- **ROC AUC**: probability a positive ranks above a negative
- **Log Loss** = -mean[y log p + (1-y) log (1-p)]
- **Brier Score** = mean[(p - y)^2]
- **Calibration MAE** = mean|p - y|
- **ECE@K**: avg bin |confidence - accuracy|
- **MCE@K**: max bin |confidence - accuracy|

In [None]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt
from pathlib import Path

# Update this path to your artifact if available
proba_csv = Path('artifacts/predictions_example.csv')
if not proba_csv.exists():
    print('Update `proba_csv` to point to a predictions artifact CSV with y_true,y_pred_proba.')
else:
    df = pd.read_csv(proba_csv)
    y = df['y_true'].to_numpy()
    p = df['y_pred_proba'].to_numpy()
    p = np.clip(p, 0.0, 1.0 - 1e-12)

    def ece(y, p, n_bins=10):
        edges = np.linspace(0,1,n_bins+1)
        ids = np.digitize(p, edges, right=True)
        ids = np.minimum(ids, n_bins)
        e = 0.0
        mce = 0.0
        for b in range(1, n_bins+1):
            m = ids == b
            if not m.any():
                continue
            conf = p[m].mean()
            acc  = y[m].mean()
            w = m.mean()
            e += w * abs(conf - acc)
            mce = max(mce, abs(conf - acc))
        return e, mce

    e10, m10 = ece(y, p, 10)
    e15, _   = ece(y, p, 15)
    print(f'ECE@10={e10:.4f}  ECE@15={e15:.4f}  MCE@10={m10:.4f}')

    bins = np.linspace(0,1,11)
    ids = np.digitize(p, bins, right=True)
    ids = np.minimum(ids, 10)
    centers = 0.5*(bins[:-1]+bins[1:])
    obs = [y[ids==i].mean() if (ids==i).any() else np.nan for i in range(1,11)]

    plt.figure()
    plt.plot([0,1],[0,1],'--')
    plt.plot(centers, obs, marker='o')
    plt.xlabel('Predicted probability')
    plt.ylabel('Observed frequency')
    plt.title('Reliability diagram')
    plt.grid(True)
    plt.show()

A well-calibrated model follows the diagonal. Downward curves mean overconfidence; upward curves mean underconfidence. Try Platt or Isotonic calibration if needed.