# Probability Calibration

Probability calibration is the process of calibrating an ML model to return the true likelihood of an event. This is necessary when we need the probability of the event in question rather than its classification.

## Calibration Curve

Compute true and predicted probabilities for a calibration curve.

The method assumes the inputs come from a binary classifier, and discretize the [0, 1] interval into bins.

Calibration curves may also be referred to as reliability diagrams.

In [1]:
import numpy as np
from sklearn.calibration import calibration_curve
y_true = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
y_pred = np.array([0.1, 0.2, 0.3, 0.4, 0.65, 0.7, 0.8, 0.9,  1.])
prob_true, prob_pred = calibration_curve(y_true, y_pred, n_bins=3)
prob_true

array([0. , 0.5, 1. ])

In [2]:
prob_pred

array([0.2  , 0.525, 0.85 ])

## Calibrated Classifier CV

Probability calibration with isotonic regression or logistic regression.

In [3]:
from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
from sklearn.calibration import CalibratedClassifierCV

In [4]:
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, random_state=42)
base_clf = GaussianNB()

In [5]:
calibrated_clf = CalibratedClassifierCV(base_estimator=base_clf, cv=3)
calibrated_clf.fit(X, y)
len(calibrated_clf.calibrated_classifiers_)
calibrated_clf.predict_proba(X)[:5, :]

array([[0.11009913, 0.88990087],
       [0.07226373, 0.92773627],
       [0.92831861, 0.07168139],
       [0.9283446 , 0.0716554 ],
       [0.07186091, 0.92813909]])

In [10]:
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0, random_state=42)
X_train, X_calib, y_train, y_calib = train_test_split(X, y, random_state=42)
base_clf = GaussianNB()
base_clf.fit(X_train, y_train)
calibrated_clf = CalibratedClassifierCV(base_estimator=base_clf, cv="prefit")
calibrated_clf.fit(X_calib, y_calib)
len(calibrated_clf.calibrated_classifiers_)
calibrated_clf.predict_proba([[-0.5, 0.5]])

array([[0.93677315, 0.06322685]])