# Comparing the old and new implementation of P-$\lambda$'s `uncertainty-calibration`

Propose: This notebook shows that the two implementations are equivalent on some test cases. 

Background:  [P-lambdas's calibrator](https://github.com/p-lambda/verified_calibration) (NeuroIPS 2019) is a great package. But it has callables as members of its calibrator cclass, making a trained calibrator cannot be pickled for future use. Now we rewrite it to make it pickl-able. 

In [1]:
import calibration as cal

In [2]:
raw_probs = [0.61051559, 0.00047493709, 0.99639291, 0.00021221573, 0.99599433, 0.0014127002, 0.0028262993]
labels = [1,0,1,0,1,0,0]

import numpy as np
# turn into two-column array, with the i-th column be the probability of the i-th class
raw_probs = np.array(raw_probs) 
raw_probs = np.vstack((raw_probs, 1-raw_probs)).T

print (raw_probs)

[[6.10515590e-01 3.89484410e-01]
 [4.74937090e-04 9.99525063e-01]
 [9.96392910e-01 3.60709000e-03]
 [2.12215730e-04 9.99787784e-01]
 [9.95994330e-01 4.00567000e-03]
 [1.41270020e-03 9.98587300e-01]
 [2.82629930e-03 9.97173701e-01]]


# Do the old way and new way give the same result?

In [3]:
num_bins = 4

## The new output 

In [4]:
calibrator_new = cal.PlattBinnerMarginalCalibrator(7, num_bins=num_bins, way="new")  # NOTE WAY
calibrator_new.train_calibration(raw_probs, labels)


self._k: 2
labels_one_hot: [[0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [1. 0.]]
c: 0
probs_c [6.1051559e-01 4.7493709e-04 9.9639291e-01 2.1221573e-04 9.9599433e-01
 1.4127002e-03 2.8262993e-03]
labels_c [0. 1. 0. 1. 0. 1. 1.]
logistic_regressor_c: LogisticRegression(C=10000000000.0) [[-4.66513617]] [-7.47650288]
platt_probs_c: [6.95504760e-05 1.00000000e+00 2.31271468e-15 1.00000000e+00
 3.77816773e-15 1.00000000e+00 9.99999998e-01]
binned_values: [[2.3127146823566147e-15, 3.7781677339412375e-15], [6.955047601333205e-05, 0.9999999976968412], [0.9999999999994453, 0.9999999999099543], [0.9999999999999871]]
bins [3.477523800855511e-05, 0.9999999988033977, 0.9999999999997162, 1.0]
bin_means_c [3.04544121e-15 5.00034774e-01 1.00000000e+00 1.00000000e+00]
c: 1
probs_c [0.38948441 0.99952506 0.00360709 0.99978778 0.00400567 0.9985873
 0.9971737 ]
labels_c [1. 0. 1. 0. 1. 0. 0.]
logistic_regressor_c: LogisticRegression(C=10000000000.0) [[-4.66513617]] [7.47650288]
platt_probs_c: [9.

In [5]:
calibrated_probs = calibrator_new.calibrate(raw_probs) # should only see 4 values per class because num_bins = 4
print (calibrated_probs)
for c in range(2):
    assert len(np.unique(calibrated_probs[:,c])) == num_bins

c: 0
probs_c: [6.1051559e-01 4.7493709e-04 9.9639291e-01 2.1221573e-04 9.9599433e-01
 1.4127002e-03 2.8262993e-03]
logistic_regressor_c: LogisticRegression(C=10000000000.0) [[-4.66513617]] [-7.47650288]
platt_probs_c: [6.95504760e-05 1.00000000e+00 2.31271468e-15 1.00000000e+00
 3.77816773e-15 1.00000000e+00 9.99999998e-01]
bin_means_c: [3.04544121e-15 5.00034774e-01 1.00000000e+00 1.00000000e+00]
bins_c: [3.477523800855511e-05, 0.9999999988033977, 0.9999999999997162, 1.0]
bin_indices: [1 2 0 3 0 2 1]
calibrated_probs[:, c] [5.00034774e-01 1.00000000e+00 3.04544121e-15 1.00000000e+00
 3.04544121e-15 1.00000000e+00 5.00034774e-01]
c: 1
probs_c: [0.38948441 0.99952506 0.00360709 0.99978778 0.00400567 0.9985873
 0.9971737 ]
logistic_regressor_c: LogisticRegression(C=10000000000.0) [[-4.66513617]] [7.47650288]
platt_probs_c: [9.99930450e-01 5.54652828e-13 1.00000000e+00 1.29226372e-14
 1.00000000e+00 9.00457819e-11 2.30315874e-09]
bin_means_c: [2.83787733e-13 1.19660226e-09 9.99965225e-01 

In [None]:
calibrator_new._logistic_regressors[0].__dict__

## The old output 

In [None]:
calibrator_old = cal.PlattBinnerMarginalCalibrator(7, num_bins=num_bins, way="old")  # NOTE WAY
calibrator_old.train_calibration(raw_probs, labels)
calibrated_probs = calibrator_old.calibrate(raw_probs) # should only see four values per class because num_bins = 4
print (calibrated_probs)
for c in range(2):
    assert len(np.unique(calibrated_probs[:,c])) == num_bins

# However, the probability per row does not add up to 1 always 
print (np.sum(calibrated_probs, axis=1))

## Conclusion: They are the same. But the probabilities per-sample do not always add up to 1. 

# Now the new calibrator can be pickled because it contains only parameters

In [None]:
import pickle
with open("calibrator_new.pkl", "wb") as f:
    pickle.dump(calibrator_new, f)

## vs the old way that cannot due to containing callables

In [None]:
with open("calibrator_old.pkl", "wb") as f:
    pickle.dump(calibrator_old, f)