
<br>
========================================<br>
Comparison of Calibration of Classifiers<br>
========================================<br>
Well calibrated classifiers are probabilistic classifiers for which the output<br>
of the predict_proba method can be directly interpreted as a confidence level.<br>
For instance a well calibrated (binary) classifier should classify the samples<br>
such that among the samples to which it gave a predict_proba value close to<br>
0.8, approx. 80% actually belong to the positive class.<br>
LogisticRegression returns well calibrated predictions as it directly<br>
optimizes log-loss. In contrast, the other methods return biased probabilities,<br>
with different biases per method:<br>
* GaussianNaiveBayes tends to push probabilities to 0 or 1 (note the counts in<br>
  the histograms). This is mainly because it makes the assumption that features<br>
  are conditionally independent given the class, which is not the case in this<br>
  dataset which contains 2 redundant features.<br>
* RandomForestClassifier shows the opposite behavior: the histograms show<br>
  peaks at approx. 0.2 and 0.9 probability, while probabilities close to 0 or 1<br>
  are very rare. An explanation for this is given by Niculescu-Mizil and Caruana<br>
  [1]_: "Methods such as bagging and random forests that average predictions<br>
  from a base set of models can have difficulty making predictions near 0 and 1<br>
  because variance in the underlying base models will bias predictions that<br>
  should be near zero or one away from these values. Because predictions are<br>
  restricted to the interval [0,1], errors caused by variance tend to be one-<br>
  sided near zero and one. For example, if a model should predict p = 0 for a<br>
  case, the only way bagging can achieve this is if all bagged trees predict<br>
  zero. If we add noise to the trees that bagging is averaging over, this noise<br>
  will cause some trees to predict values larger than 0 for this case, thus<br>
  moving the average prediction of the bagged ensemble away from 0. We observe<br>
  this effect most strongly with random forests because the base-level trees<br>
  trained with random forests have relatively high variance due to feature<br>
  subsetting." As a result, the calibration curve shows a characteristic<br>
  sigmoid shape, indicating that the classifier could trust its "intuition"<br>
  more and return probabilities closer to 0 or 1 typically.<br>
* Support Vector Classification (SVC) shows an even more sigmoid curve as<br>
  the  RandomForestClassifier, which is typical for maximum-margin methods<br>
  (compare Niculescu-Mizil and Caruana [1]_), which focus on hard samples<br>
  that are close to the decision boundary (the support vectors).<br>
.. topic:: References:<br>
    .. [1] Predicting Good Probabilities with Supervised Learning,<br>
          A. Niculescu-Mizil & R. Caruana, ICML 2005<br>


In [None]:
print(__doc__)

Author: Jan Hendrik Metzen <jhm@informatik.uni-bremen.de><br>
License: BSD Style.

In [None]:
import numpy as np
np.random.seed(0)

In [None]:
import matplotlib.pyplot as plt

In [None]:
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.calibration import calibration_curve

In [None]:
X, y = datasets.make_classification(n_samples=100000, n_features=20,
                                    n_informative=2, n_redundant=2)

In [None]:
train_samples = 100  # Samples used for training the models

In [None]:
X_train = X[:train_samples]
X_test = X[train_samples:]
y_train = y[:train_samples]
y_test = y[train_samples:]

Create classifiers

In [None]:
lr = LogisticRegression()
gnb = GaussianNB()
svc = LinearSVC(C=1.0)
rfc = RandomForestClassifier()

#############################################################################<br>
Plot calibration plots

In [None]:
plt.figure(figsize=(10, 10))
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
ax2 = plt.subplot2grid((3, 1), (2, 0))

In [None]:
ax1.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")
for clf, name in [(lr, 'Logistic'),
                  (gnb, 'Naive Bayes'),
                  (svc, 'Support Vector Classification'),
                  (rfc, 'Random Forest')]:
    clf.fit(X_train, y_train)
    if hasattr(clf, "predict_proba"):
        prob_pos = clf.predict_proba(X_test)[:, 1]
    else:  # use decision function
        prob_pos = clf.decision_function(X_test)
        prob_pos = \
            (prob_pos - prob_pos.min()) / (prob_pos.max() - prob_pos.min())
    fraction_of_positives, mean_predicted_value = \
        calibration_curve(y_test, prob_pos, n_bins=10)
    ax1.plot(mean_predicted_value, fraction_of_positives, "s-",
             label="%s" % (name, ))
    ax2.hist(prob_pos, range=(0, 1), bins=10, label=name,
             histtype="step", lw=2)

In [None]:
ax1.set_ylabel("Fraction of positives")
ax1.set_ylim([-0.05, 1.05])
ax1.legend(loc="lower right")
ax1.set_title('Calibration plots  (reliability curve)')

In [None]:
ax2.set_xlabel("Mean predicted value")
ax2.set_ylabel("Count")
ax2.legend(loc="upper center", ncol=2)

In [None]:
plt.tight_layout()
plt.show()