
<br>
================<br>
Precision-Recall<br>
================<br>
Example of Precision-Recall metric to evaluate classifier output quality.<br>
Precision-Recall is a useful measure of success of prediction when the<br>
classes are very imbalanced. In information retrieval, precision is a<br>
measure of result relevancy, while recall is a measure of how many truly<br>
relevant results are returned.<br>
The precision-recall curve shows the tradeoff between precision and<br>
recall for different threshold. A high area under the curve represents<br>
both high recall and high precision, where high precision relates to a<br>
low false positive rate, and high recall relates to a low false negative<br>
rate. High scores for both show that the classifier is returning accurate<br>
results (high precision), as well as returning a majority of all positive<br>
results (high recall).<br>
A system with high recall but low precision returns many results, but most of<br>
its predicted labels are incorrect when compared to the training labels. A<br>
system with high precision but low recall is just the opposite, returning very<br>
few results, but most of its predicted labels are correct when compared to the<br>
training labels. An ideal system with high precision and high recall will<br>
return many results, with all results labeled correctly.<br>
Precision (:math:`P`) is defined as the number of true positives (:math:`T_p`)<br>
over the number of true positives plus the number of false positives<br>
(:math:`F_p`).<br>
:math:`P = \\frac{T_p}{T_p+F_p}`<br>
Recall (:math:`R`) is defined as the number of true positives (:math:`T_p`)<br>
over the number of true positives plus the number of false negatives<br>
(:math:`F_n`).<br>
:math:`R = \\frac{T_p}{T_p + F_n}`<br>
These quantities are also related to the (:math:`F_1`) score, which is defined<br>
as the harmonic mean of precision and recall.<br>
:math:`F1 = 2\\frac{P \\times R}{P+R}`<br>
Note that the precision may not decrease with recall. The<br>
definition of precision (:math:`\\frac{T_p}{T_p + F_p}`) shows that lowering<br>
the threshold of a classifier may increase the denominator, by increasing the<br>
number of results returned. If the threshold was previously set too high, the<br>
new results may all be true positives, which will increase precision. If the<br>
previous threshold was about right or too low, further lowering the threshold<br>
will introduce false positives, decreasing precision.<br>
Recall is defined as :math:`\\frac{T_p}{T_p+F_n}`, where :math:`T_p+F_n` does<br>
not depend on the classifier threshold. This means that lowering the classifier<br>
threshold may increase recall, by increasing the number of true positive<br>
results. It is also possible that lowering the threshold may leave recall<br>
unchanged, while the precision fluctuates.<br>
The relationship between recall and precision can be observed in the<br>
stairstep area of the plot - at the edges of these steps a small change<br>
in the threshold considerably reduces precision, with only a minor gain in<br>
recall.<br>
**Average precision** (AP) summarizes such a plot as the weighted mean of<br>
precisions achieved at each threshold, with the increase in recall from the<br>
previous threshold used as the weight:<br>
:math:`\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n`<br>
where :math:`P_n` and :math:`R_n` are the precision and recall at the<br>
nth threshold. A pair :math:`(R_k, P_k)` is referred to as an<br>
*operating point*.<br>
AP and the trapezoidal area under the operating points<br>
(:func:`sklearn.metrics.auc`) are common ways to summarize a precision-recall<br>
curve that lead to different results. Read more in the<br>
:ref:`User Guide <precision_recall_f_measure_metrics>`.<br>
Precision-recall curves are typically used in binary classification to study<br>
the output of a classifier. In order to extend the precision-recall curve and<br>
average precision to multi-class or multi-label classification, it is necessary<br>
to binarize the output. One curve can be drawn per label, but one can also draw<br>
a precision-recall curve by considering each element of the label indicator<br>
matrix as a binary prediction (micro-averaging).<br>
.. note::<br>
    See also :func:`sklearn.metrics.average_precision_score`,<br>
             :func:`sklearn.metrics.recall_score`,<br>
             :func:`sklearn.metrics.precision_score`,<br>
             :func:`sklearn.metrics.f1_score`<br>
<br>
#############################################################################<br>
In binary classification settings<br>
--------------------------------------------------------<br>
<br>
Create simple data<br>
..................<br>
<br>
Try to differentiate the two first classes of the iris data

In [None]:
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
import numpy as np

In [None]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

Add noisy features

In [None]:
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

Limit to the two first classes, and split into training and test

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X[y < 2], y[y < 2],
                                                    test_size=.5,
                                                    random_state=random_state)

Create a simple classifier

In [None]:
classifier = svm.LinearSVC(random_state=random_state)
classifier.fit(X_train, y_train)
y_score = classifier.decision_function(X_test)

#############################################################################<br>
Compute the average precision score<br>
...................................

In [None]:
from sklearn.metrics import average_precision_score
average_precision = average_precision_score(y_test, y_score)

In [None]:
print('Average precision-recall score: {0:0.2f}'.format(
      average_precision))

#############################################################################<br>
Plot the Precision-Recall curve<br>
................................

In [None]:
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
import matplotlib.pyplot as plt

In [None]:
disp = plot_precision_recall_curve(classifier, X_test, y_test)
disp.ax_.set_title('2-class Precision-Recall curve: '
                   'AP={0:0.2f}'.format(average_precision))

#############################################################################<br>
In multi-label settings<br>
------------------------<br>
<br>
Create multi-label data, fit, and predict<br>
...........................................<br>
<br>
We create a multi-label dataset, to illustrate the precision-recall in<br>
multi-label settings

In [None]:
from sklearn.preprocessing import label_binarize

Use label_binarize to be multi-label like settings

In [None]:
Y = label_binarize(y, classes=[0, 1, 2])
n_classes = Y.shape[1]

Split into training and test

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.5,
                                                    random_state=random_state)

We use OneVsRestClassifier for multi-label prediction

In [None]:
from sklearn.multiclass import OneVsRestClassifier

Run classifier

In [None]:
classifier = OneVsRestClassifier(svm.LinearSVC(random_state=random_state))
classifier.fit(X_train, Y_train)
y_score = classifier.decision_function(X_test)

#############################################################################<br>
The average precision score in multi-label settings<br>
....................................................

In [None]:
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score

For each class

In [None]:
precision = dict()
recall = dict()
average_precision = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test[:, i],
                                                        y_score[:, i])
    average_precision[i] = average_precision_score(Y_test[:, i], y_score[:, i])

A "micro-average": quantifying score on all classes jointly

In [None]:
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test.ravel(),
    y_score.ravel())
average_precision["micro"] = average_precision_score(Y_test, y_score,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.2f}'
      .format(average_precision["micro"]))

#############################################################################<br>
Plot the micro-averaged Precision-Recall curve<br>
...............................................<br>


In [None]:
plt.figure()
plt.step(recall['micro'], precision['micro'], where='post')

In [None]:
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.ylim([0.0, 1.05])
plt.xlim([0.0, 1.0])
plt.title(
    'Average precision score, micro-averaged over all classes: AP={0:0.2f}'
    .format(average_precision["micro"]))

#############################################################################<br>
Plot Precision-Recall curve for each class and iso-f1 curves<br>
.............................................................<br>


In [None]:
from itertools import cycle
# setup plot details
colors = cycle(['navy', 'turquoise', 'darkorange', 'cornflowerblue', 'teal'])

In [None]:
plt.figure(figsize=(7, 8))
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))

In [None]:
lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.2f})'
              ''.format(average_precision["micro"]))

In [None]:
for i, color in zip(range(n_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.2f})'
                  ''.format(i, average_precision[i]))

In [None]:
fig = plt.gcf()
fig.subplots_adjust(bottom=0.25)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))

In [None]:
plt.show()