## Accuracy

(TP + TN) / (TP + TN + FP + FN)

In [1]:
labels = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1]
guesses = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0]

true_positives = 0
true_negatives = 0
false_positives = 0
false_negatives = 0

for i in range(len(labels)):
    if labels[i] == 1 and guesses[i] == 1:
        true_positives += 1
    if labels[i] == 1 and guesses[i] == 0:
        false_negatives += 1
    if labels[i] == 0 and guesses[i] == 1:
        false_positives += 1
    if labels[i] == 0 and guesses[i] == 0:
        true_negatives += 1

accuracy = (true_positives + true_negatives) / (true_positives + true_negatives + false_positives + false_negatives)

print(accuracy)

0.3


## Recall (a.k.a. Sensitivity)

TP / (TP + FN)

Accuracy can be an extremely misleading statistic depending on your data. Consider the example of an algorithm that is trying to predict whether or not there will be over 3 feet of snow on the ground tomorrow. We can write a pretty accurate classifier right now: always predict False. This classifier will be incredibly accurate — there are hardly ever many days with that much snow. But this classifier never finds the information we’re actually interested in.

In this situation, the statistic that would be helpful is recall. Recall measures the percentage of relevant items that your classifier found. In this example, recall is the number of snow days the algorithm correctly predicted divided by the total number of snow days. 

In [2]:
recall = true_positives / (true_positives + false_negatives)
print(recall)

0.42857142857142855


## Precision (a.k.a Positive Predictive Value)

TP / (TP + FP)

Unfortunately, recall isn’t a perfect statistic either. For example, we could create a snow day classifier that always returns True. This would have low accuracy, but its recall would be 1 because it would be able to accurately find every snow day. 

The algorithm that predicts every day is a snow day has recall of 1, but it will have very low precision. It correctly predicts every snow day, but there are tons of false positives as well.

In [3]:
precision = true_positives / (true_positives + false_positives)
print(precision)

0.5


## F1 Score

F1 score is the harmonic mean of precision and recall.

2 * ((Precision * Recall) / (Precision + Recall))

The F1 score combines both precision and recall into a single statistic. We use the harmonic mean rather than the traditional arithmetic mean because we want the F1 score to have a low value when either precision or recall is 0.

In [4]:
f_1 = 2* ((precision * recall)/(precision + recall))
print(f_1)

0.4615384615384615


The decision to use precision, recall, or F1 score ultimately comes down to the context of your classification. Maybe you don’t care if your classifier has a lot of false positives. If that’s the case, precision doesn’t matter as much.

## Scikit-learn

In [5]:
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

In [6]:
print(accuracy_score(labels, guesses))
print(recall_score(labels, guesses))
print(precision_score(labels,guesses))
print(f1_score(labels, guesses))

0.3
0.42857142857142855
0.5
0.4615384615384615
