In [1]:
import pandas as pd
from sklearn.metrics import confusion_matrix

df = pd.DataFrame({
    'actual': ['coffee', 'no coffee', 'no coffee', 'coffee', 'coffee', 'coffee', 'no coffee', 'coffee'],
    'prediction': ['no coffee', 'no coffee', 'coffee', 'coffee', 'coffee', 'coffee', 'no coffee', 'no coffee'],
})
df

Unnamed: 0,actual,prediction
0,coffee,no coffee
1,no coffee,no coffee
2,no coffee,coffee
3,coffee,coffee
4,coffee,coffee
5,coffee,coffee
6,no coffee,no coffee
7,coffee,no coffee


In [2]:
pd.crosstab(df.actual, df.prediction)

prediction,coffee,no coffee
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
coffee,3,2
no coffee,1,2


In [3]:
confusion_matrix(df.actual, df.prediction,
                 labels = ('no coffee', 'coffee'))

array([[2, 1],
       [2, 3]])

The matrix here represent the 4 possible outcomes of our classification task:

c[0,0]: There are 2 **True Negatives**, where we predicted the people don't like cofee and they really don't

c[0:1]: There is 1 **False Positive**, where we predicted the person likes coffee but they really don't

c[1,0]: There are 2 **False Negatives**, where we predicted those people don't like coffee, but they really do

c[1,1]: There are 3 **True Positives**, that is for 4 people they really do like coffee and we predicted they do

_____________________________________

Here we are treating liking coffee as the positive case and not liking coffee as the negative case. This choice is arbitrary and we could have chosen not liking coffee as the positive case and liking cofee as the negative case.

Either way, when discussing classification model performance, you'll see one outcome classified as positive and the other as negative.

### Baseline

- For a classification problem, a common choice for the baseline model is a model that simply predicts the most common class every single time.

In [5]:
df.actual.value_counts()

coffee       5
no coffee    3
Name: actual, dtype: int64

In our example, there are 5 coffee drinkers and 3 non-coffee drinkers, so our baseline model would be to predict that someone likes coffee every single time.

In [6]:
df['baseline_prediction'] = 'coffee'

### Common Evaluation Metrics

---
<div class="alert alert-info">

Now that we have introduced the idea of a confusion matrix, we can discuss some metrics that are derived from it.

Accuracy
Accuracy is the number of times we predicted correctly divided by the total number of observations. Put another way:

$TP + TN /
TP + TN + FP + FN$
In our example above, this would be

$3
+
2
/
3
+
2
+
1
+
2
=
5
/
8
=
0.625$
So our model's overall accuracy is 62.5%.

Accuracy is a good, easy to understand metric, but can fail to capture the whole picture when the classes in the original dataset are not evenly distributed.
</div>

In [7]:
model_accuracy = (df.prediction == df.actual).mean()
baseline_accuracy = (df.baseline_prediction == df.actual).mean()

print(f'   model accuracy: {model_accuracy:.2%}')
print(f'baseline accuracy: {baseline_accuracy:.2%}')

   model accuracy: 62.50%
baseline accuracy: 62.50%


In [9]:
subset = df[df.actual == 'coffee']

model_recall = (subset.prediction == subset.actual).mean()
baseline_recall = (subset.baseline_prediction == subset.actual).mean()

print(f'   model recall: {model_recall:.2%}')
print(f'baseline recall: {baseline_recall:.2%}')

   model recall: 60.00%
baseline recall: 100.00%


In [10]:
subset = df[df.prediction == 'coffee']
model_precision = (subset.prediction == subset.actual).mean()

subset = df[df.baseline_prediction == 'coffee']
baseline_precision = (subset.baseline_prediction == subset.actual).mean()

print(f'model precision: {model_precision:.2%}')
print(f'baseline precision: {baseline_precision:.2%}')


model precision: 75.00%
baseline precision: 62.50%
