# Classification Model Evaluation

Common ways of evaluating a __classification__ model's performance.
> A model is an algorithm/classifier that is fit to the training set.
 https://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html
 
1. __Confusion matrix__: is a cross-tabulation of a model's predictions against the actual outcome.
https://en.wikipedia.org/wiki/Confusion_matrix

In [1]:
import pandas as pd
import numpy as np
from sklearn.dummy import DummyClassifier
from sklearn.metrics import confusion_matrix

In [2]:
# This is a simplified version on model evaluation to understand the
# fundamentals tools to evaluate the models.
df = pd.DataFrame({
    'actual': ['coffee', 'no coffee', 'no coffee', 'coffee',
               'coffee', 'coffee', 'no coffee', 'coffee'],
    'prediction': ['no coffee', 'no coffee', 'coffee',
                   'coffee', 'coffee', 'coffee', 'no coffee',
                   'no coffee'],
})
print("Our model predicts whether or not someone like coffee.")
df

Our model predicts whether or not someone like coffee.


Unnamed: 0,actual,prediction
0,coffee,no coffee
1,no coffee,no coffee
2,no coffee,coffee
3,coffee,coffee
4,coffee,coffee
5,coffee,coffee
6,no coffee,no coffee
7,coffee,no coffee


In [5]:
# This is a confusion matrix
pd.crosstab(df.prediction, df.actual)

actual,coffee,no coffee
prediction,Unnamed: 1_level_1,Unnamed: 2_level_1
coffee,3,1
no coffee,2,2


In [6]:
# The function accepts actual outcome, predicted outcome.
confusion_matrix(df.actual, df.prediction)

array([[3, 2],
       [1, 2]])

> Working through this simple example, I understand the contents and layout of a confusion matrix!

|Confusion Matrix|Outcome|Prediction|Actual|# of People|
|:---|:---|:---|:---|:---|
|Top Left|True Positive|coffee|coffee|3|
|Bottom Right|True Negative|no coffee|no coffee|2|
|Top Right|False Positive/Type I Error|coffee|no coffee|1|
|Bottom Left|False Negative/Type II Error|no coffee|coffee|2|


|Outcome|English|IRL outcome if put into production|
|:---|:---|:---|
|True Positive|Jarvis predicts a person likes coffee and they do like coffee.|Customer gets coffee. OK.|
|True Negative|Jarvis predicts a person does not like coffee and they do not like coffee.|Customer does not get coffee. OK.|
|False Positive|Jarvis predicts a person likes coffee and they do not like coffee.|Customer gets coffee they didn't ask for. Awkward...|
|False Negative|Jarvis predicts a person does not like coffee and they do like coffee.|Customer doesn't get coffee they wanted. Karen transforms into Godzilla.|

## Baseline model
__DummyClassifier__ is a classifier that makes predictions using simple rules. This classifier is useful as a simple baseline to compare with other (real) classifiers.

https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html#sklearn.dummy.DummyClassifier
<div class="alert alert-block alert-danger">
Do not use it for real problems.
</div>

In [None]:
# Create a dummy classifier with the strategy as 'most _frequent'.
# It predicts the most frequent ACTUAL class/grouping.
baseline_classifier = DummyClassifier(strategy='most_frequent')

# Fit the dummy classifier with predictions and actual outcomes.
baseline_classifier.fit(df.prediction, df.actual)

In [18]:
# The dummy classifier pokemon evolves into its final form, Dummy Model.
# The model predicts the most frequent 'prediction'
# Meaning if df.prediction has 5 'coffee' and 3 'no coffee'
# The classifier will predict that eveyone likes coffee. If this
# model was used in a ml product it would give everyone coffee.
# Oprah would be proud.
baseline_classifier.predict(df.prediction) # EVERYONE LIKE COFFEE!!!

array(['coffee', 'coffee', 'coffee', 'coffee', 'coffee', 'coffee',
       'coffee', 'coffee'], dtype='<U6')

In [19]:
# But, IRL it only gets the prediction right 5/8 times or 62.5%
baseline_classifier.score(df.prediction, df.actual)

0.625