# Accuracy Metrics Lab

## Measuring Classifier Performance Metrics

## Objective:

* Create and interpret a Confusion Matrix.
* Calculate Precision, and identify use cases for this metric.
* Calculate Recall, and identify use cases for this metric.
* Calculate Accuracy, and identify use cases for this metric.
* Calculate F-score, and identify use cases for this metric.

## Measuring Classifier Performance

For this lab, we're going to focus on the different ways we can measure the performance of a classifier.  We'll focus on the following metrics:

* **_Precision_**
* **_Recall_**
* **_Accuracy_**
* **_F1-Score_**

In order to calculate these different metrics, we'll be creating a **_Confusion Matrix_**.  This will allow us to keep track of whether the model got each question right or wrong, as well as _how_ the model got it right or wrong (more on this below).

## The Dataset

We'll quickly create a Decision Tree Classifier to make predictions for a dataset we're already familiar with: The _Titanic Dataset_.  

Run the cell below to import the data.

In [1]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

raw_df = pd.read_csv("titanic.csv")
raw_df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,male,35.0,0,0,373450,8.05,,S


Next, we'll clean the dataset by removing unneeded columns.  We'll also binary-encode the `'Sex'` column, as well as one-hot encode the 'Embarked' column (creating a binary-encoded column for each possible category in the `'Embarked'` column).  

We'll also remove null values, and then store the target column in a separate column and remove it from the dataset. 

In [2]:
# Drop unneeded columns
intermediate_df1 = raw_df.drop(['PassengerId', 'Ticket', 'Cabin'], axis=1, inplace=False)

# Binary encode Sex column
intermediate_df1['Sex'] = intermediate_df1['Sex'].map({'male': 0, 'female': 1})

# Remove Null values
intermediate_df2 = intermediate_df1.dropna(inplace=False)

# one-hot encode Embarked column
intermediate_df3 = pd.get_dummies(intermediate_df2)

# Store labels separately and drop from dataframe
target = intermediate_df3['Survived']
clean_df = intermediate_df3.drop('Survived', axis=1, inplace=False)
clean_df.head()

Unnamed: 0,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked_C,Embarked_Q,Embarked_S
0,3,0,22.0,1,0,7.25,0,0,1
1,1,1,38.0,1,0,71.2833,1,0,0
2,3,1,26.0,0,0,7.925,0,0,1
3,1,1,35.0,1,0,53.1,0,0,1
4,3,0,35.0,0,0,8.05,0,0,1


## Modeling

Now that our data is ready, we'll split our data into training and testing sets. 

In [3]:
X_train, X_test, y_train, y_test = train_test_split(clean_df, target)

Now that we have split our data into training and testing sets, we'll fit a **_Decision Tree Classifier_** and use it to make predictions on our testing set.  

In [4]:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
preds = clf.predict(X_test)

Now that we have some predictions, we can start the lab in earnest. 

We'll start by building a **_Confusion Matrix_**.

## Confusion Matrix

Let's start by examining the first 10 predictions from the model, and seeing how it compares to the the ground truth labels stored in `x_test`.  

In [20]:
list_y_test = list(y_test)

for ind, val in enumerate(preds[:10]):
    print('prediction for {}: {}'.format(ind + 1, val))
    print('    actual for {}: {}'.format(ind + 1, list_y_test[ind]))

prediction for 1: 1
    actual for 1: 1
prediction for 2: 1
    actual for 2: 1
prediction for 3: 0
    actual for 3: 0
prediction for 4: 0
    actual for 4: 1
prediction for 5: 1
    actual for 5: 0
prediction for 6: 0
    actual for 6: 0
prediction for 7: 1
    actual for 7: 1
prediction for 8: 0
    actual for 8: 0
prediction for 9: 1
    actual for 9: 1
prediction for 10: 1
    actual for 10: 1


There are four different outcomes possible here.  Let's examine each of them:

* **_True Positive_**: Predicted a 1, and it's actually a 1. (Examples 1, 2, 7, 9, and 10)
* **_True Negative_**: Predicted a 0, and it's actually a 0. (Examples 3, 6, and 8)
* **_False Positive_**: Predicted a 1, and it's actually a 0. (Example 5)
* **_False Negative_**: Predicted a 0, and it's actually a 1. (Example 4)

A **_Confusion Matrix_** is a tally of the total numbers of each type of outcome.  Take a look at a picture of a sample confusion matrix below:

<center><img src='confusion_matrix.png'></center>

A python dictionary is the perfect data type for building a confusion matrix.  Below, we'll write a function that takes in the predictions and labels, and returns the confusion matrix as a dictionary. 

In [23]:
def confusion_matrix(predicted, actual):
    cm = {
        'TP': 0,
        'TN': 0,
        'FP': 0,
        'FN': 0
    }
    
    for ind, val, in enumerate(list(actual)):
        pred = predicted[ind]
        if val == 0:
            if pred == 0:
                cm['TN'] += 1
            else:
                cm['FN'] += 1
        else:
            if pred == 1:
                cm['TP'] += 1
            else:
                cm['FP'] += 1
    return cm

cm = confusion_matrix(preds, y_test)
cm

{'FN': 18, 'FP': 19, 'TN': 87, 'TP': 54}

## Accuracy Metrics

Now that we have our confusion matrix, we can start calculating the accuracy metrics we're interested in. 

### Precision
**_Precision_** measures the percentage of positive predictions that were correct.  The equation for calculating Precision is:
<br>
<br>
<center>
$$
\normalsize 
Precision = \frac{True\ Positives}{True\ Positives + False\ Positives}
$$  
</center>


Write a function that takes in a confusion matrix and returns the precision score in the cell below.

In [27]:
def precision(cm):
    return cm['TP'] / (cm['TP'] + cm['FP'])

precision_score = precision(cm)
print('Precision: {}'.format(precision_score))
print('Expected:  0.7397260273972602')

Precision: 0.7397260273972602
Expected:  0.7397260273972602


### Recall
**_Recall_** measures the number of percentage of positive cases that our model caught, out of all positive cases. The equation for calculating recall is:
<br>
<br>
<center>
$$
\normalsize 
Recall = \frac{True\ Positives}{True\ Positives + False\ Negatives}
$$  
</center>

Write a function that takes in a confusion matrix and returns the recall score in the cell below.

In [31]:
def recall(cm):
    return cm['TP'] / (cm['TP'] + cm['FN'])

recall_score = recall(cm)
print('Recall:    {}'.format(recall_score))
print('Expected:  0.75')

Recall:    0.75
Expected:  0.75


### Accuracy
**_Accuracy_** measures the percentage of predictions we got correct, out of all possible predictions.  The Equation for calculating Accuracy is:
<br>
<br>
<center>
$$
\normalsize 
Accuracy = \frac{True\ Positives + True\ Negatives}{True\ Positives + True\ Negatives + False\ Positives + False\ Negatives}
$$  
</center>

Write a function that takes in a confusion matrix and returns the Accuracy score in the cell below.

In [35]:
def accuracy(cm):
    return (cm['TP'] + cm['TN']) / (cm['TP'] + cm['TN'] + cm['FP'] + cm['FN'])

accuracy_score = accuracy(cm)
print('Accuracy:    {}'.format(accuracy_score))
print('Expected:    0.7921348314606742')

Accuracy:    0.7921348314606742
Expected:    0.7921348314606742


### F1-Score
**_F1-Score_** is the _harmonic mean_ of precision and recall.  Because this is a weighted average of precision and recall, it is generally considered the most accurate measure of test performance.  The equation for calculating F1-Score is:
<br>
<br>
<center>
$$
\normalsize 
F1 = 2 * \frac{Precision * Recall}{Precision + Recall}
$$  
</center>

Write a function that takes in a confusion matrix and returns the F1-score in the cell below.

In [39]:
def f1_score(cm):
    p = precision(cm)
    r = recall(cm)
    return 2 * p * r / (p + r)

f1 = f1_score(cm)
print('f1-score:    {}'.format(f1))
print('Expected:    0.7448275862068966')

f1-score:    0.7448275862068966
Expected:    0.7448275862068966


There you have it--now you know how to calculate the most important accuracy metrics for classification using a confusion matrix, as well as an intuition for which metrics are most useful depending on the problem you're trying to solve!

## Summary

In this lab, we learned how to:
* Create a **_Confusion Matrix_**
* What **_True Positive, True Negative, False Positive,_** and **_False Negative_** mean in classification
* What **_Precision_** is, and how to calculate it
* What **_Recall_** is, and how to calculate it
* What **_Accuracy_** is, and how to calculate it
* What **_F1-Score_** is, and how to calculate it