# Other Classification Performance Measures

So far we've been introduced to the concept of classification and we've learned a basic algorithm, $k$-nearest neighbors.

In this notebook we'll discuss additional performance measures for classification problems.

## What We'll Accomplish

- Introduce the confusion matrix
- Discuss precision, recall, the false positive rate, the true positive rate
- Look at ROC curves and the area beneath them

Let's go!

In [1]:
# to get the iris data
from sklearn.datasets import load_iris

# for data handling 
import pandas as pd
import numpy as np

# for plotting
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

## Accuracy Isn't Everything

Sometimes accuracy is a misleading measure. Suppose your dataset has an extreme split, say $90\%$ in class $0$ and $10\%$ in class $1$. In that instance we could build a $90\%$ accurate classifier by just labeling everything as $0$. However, we didn't correctly identify any of the class $1$ objects. This would be awful, if for instance class $1$ represented a person that has a disease.

It is thus important to consider other performance measures when deciding if a classifier is good.

### Now Things Start To Get Confusing

Additional performance measures are derived from the confusion matrix, pictured for binary problems below.

<img src="conf_mat.png" alt="Confusion Matrix Image" style="width:50%px;">

Here the diagonal of the box represents data points that were correctly predicted by the algorithm, the off-diagonal represents points that are incorrectly predicted by the algorithm. Contained within each box of the confusion matrix are counts of how the algorithm sorted. For instance, in the TP box would be the total number of correct positive (correctly classified as $1$) classifications the algorithm made. (<i>Note that you can extend the confusion matrix to a multiclass problem by just adding rows and columns accordingly. However, we'll lost the true positive true negative nomenclature</i>.)

Two popular measures derived from the confusion matrix are the algorithm's <i>precision</i> and <i>recall</i>:

$$
\text{precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}, \text{ out of all points predicted to be class } 1, \text{ what fraction were actually class } 1.
$$
$$
\text{recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}, \text{ out of all the actual data points in class } 1 \text{, what fraction did the algorithm correctly predict?}
$$

You can think of precision as how much you should trust the algorithm when it says something is class $1$. Recall estimates the probability that the algorithm correctly detects class $1$ data points.

You've likely heard of these types of measures in all the news stories about COVID-19 tests, as they are quite important in the field of public health.

Let's examine the training precision and recall for a virginica classifier using the iris data.

In [2]:
## Load the data
iris = load_iris()
iris_df = pd.DataFrame(iris['data'],columns = ['sepal_length','sepal_width','petal_length','petal_width'])

## Create a virginica variable
## this will be our target
iris_df['virginica'] = 0 
iris_df.loc[iris['target'] == 2,'virginica'] = 1

X = iris_df[['sepal_length','sepal_width','petal_length','petal_width']].to_numpy()
y = iris_df['virginica'].to_numpy()

In [3]:
from sklearn.model_selection import train_test_split

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.25, 
                                                    random_state=111,
                                                    stratify=y)

Now we'll build a $k$-nearest neighbor classifier using $k=10$. We'll then examine the confusion matrix on the training data.

In [5]:
## import Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier

In [6]:
## Make the model object
knn = KNeighborsClassifier(n_neighbors = 10)

In [7]:
## Fit the model object
knn.fit(X_train,y_train)

KNeighborsClassifier(n_neighbors=10)

In [8]:
## get the predictions
y_train_pred = knn.predict(X_train)

In [9]:
## now we can import the confusion matrix
## function from sklearn
from sklearn.metrics import confusion_matrix

Confusion matrix docs, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html">https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html</a>.

In [10]:
confusion_matrix(y_train, y_train_pred)

array([[73,  2],
       [ 3, 34]])

In this example we have $73$ true negatives (TN), $2$ false positives (FP), $3$ false negatives (FN) and $34$ true positives (TP).

### You Code

You write code to calculate the precision and recall for this classifier.

#### Calculate the Recall and Precision by Hand

Using the `sklearn` `confusion_matrix` function calculate the precision and recall by hand.

In [None]:
## Code here





In [None]:
## Print out the precision and recall here




##### Use `sklearn` to calculate the precision and recall

Look at the following documentations to figure out how you can us `sklearn` to calculate the recall or precision score.

- `precision_score` docs, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html">https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html</a>
- `recall_score` docs, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html">https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html</a>.

In [None]:
## Code here
## Import the functions from sklearn





In [None]:
## Code here
## calculate the recall and precision here





### The ROC Curve

Another way to measure the performance of your classifier is the <i>receiver operating characteristic</i> (ROC) curve.

This plots the <i>true positive rate</i> (tpr), i.e. the recall, vs the <i>false positive rate</i> (fpr). Returning to the confusion matrix the fpr is:
$$
\text{fpr} = \frac{\text{FP}}{\text{FP}+\text{TN}} = 1 - \frac{\text{TN}}{\text{FP} + \text{TN}} = 1 - \text{specificity},
$$
specifity is a term used a lot in public health and is another name for true negative rate. Public Health so hot right now.

Let's plot an ROC curve for our logistic regression model.

#### Prediction Probabilities

In order to get arrays of true positive and false positive rates for a particular classifier you need a vector of probabilities (the probability that each observation is the class of interest). You then calculate the true and false positive rates for various probability cutoffs.

Let's see what we mean.

In [None]:
## This gets the probabilities

## Note this is common for most of sklearn's classification algorithms
probs = knn.predict_proba(X_train)

In [None]:
## The first column is the probability that the observation is 0
## The second column is the probability that the observation is 1
probs

In [None]:
## Now lets calculate the TPR and FPR for a
## cutoff of .4
cutoff = .4

y_train_pred = np.ones(len(y_train))
y_train_pred[probs[:,1] < .4] = 0

conf_mat = confusion_matrix(y_train, y_train_pred)
tp = conf_mat[1,1]
tn = conf_mat[0,0]
fn = conf_mat[1,0]
fp = conf_mat[0,1]

print("The false positive rate is",np.round(fp/(tn+fp),4)*100)
print("The true positive rate is",np.round(tp/(tp+fn),4)*100)

We can now build a loop to get the various tpr and fprs for different cutoffs.

In [None]:
cutoffs = np.arange(0,1.01,.01)

tprs = []
fprs = []

for cutoff in cutoffs:
    y_pred = np.ones(len(probs))
    y_pred[probs[:,1] < cutoff] = 0 

    # tpr = tp/(tp + fn)
    tpr = np.sum(y_pred[y_train == 1])/np.sum(y_train == 1)

    # fpr = fp/(fp + tn)
    fpr = np.sum(y_pred[y_train == 0])/np.sum(y_train==0)
    
    tprs.append(tpr)
    fprs.append(fpr)

In [None]:
plt.figure(figsize=(12,8))

plt.plot(fprs,tprs)

plt.xlabel("False Positive Rate",fontsize=16)
plt.ylabel("True Positive Rate",fontsize=16)

plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

plt.title("ROC Curve", fontsize=18)

plt.show()

### You Code

Go to the documentation for the `sklearn` function `roc_curve`, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html">https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html</a> and attempt to figure out how you can use that instead of a for loop to plot the roc curve.

In [None]:
## Import roc_curve here





In [None]:
## use roc_curve to get the fpr, tpr and cutoffs
## for the training data and the corresponding probabilities






In [None]:
## Plot the corresponding ROC Curve here





#### AUC - (Do this in the breakout session)

ROC curves come with an additional measure called AUC (area under the curve). An AUC of $1$ would be a perfect classifer, an AUC of $.5$ is what you'd get with random guessing (for a binary classifier). So what is a good AUC? Well it's hard to say with a single classifier, but it can be used to compare multiple classifiers, for example if you're choosing between a classifier with AUC $.8$ and an AUC of $.85$ you'd go with the one that has AUC $.85$.

Let's see how to calculate AUC with `sklearn`.

In [None]:
## import the function from sklearn
from sklearn.metrics import roc_auc_score

`roc_auc_score` docs, <a href="https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html">https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html</a>.

In [None]:
## put in the true classes along with the 
## probability scores.
roc_auc_score(y_train,probs[:,1])

Compare the training set AUC scores of $k$-nearest neighbors classifiers with $k=1$ all the way to $k=20$. Which one performs best on the training set?

In [None]:
## Code here
## Write a loop for getting the aucs for different
## values of k here
for k in range(1,21):
    
    
    
    

In [None]:
## Plot the training AUC score as a function of
## the number of neighbors here





In [None]:
## Plot the roc_curve of the model with best
## training AUC here






That's it for this notebook!

The important take away for this notebook is that we need to be careful about what performance metric we use for classification problems. Anytime you work on a classification project, think about what the end goal is for either yourself or your use case. That should inform what performance you prioritize.

Next time we'll learn about logistic regression!

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2021.

Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)