# Visualizing the Confusion Matrix, ROC Curve, and Precision-Recall Curve

Author: Dr. Elaina A. Hyde

---

The interactive visualization below lets you see how the confusion matrix, ROC curve, and Precision-recall curve all interact. 

The model is a logistic regression fit on the cancer vs. healthy data. The raw datapoints are shown along with the prediction curve (right panel). 

You can change the threshold and see how it affects the confusion matrix, as well as where that threshold is on the corresponding ROC or precision-recall curve (left panel). 


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

plt.style.use('fivethirtyeight')

from ipywidgets import *
from IPython.display import display

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

---

**The visualization allows you to change 3 variables:**
- **spread**: the dispersion of the data (impacts the signal and how well the classifier can discriminate between the points).
- **threshold**: the decision threshold for labeling 1 vs. 0
- **cancer %**: the number of datapoints that are cancer vs. healthy. This helps show the effect of class imbalance on classifier performance and metrics.

In [3]:
import imp
plotter = imp.load_source('plotter', 'roc_plotter.py')
from plotter import ROCLogisticPlotter

roc_plotter = ROCLogisticPlotter()
roc_plotter.preconstruct_data()
roc_plotter.roc_interact()

Constructing data


  'precision', 'predicted', average, warn_for)


interactive(children=(IntSlider(value=75, description='spread:'), FloatSlider(value=0.5, description='threshol…

---

### Relevant classification metrics

This reference table describes some of the important metrics displayed in the visual below.

|   |   |
|---|---|
|**TPR/RECALL**    | The true positive rate, also known as the **sensitivity** or **recall**. It is the ability of the classifier to correctly identify a class. <br><br>`Recall = True Positives / (True Positives + False Negatives)`<br><br>A recall of 1 indicates that the classifier correctly predicted all observations of the class.  0 means the classifier predicted all observations of the current class incorrectly.|
|**FPR** | The false positive rate is the percent of times model predicts 1 when the class is 0. This is the x-axis on the ROC curve.<br><br> `fpr = fp / (tn + fp)`<br><br>|
|**PRECISION** | The ability of the classifier to avoid labeling a class as a member of another class. <br><br> `Precision = True Positives / (True Positives + False Positives)`<br><br>_A precision score of 1 indicates that the classifier never mistakenly classified the current class as another class.  precision score of 0 would mean that the classifier misclassified every instance of the current class_ |
|**RECALL**    | The ability of the classifier to correctly identify the current class. <br><br>`Recall = True Positives / (True Positives + False Negatives)`<br><br>A recall of 1 indicates that the classifier correctly predicted all observations of the class.  0 means the classifier predicted all observations of the current class incorrectly.|
|**AUC** | The area under the curve: this can refer to either the ROC curve or the precision-recall curve. In the case of the ROC curve, an area of 0.50 is the baseline, meaning this is the area under the curve when the classifier would be predicting at chance. An AUC of 1.0 is a perfect model, where the classifier never makes a mistake. <br><br>|

In [4]:
from sklearn.metrics import classification_report

In [5]:
print(classification_report(roc_plotter.currents['y_true'], roc_plotter.currents['y_pred']))

             precision    recall  f1-score   support

        0.0       0.69      0.66      0.67        50
        1.0       0.67      0.70      0.69        50

avg / total       0.68      0.68      0.68       100

