# Plotting Receiver Operating Characteristic (ROC) Curves Demo
<hr>

Author: ***Willa Potosnak***  &lt;wpotosna@andrew.cmu.edu&gt;

## Contents
### 1. [Introduction](#introduction) 

### 2. [Import Model Results](#import) 
       
### 3. [Plotting ROC Curves with Parametric Confidence Intervals](#para)

####   &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;   3.1 [Plotting with bokeh](#parabokeh)
####   &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;   3.2 [Plotting with matplotlib](#paramat)

### 4. [Plotting ROC Curves with Bootstrap (non-parametric) Confidence Intervals](#nonpara)
#### &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;   4.1 [Plotting with bokeh](#nonparabokeh)
####   &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;   4.2 [Plotting with matplotlib](#nonparamat)

<hr>


<a id='introduction'></a>

## 1. Introduction

Classification models that output scores, referred to as rating classifiers, do not output crisp binary decisionsor, or labels, to samples. As such, a discrimination threshold for the scores must be set to obtain binary decisions. Receiver Operating Characteristic (ROC) curve analysis is commonly used to evaluate binary classification model performance by computing the rates of true positive and negative classifications at each possible discrimination threshold. The ROC Curve plots the true positive rate (TPR) versus the false positive rate (FPR) (or the true negative rate (TNR) versus the false negative rate (FNR)) and provides a means to compare classifier performance in terms of ability to discriminate positive and negative samples. Common summary performance metrics for ROC curve analysis include area under the curve (AUC) as well as the specificity and sensitivity of the classifier.

The `ROC_Curve_bokeh` and `ROC_Curve_matplotlib` classes allow a user to plot ROC Curves using the bokeh and matplotlib, respectively. Both classes can be used to plot either the true positive rate versus the false postive rate (direction='TPRvsFPR') or the true negative rate versus the false negative rate (direction='TNRvsFNR'). The plots can be displayed either with the x-axis set on a linear (x_scale='linear') or log10 scale (x_scale='log'). Plot format (e.g, text size, plot x- and y-range, etc.) can also be customized.

<a id='import'></a>

## 2. Import Model Results
Create pandas DataFrame objects for the truth lables, classifier scores, and cross-validation folds (if applicable)

In [None]:
import numpy as np
import os
import pandas as pd
import sys

sys.path.append('../')

In [None]:
score_file_path = os.getcwd()
classifier_1_scores = pd.read_csv(score_file_path+'/classifier_1_scores.csv')
classifier_2_scores = pd.read_csv(open(score_file_path+'/classifier_2_scores.csv'))

classifier_1_truth = pd.DataFrame(classifier_1_scores, columns=['truth'])
classifier_1_predictions = pd.DataFrame(classifier_1_scores, columns=['predictions'])
classifier_1_folds = pd.DataFrame(classifier_1_scores, columns=['fold'])

classifier_2_truth = pd.DataFrame(classifier_2_scores, columns=['truth'])
classifier_2_predictions = pd.DataFrame(classifier_2_scores, columns=['predictions'])
classifier_2_folds = pd.DataFrame(classifier_2_scores, columns=['fold'])

<a id='para'></a>

## 3. Plotting ROC Curves with Parametric Confidence Intervals
The true label dataframe(s), prediction dataframe(s), fold dataframe(s), legend label(s), and line color(s) must be stored in a python list. 
Each list item should correspond to different classifier results.

In [None]:
truth = [classifier_1_truth, classifier_2_truth]
preds = [classifier_1_predictions, classifier_2_predictions]
folds = [classifier_1_folds, classifier_2_folds]
labels = ['classifier 1', 'classifier 2']
colors = ['blue', 'red']

<a id='parabokeh'></a>

### 3.1 Plotting with bokeh

In [None]:
from roc_curve.plotting import ROC_Curve_bokeh

plt = ROC_Curve_bokeh(axis_label_size='22', axis_tick_size='16', legend_text_size='18')

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0,1), yrange=(0,1), direction='TPRvsFPR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(-2.8,0), yrange=(0,1),  direction='TPRvsFPR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0,1), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(-2.8,0), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0)

<a id='paramat'></a>

### 3.2 Plotting with matplotlib

In [None]:
from roc_curve.plotting import ROC_Curve_matplotlib

plt = ROC_Curve_matplotlib(axis_label_size='22', axis_tick_size='16', legend_text_size='18')

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0,1), yrange=(0,1), direction='TPRvsFPR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0.001, 1), yrange=(0,1),  direction='TPRvsFPR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0,1), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=folds, xrange=(0.001, 1), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='parametric', alpha=0.05, bootstrap_iters=None, random_seed=0, save_dir=None)

<a id='nonpara'></a>

## 4. Plotting ROC Curves with Bootstrap (Non-parametric) Confidence Intervals
The true label dataframe(s), prediction dataframe(s), legend label(s), and line color(s) must be stored in a python list. Folds must be set to None. Each list item should correspond to different classifier results.

In [None]:
truth = [classifier_1_truth.iloc[np.where(classifier_1_folds==1)[0]], classifier_2_truth.iloc[np.where(classifier_2_folds==1)[0]]]
preds = [classifier_1_predictions.iloc[np.where(classifier_1_folds==1)[0]], classifier_2_predictions.iloc[np.where(classifier_2_folds==1)[0]]]
labels = ['classifier_1', 'classifier_2']
colors = ['blue', 'red']

<a id='nonparabokeh'></a>

### 4.1 Plotting with bokeh

In [None]:
from roc_curve.plotting import ROC_Curve_bokeh

plt = ROC_Curve_bokeh(axis_label_size='22', axis_tick_size='16', legend_text_size='18')

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0, 1), yrange=(0,1),  direction='TPRvsFPR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(-2.8,0), yrange=(0,1),  direction='TPRvsFPR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0, 1), yrange=(0,1),  direction='TNRvsFNR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(-2.8,0), yrange=(0,1),  direction='TNRvsFNR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='bottom_right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0)

<a id='nonparamat'></a>

### 4.2 Plotting with matplotlib

In [None]:
from roc_curve.plotting import ROC_Curve_matplotlib

plt = ROC_Curve_matplotlib(axis_label_size='22', axis_tick_size='16', legend_text_size='18')

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0,1), yrange=(0,1), direction='TPRvsFPR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0.001, 1), yrange=(0,1),  direction='TPRvsFPR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0,1), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='linear', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0, save_dir=None)

plt.plot(true_labels=truth, predictions=preds, folds=None, xrange=(0.001, 1), yrange=(0,1), direction='TNRvsFNR', 
         x_scale='log', line_width=4, line_dash='solid', line_color=colors, legend_label=labels, 
         legend_location='lower right', ci_method='bootstrap', alpha=0.05, bootstrap_iters=100, random_seed=0, save_dir=None)