## This jupyter notebook implements a ROC curve analysis, a type of model evaluation metric.

An ROC curve provides the performance of a classifier as we manipulate its decision threshold. Performance is measured as a function of True Positive Rate (TPR) and False Positive Rate (FPR). The function plot_roc_curve() can do this automatically, as opposed to changing the decision threshold manually and recording TPR and FPR.

### Import packages as well as a toy dataset, and then split the data.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np

X, y = load_wine(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3)

### Call three seperate models and fit the mo the training data. Print the accuracy of the three models.

In [None]:
clf1 = SVC(gamma = 'scale', probability = True)
clf1.fit(X_train, y_train)

svc_score = clf1.score(X_test, y_test)

clf2 = RandomForestClassifier(n_estimators = 100)
clf2.fit(X_Train, y_train)

rfc_score = clf2.score(X_test, y_test)

clf3 = LogisticRegression(multi_class = 'auto', solver = 'lbfgs')
clf3.fit(X_train, y_train)

log_reg_score = clf3.score(X_test, y_test)

print('svc score was... ', svc_score, ' ,rfc score was... ', rfc_score, ' ,logreg score was...', log_reg_score)

### Scikit-plot, a useful visualization package, is not automatically loaded in SageMaker so install the package into the current notebook.

In [None]:
!pip install scikit-plot

### Finally, set the figure attributes and use the plot_roc_curve() function to create the plot.

In [None]:
import sklearn.metrics as metrics
import scikitplot as skplt
import matplotlib.pyplot as plt

from pylab import rcParams
rcParams['figure.figsize'] = 10, 10
rcParams['figure.dpi'] = 300
rcParams['font.size'] = 12

y_true = y_test
y_probas = clf1.predict_proba(X_test)

skplt.metrics.plot_roc_curve(y_true, y_probas)

plt.show