# ROC and AUC 

This notebook covers the basics of Receiver Operator Characteristic (ROC) graphs and the concept of the area under the curve (AUC).

Source: "ROC and AUC, Clearly Explained!", StatQuest with Josh Starmer https://www.youtube.com/watch?v=4jRBRDbJemM

Data Science Project Steps:

    1) Load the data

    2) Define appropriate X and y

    3) Normalize the Data

    4) Split the data into train and test sets.

    5) Fit a model

    6) Predict

## Part 1: Model Confusion Matrix


### A confusion matrix for a binary model
![Binary Classifiction Confusion Matrix](supporting_images/Binary_Classifiction_Confusion_Matrix.jpg)


### A confusion matrix for a multi-categorical model
![Multi-Categorical Classifiction Confusion Matrix](supporting_images/Multi_Categorical_Models.png)


## Part 2: Sensitivity and Specificity

Sensitivity is the model's ability to correctly identify positives (belongs to a class). Specificity is the model's ability to correctly identify negatives (does not belong to a class):

![Sensitivity and Specificity Calculations](supporting_images/Sensitivity_and_Specificity.png)


### Calculating Sensitivity
![Calculating Sensitivity](supporting_images/Sensitivity_Categorical_Models.png)


### Calculating Specificity
![Calculating Specificity](supporting_images/Specificity_Categorical_Models.png)



## Part 3: ROC and AUC

Let's evaluate a categorical model with a threshold feature.


In [20]:
#Your code here
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd

In [23]:
iris = datasets.load_iris()

X = iris.data
y = iris.target
iris = pd.DataFrame(data= np.c_[X, y],
                     columns= iris['feature_names'] + ['target'])

In [25]:
iris.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [9]:
X = iris

array([[5.1, 3.5],
       [4.9, 3. ],
       [4.7, 3.2],
       [4.6, 3.1],
       [5. , 3.6],
       [5.4, 3.9],
       [4.6, 3.4],
       [5. , 3.4],
       [4.4, 2.9],
       [4.9, 3.1],
       [5.4, 3.7],
       [4.8, 3.4],
       [4.8, 3. ],
       [4.3, 3. ],
       [5.8, 4. ],
       [5.7, 4.4],
       [5.4, 3.9],
       [5.1, 3.5],
       [5.7, 3.8],
       [5.1, 3.8],
       [5.4, 3.4],
       [5.1, 3.7],
       [4.6, 3.6],
       [5.1, 3.3],
       [4.8, 3.4],
       [5. , 3. ],
       [5. , 3.4],
       [5.2, 3.5],
       [5.2, 3.4],
       [4.7, 3.2],
       [4.8, 3.1],
       [5.4, 3.4],
       [5.2, 4.1],
       [5.5, 4.2],
       [4.9, 3.1],
       [5. , 3.2],
       [5.5, 3.5],
       [4.9, 3.6],
       [4.4, 3. ],
       [5.1, 3.4],
       [5. , 3.5],
       [4.5, 2.3],
       [4.4, 3.2],
       [5. , 3.5],
       [5.1, 3.8],
       [4.8, 3. ],
       [5.1, 3.8],
       [4.6, 3.2],
       [5.3, 3.7],
       [5. , 3.3],
       [7. , 3.2],
       [6.4, 3.2],
       [6.9,

In [10]:
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle

from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.metrics import roc_auc_score

In [16]:
# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Encode the classes
y = label_binarize(y, classes=[0, 1, 2])
n_classes = y.shape[1]

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X, random_state.randn(n_samples, 200 * n_features)]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

## Create the classifier and a scoring array

In [17]:
# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
                                 random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)


Add additional metrics

In [19]:
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])