<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Imbalanced_Datasets_4b.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Receiver Operation Characteristic (ROC)**

While ROC Curve is a helpful diagnostic tool, it can be difficult to compare classifiers based on their curves. <br>
Instead, the area under the curve can be calculated to give a single score for a classifier model across all threshold values. <br>
This is called the ROC area under curve or ROC AUC or sometimes ROCAUC. <br>
The score is a value between 0.0 and 1.0, with **1.0 indicating a perfect classifier**.<br>

This single score can be used to compare binary classifier models directly

In [None]:
from IPython.display import Image
def page(num):
    return Image("image/"+str(num)+ ".png" , width=640)

In [None]:
# Clone the entire repo.
!git clone -l -s https://github.com/cagBRT/Data.git cloned-repo
%cd cloned-repo

In [None]:
Image("images/AUC"+ ".png" , width=640)

The ROC AUC should be used with caution. <br>

For imbalanced classification with a severe skew and few examples of the minority class, the ROC AUC can be misleading. <br>

This is because a small number of correct or incorrect predictions can result in a large change in the ROC Curve or ROC AUC score

In [None]:
# example of a roc auc for a predictive model
from sklearn.datasets import make_classification
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from numpy import where
from matplotlib import pyplot

In [None]:
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.inspection import DecisionBoundaryDisplay
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier


**Create a dataset**

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)

**Plot the data set**

In [None]:
for class_value in range(2):
  # get row indexes for samples with this class
  row_ix = where(y == class_value)
    # create scatter of these samples
  pyplot.scatter(X[row_ix, 0], X[row_ix, 1])
  # show the plot
pyplot.show()

**Create a Dummy classifier model and train it**

In [None]:
# no skill model, stratified random class predictions
model = DummyClassifier(strategy='stratified')
model.fit(trainX, trainy)
yhat = model.predict_proba(testX)
pos_probs = yhat[:, 1]

**Calculate the ROC AUC score**

In [None]:
# calculate roc auc
roc_auc = roc_auc_score(testy, pos_probs)
print('No Skill ROC AUC %.3f' % roc_auc)

**Create and train a logistic regression model**

In [None]:
# skilled model
model = LogisticRegression(solver='lbfgs')
model.fit(trainX, trainy)
yhat = model.predict_proba(testX)
pos_probs = yhat[:, 1]
# calculate roc auc
roc_auc = roc_auc_score(testy, pos_probs)
print('Logistic ROC AUC %.3f' % roc_auc)

    
Example of classification model use:
    
>  KNeighborsClassifier(3),<br>
  SVC(kernel="linear", C=0.025, random_state=42, probability=True),<br>
  SVC(gamma=2, C=1, random_state=42, probability=True),<br>
  GaussianProcessClassifier(1.0 * RBF(1.0), random_state=42),<br>
  DecisionTreeClassifier(max_depth=5, random_state=42),<br>
  RandomForestClassifier(
        max_depth=5, n_estimators=10, max_features=1, random_state=42
  ),<br>
  MLPClassifier(alpha=1, max_iter=1000, random_state=42),<br>
  AdaBoostClassifier(random_state=42),<br>
  GaussianNB(), <br>
  QuadraticDiscriminantAnalysis(),<br>