<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Imbalanced_Datasets_4a.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Receiver Opeating Characteristic (ROC) Curves and ROC AUC**

Most imbalanced classification problems involve two classes:<br>
- a negative case with the majority of examples <br>
- a positive case with a minority of examples. 

Two diagnostic tools that help in the interpretation of binary (two-class) classification predictive models are:<br>

- ROC Curves <br> 
- Precision-Recall curves.

ROC Curves and Precision-Recall Curves provide a diagnostic tool for binary classification models.<br>

ROC AUC and Precision-Recall AUC provide scores that summarize the curves and can be used to compare classifiers.<br>

ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class.

These plots summarize the performance of *binary classification models* on the positive class. 

In [None]:
# example of a precision-recall auc for a predictive model
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from matplotlib import pyplot

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=1)
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)

In [None]:
# fit a model
model = LogisticRegression(solver='lbfgs')
model.fit(trainX, trainy)
# predict probabilities
yhat = model.predict_proba(testX)
# retrieve just the probabilities for the positive class
pos_probs = yhat[:, 1]

**Plot the ROC Curve**s

The upper left point iin the plot is a perfect skill. <br>

If a model has no skill at class prediction, then its performance will be the diagona line from lwer left to upper right. <br>

If the performance falls below the diagonal line, it is worse than a no skill model.


In [None]:
# plot no skill roc curve
pyplot.plot([0, 1], [0, 1], linestyle='--', label='No Skill')
# calculate roc curve for model
fpr, tpr, _ = roc_curve(testy, pos_probs)
# plot model roc curve
pyplot.plot(fpr, tpr, marker='.', label='Logistic')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()