<a href="https://colab.research.google.com/github/cagBRT/Data/blob/main/Imbalanced_Datasets_4a.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone -l -s https://github.com/cagBRT/Machine-Learning.git cloned-repo
%cd cloned-repo

In [None]:
from IPython.display import Image

# **Receiver Operating Characteristic (ROC) Curves and AUC**

Most imbalanced classification problems involve two classes:<br>
- a negative case with the majority of examples <br>
- a positive case with a minority of examples.

Two diagnostic tools that help in the interpretation of binary (two-class) classification predictive models are:<br>

- ROC Curves <br>
- Precision-Recall curves.

When we train a classification model, we get the probability of getting a result. In this case, our example will be the likelihood of repaying a loan.

ROC Curves and Precision-Recall Curves provide a diagnostic tool for binary classification models.<br>

ROC AUC and Precision-Recall AUC provide scores that summarize the curves and can be used to compare classifiers.<br>

ROC Curves and ROC AUC can be optimistic on severely imbalanced classification problems with few samples of the minority class.

In [None]:
Image("images/ClassModel1.png", width=600)

The probabilities usually range between 0 and 1. The higher the value, the more likely the person is to repay a loan.

Next step is to find a threshold to classify the probabilities as “will repay” or “won’t repay”.<br>
All predictions at or above this threshold, are classified as “will repay”<br>
All predictions below this threshold, are classified as “won’t repay

**Summary of TP and TN**

In [None]:
Image("images/ClassModel2.png", width=600)

**The False Positve Rates and the True Positive Rates decrease as the threshold increases**

In [None]:
Image("images/ClassModel3.png")

**The ROC Curve**<br>

To plot the ROC curve, we need to calculate the TPR and FPR for many different thresholds <br>

For each threshold, we plot the FPR value in the x-axis and the TPR value in the y-axis. We then join the dots with a line.



In [None]:
Image("images/ROCCurve.png")

In [None]:
Image("images/ROCTypicalGraph.png")

The point on an ROC curve closest to (0.0,1.0) theoretically identifies the ideal classification threshold. <br>

However, several other real-world issues influence the selection of the ideal classification threshold. For example, perhaps false negatives cause far more pain than false positives.
<br>

A numerical metric called AUC summarizes the ROC curve into a single floating-point value.

These plots summarize the performance of *binary classification models* on the positive class.

In [None]:
# example of a precision-recall auc for a predictive model
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve
from matplotlib import pyplot

In [None]:
from sklearn.metrics import confusion_matrix
#importing accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import classification_report

In [None]:
# generate 2 class dataset
X, y = make_classification(n_samples=10000, n_classes=2,n_features=4, random_state=1,weights=[0.99])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2)

In [None]:
# fit a model
model = LogisticRegression(solver='lbfgs')
model.fit(trainX, trainy)
# predict probabilities
y_pred=model.predict(testX)

yhat = model.predict_proba(testX)
# retrieve just the probabilities for the positive class
pos_probs = yhat[:, 1]

In [None]:
confusion = confusion_matrix(testy, y_pred)
print('Confusion Matrix\n')
print(confusion)

**Plot the ROC Curve**s

The upper left point iin the plot is a perfect skill. <br>

If a model has no skill at class prediction, then its performance will be the diagona line from lwer left to upper right. <br>

If the performance falls below the diagonal line, it is worse than a no skill model.


In [None]:
# plot no skill roc curve
pyplot.plot([0, 1], [0, 1], linestyle='--', label='No Skill')
# calculate roc curve for model
fpr, tpr, _ = roc_curve(testy, pos_probs)
# plot model roc curve
pyplot.plot(fpr, tpr, marker='.', label='Logistic')
# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')
# show the legend
pyplot.legend()
# show the plot
pyplot.show()

# Area Under the ROC Curve

In [None]:
Image("images/AUC.png")

The area covered below the line is called “**Area Under the Curve (AUC)**”.

 This is used to evaluate the performance of a classification model. <br>
 The higher the AUC, the better the model is at distinguishing between classes.

AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

AUC is desirable for the following two reasons:

**1.AUC is scale-invariant**. It measures how well predictions are ranked, rather than their absolute values.<br>

**2. AUC is classification-threshold-invariant**. It measures the quality of the model's predictions irrespective of what classification threshold is chosen.
However, both these reasons come with caveats, which may limit the usefulness of AUC in certain use cases:

**Scale invariance is not always desirable**. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.

**Classification-threshold invariance is not always desirable**. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.

# How to improve AUC?<br>
In order to improve AUC, it is overall to improve the performance of the classifier. Several measures could be taken for experimentation. However, it will depend on the problem and the data to decide which measure will work.<br>
>(1) Feature normalization and scaling. Basically, this is a method that improves the performance of the linear (logistic) model.<br>
(2) Improve class imbalance. In classification problems, a bunch of them have imbalance classes. Setting class weights, or performing upward/downward sampling will help.<br>
(3) Optimize other scores. Defining the right score for the problem, and optimize the score will help the prediction performance.<br>
(4) Explore different models. Among the classification model, choose the model that has the best performance on the problem.<br>
(5) Tune the parameter through grid search. Grid search is an automatic way to tune your parameter.<br>
(6) Error analysis. Go back to check the false positive and false negative cases and find the reasons for this.<br>
(7) Include more features or fewer features.<br>
(8) There are also researches on optimizing AUC scores directly through investigating the relationship of AUC and error rate, or with the models, leading to a more straightforward but also more complicated analysis.

High AUC means your algorithm does a good job at *ranking* the test data, with most negative cases at one end of a scale and positive cases at the other.

https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5


To visualize how AUC is affected by different level of separation/discrimination, the distribution of probability for both the positive and negative classes are plotted below. When the overlap of two classes increases, the harder it gets to separate them and results in the decrease of AUC - random separation, at which AUC is equal to 0.5. <br>

Interestingly, **the classifier can be a good one after reversing the predictions if the ROC curve lies in the right-bottom corner with AUC <=0.5.**



In [None]:
!wget https://sinyi-chou.github.io/images/classification/prob_dist_animation.gif
Image(open('prob_dist_animation.gif','rb').read(), width=600)

In [None]:
Image("images/multiple_ROCs_plot.png", width=600)

AUC is a threshold-free metrics capable of measuring the overall performance of binary classifier.

AUC can only be used in binary classification. In multinomial classification, one-to-rest AUC would be an option using the average of each class.

AUC is a good metric when the rank of output probabilities is of interest.

**Although AUC is powerful, it is not a cure-all. AUC is not suitable for heavily imbalanced class distribution and when the goal is to have well-calibrated probabilities.**

Models with maximized AUC treat the weight between positive and negative class equally.