# ROC curve. What is it and how can you interpret it?


<!-- We will say that a binary classifier is an algorithm that classifies elements from a set into one of two groups on the basis of a classification rule. For instance, a machine learning algorithm that classifies if an email is spam or not is a binary classifier. Another example would be a medical test that says if a patient is positive or negative to a certain disease acts as a binary classifier. For the rest of the notebook, we will stick to only classifying if an element from the set is positive or negative. -->

Learning Objectives for this Jupyter Notebook:

* LO1: Define ROC curves and recall their purpose in evaluating classification algorithms.
* LO2: Interpret ROC curves by understanding how changes in threshold values affect sensitivity and specificity


Before explaining the ROC, let's clarify some terms!

## Classifiers

Let's start with classification problems involving two classes. Each instance is assigned either a positive or negative class label. A *classification model*, also called a *classifier*, maps instances to predicted classes. 
<!-- Some models give a continuous output, like an estimate of an instance's class membership probability. We can set different thresholds to predict class membership. Other models directly assign a discrete class label, indicating the predicted class of the instance. -->

For instance, a machine learning algorithm that classifies if an email is spam or not is a classifier. Another example would be a medical test that says if a patient is positive or negative to a certain disease acts as a binary classifier. For the rest of the notebook, we will stick to only classifying if an element from the set is positive or negative.

In [None]:
# Example of a binary classifier. Just run the cell and see what is classified.
# Imagine you want to classify if a patient is positive (has diabetes) or
# negative (does not have diabetes) based on their glucose concentration

glucose_concentration = [116,141,200,239,103,140,98,202,134,129,124]

# 1 means positive and 0 means negative
actual_instance = [1,1,1,1,0,0,0,1,0,1,1]

threshold = 140
predicted_instance = [1 if glucose > threshold else 0 for glucose in glucose_concentration]
print("actual instances:    ", actual_instance)
print("predicted instances: ", predicted_instance)

From the examples above, we can see that some of the patients were correctly classified as positive (diabetic) and negative (not diabetics) whereas some patients were incorrectly diagnosed.


##  Classification Outcomes
Given a classifier and an instance, there are four possible outcomes:

1. ***True Positive (TP)***: The instance is positive (e.g., the patient has diabetes) and is correctly classified as positive.
2. ***False Negative (FN)***: The instance is positive but is incorrectly classified as negative.
3. ***True Negative (TN)***: The instance is negative (e.g., the patient does not have diabetes) and is correctly classified as negative.
4. ***False Positive (FP)***: The instance is negative but is incorrectly classified as positive.


## Confusion Matrix
Given a classifier and a set of instances (the test set), we can construct a two-by-two confusion matrix (also called a contingency table) to represent the outcomes for the set of instances. This matrix forms the basis for many common evaluation metrics.

Here is the structure of a confusion matrix:


![ROC_Confusion_Matrix.png](Images/ROC_Confusion_Matrix.png)

## Exercise 1

For the example above, can you calculate the TP, FP, FN, TN?

In [None]:
# Run this cell to see the solution for exercise 1

%run ./Solutions/solution_exercise_1.py

# How can we evaluate a classifier?

There are several metrics that could indicate if a classifier is good or not. Some of them are:

Accuracy = $\frac{Number \space of \space  Correct \space Predictions}{Total \space Number \space of  \space Predictions} = \frac{TP + TN}{TP + TN + FP + FN}$


True positive rate = $\frac{Positives \space correctly \space classified}{Total \space positives} = \frac{TP}{TP + FN}$


False positive rate = $\frac{Negative \space incorrectly \space classified}{Total \space negatives} = \frac{FP}{TN + FP}$

Sensitivity = True positive rate

Specificity = 1 - False positive rate

# ROC curve

ROC graphs are two-dimensional graphs in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis. An ROC curve plots TPR vs. FPR at different classification thresholds. The classification threshold in machine learning is a boundary or a cut-off point used to assign a specific predicted class for each object. An ROC graph depicts relative tradeoffs between benefits (true positives) and costs (false positives).

Below there is an example of an ROC curve:
 
![ROC_picture_gdev.png](Images/ROC_picture_gdev.png)

Here, there is another example of an ROC curve:

![Graph_from_ML_course.png](Images/Graph_from_ML_course.png)

*Graph taken from the machine learning course CSE2510*

# ROC Space

Let's analyze some key points in the ROC space to understand different classifier behaviors:

* **Lower Left Point (0,0)**: This point represents the strategy of never issuing a positive classification. Such a classifier commits no false positive errors but also gains no true positives. Essentially, it predicts all instances as negative.


* **Upper Left Point (0,1)**: This point represents perfect classification. The classifier achieves a true positive rate (TPR) of 1 and a false positive rate (FPR) of 0, meaning it correctly identifies all positive instances without any false positives. This is depicted as point ***D*** in the figure below, indicating perfect performance.


* **Northwest Direction in ROC Space**: Informally, one point in ROC space is better than another if it is to the northwest (higher TPR and/or lower FPR) of the first. This implies a higher true positive rate and/or a lower false positive rate, which is desirable.


* **Diagonal Line (y = x)**: This line represents the strategy of randomly guessing the class. For example, if a classifier guesses the positive class 70% of the time, it is expected to correctly identify 70% of the positives (TPR = 0.7), but its false positive rate will also increase to 70% (FPR = 0.7), yielding the point (0.7, 0.7) in ROC space. Point ***C*** in the figure below represents such a classifier, which performs virtually as random guessing.


* **Lower Right Triangle**: Any classifier that appears in the lower right triangle performs worse than random guessing. This region is typically empty in ROC graphs. If we negate a classifier—that is, reverse its classification decisions on every instance—its true positive classifications become false negative mistakes, and its false positives become true negatives. Therefore, any classifier producing a point in the lower right triangle can be negated to produce a point in the upper left triangle. Point ***E*** performs much worse than random and is the negation of point ***B***.


* **Classifiers on the Diagonal**: A classifier that lies on the diagonal (y = x) has no information about the class. Its performance is equivalent to random guessing.


* **Classifiers Below the Diagonal**: A classifier below the diagonal may be said to have useful information but is applying it incorrectly. Negating such a classifier can yield a useful classifier with performance above the diagonal.

![ROC_Confusion_Matrix.png](Images/ROC_space.png)

## Exercise 2

1. Which classifier is best from A,B,C,D,E based on the ROC curve above?
2. How is a point in the ROC space determined to be better than another point?
3. What does the point (1,1) represent in the ROC space?

In [None]:
# Run this cell to see the solution for exercise 2

%run ./Solutions/solution_exercise_3.py

# Let's see some ROC curve

Let's look at a simple example. Imagine we have a binary classifier for detecting if someone has got Covid. We are just looking at two "features" (or characteristics) of a patient. For this classifier, we will only consider the temperature of a patient and his oxigen levels. Below there is a plot of the patients represented as data points. The red points represent the infected people and the blue points represent the not-infected people.

![Covid_distribution.png](Images/Covid_distribution.png)

We need to distinguish the positive from the negative points. We could use a ***desicion boundary***.
 
A decision boundary is like a rule or line that helps you decide which category (positive or negative) a new patient belongs to based on its characteristics. Unfortunately, there isn't any line that could separate perfectly the positives and negatives. 

Let's have a simple binary classifier  that determines if a patient is positive or negative. This classification model will use the decision boundary line y = x - threshold. If a point is above the line, then it is classified as negative otherwise positive. In the next cell, there are 2 plots, the left one displaying the data points with the decision boundary and the right one displaying the ROC curve with the operating point.

![covid_static.png](Images/covid_static.png)

# How can we evaluate a ROC curve?

To compare classifiers we may want to reduce ROC performance to a single scalar value representing expected performance. A common method is to calculate the *area under the ROC curve*, abbreviated ***AUC*** . 

Since the AUC is a portion of the area of the unit square, its value will always be between 0 and 1.0. However, because random guessing produces the diagonal line between (0,0) and (1,1), which has an area of 0.5, so no realistic classifier should have an AUC less than 0.5.

The AUC has an important statistical property: the AUC of a classifier is equivalent to the *probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance*.

![AUC_ML.png](Images/AUC_ML.png)

Remember! The AUC offers a convenient summary metric to compare overall performance across different classifiers. 

## Exercise 3

For the following ROC curve, calculate the AUC?

![AUC_Calculate.png](Images/AUC_Calculate.png)

In [None]:
# Run this cell to see the solution for exercise 3

%run ./Solutions/solution_exercise_4.py

## Exercise 4

We have seen that for a binary classifier, we could use accuracy or ROC curve to evaluate its performance. However, ROC is preferred to accuracy. Can you come with an example of a binary classifier with a high accuracy, but low TPR?

In [None]:
# Run this cell to see the solution for exercise 4

%run ./Solutions/solution_exercise_5.txt

# Static visualisations for ROC curve

Below, there are some imagines of static visualisations for ROC curves. 

The class distribution plot portrays the distributions of the positive (orange) and negative (black) instances. These distributions are normal distribution and the sliders represent the parameters related to the distribution. The green classification thresholds classifies the instance into positive and negative. The left side of the threshold is classified as negative, whereas the right side as positive.


The ROC curve is displayed on the right side of the plot. The green operating point is the point on the ROC curve. The Confusion Matrix plot displays the four classification outcomes for the specific threshold. Finally, there is AUC plot illustrating the area under the curve and the Accuracy plot showing the accuracy of the model.

![static_1.png](Images/static_1.png)
![static_2.png](Images/static_2.png)
![static_3.png](Images/static_3.png)
![static_4.png](Images/static_4.png)

# Summary

We've covered receiver operator characteristic (ROC) curves in some depth. We learned they graph how often we mistakenly assign a true label against how often we correctly assign a true label. Each point on the graph represents one threshold that was applied.

We also saw how area-under the curve (AUC) can give us an idea as to how reliant our model is to having the perfect decision threshold. It's also a handy measure to compare two models to one another. Congratulations!