# ROC-AUC

```{figure} https://upload.wikimedia.org/wikipedia/commons/3/36/Roc-draft-xkcd-style.svg
:align: center
```

ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) is a commonly used evaluation metric for binary classification models. 

It measures the performance of a classifier in distinguishing between the positive and negative classes and is particularly useful when dealing with imbalanced datasets. 

The ROC-AUC score quantifies the area under the ROC curve, which plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various classification thresholds.


1. True Positive Rate (TPR)
    $$
    \text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}
    $$
    
2. False Positive Rate (FPR)
    $$
    \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}
    $$
 

Here's how ROC-AUC works:

1. Receiver Operating Characteristic (ROC) Curve:

    * The ROC curve is created by plotting the True Positive Rate (TPR) on the y-axis and the False Positive Rate (FPR) on the x-axis at different classification thresholds.
    
    * The ROC curve visually illustrates the trade-off between TPR and FPR as you adjust the classification threshold.
    
    <br>
    
2. Area Under the Curve (AUC):
    * The ROC-AUC score quantifies the area under the ROC curve.
    * A perfect classifier has an AUC of 1, while a random classifier has an AUC of 0.5.
    * An AUC value greater than 0.5 indicates that the model is better than random at distinguishing between the classes.

    <br>

3. Interpretation:
    * A higher ROC-AUC score generally indicates a better classifier.
    * The ROC-AUC score is independent of the classification threshold and is useful for comparing different classifiers or models.

<div>
<img src="attachment:image.png" width="500"/>
</div>



Here's how to compute the ROC-AUC score using Python and scikit-learn:

In [1]:
from sklearn.metrics import roc_auc_score

# True labels and predicted probabilities for the positive class
true_labels = [0, 1, 1, 0, 1, 0, 1, 0, 0, 1]
predicted_probabilities = [0.2, 0.8, 0.7, 0.3, 0.9, 0.1, 0.6, 0.4, 0.2, 0.75]

# Calculate the ROC-AUC score
roc_auc = roc_auc_score(true_labels, predicted_probabilities)

print(f"ROC-AUC Score: {roc_auc:.4f}")


ROC-AUC Score: 1.0000


One more example!)

In [3]:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate a synthetic dataset for demonstration (replace with your data)
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a binary classification model (e.g., Logistic Regression)
model = LogisticRegression(random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)

# Predict probabilities on the test data
y_probabilities = model.predict_proba(X_test)[:, 1]

# Calculate the ROC-AUC score
roc_auc = roc_auc_score(y_test, y_probabilities)

print(f"ROC-AUC Score: {roc_auc:.4f}")

ROC-AUC Score: 0.9142
