# Short explanation of ROC and AUC, how to apply using a 2 classes dataset. 

````ROC Curves```` summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds. 

In simple words, while using a predictive model and setting different probability thresholds we can classify it. In this case, we will end up with ````true positive```` and ````false positive predictions````. After building a ```Confusion Matrix``` and calculating the true and false values for the threshold, one needs to make a confusion matrix for each threshold. Finally, there will be huge amount of ````Confusion Matrices````. Instead of doing this, ````Receiver Operator Characteristics (ROC)```` can be computed, which will provide a summarized information about ````False Positive Rate```` (FPR) and ````True Positive Rate```` (TPR). These will be plotted on X-axis and Y-axis respectively.

$\text{TPR (Sensitivity)} = \frac{TP}{TP + FN}$

$\text{FPR (1 - Specificity)} = \frac{FP}{TN + FP}$

Plotting the points on the graph representing each ````threshold```` and connecting them with a line which will give the ````ROC```` graph. Finally, depending on how many ````False Positives```` can be accepted, one can choose the ````optimal```` threshold. 

Now we can consider the concept of the ````Area Under the Curve (AUS)````, which plots the whole are under the ````ROC```` (0.0, 1.0). ````AUC```` makes it easy to compare ````ROC```` curves. If the ````AUC```` of one ````ROC```` curve of the model is greater than the ````AUC```` of the ````ROC```` for another model, one can conclude that first model is better. 

## Creating 2000 sample and 2 classes dataset.

In [1]:
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=2000, n_classes=2, n_features=10, random_state=8)

## Split the data

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=.75, random_state=8)

## Create Classification Models and fit: ````RandomForest```` as RF and ````LogisticRegression```` as LR.

In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(max_features=5, n_estimators=500)
rf_model.fit(X_train, y_train)

lr_model = LogisticRegression(solver='lbfgs')
lr_model.fit(X_train, y_train)

## Create prediction probabilities

### No skill prediction

In [4]:
ns_probs = [0 for _ in range(len(y_test))]

### ````RandomForest```` prediction

In [5]:
rf_probs = rf_model.predict_proba(X_test)

### ````LogisticRegression```` prediction

In [6]:
lr_probs = lr_model.predict_proba(X_test)

### To keep probabilities for the positive outcome only

In [7]:
lr_probs = lr_probs[:, 1]
rf_probs = rf_probs[:, 1]

## Calculate ````AUC```` scores

In [8]:
from sklearn.metrics import roc_curve, roc_auc_score

ns_auc = roc_auc_score(y_test, ns_probs)
rf_auc = roc_auc_score(y_test, rf_probs)
lr_auc = roc_auc_score(y_test, lr_probs)

## Printing AUC scores:

In [9]:
print('No-Skill Prediction: AUROC = %.3f' % (ns_auc))
print('Random Forest: AUROC = %.3f' % (rf_auc))
print('LogisticRegression: AUROC = %.3f' % (lr_auc))

No skill prediction - all predictions are wrong / AUC score = 0.5

RandomForest score = 0.959, which is the best model outperforming LogisticRegression with the score = 0.935

## Calculate ````ROC```` scores

In [10]:
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)

## Plotting the ROC curve

In [13]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))
plt.plot(ns_fpr, ns_tpr, linestyle='--', label='No skill prediction (AUC = %0.3f)' % ns_auc)
plt.plot(rf_fpr, rf_tpr, marker='.', label='Random Forest (AUC = %0.3f)' % rf_auc)
plt.plot(lr_fpr, lr_tpr, marker='.', label='LogisticRegression (AUC = %0.3f)' % lr_auc)


plt.title('ROC Plot')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()  
plt.show()