# Area bajo la curva ROC

Vamos a basarnos en un ejemplo de la documentación de sklearn: https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn import datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import roc_auc_score
from sklearn.neighbors import KNeighborsClassifier

In [None]:
# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
print(y)

In [None]:
# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0)

In [None]:
y_test[:5]

En la siguiente celda aplicamos un one hot enconding a el target con label_binarize.

In [None]:
y_test = label_binarize(y_test, classes=[0, 1, 2])
n_classes = y_test.shape[1]
print(y_test[:5])

In [None]:
n_classes

Prestar atención en la siguiente celda:

En lugar de utilizar predict, utilizamos predict_proba que nos retorna una probabilidad por cada clase.

In [None]:
# Learn to predict each class
classifier = KNeighborsClassifier(n_neighbors=50)
classifier.fit(X_train, y_train)
y_score = classifier.predict_proba(X_test)
print(y_score[:5])

Por cada clase, calculamos la curva ROC

In [None]:
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i],)
    roc_auc[i] = auc(fpr[i], tpr[i])

In [None]:
fpr

In [None]:
for i in range(n_classes):
    plt.figure(figsize=(10,5))
    plt.plot(fpr[i], tpr[i],'o--', label='ROC curve (area = %0.2f)' % roc_auc[i])
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Curva ROC: Clase ' + str(i))
    plt.legend(loc="lower right")
    plt.show()

# Ejercicio

Por grupo buscar un dataset de clasificación binario, entrenar un modelo y graficar la curva ROC.

Algunos datasets pueden ser:
- Titanic
- NBA (clase 19)
- Breast Cancer Wisconsin
- Heart disease https://www.kaggle.com/ronitf/heart-disease-uci

Luego haremos una puesta en comùn entre todos