# Diagnostic Curves

In this notebook will be some additional problems regarding the $k$-nearest neighbors classifier. This material corresponds to `Lectures/Supervised Learning/Classification/5. Diagnostic Curves`.

In [3]:
## For data handling
import pandas as pd
import numpy as np

## For plotting
import matplotlib.pyplot as plt
import seaborn as sns

## This sets the plot style
## to have a grid on a white background
sns.set_style("whitegrid")

##### 1. Cancer ROC curve

Build a logistic regression classifier on the Wisconsin cancer data set in `sklearn`.

Create a validation set then plot the ROC curve for this classifier on that set. Use this, and any other plots you would like, to choose a probability cutoff for the classifier.

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

In [2]:
cancer = load_breast_cancer()

X = cancer['data']
y = cancer['target']

## Changing labels so 
## 1 is malignant and
## 0 is benign
y = -y + 1

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                       shuffle = True,
                                                       random_state = 232,
                                                       test_size=.2,
                                                       stratify = y)

In [4]:
xtr_train, xtr_val, ytr_train, ytr_test = train_test_split(X_train, y_train,
                                               shuffle = True,
                                               random_state = 232,
                                               test_size=.2,
                                               stratify = y_train)

In [5]:
## code here
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix


In [7]:
## code here
scale = StandardScaler()
xtr_train_scaled = scale.fit_transform(xtr_train)

lgr = LogisticRegression()

lgr.fit(xtr_train_scaled,ytr_train)

lgr_pred_prob = lgr.predict_proba(scale.transform(xtr_val))

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1.])

In [None]:
## code here
tprs = []
fprs = []





In [None]:
## code here




In [None]:
## code here




In [None]:
## code here




##### 2. Cancer Gains/Lift Curve

<i>Note that this question is a hypothetical exercise and not any kind of endorsement to this approach to health care</i>.

Imagine that you work for a clinic looking to provide underserved populations with free healthcare. This clinic has the resources to provide free breast cancer treatment to $x\%$ of its care population. Ideally, they would like to target this care according to those whose scans indicate they most likely need it. A gains and lift chart could help clinic managers see estimates of the percent of those who need care who would receive it if they provided care to individuals with the highest probability of malignant tumors.


Using the logistic regression model you fit above, produce a gains and lift chart for this classifier using the validation set.

In [None]:
## code here




In [None]:
## code here




In [None]:
## code here




In [None]:
## code here




In [None]:
## code here




--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)