In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["savefig.dpi"] = 300
plt.rcParams["savefig.bbox"] = 'tight'
import sklearn
sklearn.set_config(print_changed_only=True)
np.set_printoptions(precision=3, suppress=True)
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import scale, StandardScaler

### Step 1: An Implementation of Confusion Matrix
- Load the breast cancer dataset using `load_breast_cancer()` from `from sklearn.datasets import load_breast_cancer`
- Split the data into train and test datasets
- Fit a logistic regression model
- Predict the target variable values for the test dataset
- Create the confusion matrix using `confusion_matrix` from `from sklearn.metrics import confusion_matrix`
- Plot the confusion matrix using `plot_confusion_matrix` from `from sklearn.metrics import  plot_confusion_matrix`

### Step 2: Problems with accuracy and unbalanced data
- Create a numpy array of 100 zeros, assign the first 10 elements to zero, and name it the true label as `y_true`
- Create a numpy array of 100 zeros, name it as `y_pred_1`. This will be the first prediction set of the true labels
- Copy `y_true` to a new variable called `y_pred_2` and assign the elements between 10 and 20 to 1. This will be the second prediction set of the true labels.
- Copy `y_true` to a new variable called `y_pred_3` and subtract the elements between 5 and 15 from 1. This will be the third prediction set of the true labels.

- Check the accuracy score for each prediction set using `accuracy_score` from `sklearn.metrics`

- Create the confusion matrix and plot it for each prediction set

- Print the classification report, `classification_report` from `sklearn.metrics` for each prediction set

### Step 3: Impact of Changing the Prediction Threshold
- Load the breast cancer data set
- Split the data into train and test datasets
- Fit a logistic regression model, predict the test dataset and print the classification report
- Make another prediction, using a threshold of 0.85 and print the clasification report. Observe the difference.

### Step 4: Precision recall curves
- Run the first cell below to import the mammography data set
- Split the dataset into train and test datasets
- Fit a `SVC()` model on the train dataset
- Test the model on the test dataset
- Now create a pipeline of a standard scaler and SVC with `C=1000` and `gamma=0.01`
- Fit the model on the train dataset and test it on the test dataset. Can you see any difference?

- Now create a grid search, with `C=np.logspace(-3,3,7)` and `gamma=np.logspace(-6,0,7)`
- When creating the grid search, use `scoring='average_precision'
- Fit the grid object on the train dataset
- Print the best parameter values and the best score

- Use `from sklearn.metrics import plot_precision_recall_curve, precision_recall_fscore_support`
- Plot the precision and recall curve for the SVC model you created above
- Find the precision and recall for your prediction of the test dataset and mark it on your precision and recall curve

- Repeat the step above for a random forest classifier. Choose `max_features=2`
- Show both SVC and random forest classifier curves on the same plot

### Step:5 ROC CURVE
- Use `from sklearn.metrics import plot_roc_curve`
- Create the confusion matrix for both SVC and RF models
- Plot the ROC curve for both classifiers

### Step 6: Confusion Matrix for Multi-class
- Load the digits dataset using `from sklearn.datasets import load_digits`
- Split the dataset into test and train dataset
- Fit a logistic regressor on the train dataset
- Make prediction for the test dataset
- Print the accuracy, the confusion matrix, and the classification report for the predictions

### Step 7: ROC AUC with cross-validation
- Run the cell below to create random dataset of blobs
- Split the dataset into train and test datasets
- Use cross validation with SVC
- Use cross validation with `scoring='roc_auc'`
- Print the scores for the regular cross validation and cross validation with ROC