# SI 618 Classification III (Evaluation and Application)
### Dr. Chris Teplovs, School of Information, University of Michigan
Copyright &copy; 2024.  This notebook may not be shared outside of the course without permission.
### Please ensure you have this version:
Version 2024.11.14.2.CT

In this notebook, we will review and dive deeper into the evaluation of classifiers.

First, let's import all the functionality we'll need in this notebook:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import cross_val_predict, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier # to compare with logistic regression

from sklearn.metrics import auc, confusion_matrix, classification_report, precision_recall_curve, roc_curve, roc_auc_score

And now let's read the data file, taken from Kaggle's Titanic Disaster Machine Learning page (you might need to adjust the path name to get to where you put the data file):

In [None]:
titanic = pd.read_csv('../data/titanic.csv')

Let's do a train-test split, retaining 20% for our test dataset.  Note that in some cases below we'll use cross-validation on the data from our training dataset, which is unnecessarily limiting, but included for the purposes of demonstration.  I'm dropping a lot of columns to make the analysis easy, but at the expense of accuracy (i.e., I'm throwing away a lot of information).

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    # Aggressively drop columns for now.  Note the resultant decrease in accuracy
    titanic.drop(['Survived', 'PassengerId', 'Name', 'Ticket', 'Cabin', 'Embarked'], axis=1), 
    titanic['Survived'], 
    test_size=0.2, 
    random_state=42)  

Next, let's pre-process our data by imputing missing numeric values and then scaling all numeric variables.  We'll one-hot encode our only remaining string variable:

In [None]:
num_pipeline = Pipeline([
    ('impute',SimpleImputer(strategy='median')), 
    ('scale',StandardScaler())
    ])

In [None]:
preprocessing_pipeline = ColumnTransformer([
    ('num', num_pipeline, ['Age', 'Fare']),
    ('cat', OneHotEncoder(), ['Sex'])])

Fit and transform the training data; use the fitted pipeline to transform the test data (i.e., we do not `fit` to the test data)

In [None]:
X_train_prepared = preprocessing_pipeline.fit_transform(X_train)
X_test_prepared = preprocessing_pipeline.transform(X_test)

Let's fit a classifier and print out the accuracy score:

In [None]:
clf = LogisticRegression()

clf.fit(X_train_prepared, y_train)
print(clf.score(X_test_prepared, y_test))

(a reminder of some definitions from last class)

$\text{accuracy} = \frac{\text{True Positives + True Negatives}}{\text{All Samples}}$

$\text{precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}}$

$\text{true positive rate} = \text{recall} = \text{sensitivity} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}}$

$\text{F1} = \frac{2 \times (\text{Precision} \times \text{Recall})}{\text{Precision + Recall}}$

$ \text{false positive rate} = \text{fall-out} = \frac{\text{False Positives}}{\text{False Positives + True Negatives}}$

$ \text{specificity} = \frac{\text{True Negatives}}{\text{True Negatives + False Positives}}$

$ \text{false positive rate} = 1 - \text{specificity}$

In [None]:
y_test.value_counts()

The confusion matrix helps us understand the relationships between the different 
states:
```
[[TP FN]
 [FP TN]]
```


In [None]:
print(confusion_matrix(y_test, clf.predict(X_test_prepared)))

Note that in the above confusion matrix, the first row is the "did not survive" class )(i.e., Survived = 0).  That's why we have different values for precision and recall in the classification report:

In [None]:
print(classification_report(y_test, clf.predict(X_test_prepared)))
# Note that in binary classification, recall of the positive class is also known as “sensitivity”; recall of the negative class is “specificity”.

Let's take a look at the probabilities associated with each of the class assignments:

In [None]:
y_probabilities = cross_val_predict(clf, X_train_prepared, y_train, cv=3,
                                    method="predict_proba")

In [None]:
print(y_probabilities)

We can improve the output a bit:

In [None]:
with np.set_printoptions(precision=3, suppress=True):
    print(y_probabilities)

In [None]:
y_scores = y_probabilities[:, 1]   # score = proba of positive class
fpr, tpr, roc_thresholds = roc_curve(y_train,y_scores,pos_label=1)

In [None]:
def plot_roc_curve(fpr, tpr, label=None):
    plt.plot(fpr, tpr, linewidth=2, label=label)
    plt.plot([0, 1], [0, 1], 'k--') # dashed diagonal
    plt.axis([0, 1, 0, 1])                                    # Not shown in the book
    plt.xlabel('False Positive Rate (Fall-Out)', fontsize=14) # Not shown
    plt.ylabel('True Positive Rate (Recall)', fontsize=14)    # Not shown
    plt.grid(True)                                            # Not shown

plot_roc_curve(fpr, tpr)
plt.show()

In [None]:
print(f"ROCAUC score: {roc_auc_score(y_train, y_scores):.2f}")

In [None]:
precisions, recalls, pr_thresholds = precision_recall_curve(y_train,y_scores)

In [None]:
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2)
    plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2)
    plt.legend(loc="center right", fontsize=16) # Not shown in the book
    plt.xlabel("Threshold", fontsize=16)        # Not shown
    plt.grid(True)                              # Not shown

plt.figure(figsize=(8, 4))                                                                  # Not shown
plot_precision_recall_vs_threshold(precisions, recalls, pr_thresholds)
plt.show()

In [None]:
def plot_precision_vs_recall(precisions, recalls):
    plt.plot(recalls, precisions, "b-", linewidth=2)
    plt.xlabel("Recall", fontsize=16)
    plt.ylabel("Precision", fontsize=16)
    plt.axis([0, 1, 0, 1])
    plt.grid(True)

plt.figure(figsize=(8, 6))
plot_precision_vs_recall(precisions, recalls)

In [None]:
print(f"Precision-recall AUC: {auc(recalls, precisions):.2f}")

# BREAK

In this part of today's class, we're going to learn about [Kaggle competitions](https://www.kaggle.com/competitions/), and we are going to start one in class: https://www.kaggle.com/competitions/titanicL.  Download the data and start your work by creating new cells below.  See if you can get a classifier to run!