In this notebook, we're going to talk about ways of digging into the utility of a machine learning model beyond its accuracy.

# Important: Run this code cell each time you start a new session!

In [None]:
!pip install numpy
!pip install pandas
!pip install matplotlib
!pip install os
!pip install scikit-learn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import sklearn

# Can We Trust Machine Learning Models?

Quantifying the performance of a machine learning model can only go so far in giving us confidence that the model is ready to be shared or deployed. When we don't know how a model is making its predictions, it can be challenging to trust the decision that it makes.

Understanding how a model is arriving at its decisions can bring about the following benefits:
* **Transparency:** Transparency helps build trust and credibility in the model's results. Users can better understand the reasoning behind the predictions and have more confidence in the model's recommendations.
* **Regulatory compliance:** In some industries, such as finance or healthcare, there are strict regulations and requirements for model transparency and interpretability. Interpretable models ensure compliance with these regulations, providing explanations and justifications for their predictions.
* **Error debugging and improvement:** By understanding why certain predictions are incorrect, we can diagnose errors made by the model and make targeted improvements to enhance its accuracy and reliability.
* **Domain expertise:** If we already have an idea of which features should be particularly informative for our target problem, seeing that the model also considers those features important could give us confidence that the model is learning useful information.
* **Knowledge extraction:** By understanding how features are weighted, we can also generate new domain-specific insights for further investigation.
* **Bias detection and fairness:** If the model puts undue importance on a particular feature, then it may be biased to make incorrect decisions on specific categories of data. Such biases can lead to discrimination or unfair treatment based on certain attributes, so identifying these issues is the first step towards improving the fairness of the model.

# Interpretability vs. Explainability

There are two terms that many people use to talk about how a model arrives at its decisions: ***interpretability*** and ***explainability***. Some people use these terms interchangeably, and it is hard to find a definitive consensus across different fields. Here are definitions taken from [Cynthia Rudin](https://www.nature.com/articles/s42256-019-0048-x):

* If a model is **interpretable**, then we can explain how it **generally makes decisions** by inspecting its inner workings. We can think of this as an **internal** or **a priori** form of understanding decision making.
* If a model is **explainable**, then we can explain how it makes **decisions for individual inputs** either by inspecting its inner workings, by using statistical methods, or a second model altogether. We can think of this as an **external** or **a posteriori** form of understanding decision making.

Most explainable models are inherently interpretable, but the two concepts are not always correlated. Let's talk about some examples:
* Logistic regressions are very easy to interpret and explain because we can look at model coefficients and calculate predictions ourselves.
* Decision trees are pretty easy to explain because we can trace a series of decisions along its branches. We can also interpret how a decision tree makes decisions in general by looking at the tree in its totality, but that can get cumbersome if the tree is quite large.
* Deep learning models are not interpretable because of their mathematical complexity. However, significant research has gone into making models the models at least explainable.

You might notice that these models are roughly listed in decreasing order of interpretability, increasing order of complexity, and increasing of potential accuracy. This is an inherent trade-off in data science, and has led many researchers to investigate other ways of deciphering machine learning models.

In the following notebooks, we will examine various techniques that give us more insight into the inner workings of the machine learning models we create.

# Setup for Today's Notebooks

In all of the notebooks today, we are going to investigate the quality of machine learning models trained on a toy dataset provided by `scikit-learn` called the Diabetes Dataset. This fictitious dataset is intended to emulate electronic health record data collected from 442 diabetic patients. The dataset is already tabular, so we will not need to do nearly as much work to process the data as we have done in the past.

The ten features for our dataset are listed below:

| Feature Name | Description |
|--------------|-------------|
| age | Age in years |
| sex | Biological sex |
| bmi | Body mass index |
| bp | Average blood pressure |
| s1 tc | Total serum cholesterol |
| s2 ldl | Low-density lipoproteins |
| s3 hdl | High-density lipoproteins |
| s4 tch | Total cholesterol / HDL |
| s5 ltg | Log of serum triglycerides |
| s6 glu | Blood sugar |

The creators of this dataset have already normalized all of the features such that they range between -0.2 and 0.2.

The label for our dataset is "a quantitative measure of disease progression one year after baseline". This value ranges between 0 to 350.

In [None]:
from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score
from sklearn.metrics import accuracy_score, f1_score

# Load the dataset
diabetes_dataset = datasets.load_diabetes(as_frame=True)
df = diabetes_dataset.frame

# Rename the features for clarity
df = df.rename(columns={'s1': 'total serum cholesterol',
                        's2': 'low-density lipoproteins',
                        's3': 'high-density lipoproteins',
                        's4': 'total cholesterol',
                        's5': 'log of serum triglycerides',
                        's6': 'blood sugar'})
df.head()

We are going to examine two different models trained on this dataset:
1. A linear regression model that predicts the disease progression score
2. A decision tree classification model that predicts whether the disease progression score is above 150, which is roughly the mean score across the dataset.

We will train both of these models using an 80%-20% train-test split.

The code blocks will be replicated in the upcoming notebooks so that we can examine models trained on these datasets.

In [None]:
def generate_regressor(orig_df):
    """
    Train and test a regression model on the input DataFrame, returning the
    regressor, the feature names, and a dictionary of all the relevant data
    orig_df: the input DataFrame
    """
    # Set random number generator so the results are always the same
    np.random.seed(42)

    # Get the names of the features
    feature_names = df.columns.tolist()
    feature_names.remove('target')

    # Split the data into train and test sets
    train_df, test_df = train_test_split(orig_df, test_size=0.2)

    # Separate features from labels
    x_train = train_df.drop('target', axis=1).values
    y_train = train_df['target'].values
    x_test = test_df.drop('target', axis=1).values
    y_test = test_df['target'].values

    # Create and train the model
    regr = LinearRegression()
    regr.fit(x_train, y_train)

    # Use the model to predict on the test set
    y_pred = regr.predict(x_test)

    # Create a nested dictionary of all the data for easier retrieval
    data = {'train': {'x': x_train, 'y': y_train},
            'test': {'x': x_test, 'y': y_test},
            'pred': {'x': x_test, 'y': y_pred}}
    return regr, feature_names, data

In [None]:
_, _, data = generate_regressor(df)
y_test = data['test']['y']
y_pred = data['pred']['y']
print(f'Mean absolute error: {mean_absolute_error(y_test, y_pred):0.2f}')
print(f'Coefficient of determination: {r2_score(y_test, y_pred):0.2f}')

In [None]:
def generate_classifier(orig_df):
    """
    Train and test a classification model on the input DataFrame, returning the
    classifier, the feature names, and a dictionary of all the relevant data
    orig_df: the input DataFrame for regression
    """
    # Set random number generator so the results are always the same
    np.random.seed(42)

    # Copy the DataFrame since we will be modifying it
    df = orig_df.copy()

    # Get the names of the features
    feature_names = df.columns.tolist()
    feature_names.remove('target')

    # Turn the label into a binary variable
    df['target'] = df['target'] > 150

    # Split the data into train and test sets
    train_df, test_df = train_test_split(df, test_size=0.2)

    # Separate features from labels
    x_train = train_df.drop('target', axis=1).values
    y_train = train_df['target'].values
    x_test = test_df.drop('target', axis=1).values
    y_test = test_df['target'].values

    # Create and train the model
    clf = DecisionTreeClassifier()
    clf.fit(x_train, y_train)

    # Use the model to predict on the test set
    y_pred = clf.predict(x_test)

    # Create a nested dictionary of all the data for easier retrieval
    data = {'train': {'x': x_train, 'y': y_train},
            'test': {'x': x_test, 'y': y_test},
            'pred': {'x': x_test, 'y': y_pred}}
    return clf, feature_names, data

In [None]:
_, _, data = generate_classifier(df)
y_test = data['test']['y']
y_pred = data['pred']['y']
print(f"Accuracy: {accuracy_score(y_test, y_pred):0.2f}")
print(f"F1 Score: {f1_score(y_test, y_pred):0.2f}")