# 03: PDPs and ICE Plots

Partial dependence plots (PDPs) and individual conditional expectation (ICE) plots are common techniques for model explainability. They show the impact that a feature or pair of features has on a model's predictions.

## Imports

In [None]:
# If you're running this on colab, then you can uncomment the below command to
# install the pmlb library.
# !pip install pmlb

In [None]:
from dataclasses import dataclass, field
from itertools import product

import altair as alt
import numpy as np
import pandas as pd
import pmlb

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import PartialDependenceDisplay

In [None]:
# If you're running this code locally, then you can uncomment this to automatically
# save the chart data in files, rather than including the data in the spec. 

# !mkdir -p data
# alt.data_transformers.enable('json', prefix='data/altair-data')

## Data Preparation and Modeling

For this lab, we'll be using the same dataset from the second lab about a telephone service provider's customers. Each instance is a customer. The target is whether or not the customer churns, or switches providers. We'll drop the columns we previously found to be redundant.

In [None]:
df = pmlb.fetch_data('churn')
df.drop(columns=['total day charge', 'total eve charge', 'total night charge'], inplace=True)

In [None]:
df.head()

In preparation for modeling this dataset, we split the dataset into a train and test set and separate the instances from the labels.

In [None]:
df_train, df_test = train_test_split(df, test_size=0.25)

In [None]:
X_train = df_train.drop(columns=['target'])
y_train = df_train['target'].values

X_test = df_test.drop(columns=['target'])
y_test = df_test['target'].values

In [None]:
param_grid = {
    'n_estimators': [100],
    'criterion': ['entropy'],
    'bootstrap': [True],
    'max_features': ['sqrt', 1.0],
    'max_depth': [6, 12],
    'min_samples_split': [2, 8],
    'class_weight': ['balanced', None]
}

cv = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, scoring='f1', n_jobs=-1)

cv.fit(X_train, y_train)

In [None]:
cv.best_params_

In [None]:
cv.best_score_

In [None]:
model = cv.best_estimator_

## One-way Partial Dependence and ICE Plots

### sklearn

In [None]:
PartialDependenceDisplay.from_estimator(
    model,
    X=X_train,
    features=['total day minutes'],
    kind='both',
    grid_resolution=20
)

In [None]:
PartialDependenceDisplay.from_estimator(
    model,
    X=X_train,
    features=['total day minutes'],
    kind='average',
    grid_resolution=20,
    percentiles=(0, 1)
)

### Implementation

**Exercises 1-3:**

First, we will write a function to calculate the partial dependence and ICE lines for a list of features. The output of this function will be a dictionary that maps from the name of the feature to a `FeatureData` instance. We'll do this in 4 steps.

1. Calculate the values for the feature that we will evaluate the model at. `resolution` is the number of points we want to sample. [`np.linspace`](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html) might be helpful.
2. Calculate the ICE lines. To do this, you'll need to loop over the values, set all instances to have that value for the feature, and have the model generate probabilities for those instances.
3. Calculate the partial dependence line. This is the mean of the ICE lines. Also calculate the mean-centered partial dependence line by making the partial dependence line have a mean of 0. Create a `FeatureData` instance and add it to the `one_way_data` dictionary.

In [None]:
@dataclass
class FeatureData:
    # name of the feature
    feature: str
    
    # 1D array of feature values that we will check the model at
    values: np.ndarray
    
    # 2D array of the ICE lines. The shape will be (number of instances, number of features)
    ice: np.ndarray
    
    # 1D array containing the partial dependence
    pd: np.ndarray
    
    # 1D array containing the mean-centered partial dependence
    mean_centered_pd: np.ndarray
            
def calculate_one_way(df, model, features, resolution):
    one_way_data = {}

    for feature in features:
        # 1
        values = np.linspace(df[feature].min(), df[feature].max(), num=resolution)

        # 2
        ice = []

        original_values = df[feature]

        for x in values:
            df[feature] = x
            predictions = model.predict_proba(df)[:,1]
            ice.append(predictions)

        ice = np.array(ice).T

        df[feature] = original_values

        # 3
        pd = ice.mean(axis=0)
        mean_centered_pd = pd - pd.mean()

        one_way_data[feature] = FeatureData(feature, values, ice, pd, mean_centered_pd)
    
    return one_way_data

# for simplicity, we'll remove categorical features
categories = ['state', 'area code', 'phone number', 'international plan', 'voice mail plan']
features = [f for f in X_train.columns if f not in categories]
one_way_data = calculate_one_way(X_train, model, features, 20)

In [None]:
one_way_data['total day minutes']

Altair works best with pandas dataframes that are in [long-form](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data) or [tidy](https://r4ds.had.co.nz/tidy-data.html) format. The `prepare_dataframe_one_way` function below puts all of the ICE and PDP data into one dataframe.

In [None]:
def prepare_dataframe_one_way(one_way_data, ice_instances=100):
    dfs = []

    # loop over the features in one_way_data
    for feature, info in one_way_data.items():
        # create a dataframe for the PDP line
        dfs.append(pd.DataFrame({
            'feature': [info.feature] * len(info.values),
            'x': info.values,
            'y': info.pd,
            'kind': ['pdp'] * len(info.values)
        }))

        # create a dataframe for the ICE lines
        dfs.append(pd.DataFrame({
            'feature': [feature] * ice_instances,
            'x': [info.values for _ in range(ice_instances)],
            'y': info.ice[:ice_instances].tolist(),
            'kind': ['ice'] * ice_instances
        }).explode(['x', 'y']).reset_index())

    return pd.concat(dfs)

In [None]:
df1 = prepare_dataframe_one_way(one_way_data)

The index column is used to identify which original instance an ICE line represents. If two ICE rows have the same feature and index, then they are a part of the same ICE line.

In [None]:
df1

Here's how we can filter the dataframe to get the PDP line for the "total day minutes" feature.

In [None]:
df_tdm_pdp = df1[(df1['feature'] == 'total day minutes') & (df1['kind'] == 'pdp')]
df_tdm_pdp

Here's how we can filter the dataframe to get all of the ICE lines for the "total day minutes" feature.

In [None]:
df_tdm_ice = df1[(df1['feature'] == 'total day minutes') & (df1['kind'] == 'ice')]
df_tdm_ice

**Exercise 4:**

Using `df_tdm_pdp`, create a PDP for the "total day minutes" feature.

In [None]:
tdm_pdp = alt.Chart(df_tdm_pdp).mark_line().encode(
    x=alt.X('x', title='total day minutes'),
    y=alt.Y('y', title='churn probability')
)

tdm_pdp

**Exercise 5:**

Using `df_tdm_ice`, create an ICE plot for the "total day minutes" feature. 

Hint: if you want to have multiple lines all with the same color, then you can use the `detail` encoding.

In [None]:
tdm_ice = alt.Chart(df_tdm_ice).mark_line(color='#d3d3d3', opacity=0.4).encode(
    detail='index',
    x=alt.X('x', title='total day minutes'),
    y=alt.Y('y', title='churn probability')
)

tdm_ice

We can then layer the PDP over the ICE plot:

In [None]:
tdm_ice + tdm_pdp

Rather than filtering the dataframe ahead of time, we can also filter the data using `transform_filter`. In the below example, we pass the charts the entire `df1` dataframe and then select the "total day minutes" feature and the plot kind using `transform_filter`.

In [None]:
tdm_pdp = alt.Chart(df1).mark_line().encode(
    x=alt.X('x', title='total day minutes'),
    y=alt.Y('y', title='churn probability')
).transform_filter(
    (alt.datum.feature == 'total day minutes') & (alt.datum.kind == 'pdp')
)

tdm_ice = alt.Chart(df1).mark_line(color='#d3d3d3', opacity=0.4).encode(
    detail='index',
    x=alt.X('x', title='total day minutes'),
    y=alt.Y('y', title='churn probability')
).transform_filter(
    (alt.datum.feature == 'total day minutes') & (alt.datum.kind == 'ice')
)

tdm_ice + tdm_pdp

Here's another example where we do the same thing, where we make use of a base plot.

In [None]:
base = alt.Chart(df1).mark_line().encode(
    x=alt.X('x', title='total day minutes'),
    y=alt.Y('y', title='churn probability')
).transform_filter(alt.datum.feature == 'total day minutes')

tdm_pdp = base.transform_filter(alt.datum.kind == 'pdp')

tdm_ice = base.encode(
    opacity=alt.value(0.4),
    color=alt.value('#d3d3d3'),
    detail='index',
).transform_filter(alt.datum.kind == 'ice')

tdm_ice + tdm_pdp

**Exercise 7:**

Finish the function below to show all of the plots in a grid. The `show_pdp` and `show_ice` parameters should determine what kind of plots to show. You can assume that either one or both will be true.

Hints:
- Recall from the end of lab 02 that faceting needs to happen after layering, using the `.facet()` method.
- Recall from lab 02 that we can use `resolve_scale()` to control whether the charts' x scales are "shared" or "independent".

In [None]:
def plot_one_way_grid(df, show_pdp=True, show_ice=True):
    base = alt.Chart(df).mark_line().encode(
        x=alt.X('x', title=None),
        y=alt.Y('y', title='churn probability')
    ).properties(
        width=175,
        height=125
    )

    pdp = base.transform_filter(alt.datum.kind == 'pdp')

    ice = base.encode(
        opacity=alt.value(0.4),
        color=alt.value('#d3d3d3'),
        detail='index',
    ).transform_filter(alt.datum.kind == 'ice')
    
    if show_pdp and show_ice:
        chart = ice + pdp
    elif show_pdp:
        chart = pdp
    else:
        chart = ice
    
    return chart.facet('feature', columns=4).resolve_scale(x='independent')
    
plot_one_way_grid(df1, show_ice=True, show_pdp=True)

**Exercise 7:**

Update the `plot_one_way_grid` so that when you hover over a line in one of the charts, it highlights that same instance in all of the other charts.

In [None]:
def plot_one_way_grid(df, show_pdp=True, show_ice=True):
    base = alt.Chart(df).mark_line().encode(
        x=alt.X('x', title=None),
        y=alt.Y('y', title='churn probability')
    ).properties(
        width=175,
        height=125
    )

    pdp = base.transform_filter(alt.datum.kind == 'pdp')

    brush = alt.selection_multi(fields=['index'], empty='none', on='mouseover')
    
    ice = base.encode(
        opacity=alt.value(0.4),
        color=alt.condition(brush, alt.value('red'), alt.value('#d3d3d3')),
        detail='index',
    ).transform_filter(
        alt.datum.kind == 'ice'
    ).add_selection(brush)
    
    if show_pdp and show_ice:
        chart = ice + pdp
    elif show_pdp:
        chart = pdp
    else:
        chart = ice
    
    return chart.facet('feature', columns=4).resolve_scale(x='independent')
    
plot_one_way_grid(df1, show_ice=True, show_pdp=True)

## Two-way Partial Dependence Plots

Calculating all of the possible two-way PDPs would take too long, so instead we'll just calculate a few of them individually. We'll use the `TwoWayData` class to store the necessary information about a two-way PDP.

In [None]:
@dataclass
class TwoWayData:
    x: FeatureData
    y: FeatureData
    df: pd.DataFrame

**Exercise 8:**

Finish the `calculate_two_way` function below, which should calculate the data for a two-way PDP for the given pair of features. `TwoWayData.df` will be a dataframe with three columns: x, y, and prediction. Each row of this dataframe is one grid spot in the two-way PDP.

In [None]:
def calculate_two_way(data, model, x_feature, y_feature):
    
    x_val = []
    y_val = []
    avg_prediction = []
    
    for x in x_feature.values:
        original_x = data[x_feature.feature]
        
        data[x_feature.feature] = x

        for y in y_feature.values:
            original_y = data[y_feature.feature]
            
            data[y_feature.feature] = y
            predictions = model.predict_proba(data)[:,1]
            avg_prediction.append(predictions.mean())
            
            x_val.append(x)
            y_val.append(y)
            
            data[y_feature.feature] = original_y
            
        data[x_feature.feature] = original_x
    
    df = pd.DataFrame({
        'x': x_val,
        'y': y_val,
        'prediction': avg_prediction
    })
    
    return TwoWayData(x_feature, y_feature, df)

In [None]:
day_eve_mins = calculate_two_way(
    X_train, model,
    one_way_data['total day minutes'],
    one_way_data['total eve minutes']
)

**Exercise 9:**

Finish the `plot_two_way` function below, which creates a two-way PDP.

Hint: The [heatmap example](https://altair-viz.github.io/gallery/simple_heatmap.html) is a useful reference.

In [None]:
def plot_two_way(data):
    return alt.Chart(data.df).mark_rect().encode(
        x=alt.X('x:O', title=data.x.feature, axis=alt.Axis(format='.2f')),
        y=alt.Y('y:O', title=data.y.feature, sort='descending', axis=alt.Axis(format='.2f')),
        color='prediction',
    )

In [None]:
plot_two_way(day_eve_mins)

In [None]:
day_mins_service_calls = calculate_two_way(
    X_train, model,
    one_way_data['total day minutes'],
    one_way_data['number customer service calls'],
)


plot_two_way(day_mins_service_calls)

In [None]:
day_mins_vmail = calculate_two_way(
    X_train, model,
    one_way_data['total day minutes'],
    one_way_data['number vmail messages'],
)

plot_two_way(day_mins_vmail)

### Showing interaction between features

See the [Feature Interaction chapter](https://christophm.github.io/interpretable-ml-book/interaction.html) in Molnar's book.

In [None]:
def calculate_interaction(two_way_data):
    df = two_way_data.df
    
    df['mean_centered_prediction'] = df['prediction'] - df['prediction'].mean()
    
    expected = []
    
    for x in two_way_data.x.mean_centered_pd:
        for y in two_way_data.y.mean_centered_pd:
            expected.append(x + y)
            
    df['interaction'] = df['mean_centered_prediction'] - np.array(expected)

In [None]:
def plot_interaction(data):
    abs_interaction = data.df['interaction'].abs().max()
    
    return alt.Chart(data.df).mark_rect().encode(
        x=alt.X('x:O', title=data.x.feature, axis=alt.Axis(format='.2f')),
        y=alt.Y('y:O', title=data.y.feature, sort='descending', axis=alt.Axis(format='.2f')),
        color=alt.Color(
            'interaction',
            scale=alt.Scale(scheme='brownbluegreen', domainMid=0, domain=[-abs_interaction, abs_interaction]),
        ),
        
    )

In [None]:
calculate_interaction(day_mins_service_calls)
plot_interaction(day_mins_service_calls)

In [None]:
calculate_interaction(day_eve_mins)
plot_interaction(day_eve_mins)