# Local interpretability with Scikit-Learn

This notebook we will evaluate the contribution of each feature to the target value, for a single observation.

Based on [Christoph Molnar's book](https://christophm.github.io/interpretable-ml-book/limo.html#effect-plot)

In [None]:
# Packages

import matplotlib.pyplot as plt
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import MinMaxScaler

## Load Data

In [None]:
# load the California House price data from Scikit-learn
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X = X.drop(columns = ["Latitude", "Longitude", "AveBedrms"])

# Split data
X_train, X_test, y_train, y_test = None, None, None, None
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0,
)

# Scale data
scaler = MinMaxScaler().set_output(transform="pandas").fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [None]:
print('X_train.shape:', X_train.shape)
print('X_test.shape:', X_test.shape)
print('y_train.shape:', y_train.shape)
print('y_test.shape:', y_test.shape)

## Model a Linear Regression

In [None]:
linreg = LinearRegression().fit(X_train, y_train)

In [None]:
coeffs = pd.Series(linreg.coef_, index=linreg.feature_names_in_)
coeffs

## Effect plots

The effect is the coefficient multiplied by the feature value. If we do this for each observation, we can better understand the contribution of each feature to the target.

In [None]:
# multiply the coefficients by the feature values

effects = coeffs * X_test

# plot the effects
effects.boxplot(figsize=(8,6))
plt.ylabel("Effects (coeff x feature)");

## Local interpretability

Let's take an individual observation and evaluate how its feature values contribute towards house price.

In [None]:
# Let's pick up a few observations

X_test.tail()

In [None]:
# Helper functions

def get_observation_values(obv):
    """Return values of an observation obs"""
    return X_test.loc[obv]

def compute_effects(obv):
    """Compute effect of an observation to the target"""
    return coeffs * get_observation_values(obv)

def plot_effect(obv, obv_effect):
    """Plot effect of an observation"""
    obv_effect.plot.bar()

    plt.axhline(y=0, color='r', linestyle='-')
    plt.ylabel("Coefficient x feature value")
    plt.title(f"Local interpretability for {obv}")
    plt.show()

def show_observation_in_effects(obv_effect):
    """overlay the individual observation to the effects"""

    effects.boxplot(figsize=(8,6), positions=range(len(effects.columns)))
    plt.scatter(effects.columns, obv_effect,  marker='o', color="r", s=50)
    plt.ylabel("Effects (coeff x feature)")
    plt.show()

In [None]:
# Case: observation 12156

obs = 12156

get_observation_values(obs)

In [None]:
effect = compute_effects(obs)
effect

In [None]:
# Plot effects
plot_effect(obs, effect)

In [None]:
# overlay the individual observation to the effects
show_observation_in_effects(effect)

In [None]:
# Case: observation 2445

obs = 2445

get_observation_values(obs)

In [None]:
# Plot effects
effect = compute_effects(obs)
plot_effect(obs, effect)

In [None]:
# overlay the individual observation to the effects
show_observation_in_effects(effect)