## Explain regression model predictions

[original notebook link](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb)

### Introduction

This notebook illustrates how to use interpret-community to help interpret regression model predictions at training time. It demonstrates the API calls needed to obtain the global and local interpretations along with an interactive visualization dashboard for discovering patterns in data and explanations.

Three tabular data explainers are demonstrated:

- TabularExplainer (SHAP)
- MimicExplainer (global surrogate)
- PFIExplainer

![](https://github.com/interpretml/interpret-community/raw/f5e7bfa82d2036b578dc35102c28fdad0e61b127/notebooks/img/interpretability-architecture.png)

The goal of this project is to predict Boston Housing Prices by using scikit-learn and locally running the model explainer:

1. Train a GradientBoosting regression model using Scikit-learn
2. Run 'explain_model' globally and locally with full dataset in local mode, which doesn't contact any Azure services.
3. Visualize the global and local explanations with the visualization dashboard.



In [3]:
from sklearn import datasets
from sklearn.ensemble import GradientBoostingRegressor

# Explainers:
# 1. SHAP Tabular Explainer
from interpret.ext.blackbox import TabularExplainer

# OR

# 2. Mimic Explainer
#from interpret.ext.blackbox import MimicExplainer
# You can use one of the following four interpretable models as a global surrogate to the black box model
from interpret.ext.glassbox import LGBMExplainableModel
from interpret.ext.glassbox import LinearExplainableModel
from interpret.ext.glassbox import SGDExplainableModel
from interpret.ext.glassbox import DecisionTreeExplainableModel

# OR

# 3. PFI Explainer
#from interpret.ext.blackbox import PFIExplainer

### Load the Boston house price data

In [8]:
boston_data = datasets.load_boston()

print (boston_data.keys())
print (boston_data['data'].shape)
print (boston_data['target'].shape)

dict_keys(['data', 'target', 'feature_names', 'DESCR', 'filename'])
(506, 13)
(506,)


In [10]:
# Split data into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)

print (x_train.shape, x_test.shape, y_train.shape, y_test.shape)

(404, 13) (102, 13) (404,) (102,)


### Train a GradientBoosting regression model, which you want to explain

In [11]:
reg = GradientBoostingRegressor(n_estimators=100, max_depth=4, learning_rate=0.1, loss='huber', random_state=1)

model = reg.fit(x_train, y_train)

### Explain predictions on your local machine

In [12]:
explainer = TabularExplainer(model=model, initialization_examples=x_train, features=boston_data['feature_names'])

Setting feature_perturbation = "tree_path_dependent" because no background data was given.
The sklearn.ensemble.gradient_boosting module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.ensemble. Anything that cannot be imported from sklearn.ensemble is now part of the private API.


Explain overall model predictions (global explanation)

In [15]:
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(x_train)

In [14]:
global_explanation

<interpret_community.explanation.explanation.DynamicGlobalExplanation at 0x1a45f230788>

In [22]:
global_explanation.data()

{'mli': [],
 'names': ['CRIM',
  'ZN',
  'INDUS',
  'CHAS',
  'NOX',
  'RM',
  'AGE',
  'DIS',
  'RAD',
  'TAX',
  'PTRATIO',
  'B',
  'LSTAT'],
 'scores': [0.4526565971899683,
  0.006759614161038798,
  0.1816447278339545,
  0.0140680470787089,
  0.4379426598563117,
  2.283003782486396,
  0.42845572503497653,
  0.540561891733403,
  0.06370554223187269,
  0.321632885465318,
  0.8796515522662416,
  0.2493846900227646,
  4.56663862102852]}

In [23]:
global_explanation.get_ranked_global_names()

In [24]:
global_explanation.get_ranked_global_values()

In [25]:
global_explanation.global_importance_rank

In [26]:
# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

global importance rank: {'LSTAT': 4.56663862102852, 'RM': 2.283003782486396, 'PTRATIO': 0.8796515522662416, 'DIS': 0.540561891733403, 'CRIM': 0.4526565971899683, 'NOX': 0.4379426598563117, 'AGE': 0.42845572503497653, 'TAX': 0.321632885465318, 'B': 0.2493846900227646, 'INDUS': 0.1816447278339545, 'RAD': 0.06370554223187269, 'CHAS': 0.0140680470787089, 'ZN': 0.006759614161038798}


### Explain overall model predictions as a collection of local (instance-level) explanations


In [35]:
print (len(global_explanation.local_importance_values))

print (len(global_explanation.local_importance_values[0]))

404
13


### Generate local explanations

Explain local data points (individual instances)


In [None]:
# You can pass a specific data point or a group of data points to the explain_local function

# E.g., Explain the first data point in the test set
local_explanation = explainer.explain_local(x_test[0,:])

# E.g., Explain the first five data points in the test set
# local_explanation_group = explainer.explain_local(x_test[0:4,:])