# SHAP Tabular Explainer via Contextual AI

This tutorial demonstrates how to generate explanations using Shap's tabular explainer implemented by the Contextual AI library. . To recap, the main steps for generating explanations are:

1. Get an explainer via the `ExplainerFactory` class
2. Build the text explainer
3. Call `explain_instance`


## Credits
1. Pramodh, Manduri <manduri.pramodh@sap.com>

In [1]:
# Obtain data
import pandas as pd
import io
import shap
from sklearn.model_selection import train_test_split

#df_data = pd.read_csv(io.StringIO(data), header = 0)
#df_data = pd.read_csv('HousePrices_HalfMil.csv', header = 0)
df_data = pd.read_csv('train.csv', header = 0, nrows=300)

# Get predictor and target
X = df_data.drop("Prices", axis=1).fillna(value=0)
y = df_data["Prices"].fillna(value=0)
train_X, test_X, train_y, test_y = train_test_split(X.values, y.values,
                                                    test_size=0.25)
        
# Train regression
from sklearn.linear_model import Lasso
alpha_list = [0.01, 0.1, 1, 2, 5, 10]
model_list = []
r2_list = []
for alpha in alpha_list:
    lm = Lasso(alpha)
    lm.fit(train_X, train_y)
    model_list.append(lm)
    # model quality
    y_pred = lm.predict(test_X)
    r2 = lm.score(test_X, test_y)
    r2_list.append(r2)
    print('Alpha: %s. R2: %s' % (alpha,r2))

index = r2_list.index(max(r2_list))
lm = model_list[index]

Alpha: 0.01. R2: 0.9999999999521786
Alpha: 0.1. R2: 0.9999999981643364
Alpha: 1. R2: 0.9999998122204277
Alpha: 2. R2: 0.9999990721384927
Alpha: 5. R2: 0.9999941923311907
Alpha: 10. R2: 0.999976755354787


In [2]:
feature_names = X.columns.tolist()
clf = lm
clf_fn = lm.predict

limit_size = 50
print('Subsetting training data to %s to speed up. ' % limit_size)
train_X = train_X[:limit_size]

Subsetting training data to 50 to speed up. 


In [3]:
import os
import json
import sys
sys.path.append('../../../')
from xai.compiler.base import Configuration, Controller

The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.


In [4]:
json_config = 'shap-tabular-regressor-feature-importance.json'
with open(json_config) as file:
    config = json.load(file)
config

{'name': 'Report for Housing Prices Half Mil Feature Importance Ranking',
 'overview': True,
 'content_table': True,
 'contents': [{'title': 'Feature Importance Ranking with Housing Prices data-set',
   'desc': 'This section provides the Feature Importance of model',
   'sections': [{'title': 'Feature Importance Analysis',
     'desc': 'This section provides the analysis on feature',
     'component': {'_comment': 'refer to document section xxxx',
      'class': 'FeatureImportanceRanking',
      'attr': {'trained_model': 'var:clf',
       'train_data': 'var:train_X',
       'feature_names': 'var:feature_names',
       'method': 'shap',
       'mode': 'regression'}}}]}],
 'writers': [{'class': 'Pdf',
   'attr': {'name': 'housingprices-regression-feature-importance-report'}}]}

In [5]:
controller = Controller(config=Configuration(config, locals()))

In [6]:
controller.render()

Setting feature_perturbation = "tree_path_dependent" because no background data was given.


HBox(children=(FloatProgress(value=0.0, max=50.0), HTML(value='')))

l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_




l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
Setting feature_perturbation = "tree_path_dependent" because no background data was given.


HBox(children=(FloatProgress(value=0.0, max=50.0), HTML(value='')))

l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_features(10)"!
l1_reg="auto" is deprecated and in the next version (v0.29) the behavior will change from a conditional use of AIC to simply "num_






### Results

In [7]:
pprint("report generated : %s/housingprices-regression-feature-importance-report.pdf" % os.getcwd())
('report generated : '
 '/Users/i062308/Development/Explainable_AI/tutorials/compiler/housingprices_halfmil/housingprices-regression-feature-importance-report.pdf')

NameError: name 'pprint' is not defined