<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Prepping-the-data" data-toc-modified-id="Prepping-the-data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Prepping the data</a></span><ul class="toc-item"><li><span><a href="#Read-the-file" data-toc-modified-id="Read-the-file-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Read the file</a></span></li><li><span><a href="#Split-features-and-targets" data-toc-modified-id="Split-features-and-targets-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Split features and targets</a></span></li><li><span><a href="#Remove-unneeded-columns" data-toc-modified-id="Remove-unneeded-columns-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Remove unneeded columns</a></span></li><li><span><a href="#Renaming-columns---optional" data-toc-modified-id="Renaming-columns---optional-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Renaming columns - optional</a></span></li><li><span><a href="#Convert-categorical-features" data-toc-modified-id="Convert-categorical-features-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Convert categorical features</a></span></li></ul></li><li><span><a href="#Split-the-data-into-train-and-test" data-toc-modified-id="Split-the-data-into-train-and-test-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Split the data into train and test</a></span></li><li><span><a href="#Train-a-simple-regressor-model" data-toc-modified-id="Train-a-simple-regressor-model-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Train a simple regressor model</a></span></li><li><span><a href="#Run-predictions" data-toc-modified-id="Run-predictions-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Run predictions</a></span></li><li><span><a href="#Using-the-actual-shapash-package" data-toc-modified-id="Using-the-actual-shapash-package-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Using the actual shapash package</a></span><ul class="toc-item"><li><span><a href="#Declare-the-SmartExplainer-object" data-toc-modified-id="Declare-the-SmartExplainer-object-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Declare the SmartExplainer object</a></span></li><li><span><a href="#Compile-Model,-Dataset,-Encoders" data-toc-modified-id="Compile-Model,-Dataset,-Encoders-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Compile Model, Dataset, Encoders</a></span></li><li><span><a href="#Display-output-as-interactive-app" data-toc-modified-id="Display-output-as-interactive-app-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Display output as interactive app</a></span></li><li><span><a href="#Show-feature-explainability" data-toc-modified-id="Show-feature-explainability-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>Show feature explainability</a></span></li><li><span><a href="#Generate-the-Shapash-Report" data-toc-modified-id="Generate-the-Shapash-Report-5.5"><span class="toc-item-num">5.5&nbsp;&nbsp;</span>Generate the Shapash Report</a></span></li></ul></li></ul></div>

# Simple Model Explainbility - Shapash

In [1]:
# shapash install info
# https://shapash.readthedocs.io/en/latest/installation-instructions/index.html
# https://github.com/MAIF/shapash

from shapash.explainer.smart_explainer import SmartExplainer
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.model_selection import train_test_split
from lightgbm import LGBMRegressor
import os
import warnings


warnings.filterwarnings("ignore")

## Prepping the data

### Read the file

In [2]:
# Let's get some public available data 
# download https://www.kaggle.com/shwetabh123/mall-customers
input_df = pd.read_csv('Mall_Customers.csv')
print(input_df.columns)
print(input_df.shape)
print(input_df.head(2))

Index(['CustomerID', 'Genre', 'Age', 'Annual Income (k$)',
       'Spending Score (1-100)'],
      dtype='object')
(200, 5)
   CustomerID Genre  Age  Annual Income (k$)  Spending Score (1-100)
0           1  Male   19                  15                      39
1           2  Male   21                  15                      81


### Split features and targets

In [3]:
# Split into features and targets
y_df = input_df['Spending Score (1-100)'].to_frame()
X_df = input_df[input_df.columns.difference(['Spending Score (1-100)'])]

### Remove unneeded columns

In [4]:
# Removing CustomerID as it doesn't add value
X_df = X_df.drop(labels=['CustomerID'],axis=1,inplace=False)
print(X_df.columns)

Index(['Age', 'Annual Income (k$)', 'Genre'], dtype='object')


### Renaming columns - optional

In [5]:
# Let's rename the Annual Income , Genre/Gender and Spending Score columns 
X_df = X_df.rename(columns={"Annual_income (k$)": "Income", "Genre": "Gender"})
y_df = y_df.rename(columns={"Spending Score (1-100)" : "Spending_Score"})

### Convert categorical features

In [6]:
# The gender column needs to be a category
categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

## Split the data into train and test

In [7]:
# Time to split the dataset into train and test
# There aren't many samples. You probably won't need a 60/40 split...most likely 80/20.
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.60, random_state=1)

## Train a simple regressor model

In [8]:
# Train a simple model
regressor = LGBMRegressor(n_estimators=50).fit(Xtrain,ytrain)

## Run predictions

In [9]:
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

## Using the actual shapash package

### Declare the SmartExplainer object

In [10]:
# Step 1: Declare SmartExplainer Object
# https://shapash.readthedocs.io/en/latest/autodocs/shapash.explainer.html
xpl = SmartExplainer()

### Compile Model, Dataset, Encoders

In [11]:
# Step 2: Compile Model, Dataset, Encoders
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    y_pred=y_pred, # Optional
)

Backend: Shap TreeExplainer


### Display output as interactive app

In [12]:
# Step 3: Display output
app = xpl.run_app()

INFO:root:Your Shapash application run on http://DESKTOP-7JL51FR:8050/


Dash is running on http://0.0.0.0:8050/



INFO:root:Use the method .kill() to down your app.
INFO:shapash.webapp.smart_app:Dash is running on http://0.0.0.0:8050/



 * Serving Flask app 'shapash.webapp.smart_app' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


### Show feature explainability

In [13]:
xpl.to_pandas(max_contrib=3).head()

INFO:werkzeug: * Running on http://192.168.1.74:8050/ (Press CTRL+C to quit)


Unnamed: 0,pred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
58,66.797417,Age,27,14.763532,Annual Income (k$),46,2.33574,Gender,Female,-0.260188
40,37.507324,Age,65,-14.006882,Annual Income (k$),38,1.103943,Gender,Female,0.45193
34,25.459063,Age,49,-15.790038,Annual Income (k$),33,-9.17716,Gender,Female,0.467928
102,40.810482,Age,67,-13.319123,Annual Income (k$),62,4.965449,Gender,Male,-0.794178
184,27.366683,Age,41,-19.319643,Annual Income (k$),99,-4.422505,Gender,Female,1.150498


### Generate the Shapash Report

In [14]:
# Step 4: Generate the Shapash Report
# https://github.com/MAIF/shapash/blob/master/tutorial/report/shapash_report_example.py

xpl.generate_report(
        output_file='medium_spending_scores_report2.html',
        project_info_file='shapash_project_info.yml',
        x_train=Xtrain,
        y_train=ytrain,
        y_test=ytest,
        title_story="Spending Scores Report",
        title_description="""This is just an easy sample.
            It was generated using the Shapash library.""",
        metrics=[
            {
                'path': 'sklearn.metrics.mean_absolute_error',
                'name': 'Mean absolute error',
            }
        ]
    )

INFO:papermill:Input Notebook:  C:\Users\dawne\anaconda3\envs\shapash\Lib\site-packages\shapash\report\base_report.ipynb
INFO:papermill:Output Notebook: C:\Users\dawne\AppData\Local\Temp\tmpodvwfajs\base_report.ipynb
INFO:blib2to3.pgen2.driver:Generating grammar tables from C:\Users\dawne\AppData\Roaming\Python\Python37\site-packages\blib2to3\Grammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\Users\dawne\AppData\Local\black\black\Cache\21.7b0\Grammar3.7.11.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such file or directory: 'C:\\Users\\dawne\\AppData\\Local\\black\\black\\Cache\\21.7b0\\tmpjt931kcq'
INFO:blib2to3.pgen2.driver:Generating grammar tables from C:\Users\dawne\AppData\Roaming\Python\Python37\site-packages\blib2to3\PatternGrammar.txt
INFO:blib2to3.pgen2.driver:Writing grammar tables to C:\Users\dawne\AppData\Local\black\black\Cache\21.7b0\PatternGrammar3.7.11.final.0.pickle
INFO:blib2to3.pgen2.driver:Writing failed: [Errno 2] No such

Executing:   0%|          | 0/15 [00:00<?, ?cell/s]

INFO:papermill:Executing notebook with kernel: python3


In [15]:
#  Step 5: From training to deployment : SmartPredictor Object
# predictor = xpl.to_smartpredictor()

INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:21] "GET / HTTP/1.1" 200 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:21] "[36mGET /assets/material-icons.css?m=1628108331.4625525 HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:21] "[36mGET /assets/style.css?m=1628108331.473523 HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:21] "[36mGET /assets/jquery.js?m=1628108331.457566 HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:21] "[36mGET /assets/main.js?m=1628108331.460558 HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:22] "GET /_dash-dependencies HTTP/1.1" 200 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:22] "GET /_dash-layout HTTP/1.1" 200 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:22] "[36mGET /assets/shapash-fond-fonce.png HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1.74 - - [04/Aug/2021 21:12:22] "[36mGET /assets/settings.png HTTP/1.1[0m" 304 -
INFO:werkzeug:192.168.1