# Shapash Model Overview

https://shapash.readthedocs.io/en/latest/

## With this tutorial you: 
Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production with a simple use case.

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment. A more detailed tutorial allows you to know more

about the SmartPredictor Object.

## Contents:

Build a Regressor

Compile Shapash SmartExplainer

From Shapash SmartExplainer to SmartPredictor

Save Shapash Smartpredictor Object in pickle file

Make a prediction

## Load The DataSet


In [1]:
import seaborn as sns
df=sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


## Divide the dataset into independent and dependent dataset

In [2]:
y=df['tip']
X=df[df.columns.difference(['tip'])]

In [3]:
X.head()

Unnamed: 0,day,sex,size,smoker,time,total_bill
0,Sun,Female,2,No,Dinner,16.99
1,Sun,Male,3,No,Dinner,10.34
2,Sun,Male,3,No,Dinner,21.01
3,Sun,Male,2,No,Dinner,23.68
4,Sun,Female,4,No,Dinner,24.59


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB


# Apply LabelEncoder to Catagorical Columns column

In [5]:
columns=['sex','smoker','day','time']
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

for column in columns:
    X[column] = le.fit_transform(X[column])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = le.fit_transform(X[column])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = le.fit_transform(X[column])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X[column] = le.fit_transform(X[column])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row

In [6]:
X.head()

Unnamed: 0,day,sex,size,smoker,time,total_bill
0,2,0,2,0,0,16.99
1,2,1,3,0,0,10.34
2,2,1,3,0,0,21.01
3,2,1,2,0,0,23.68
4,2,0,4,0,0,24.59


## Train Test split

In [7]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.75,random_state=1)

## Applying RandomForestRegressor On The Training Data

In [8]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=200).fit(X_train,y_train)

# Lets Understand Our Model With Shapash
In this section, we use the SmartExplainer Object from shapash.

It allows users to understand how the model works with the specified data.
This object must be used only for data mining step. Shapash provides another object for deployment.

In [9]:
from shapash.explainer.smart_explainer import SmartExplainer

In [10]:
xpl = SmartExplainer(model=regressor)

In [11]:
xpl.compile(X_test)

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x000002025C5D9250>


In [12]:
xpl

<shapash.explainer.smart_explainer.SmartExplainer at 0x20262feae90>

# Lets Understand the results of your trained model
Then, we can easily get a first summary of the explanation of the model results.

Here, we chose to get the 3 most contributive features for each prediction.
We used a wording to get features names more understandable in operationnal case

In [13]:
app = xpl.run_app(title_story='Tips Dataset')


Index.is_numeric is deprecated. Use pandas.api.types.is_any_real_numeric_dtype instead

INFO:root:Your Shapash application run on http://DESKTOP-G5S0J8E:8050/
INFO:root:Use the method .kill() to down your app.


In [14]:
predictor = xpl.to_smartpredictor()
predictor.save('./predictor.pkl')
from shapash.utils.load_smartpredictor import load_smartpredictor
predictor_load = load_smartpredictor('./predictor.pkl')

# Make a prediction with your SmartPredictor
In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the

 SmartPredictor.

The add_input method is the first step to add a dataset for prediction and explainabili
ty.
It checks the structure of the dataset, the prediction and the contribution if specif
ied.
It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of

this method)
In API mode, this method can handle dictionnaries data which can be received from a GET or a POST re
quest.
A
dd data
The x input in add_input method doesn't have to be encoded, add_input applies preprocessing.

In [15]:
predictor_load.add_input(x=X, ypred=y)

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x000002026754C410>


In [16]:
detailed_contributions = predictor_load.detail_contributions()

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x000002026754C410>


In [17]:
detailed_contributions.head()

Unnamed: 0,tip,day,sex,size,smoker,time,total_bill
0,1.01,0.039089,0.099318,-0.018564,-0.035033,-0.002448,0.222412
1,1.66,0.080101,-0.022067,-0.008183,-0.060481,-0.020191,-1.191057
2,3.5,0.041035,-0.01844,-0.012122,-0.02724,0.01289,0.461699
3,3.31,0.071011,-0.015454,-0.027504,-0.004976,0.012226,0.119769
4,3.61,0.034589,0.052169,-0.042771,-0.006804,-0.005541,-0.300319


# Summarize explanability of the predictions
You can use the summarize method to summarize your local explainability

This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.

In [18]:
predictor_load.modify_mask(max_contrib=3)
explanation = predictor_load.summarize()
explanation.head()

Unnamed: 0,tip,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
0,1.01,total_bill,16.99,0.222412,sex,0.0,0.099318,day,2.0,0.039089
1,1.66,total_bill,10.34,-1.191057,day,2.0,0.080101,smoker,0.0,-0.060481
2,3.5,total_bill,21.01,0.461699,day,2.0,0.041035,smoker,0.0,-0.02724
3,3.31,total_bill,23.68,0.119769,day,2.0,0.071011,size,2.0,-0.027504
4,3.61,total_bill,24.59,-0.300319,sex,0.0,0.052169,size,4.0,-0.042771
