# Shapash model in production - Overview

<b>With this tutorial you:</b><br />
Understand how to create a Shapash SmartPredictor to make prediction and have local explanation in production
with a simple use case.<br />

This tutorial describes the different steps from training the model to Shapash SmartPredictor deployment.
A more detailed tutorial allows you to know more about the SmartPredictor Object.

Contents:
- Build a Regressor
- Compile Shapash SmartExplainer
- From Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

In [1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

## Step 1 : Exploration and training of the model

### Building Supervized Model 

In this section, we train a Machine Learning supervized model with our data House Prices.

In [2]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

#### Preprocessing step 

Encoding Categorical Features

In [4]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(cols=categorical_features,
                         handle_unknown='ignore',
                         return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

#### Train / Test Split

In [5]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

#### Model Fitting

In [6]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain, ytrain)

In [7]:
y_pred = pd.DataFrame(regressor.predict(Xtest), columns=['pred'], index=Xtest.index)

### Understand my model with shapash

In this section, we use the SmartExplainer Object from shapash.
- It allows users to understand how the model works with the specified data. 
- This object must be used only for data mining step. Shapash provides another object for deployment.
- In this tutorial, we are not exploring possibilites of the SmartExplainer but others will. (see other tutorials)

#### Declare and Compile SmartExplainer 

In [8]:
from shapash.explainer.smart_explainer import SmartExplainer

#### Use wording on features names to better understanding results

Here, we use a wording to rename our features label with more understandable terms. It's usefull to make our local explainability more operational and understandable for users.
- To do this, we use the house_dict dictionary which maps a description to each features.
- We can then use it features_dict as a parameter of the SmartExplainer.

In [9]:
xpl = SmartExplainer(features_dict=house_dict)

**compile()<br />** This method is the first step to understand model and prediction.<br /> It performs the sorting
of contributions, the reverse preprocessing steps and all the calculations necessary for
a quick display of plots and efficient summary of explanation. (see SmartExplainer documentation and tutorials)

In [10]:
xpl.compile(
            x=Xtest,
            model=regressor,
            preprocessing=encoder, # Optional: compile step can use inverse_transform method
            y_pred=y_pred # Optional
            )

Backend: Shap TreeExplainer


#### Understand results of your trained model

Then, we can easily get a first summary of the explanation of the model results.
- Here, we chose to get the 3 most contributive features for each prediction.
- We used a wording to get features names more understandable in operationnal case.

In [11]:
xpl.to_pandas(max_contrib=3).head()

Unnamed: 0,pred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
259,209141.256921,Ground living area square feet,1792,13710.4,Overall material and finish of the house,7,12776.3,Total square feet of basement area,963,-5103.03
268,178734.474531,Ground living area square feet,2192,29747.0,Overall material and finish of the house,5,-26151.3,Overall condition of the house,8,9190.84
289,113950.84457,Overall material and finish of the house,5,-24730.0,Ground living area square feet,900,-16342.6,Total square feet of basement area,882,-5922.64
650,74957.162142,Overall material and finish of the house,4,-33927.7,Ground living area square feet,630,-23234.4,Total square feet of basement area,630,-11687.9
1234,135305.2435,Overall material and finish of the house,5,-25445.7,Ground living area square feet,1188,-11476.6,Condition of sale,Abnormal Sale,-5071.82


## Step 2 : SmartPredictor in production

### Switch from SmartExplainer to SmartPredictor

When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployment. 
- In this section, we learn how to easily switch from SmartExplainer to a SmartPredictor.
- SmartPredictor allows you to make predictions, detail and summarize contributions on new data automatically.
- It only keeps the attributes needed for deployment to be lighter than the SmartExplainer object. 
- SmartPredictor performs additional consistency checks before deployment.
- SmartPredictor allows you to configure the way of summary to suit your use cases.
- It can be used with API or in batch mode.

In [12]:
predictor = xpl.to_smartpredictor()

#### Save and Load your SmartPredictor

You can easily save and load your SmartPredictor Object in pickle.

#### Save your SmartPredictor in Pickle File

In [13]:
predictor.save('./predictor.pkl')

#### Load your SmartPredictor in Pickle File

In [14]:
from shapash.utils.load_smartpredictor import load_smartpredictor

In [15]:
predictor_load = load_smartpredictor('./predictor.pkl')

### Make a prediction with your SmartPredictor

In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor.
- The add_input method is the first step to add a dataset for prediction and explainability.
- It checks the structure of the dataset, the prediction and the contribution if specified. 
- It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation of this method)
- In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

#### Add data

In [16]:
predictor_load.add_input(x=X_df, ypred=y_df)

#### Make prediction

Then, we can see ypred is the one given in add_input method by checking the attribute data["ypred"]. If not specified, it will automatically be computed in the method. 

In [17]:
predictor_load.data["ypred"].head()

Unnamed: 0_level_0,SalePrice
Id,Unnamed: 1_level_1
1,208500
2,181500
3,223500
4,140000
5,250000


#### Get detailed explanability associated to the prediction

You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset.
- For classification problems, it automatically associates contributions with the right predicted label. 
- The predicted label can be computed automatically in the method or you can specify an ypred with add_input method.

In [18]:
detailed_contributions = predictor_load.detail_contributions()

In [19]:
detailed_contributions.head()

Unnamed: 0_level_0,SalePrice,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,208500,-1104.994176,1281.445856,0.0,375.679661,12.259902,157.224629,-233.02542,-738.445396,-59.294761,...,-104.645827,-351.621116,0.0,-498.228775,-5165.503476,0.0,-944.040092,3870.961681,2219.313761,17.478037
2,181500,1629.056157,-683.689921,0.0,127.17779,8.045214,166.542629,-1112.62348,5781.667847,-76.735634,...,-229.800309,-217.52546,0.0,-546.01397,2783.676113,0.0,2388.081749,340.160052,-4310.04167,413.350114
3,223500,-1321.131971,-556.399274,0.0,361.547835,10.474788,197.200789,-531.98803,61.498784,-84.596547,...,-91.178667,-323.291461,0.0,-178.77262,-5157.340087,0.0,-919.487875,3877.023405,2141.713106,-72.948518
4,140000,-991.578703,20.078384,0.0,310.409913,9.720006,226.625042,-502.49935,-3170.028608,-95.892612,...,-89.323269,-344.762081,0.0,-608.018763,-5882.195324,0.0,-853.089042,-3740.803412,-4930.879276,555.382978
5,250000,-8807.740743,-1061.015056,0.0,-1580.357621,7.868458,124.925471,-237.640755,-2109.988854,-95.455721,...,-481.118248,-384.072813,0.0,-4071.644413,-4866.753785,0.0,270.88213,2394.736098,1533.333106,-233.439195


### Summarize explanability of the predictions

- You can use the summarize method to summarize your local explainability
- This summary can be configured with modify_mask method so that you have explainability that meets your operational needs.
- When you initialize the SmartPredictor, you can also specify  :
>- postprocessing: to apply a wording to several values of your dataset.
>- label_dict: to rename your label for classification problems.
>- features_dict: to rename your features.

In [20]:
predictor_load.modify_mask(max_contrib=3)

In [21]:
explanation = predictor_load.summarize()

For example, here, we chose to build a summary with 3 most contributive features of your dataset.
- As you can see below, the wording defined in the first step of this tutorial has been kept by the SmartPredictor and used in the summarize method. 

In [22]:
explanation.head()

Unnamed: 0,SalePrice,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
1,208500,Overall material and finish of the house,7,8248.82,Total square feet of basement area,856,-5165.5,Original construction date,2003,3870.96
2,181500,Overall material and finish of the house,6,-14555.9,Ground living area square feet,1262,-10016.3,Overall condition of the house,8,6899.3
3,223500,Ground living area square feet,1786,15708.3,Overall material and finish of the house,7,11084.5,Size of garage in square feet,608,5998.61
4,140000,Overall material and finish of the house,7,8188.35,Size of garage in square feet,642,6651.57,Total square feet of basement area,756,-5882.2
5,250000,Overall material and finish of the house,8,58568.4,Ground living area square feet,2198,16891.9,Size of garage in square feet,836,15161.9
