# Shapash model in production - Overview

<b>With this tutorial you:</b><br />
Understand how create a Shapash SmartPredictor to make prediction and have explanation in production
with a simple use case.<br />

A tutorial more detailed, will go further to help you getting started with the SmartPredictor Object.

Contents:
- Build a Regressor
- Compile Shapash SmartExplainer
- Compile Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

In [1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

## Building Supervized Model 

In this section, we will train a Machine Learning supervized model with our data House Prices.

In [4]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [5]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

#### Encoding Categorical Features 

We need to use a preprocessing on our data for handling categorical features before the training step.

In [7]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

#### Train / Test Split

In [8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

#### Model Fitting

In [9]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain,ytrain)

In [10]:
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

## Understand my model with shapash

- In this section, we will use the SmartExplainer Object from shapash which allow the users to understand how the model works with the dataset specified. 
- This object must be used only for data mining step. Shapash provide another object for deployment.
- In this tutorial, we will not explore possibilites of the SmartExplainer but others will. (you can go check them)

#### Declare and Compile SmartExplainer 

In [11]:
from shapash.explainer.smart_explainer import SmartExplainer

In [12]:
xpl = SmartExplainer()

In [31]:
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer


#### Compile SmartExplainer to SmartPredictor

- When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployement. 
- In this section, we will learn how to easily switch from SmartExplainer to a SmartPredictor.
- SmartPredictor allows you not to only understand results of your models but also to produce those results on new data automatically.
- It will make new predictions and summarize explainability that you configured  to make it operational to your needs.
- SmartPredictor take only neccessary attribute to be lighter and more consistent than Smartexplainer for deployment context. 
- SmartPredictor can be use with API or in batch mode.

In [14]:
predictor = xpl.to_smartpredictor()

## Save and Load your Predictor

You can easily save and load your SmartPredictor Object in pickle.

#### Save your predictor in Pickle File

In [15]:
predictor.save('./predictor.pkl')

#### Load your predictor in Pickle File

In [17]:
from shapash.utils.load_smartpredictor import load_smartpredictor

In [18]:
predictor_load = load_smartpredictor('./predictor.pkl')

## Make a prediction with your Predictor

- In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor.
- The add_input method is the first step to add a dataset for prediction and explainability.
- It checks the structure of the dataset, the prediction and the contribution if specified. 
- It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation on this method)
- In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

#### Add data

In [19]:
predictor_load.add_input(x=X_df, ypred=y_df)

#### Make prediction

You can use the method predict of the SmartPredictor to make prediction on your newdata added before with add_input.

In [20]:
prediction = predictor_load.predict()

In [21]:
prediction.head()

Unnamed: 0_level_0,ypred
Id,Unnamed: 1_level_1
1,206462.878757
2,181533.722748
3,223035.016548
4,163889.216253
5,253226.682587


#### Get detailed explanability associated to the prediction

- You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset.
- For classification problems, it will automatically associated contributions with the right predicted label. 
- The predicted label can be compute automatically with predict method or you can specify in add_input method an ypred

In [22]:
detailed_contributions = predictor_load.detail_contributions()

In [23]:
detailed_contributions.head()

Unnamed: 0_level_0,ypred,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,206462.878757,-1104.994176,1281.445856,0.0,375.679661,12.259902,157.224629,-233.02542,-738.445396,-59.294761,...,-104.645827,-351.621116,0.0,-498.228775,-5165.503476,0.0,-944.040092,3870.961681,2219.313761,17.478037
2,181533.722748,2249.403962,-655.861167,0.0,123.907278,-9.270166,139.43186,2699.247506,5102.469936,-84.771341,...,-153.842142,-236.526862,0.0,-705.112993,2988.981279,0.0,2090.785074,323.902986,-3861.776078,424.382977
3,223035.016548,-1426.795115,-616.113112,0.0,369.536957,9.210944,199.213726,1032.288162,-92.179454,-93.16931,...,-91.178667,-280.832451,0.0,-324.734175,-5338.340597,0.0,-777.746743,3837.761102,2192.921648,-98.965041
4,163889.216253,-653.873832,121.459865,0.0,307.677892,9.720006,252.786934,-530.156452,-2987.649814,-77.039912,...,-114.608224,-338.435699,0.0,-635.065828,-6548.453864,0.0,-974.50314,-3386.36121,-5232.537839,1633.763619
5,253226.682587,-9531.577733,-1097.620788,0.0,-1574.988323,7.453569,130.470247,623.939546,-2396.572526,-92.929525,...,-481.118248,-366.250007,0.0,-4733.60306,-4675.706762,0.0,165.653455,2334.652063,1355.358932,-395.126541


#### Summarize explanability of the predictions

- You can use the summarize method to summarize your local explainability
- This summary can be configured with the method modify_mask in order for you to have the explainability that satisfy your operational needs
- You can also specify :
>- a postprocessing when you initialize your SmartPredictor to apply a wording to several values of your dataset.
>- a label_dict to rename your label in classification problems (during the initialisation of your SmartPredictor).
>- a features_dict to rename your features.

In [27]:
predictor_load.modify_mask(max_contrib=5)

In [28]:
explanation = predictor_load.summarize()

For example, here, we choose to only build a summary with 5 most contributives features of your datasets.

In [29]:
explanation.head()

Unnamed: 0,ypred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3,feature_4,value_4,contribution_4,feature_5,value_5,contribution_5
1,206462.878757,OverallQual,7,8248.82,TotalBsmtSF,856,-5165.5,YearBuilt,2003,3870.96,BsmtUnfSF,150,3769.64,GarageArea,548,3107.92
2,181533.722748,OverallQual,6,-14419.4,GrLivArea,1262,-9238.07,OverallCond,8,6371.61,BsmtFinSF1,978,5102.47,Fireplaces,1,4450.28
3,223035.016548,GrLivArea,1786,15880.4,OverallQual,7,9651.28,GarageArea,608,6259.46,TotalBsmtSF,920,-5338.34,YearBuilt,2001,3837.76
4,163889.216253,TotalBsmtSF,756,-6548.45,YearRemodAdd,1970,-5232.54,GarageArea,642,4384.29,OverallQual,7,4330.48,MSSubClass,3,-3676.29
5,253226.682587,OverallQual,8,55722.1,GrLivArea,2198,17176.5,GarageArea,836,14907.7,1stFlrSF,1145,-9531.58,LotArea,14260,8143.13
