# Shapash model in production - Overview

<b>With this tutorial you:</b><br />
Understand how create a Shapash SmartPredictor to make prediction and have explanation in production
with a simple use case.<br />

A tutorial more detailed, will go further to help you getting started with the SmartPredictor Object.

Contents:
- Build a Regressor
- Compile Shapash SmartExplainer
- Compile Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

In [2]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

## Building Supervized Model 

In this section, we will train a Machine Learning supervized model with our data House Prices.

In [3]:
from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [4]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

#### Encoding Categorical Features 

We need to use a preprocessing on our data for handling categorical features before the training step.

In [5]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)

is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead


#### Train / Test Split

In [6]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

#### Model Fitting

In [7]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain,ytrain)

In [8]:
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

## Understand my model with shapash

- In this section, we will use the SmartExplainer Object from shapash which allow the users to understand how the model works with the dataset specified. 
- This object must be used only for data mining step. Shapash provide another object for deployment.
- In this tutorial, we will not explore possibilites of the SmartExplainer but others will. (you can go check them)

#### Declare and Compile SmartExplainer 

In [9]:
from shapash.explainer.smart_explainer import SmartExplainer

In [10]:
xpl = SmartExplainer()

In [11]:
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer


#### Compile SmartExplainer to SmartPredictor

- When you are satisfied by your results and the explainablity given by Shapash, you can use the SmartPredictor object for deployement. 
- In this section, we will learn how to easily switch from SmartExplainer to a SmartPredictor.
- SmartPredictor allows you not to only understand results of your models but also to produce those results on new data automatically.
- It will make new predictions and summarize explainability that you configured  to make it operational to your needs.
- SmartPredictor take only neccessary attribute to be lighter and more consistent than Smartexplainer for deployment context. 
- SmartPredictor can be use with API or in batch mode.

In [12]:
predictor = xpl.to_smartpredictor()

## Save and Load your Predictor

You can easily save and load your SmartPredictor Object in pickle.

#### Save your predictor in Pickle File

In [13]:
predictor.save('./predictor.pkl')

#### Load your predictor in Pickle File

In [14]:
from shapash.utils.load_smartpredictor import load_smartpredictor

In [15]:
predictor_load = load_smartpredictor('./predictor.pkl')

## Make a prediction with your Predictor

- In order to make new predictions and summarize local explainability of your model on new datasets, you can use the method add_input of the SmartPredictor.
- The add_input method is the first step to add a dataset for prediction and explainability.
- It checks the structure of the dataset, the prediction and the contribution if specified. 
- It applies the preprocessing specified in the initialisation and reorder the features with the order used by the model. (see the documentation on this method)
- In API mode, this method can handle dictionnaries data which can be received from a GET or a POST request.

#### Add data

In [16]:
predictor_load.add_input(x=X_df, ypred=y_df)

#### Make prediction

Then, we can see ypred is the one given in add_input method by checking the attribute data["ypred"]. If not specified, it will automatically be computed in the method. 

In [17]:
predictor_load.data["ypred"]

.. table:: 

    +---------+
    |SalePrice|
    |   208500|
    +---------+
    |   181500|
    +---------+
    |   223500|
    +---------+
    |   140000|
    +---------+
    |   250000|
    +---------+


#### Get detailed explanability associated to the prediction

- You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset.
- For classification problems, it will automatically associated contributions with the right predicted label. 
- The predicted label can be compute automatically with predict method or you can specify in add_input method an ypred

In [18]:
detailed_contributions = predictor_load.detail_contributions()

In [20]:
detailed_contributions.head()

.. table:: 

    +---------+--------+--------+---------+------------+--------+--------+------------+----------+----------+------------+------------+------------+------------+--------+---------+----------+----------+----------+----------+-------------+---------+---------+-----------+-----------+----------+----------+--------+----------+----------+----------+------------+----------+----------+-----------+---------+--------+-------+---------+----------+------------+-----------+-----------+---------+-------+---------+--------+------------+----------+--------+----------+----------+-------+-------+------------+-----------+-----------+-----------+----------+--------+--------+---------+-------------+--------+-----------+------+------------+-----------+---------+----------+---------+------------+-------+
    |SalePrice|1stFlrSF|2ndFlrSF|3SsnPorch|BedroomAbvGr|BldgType|BsmtCond|BsmtExposure|BsmtFinSF1|BsmtFinSF2|BsmtFinType1|BsmtFinType2|BsmtFullBath|BsmtHalfBath|BsmtQual|BsmtUnfSF|CentralAir|Co

#### Summarize explanability of the predictions

- You can use the summarize method to summarize your local explainability
- This summary can be configured with the method modify_mask in order for you to have the explainability that satisfy your operational needs
- You can also specify :
>- a postprocessing when you initialize your SmartPredictor to apply a wording to several values of your dataset.
>- a label_dict to rename your label in classification problems (during the initialisation of your SmartPredictor).
>- a features_dict to rename your features.

In [21]:
predictor_load.modify_mask(max_contrib=5)

In [22]:
explanation = predictor_load.summarize()

For example, here, we choose to only build a summary with 5 most contributives features of your datasets.

In [23]:
explanation.head()

.. table:: 

    +---------+-----------+-------+--------------+-----------+-------+--------------+-----------+-------+--------------+------------+-------+--------------+------------+-------+--------------+
    |SalePrice| feature_1 |value_1|contribution_1| feature_2 |value_2|contribution_2| feature_3 |value_3|contribution_3| feature_4  |value_4|contribution_4| feature_5  |value_5|contribution_5|
    |   208500|OverallQual|      7|        8248.8|TotalBsmtSF|    856|       -5165.5|YearBuilt  |   2003|        3871.0|BsmtUnfSF   |    150|        3769.6|GarageArea  |    548|        3107.9|
    +---------+-----------+-------+--------------+-----------+-------+--------------+-----------+-------+--------------+------------+-------+--------------+------------+-------+--------------+
    |   181500|OverallQual|      6|      -14555.9|GrLivArea  |   1262|      -10016.3|OverallCond|      8|        6899.3|BsmtFinSF1  |    978|        5781.7|YearRemodAdd|   1976|       -4310.0|
    +---------+-------