# From model training to deployment - an introduction to the SmartPredictor object

Shapash provide a SmartPredictor Object to make prediction and local explainability for operational needs in deployment context.
It gives a simple synthetic explanation from your model predictions results. <br />
SmartPredictor allows users to configure the summary as they wanted. <br />
It is an object dedicated to deployment, lighter than SmartExplainer Object with additionnal consistency checks. <br />
SmartPredictor can be used with an API or in batch mode. <br />

In this tutorial, we will go further to help you getting started with the SmartPredictor Object of Shapash.

Contents:
- Build a SmartPredictor
- Save and Load a Smartpredictor
- Add input
- Use label and wording
- Summarize explaination

We used Kaggle's [Titanic](https://www.kaggle.com/c/titanic) dataset

## Step 1: Exploration and training of the model

### Import Dataset

First, we need to import a dataset. Here we chose the famous dataset Titanic from Kaggle.

In [1]:
import numpy as np
import pandas as pd
from category_encoders import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import shap

In [2]:
from shapash.explainer.smart_explainer import SmartExplainer
from shapash.explainer.smart_predictor import SmartPredictor
from shapash.utils.load_smartpredictor import load_smartpredictor
from shapash.data.data_loader import data_loading

In [3]:
titan_df, titan_dict = data_loading('titanic')
del titan_df['Name']

In [4]:
titan_df.head()

Unnamed: 0_level_0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Title
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,0,Third class,male,22.0,1,0,7.25,Southampton,Mr
2,1,First class,female,38.0,1,0,71.28,Cherbourg,Mrs
3,1,Third class,female,26.0,0,0,7.92,Southampton,Miss
4,1,First class,female,35.0,1,0,53.1,Southampton,Mrs
5,0,Third class,male,35.0,0,0,8.05,Southampton,Mr


### Create Classification Model

In this section, we will train a Machine Learning supervized model with our data. In our example, we are confronted to a classification problem.

In [5]:
y = titan_df['Survived']
X = titan_df.drop('Survived', axis=1)

In [6]:
varcat=['Pclass','Sex','Embarked','Title']

#### Encoding Categorical Features 

We need to use a preprocessing on our data for handling categorical features before the training step.

In [7]:
categ_encoding = OrdinalEncoder(cols=varcat, \
                                handle_unknown='ignore', \
                                return_df=True).fit(X)
X = categ_encoding.transform(X)

#### Train Test split + Random Forest fit

In [8]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.75, random_state=1)

rf = RandomForestClassifier(n_estimators=100,min_samples_leaf=3)
rf.fit(Xtrain, ytrain)

RandomForestClassifier(min_samples_leaf=3)

In [9]:
ypred=pd.DataFrame(rf.predict(Xtest),columns=['pred'],index=Xtest.index)

### Explore your trained model results Step with SmartExplainer

When the training step is done, we can start to initialize our SmartExplainer Object.

In [10]:
from shapash.explainer.smart_explainer import SmartExplainer

SmartExplainer takes only necessary dicts of the model features

#### Use Label and Wording

Here, we will use labels and wording to get a more understandable explanabily.
- features_dict : allow users to rename the features of their datasets with the one needed
- label_dict : allow users in classification problems to rename label predicted with the one needed
- postprocessing : allow users to apply some wording to the features wanted

In [11]:
feature_dict = {'Pclass': 'Ticket class',
 'Sex': 'Sex',
 'Age': 'Age',
 'SibSp': 'Relatives such as brother or wife',
 'Parch': 'Relatives like children or parents',
 'Fare': 'Passenger fare',
 'Embarked': 'Port of embarkation',
 'Title': 'Title of passenger'}

In [12]:
label_dict = {0: "Not Survived",1: "Survived"}

In [13]:
postprocessing = {"Pclass": {'type': 'transcoding', 'rule': { 'First class' : '1st class', 'Second class' : '2nd class', "Third class" : "3rd class"}}}

#### Define a SmartExplainer

Initialize our SmartExplainer Object with wording defined above.

In [14]:
xpl = SmartExplainer(label_dict = label_dict, features_dict=feature_dict)

Then, we need to use the compile method of the SmartExplainer Object.<br /> This method is the first step to understand model and prediction. It performs the sorting
of contributions, the reverse preprocessing steps and performs all the calculations necessary for
a quick display of plots and efficient display of summary of explanation. (see the documentation on SmartExplainer Object and the associated tutorials to go further)

In [15]:
xpl.compile(
    x=Xtest,
    model=rf,
    preprocessing=categ_encoding,
    y_pred=ypred,
    postprocessing = postprocessing
)

Backend: Shap TreeExplainer


#### Understand results of your trained model with SmartExplainer

Then, we can easily get a first summary of the explanation of the model results.
- Here, we chose to get the 3 most contributive features for each prediction
- We used a wording to get features names more understandable in operationnal case.
- We renamed the label predicted with more interpretable labels.
- We chose to apply a postprocessing to some values of our features.

In [16]:
xpl.to_pandas(max_contrib=3).head()

Unnamed: 0,pred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
863,Survived,Sex,female,0.198753,Title of passenger,Mrs,0.168618,Ticket class,1st class,0.136428
224,Not Survived,Title of passenger,Mr,0.0906221,Sex,male,0.0774233,Passenger fare,7.9,0.0654413
85,Survived,Title of passenger,Miss,0.201242,Sex,female,0.182428,Ticket class,2nd class,0.0914471
681,Survived,Title of passenger,Miss,0.189312,Sex,female,0.154496,Port of embarkation,Queenstown,0.115719
536,Survived,Title of passenger,Miss,0.173563,Ticket class,2nd class,0.145691,Sex,female,0.110403


## Step 2: SmartPredictor in production

- to_smartpredictor() is a method create to get a SmartPredictor object.
- It allows users to switch from a SmartExplainer used for data mining to the SmartPredictor.
- SmartPredictor takes only neccessary attribute to be lighter and more consistent than Smartexplainer.
- SmartPredictor object is specific for deployement. 
- In this section, we will learn how to initialize a SmartPredictor.
- SmartPredictor allows you not to only understand results of your models but also to produce those results on new data automatically.
- It will make new predictions and summarize explainability that you configured  to make it operational to your needs.
- SmartPredictor take only neccessary attribute to be lighter and more consistent than Smartexplainer for deployment context. 
- SmartPredictor can be use with API or in batch mode.
- It handles dataframes and dictionnaries input data.

### Switch from SmartExplainer Object to SmartPredictor Object

In [17]:
predictor = xpl.to_smartpredictor()

#### Save your predictor in Pickle File

In [18]:
predictor.save('./predictor.pkl')

#### Load your predictor in Pickle File

In [19]:
predictor_load = load_smartpredictor('./predictor.pkl')

### Make a prediction with your SmartPredictor

- Once our SmartPredictor has been initialized, we can easily apply predictions and summary to new datasets.
- First, we have to specify a new dataset which can be a pandas.DataFrame or a dictionnary (usefull when you decide to use an API in your deployment process)
- We will use the add_input method of the SmartPredictor. (see the documentation for this method)

#### Add data

In [20]:
person_x = {'Pclass': 'First class',
 'Sex': 'female',
 'Age': 36,
 'SibSp': 1,
 'Parch': 0,
 'Fare': 7.25,
 'Embarked': 'Cherbourg',
 'Title': 'Miss'}

In [21]:
predictor_load.add_input(x=person_x)

If you don't specify an ypred in the add_input method, SmartPredictor will use its predict method to automatically affect the predicted value to ypred.

#### Make prediction

Then, we can see that ypred is automatically computed in add_input method by checking the attribute data["ypred"] thanks to our model trained and the new dataset given. 

In [22]:
predictor_load.data["ypred"]

Unnamed: 0,ypred,proba
0,Survived,0.702935


We can also use the predict_proba method of the SmartPredictor to automatically compute the probabilties associated to each label possible with our model and the new dataset.

In [23]:
prediction_proba = predictor_load.predict_proba()

In [24]:
prediction_proba

Unnamed: 0,class_0,class_1
0,0.297065,0.702935


### Get detailed explanability associated to the prediction

- You can use the method detail_contributions to see the detailed contributions of each of your features for each row of your new dataset.
- For classification problems, it will automatically associated contributions with the right predicted label. (like you can see below) 
- The predicted label can be compute automatically with predict method or you can specify in add_input method an ypred

In [25]:
detailed_contributions = predictor_load.detail_contributions()

You can notice here that the ypred has already been renamed with the value that we have given in the label_dict.

In [26]:
detailed_contributions

Unnamed: 0,ypred,proba,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Title
0,Survived,0.702935,0.0987605,0.160839,-0.0124442,0.00474909,-0.0083478,-0.103306,0.0186102,0.174


### Summarize explanability of the predictions

- You can use the summarize method to summarize your local explainability
- This summary can be configured with the method modify_mask in order for you to have the explainability that satisfy your operational needs
- You can also specify :
>- a postprocessing when you initialize your SmartPredictor to apply a wording to several values of your dataset.
>- a label_dict to rename your label in classification problems (during the initialisation of your SmartPredictor).
>- a features_dict to rename your features.

Here, we chose to use modify_mask method to only get the 3 most contributives features in our explanability.

In [27]:
predictor_load.modify_mask(max_contrib=3)

In [28]:
explanation = predictor_load.summarize()

- You can notice in the summarize that the dictionnary of mapping given to the SmartExplainer Object allow us to rename the 'Title' feature into 'Title of passenger'. 
- Also, we can see that the value of this features has been worded correctly has we configured it : First class became 1st class.
- Our explanability is focused on the 3 most contributive features.

In [29]:
explanation

Unnamed: 0,ypred,proba,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
0,Survived,0.702935,Title of passenger,Miss,0.174,Sex,female,0.160839,Passenger fare,7.25,-0.103306


### Configure your summary easily

#### If contributions wanted are the ones associated to the class 0 (More useful in multiclass classification)


Then, you can easily change the ypred or the x given to the add_input to make new prediction and summary of your explanability

You can specify an ypred to get explanability from the label that you prefer to predict instead.

In [30]:
predictor_load.add_input(x=person_x, ypred=pd.DataFrame({0}))

In [31]:
predictor_load.modify_mask(max_contrib=3)

In [32]:
explanation = predictor_load.summarize()

Here, we changed the ypred from label predicted 1 to 0 which allow us to automatically get the explanability of features that are associated to the right label predicted.

In [33]:
explanation

Unnamed: 0,0,proba,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3
0,Not Survived,0.297065,Title of passenger,Miss,-0.174,Sex,female,-0.160839,Passenger fare,7.25,0.103306


#### If users don't want one feature and want only positive contributions to restituate

- The modify_mask method allows us to configure the explanability to satisfy our needs in opeartional process.
- Here, we can choose to hide some features from our explanability and only get the one which has positive contributions. 

In [34]:
predictor_load.modify_mask(features_to_hide=["Fare"], positive=True)

In [35]:
explanation = predictor_load.summarize()

In [36]:
explanation

Unnamed: 0,0,proba,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2
0,Not Survived,0.297065,Age,36,0.0124442,Relatives like children or parents,0,0.0083478


#### If users want to restituate only contributions with a minimum of impact

Here, we chose to only show the features which has a contribution greater than 0.01.

In [37]:
predictor_load.modify_mask(threshold=0.01)

In [38]:
explanation = predictor_load.summarize()

In [39]:
explanation

Unnamed: 0,0,proba,feature_1,value_1,contribution_1
0,Not Survived,0.297065,Age,36,0.0124442
