# Shapash in Jupyter - Overview

<b>With this tutorial you:</b><br />
Understand how Shapash works in Jupyter Notebook
with a simple use case<br />

Contents:
- Build a Regressor
- Compile Shapash SmartExplainer
- Compile Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

In [1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

## Building Supervized Model 

In [2]:
import sys
sys.path.insert(0,'/home/78257d/shapash/')
from shapash.explainer.smart_predictor import SmartPredictor
from shapash.explainer.smart_explainer import SmartExplainer
from shapash.data.data_loader import data_loading
from shapash.utils.load_smartpredictor import load_smartpredictor
#from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

In [4]:
house_df.head()

Unnamed: 0_level_0,MSSubClass,MSZoning,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,2-Story 1946 & Newer,Residential Low Density,8450,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,2,2008,Warranty Deed - Conventional,Normal Sale,208500
2,1-Story 1946 & Newer All Styles,Residential Low Density,9600,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Veenker,...,0,0,0,0,0,5,2007,Warranty Deed - Conventional,Normal Sale,181500
3,2-Story 1946 & Newer,Residential Low Density,11250,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,9,2008,Warranty Deed - Conventional,Normal Sale,223500
4,2-Story 1945 & Older,Residential Low Density,9550,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Corner lot,Gentle slope,Crawford,...,272,0,0,0,0,2,2006,Warranty Deed - Conventional,Abnormal Sale,140000
5,2-Story 1946 & Newer,Residential Low Density,14260,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Northridge,...,0,0,0,0,0,12,2008,Warranty Deed - Conventional,Normal Sale,250000


#### Encoding Categorical Features 

In [5]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)


is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead



#### Train / Test Split

In [6]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

#### Model Fitting

In [7]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain,ytrain)

In [8]:
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

## Understand my model with shapash

#### Declare and Compile SmartExplainer 

In [9]:
from shapash.explainer.smart_explainer import SmartExplainer

In [10]:
xpl = SmartExplainer(features_dict=house_dict) # Optional parameter, dict specifies label for features name 

In [11]:
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer


#### Compile SmartExplainer to SmartPredictor

In [12]:
predictor = xpl.to_smartpredictor()

## Save and Load your Predictor

#### Save your predictor in Pickle File

In [13]:
predictor.save('./predictor.pkl')

#### Load your predictor in Pickle File

In [15]:
predictor_load = load_smartpredictor('./predictor.pkl')

## Make a prediction with your Predictor

#### Add data

In [19]:
predictor_load.add_input(x=X_df, ypred=y_df)

#### Make prediction

In [24]:
prediction = predictor_load.predict()

In [25]:
prediction.head()

Unnamed: 0_level_0,ypred
Id,Unnamed: 1_level_1
1,206462.878757
2,181127.963794
3,221478.052244
4,184788.423141
5,256637.518234


#### Get detailed explanability associated to the prediction

In [None]:
detailed_contributions = predictor_load.detail_contributions()

In [None]:
detailed_contributions.head()

#### Summarize explainability of the predictions

In [None]:
predictor_load.modify_mask(max_contrib=10)

In [26]:
explanation = predictor_load.summarize()

In [27]:
explanation.head()

Unnamed: 0,ypred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3,...,contribution_30,feature_31,value_31,contribution_31,feature_32,value_32,contribution_32,feature_33,value_33,contribution_33
1,206462.878757,Overall material and finish of the house,7,8248.82,Total square feet of basement area,856,-5165.5,Original construction date,2003,3870.96,...,334.984,Garage quality,0,304.462,Half baths above grade,0,286.121,Lot configuration,0,-276.762
2,181127.963794,Overall material and finish of the house,6,-14555.9,Ground living area square feet,1262,-10016.3,Overall condition of the house,8,6899.3,...,343.581,Exterior covering on house,0,340.65,Original construction date,0,340.16,Interior finish of the garage?,0,335.442
3,221478.052244,Ground living area square feet,1786,15708.3,Overall material and finish of the house,7,11084.5,Size of garage in square feet,608,5998.61,...,-323.291,Masonry veneer area in square feet,0,-295.708,Garage quality,0,290.116,Physical locations within Ames city limits,0,260.384
4,184788.423141,Overall material and finish of the house,7,8188.35,Size of garage in square feet,642,6651.57,Total square feet of basement area,756,-5882.2,...,345.697,Screen porch area in square feet,0,-344.762,Year garage was built,0,315.665,Bedrooms above grade,0,310.41
5,256637.518234,Overall material and finish of the house,8,58568.4,Ground living area square feet,2198,16891.9,Size of garage in square feet,836,15161.9,...,-361.637,Full bathrooms above grade,0,-309.068,Wood deck area in square feet,0,270.882,Masonry veneer type,0,266.871
