# Shapash in Jupyter - Overview

<b>With this tutorial you:</b><br />
Understand how Shapash works in Jupyter Notebook
with a simple use case<br />

Contents:
- Build a Regressor
- Compile Shapash SmartExplainer
- Compile Shapash SmartExplainer to SmartPredictor
- Save Shapash Smartpredictor Object in pickle file
- Make a prediction

Data from Kaggle [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)

In [1]:
import pandas as pd
from category_encoders import OrdinalEncoder
from lightgbm import LGBMRegressor
from sklearn.model_selection import train_test_split

## Building Supervized Model 

In [2]:
import sys
sys.path.insert(0,'/home/78257d/shapash/')
from shapash.explainer.smart_predictor import SmartPredictor
from shapash.explainer.smart_explainer import SmartExplainer
from shapash.data.data_loader import data_loading
from shapash.utils.load_smartpredictor import load_smartpredictor
#from shapash.data.data_loader import data_loading
house_df, house_dict = data_loading('house_prices')

In [3]:
y_df=house_df['SalePrice'].to_frame()
X_df=house_df[house_df.columns.difference(['SalePrice'])]

In [4]:
house_df.head()

Unnamed: 0_level_0,MSSubClass,MSZoning,LotArea,Street,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,...,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,2-Story 1946 & Newer,Residential Low Density,8450,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,2,2008,Warranty Deed - Conventional,Normal Sale,208500
2,1-Story 1946 & Newer All Styles,Residential Low Density,9600,Paved,Regular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Veenker,...,0,0,0,0,0,5,2007,Warranty Deed - Conventional,Normal Sale,181500
3,2-Story 1946 & Newer,Residential Low Density,11250,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Inside lot,Gentle slope,College Creek,...,0,0,0,0,0,9,2008,Warranty Deed - Conventional,Normal Sale,223500
4,2-Story 1945 & Older,Residential Low Density,9550,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Corner lot,Gentle slope,Crawford,...,272,0,0,0,0,2,2006,Warranty Deed - Conventional,Abnormal Sale,140000
5,2-Story 1946 & Newer,Residential Low Density,14260,Paved,Slightly irregular,Near Flat/Level,"All public Utilities (E,G,W,& S)",Frontage on 2 sides of property,Gentle slope,Northridge,...,0,0,0,0,0,12,2008,Warranty Deed - Conventional,Normal Sale,250000


#### Encoding Categorical Features 

In [5]:
from category_encoders import OrdinalEncoder

categorical_features = [col for col in X_df.columns if X_df[col].dtype == 'object']

encoder = OrdinalEncoder(
    cols=categorical_features,
    handle_unknown='ignore',
    return_df=True).fit(X_df)

X_df=encoder.transform(X_df)


is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead



#### Train / Test Split

In [6]:
Xtrain, Xtest, ytrain, ytest = train_test_split(X_df, y_df, train_size=0.75, random_state=1)

#### Model Fitting

In [7]:
regressor = LGBMRegressor(n_estimators=200).fit(Xtrain,ytrain)

In [8]:
y_pred = pd.DataFrame(regressor.predict(Xtest),columns=['pred'],index=Xtest.index)

## Understand my model with shapash

#### Declare and Compile SmartExplainer 

In [9]:
from shapash.explainer.smart_explainer import SmartExplainer

In [10]:
xpl = SmartExplainer()

In [11]:
xpl.compile(
    x=Xtest,
    model=regressor,
    preprocessing=encoder, # Optional: compile step can use inverse_transform method
    y_pred=y_pred # Optional
)

Backend: Shap TreeExplainer


#### Compile SmartExplainer to SmartPredictor

In [12]:
predictor = xpl.to_smartpredictor()

## Save and Load your Predictor

#### Save your predictor in Pickle File

In [13]:
predictor.save('./predictor.pkl')

#### Load your predictor in Pickle File

In [14]:
predictor_load = load_smartpredictor('./predictor.pkl')

## Make a prediction with your Predictor

#### Add data

In [15]:
predictor_load.add_input(x=X_df, ypred=y_df)

#### Make prediction

In [16]:
prediction = predictor_load.predict()

In [17]:
prediction.head()

Unnamed: 0_level_0,ypred
Id,Unnamed: 1_level_1
1,206462.878757
2,181127.963794
3,221478.052244
4,184788.423141
5,256637.518234


#### Get detailed explanability associated to the prediction

In [18]:
detailed_contributions = predictor_load.detail_contributions()

In [19]:
detailed_contributions.head()

Unnamed: 0_level_0,ypred,1stFlrSF,2ndFlrSF,3SsnPorch,BedroomAbvGr,BldgType,BsmtCond,BsmtExposure,BsmtFinSF1,BsmtFinSF2,...,SaleType,ScreenPorch,Street,TotRmsAbvGrd,TotalBsmtSF,Utilities,WoodDeckSF,YearBuilt,YearRemodAdd,YrSold
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,206462.878757,-1104.994176,1281.445856,0.0,375.679661,12.259902,157.224629,-233.02542,-738.445396,-59.294761,...,-104.645827,-351.621116,0.0,-498.228775,-5165.503476,0.0,-944.040092,3870.961681,2219.313761,17.478037
2,181127.963794,1629.056157,-683.689921,0.0,127.17779,8.045214,166.542629,-1112.62348,5781.667847,-76.735634,...,-229.800309,-217.52546,0.0,-546.01397,2783.676113,0.0,2388.081749,340.160052,-4310.04167,413.350114
3,221478.052244,-1321.131971,-556.399274,0.0,361.547835,10.474788,197.200789,-531.98803,61.498784,-84.596547,...,-91.178667,-323.291461,0.0,-178.77262,-5157.340087,0.0,-919.487875,3877.023405,2141.713106,-72.948518
4,184788.423141,-991.578703,20.078384,0.0,310.409913,9.720006,226.625042,-502.49935,-3170.028608,-95.892612,...,-89.323269,-344.762081,0.0,-608.018763,-5882.195324,0.0,-853.089042,-3740.803412,-4930.879276,555.382978
5,256637.518234,-8807.740743,-1061.015056,0.0,-1580.357621,7.868458,124.925471,-237.640755,-2109.988854,-95.455721,...,-481.118248,-384.072813,0.0,-4071.644413,-4866.753785,0.0,270.88213,2394.736098,1533.333106,-233.439195


#### Summarize explanability of the predictions

In [21]:
predictor_load.modify_mask(max_contrib=10)

In [22]:
explanation = predictor_load.summarize()

In [23]:
explanation.head()

Unnamed: 0,ypred,feature_1,value_1,contribution_1,feature_2,value_2,contribution_2,feature_3,value_3,contribution_3,...,contribution_7,feature_8,value_8,contribution_8,feature_9,value_9,contribution_9,feature_10,value_10,contribution_10
1,206462.878757,OverallQual,7,8248.82,TotalBsmtSF,856,-5165.5,YearBuilt,2003,3870.96,...,2219.31,MSSubClass,5,2069.88,BsmtFinType1,,1756.7,OverallCond,,-1507.9
2,181127.963794,OverallQual,6,-14555.9,GrLivArea,1262,-10016.3,OverallCond,8,6899.3,...,2783.68,Neighborhood,298,2753.12,WoodDeckSF,1262.0,2388.08,1stFlrSF,,1629.06
3,221478.052244,GrLivArea,1786,15708.3,OverallQual,7,11084.5,GarageArea,608,5998.61,...,2141.71,BsmtFullBath,5,1806.24,OverallCond,,-1630.02,BsmtFinType1,,1440.23
4,184788.423141,OverallQual,7,8188.35,GarageArea,642,6651.57,TotalBsmtSF,756,-5882.2,...,2969.71,MSSubClass,5,2767.33,OverallCond,,-1875.09,Neighborhood,,1585.21
5,256637.518234,OverallQual,8,58568.4,GrLivArea,2198,16891.9,GarageArea,836,15161.9,...,-4866.75,KitchenQual,12,-4611.78,MoSold,9.0,-4240.05,TotRmsAbvGrd,,-4071.64
