# Use Shapash Webapp with Eurybia


**With this tutorial, you will**
learn to use Eurybia and the Shapash webapp to understand your datadrift classifier<br />

Contents:
- Build a model to deploy
- Do data validation between learning dataset and production dataset
- Generate Report 
- Run Webapp


Data from Kaggle [Titanic](https://www.kaggle.com/c/titanic)<br />

**Requirements notice** : the following tutorial may use third party modules not included in Eurybia.  
You can find them all in one file [on our Github repository](https://github.com/MAIF/eurybia/blob/master/requirements.dev.txt) or you can manually install those you are missing, if any.

In [1]:
from category_encoders import OrdinalEncoder
import catboost
from sklearn.model_selection import train_test_split

## Building Supervized Model


In [3]:
from eurybia.data.data_loader import data_loading


Using Panel interactively in VSCode notebooks requires the jupyter_bokeh package to be installed. You can install it with:

   pip install jupyter_bokeh

or:
    conda install jupyter_bokeh

and try again.



In [4]:
titan_df = data_loading('titanic')

In [5]:
features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']

In [6]:
encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features], verbose=False) 

In [7]:
titan_df_encoded = encoder.transform(titan_df[features])

In [8]:
X_train, X_test, y_train, y_test = train_test_split(
    titan_df_encoded,
    titan_df['Survived'].to_frame(),
    test_size=0.2,
    random_state=11
)

In [9]:
i=0
indice_cat  = []
for feature in titan_df_encoded:
    if feature in features_to_encode:
        indice_cat.append(i)
    i=i+1

In [10]:
model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
        learning_rate=0.143852,
        iterations=500,
        l2_leaf_reg=15,
        max_depth = 4)

In [11]:
train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat) 

In [12]:
model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)

## Creating a fake dataset as a production dataset



In [13]:
import random

In [14]:
df_production = titan_df.copy()

In [15]:
df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))

In [16]:
df_baseline = titan_df[features]
df_current = df_production[features]

In [17]:
df_current.head()

Unnamed: 0_level_0,Pclass,Age,Embarked,Sex,SibSp,Parch,Fare
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Third class,36.0,Southampton,male,1,0,57.0
2,First class,11.0,Cherbourg,female,1,0,94.0
3,Third class,12.0,Southampton,female,0,0,25.0
4,First class,60.0,Southampton,female,1,0,94.0
5,Third class,22.0,Southampton,female,0,0,84.0


In [18]:
df_baseline.head()

Unnamed: 0_level_0,Pclass,Age,Embarked,Sex,SibSp,Parch,Fare
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Third class,22.0,Southampton,male,1,0,7.25
2,First class,38.0,Cherbourg,female,1,0,71.28
3,Third class,26.0,Southampton,female,0,0,7.92
4,First class,35.0,Southampton,female,1,0,53.1
5,Third class,35.0,Southampton,male,0,0,8.05


## Use Eurybia for data validation

In [19]:
from eurybia import SmartDrift

In [20]:
sd = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)

In [21]:
%time sd.compile(full_validation=True)

INFO: Shap explainer type - <shap.explainers._tree.TreeExplainer object at 0x11d28cad0>
CPU times: user 3.6 s, sys: 690 ms, total: 4.29 s
Wall time: 848 ms


In [22]:
sd.generate_report(    
    output_file='report_titanic.html',    
    title_story="Data validation",
    title_description="""Titanic Data validation"""   
    )

## Launch WebApp Shapash from SmartDrift

After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. <br/>
For information on Shapash Webapp : (https://github.com/MAIF/shapash)

In [23]:
app = sd.xpl.run_app(title_story='Eurybia datadrift classifier')

INFO:root:Your Shapash application run on http://PMP01204:8050/
INFO:root:Use the method .kill() to down your app.


**Stop the WebApp after using it**

In [24]:
app.kill()