# Use Shapash Webapp with Eurybia


<b>With this tutorial you:</b><br />
Understand how use Eurybia and Shapash web app to understand datadrift classifier<br />

Contents:
- Build a model to deploy
- Do data validation between learning dataset and production dataset
- Generate Report 
- Run Webapp


Data from Kaggle [Titanic](https://www.kaggle.com/c/titanic)<br />

In [1]:
import pandas as pd
from category_encoders import OrdinalEncoder
import catboost
from eurybia.core.smartdrift import SmartDrift
from sklearn.model_selection import train_test_split

## Building Supervized Model


In [2]:
from eurybia.data.data_loader import data_loading

In [3]:
titan_df = data_loading('titanic')

In [4]:
features = ['Pclass', 'Age', 'Embarked', 'Sex', 'SibSp', 'Parch', 'Fare']
features_to_encode = ['Pclass', 'Embarked', 'Sex']

In [5]:
encoder = OrdinalEncoder(cols=features_to_encode)
encoder.fit(titan_df[features]) 


is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead



OrdinalEncoder(cols=['Pclass', 'Embarked', 'Sex'],
               mapping=[{'col': 'Pclass', 'data_type': dtype('O'),
                         'mapping': Third class     1
First class     2
Second class    3
NaN            -2
dtype: int64},
                        {'col': 'Embarked', 'data_type': dtype('O'),
                         'mapping': Southampton    1
Cherbourg      2
Queenstown     3
NaN           -2
dtype: int64},
                        {'col': 'Sex', 'data_type': dtype('O'),
                         'mapping': male      1
female    2
NaN      -2
dtype: int64}])

In [6]:
titan_df_encoded = encoder.transform(titan_df[features])

In [7]:
X_train, X_test, y_train, y_test = train_test_split(
    titan_df_encoded,
    titan_df['Survived'].to_frame(),
    test_size=0.2,
    random_state=11
)

In [8]:
i=0
indice_cat  = []
for feature in titan_df_encoded:
    if feature in features_to_encode:
        indice_cat.append(i)
    i=i+1

In [9]:
model = catboost.CatBoostClassifier(loss_function= "Logloss", eval_metric="Logloss",
        learning_rate=0.143852,
        iterations=500,
        l2_leaf_reg=15,
        max_depth = 4)

In [10]:
train_pool_cat = catboost.Pool(data=X_train, label= y_train, cat_features = indice_cat)
test_pool_cat = catboost.Pool(data=X_test, label=y_test, cat_features = indice_cat) 

In [11]:
model.fit(train_pool_cat, eval_set=test_pool_cat, silent=True)
y_pred = model.predict(X_test)

## Creating a fake dataset as a production dataset



In [12]:
import random

In [13]:
df_production = titan_df.copy()

In [14]:
df_production['Age'] = df_production['Age'].apply(lambda x: random.randrange(10, 76)).astype(float)
df_production['Fare'] = df_production['Fare'].apply(lambda x: random.randrange(1, 100)).astype(float)
list_sex= ["male", "female"]
df_production['Sex'] = df_production['Sex'].apply(lambda x: random.choice(list_sex))

In [15]:
df_baseline = titan_df[features]
df_current = df_production[features]

In [16]:
df_current.head()

Unnamed: 0_level_0,Pclass,Age,Embarked,Sex,SibSp,Parch,Fare
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Third class,56.0,Southampton,female,1,0,38.0
2,First class,51.0,Cherbourg,female,1,0,84.0
3,Third class,24.0,Southampton,female,0,0,45.0
4,First class,41.0,Southampton,male,1,0,38.0
5,Third class,32.0,Southampton,female,0,0,1.0


In [17]:
df_baseline.head()

Unnamed: 0_level_0,Pclass,Age,Embarked,Sex,SibSp,Parch,Fare
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Third class,22.0,Southampton,male,1,0,7.25
2,First class,38.0,Cherbourg,female,1,0,71.28
3,Third class,26.0,Southampton,female,0,0,7.92
4,First class,35.0,Southampton,female,1,0,53.1
5,Third class,35.0,Southampton,male,0,0,8.05


## Use Eurybia for data validation

In [18]:
from eurybia import SmartDrift

In [19]:
SD = SmartDrift(df_current=df_current, df_baseline=df_baseline, deployed_model=model, encoding=encoder)

In [20]:
%time SD.compile(full_validation=True)

Backend: Shap TreeExplainer
CPU times: user 5.47 s, sys: 490 ms, total: 5.96 s
Wall time: 3.16 s


In [21]:
SD.generate_report(    
    output_file='../output/report_titanic.html',    
    title_story="Data validation",
    title_description="""Titanic Data validation"""   
    )



Report saved to ./../output/report_titanic.html. To upload and share your report, create a free Datapane account by running `!datapane signup`.

## Launch WebApp Shapash from SmartDrift

After compile step, you can launch a WebApp Shapash directly from your object SmartDrift. It allows you to access several dynamic plots that will help you to understand where drift has been detected in your data. <br/>
For information on Shapash Webapp : (https://github.com/MAIF/shapash)

In [22]:
app = SD.xpl.run_app(title_story='Eurybia datadrift classifier')

Dash is running on http://0.0.0.0:8050/





INFO:root:Your Shapash application run on http://maitrejinx-Latitude-E5570:8050/
INFO:root:Use the method .kill() to down your app.
INFO:shapash.webapp.smart_app:Dash is running on http://0.0.0.0:8050/



 * Serving Flask app "shapash.webapp.smart_app" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


**Stop the WebApp after using it**

In [24]:
app.kill()