The workshop contains three different notebooks. Each one focuses on a different stage:
    
1. Dataset Generation. The first notebook focuses on generating a dataset for training the model. We will create a Robust Test Suite to check that the dataset generated meets certain conditions
2. Model Training (This Notebook). The second notebook focuses on training the model. We will create a Robust Test Suite to check that the trained model meets certain conditions.
3. Model Inference. In the last notebook, we use mercury.monitoring to monitor data drift and estimate the predicted performance of the model without having the labels

In [1]:
import pandas as pd
import os
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, f1_score, accuracy_score, precision_score

SEED = 23

pd.set_option('display.max_colwidth', None)

## Read Dataset

Let's read the dataset that we generated in the first notebook

In [2]:
!ls dataset/

all.csv     schema.json test.csv    train.csv


In [3]:
path_dataset = "./dataset/"

df_train = pd.read_csv(path_dataset + "train.csv")
df_test = pd.read_csv(path_dataset + "test.csv")

In [4]:
df_train.head()

Unnamed: 0,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,PAY_5,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default.payment.next.month
0,360000.0,1,1,2,29,-1,2,-1,0,0,...,8336.0,5878.0,12903.0,17.0,10859.0,380.0,187.0,12966.0,1038.0,1
1,160000.0,2,1,2,28,-1,-1,-1,-1,-1,...,47340.0,2473.0,4120.0,34604.0,37836.0,47340.0,2473.0,4120.0,6302.0,0
2,240000.0,2,1,2,39,0,-1,0,0,0,...,195242.0,187710.0,171828.0,238861.0,6678.0,7537.0,10555.0,5223.0,5829.0,0
3,50000.0,2,1,2,23,0,0,0,0,-1,...,-2897.0,48211.0,48154.0,2500.0,3002.0,1500.0,52000.0,1900.0,1800.0,0
4,230000.0,2,2,2,28,0,0,2,2,2,...,204067.0,200720.0,212403.0,18150.0,7200.0,7500.0,0.0,15079.0,8000.0,0


## Train Model

In [5]:
label = "default.payment.next.month"
features = [c for c in df_train.columns if c!=label]

In [16]:
X_train = df_train[features]
y_train = df_train[label]

X_test = df_test[features]
y_test = df_test[label]

#model = DecisionTreeClassifier(random_state=SEED)
model = DecisionTreeClassifier(
    max_depth=6, class_weight="balanced", min_samples_split=15, min_samples_leaf=15, random_state=SEED
)
model = model.fit(X_train, y_train)

## Evaluation

In [17]:
acc_test = accuracy_score(y_test, model.predict(X_test))
auc_test = roc_auc_score(y_test, model.predict_proba(X_test)[:,1])
f1_score_test = f1_score(y_test, model.predict(X_test))

print("accuracy: ", acc_test)
print("AUC: ", auc_test)
print("F1: ", f1_score_test)

accuracy:  0.6547891467799934
AUC:  0.6258748474154183
F1:  0.3025099075297226


## Robust Model Test Suite

As we did when creating the dataset, we will create a `TestSuite` using [mercury.robust](https://bbva.github.io/mercury-robust/). This time, we will focus on testing the trained model creating the next tests:

- [ModelSimplicityChecker](https://bbva.github.io/mercury-robust/reference/model_tests/#mercury.robust.model_tests.ModelSimplicityChecker): Looks if a trained model has a simple baseline which trained in the same dataset gives better or similar performance on a test dataset
- [DriftMetricResistanceTest](https://bbva.github.io/mercury-robust/reference/model_tests/#mercury.robust.model_tests.DriftMetricResistanceTest): Checks the robustness of a trained model to drift in the inputs of the data.
- [TreeCoverageTest](https://bbva.github.io/mercury-robust/reference/model_tests/#mercury.robust.model_tests.TreeCoverageTest): Checks whether a given test_dataset covers a minimum of all the branches of a tree

In [18]:
# Load Data Schema
from mercury.dataschema import DataSchema
schema = DataSchema.load(path_dataset + "schema.json")

In [19]:
from mercury.robust.model_tests import (
    ModelSimplicityChecker,
    DriftMetricResistanceTest,
    TreeCoverageTest
)
from mercury.robust.data_tests import CohortPerformanceTest
from mercury.robust import TestSuite

def create_model_test_suite(
    model, 
    X_train, 
    y_train,
    X_test,
    y_test,
    schema,
    add_tree_coverage_test=False
):
    
    model_tests = []
    
    # Model Simpclicity Checker
    model_simplicity_checker = ModelSimplicityChecker(
        model = model,
        X_train = X_train,
        y_train = y_train,
        X_test = X_test,
        y_test = y_test,
        threshold = 0.02,
        eval_fn = roc_auc_score,
        ignore_feats=label,
        dataset_schema=schema,
        baseline_model=LogisticRegression(solver='liblinear', class_weight='balanced')
    )
    model_tests.append(model_simplicity_checker)
    
    # Cohort Performance Test
    group = "SEX"
    def eval_precision(df):
        return precision_score(df[label], df["prediction"])

    # Calculate predictions, we will use this in one test
    df_test_pred = pd.DataFrame()
    df_test_pred[group] = X_test[group].values
    df_test_pred["prediction"] = model.predict(X_test)
    df_test_pred[label] = y_test
    cohort_perf_test = CohortPerformanceTest(
        name="precision_by_gender_check",
        base_dataset=df_test_pred, group_col="SEX", eval_fn = eval_precision, threshold = 0.05,
        threshold_is_percentage=False
    )
    model_tests.append(cohort_perf_test)
    
    # One DriftMetricResistanceTest for each variable
    for f in features:
        drift_args = None
        if ('BILL_AMT' in f) or ('PAY_AMT' in f):
            drift_args = {'cols': [f], 'force': df_train[f].quantile(q=0.25)}
        elif 'PAY_' in f:
            drift_args = {'cols': [f], 'force': 2}
        if drift_args is not None:
            model_tests.append(DriftMetricResistanceTest(
                model = model,
                X = X_test,
                Y = y_test,
                drift_type = 'shift_drift',
                drift_args = drift_args,
                tolerance = 0.05,
                eval=accuracy_score,
                name="drift resistance " + f
            ))
        
    # Tree Coverage Test(only if specified)
    if add_tree_coverage_test:
        tree_coverage_test = TreeCoverageTest(model, X_test, threshold_coverage=.75)
        model_tests.append(tree_coverage_test)
    
    # Create Suite
    test_suite = TestSuite(
        tests=model_tests
    )
    
    return test_suite

In [20]:
test_suite = create_model_test_suite(
    model, 
    X_train, 
    y_train,
    X_test,
    y_test,
    schema,
    add_tree_coverage_test=True
)
test_results = test_suite.run()

In [21]:
test_suite.get_results_as_df()

Unnamed: 0,name,state,error,info
0,ModelSimplicityChecker,TestState.SUCCESS,,"{'metric_model': 0.6007770883539053, 'metric_baseline_model': 0.5715376853312442}"
1,precision_by_gender_check,TestState.SUCCESS,,"{'metric_by_group': [0.2283653846153846, 0.20241691842900303]}"
2,drift resistance PAY_0,TestState.FAIL,Test failed. The metric of drifted dataset has changed above tolerance = 0.050 (diff = 0.153),"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.5017979731938542, 'metric_diff': 0.15299117358613923}"
3,drift resistance PAY_2,TestState.SUCCESS,,"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.651193200392285, 'metric_diff': 0.0035959463877084374}"
4,drift resistance PAY_3,TestState.SUCCESS,,"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.6547891467799934, 'metric_diff': 0.0}"
5,drift resistance PAY_4,TestState.FAIL,Test failed. The metric of drifted dataset has changed above tolerance = 0.050 (diff = 0.225),"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.42955214122262175, 'metric_diff': 0.2252370055573717}"
6,drift resistance PAY_5,TestState.FAIL,Test failed. The metric of drifted dataset has changed above tolerance = 0.050 (diff = 0.087),"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.5675057208237986, 'metric_diff': 0.08728342595619487}"
7,drift resistance PAY_6,TestState.SUCCESS,,"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.6547891467799934, 'metric_diff': 0.0}"
8,drift resistance BILL_AMT1,TestState.FAIL,Test failed. The metric of drifted dataset has changed above tolerance = 0.050 (diff = 0.067),"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.7221314154952599, 'metric_diff': 0.0673422687152665}"
9,drift resistance BILL_AMT2,TestState.SUCCESS,,"{'metric_no_drifted': 0.6547891467799934, 'metric_drifted': 0.6547891467799934, 'metric_diff': 0.0}"


## Save Model

In [24]:
path_dataset = "./models/"

if not os.path.exists(path_dataset):
    os.makedirs(path_dataset)
    
from joblib import dump
dump(model, path_dataset + 'model.joblib') 

['./models/model.joblib']

In [25]:
import pickle

with open(path_dataset + "features.pkl", "wb") as fp:
    pickle.dump(features, fp)