# Monitoring ML Training Pipeline: Drifts

**Quick recap:**
- Goal: 
    - Building a classication model for loan eligibility that predicts whether a loan is to be given or refused
     - Introduce autonomous monitoring checkpoints orchestrated with Airflow DAGS
- Download raw data: `raw/12196ecaa65e4831987aee4bfced5f60_2015-01-01_2015-05-31.csv`
- Preprocessed the data into:
    - training dataset: `preprocessed/12196ecaa65e4831987aee4bfced5f60.csv`
    - test dataset: `preprocessed/12196ecaa65e4831987aee4bfced5f60.csv`

- Trained and deployed the model
    - JobID: 12196ecaa65e4831987aee4bfced5f60
    - Missing values: 12196ecaa65e4831987aee4bfced5f60_missing_values_model.pkl
    - Purpose to Integer: 12196ecaa65e4831987aee4bfced5f60_purpose_to_int_model.json
    - Prediction model: 12196ecaa65e4831987aee4bfced5f60_rf.pkl

**Next steps:**
- data quality check
- data drift check
- model drift check
- comparative model evaluation

<div>
<img src="../images/monitored_pipeline.png" width="850"/>
</div>

**Tools**
- deepchecks

## Motivation
In real life, data science teams will spend more of their time on developing new models and little to no time to daily check on their models currently in production. Hence, there is need of an automated monitoring system that can alert the team whenever there is any significant change that is impacting prediction performance on their models

### What to monitor in production?

**Data quality issues**

Depending on the process the data goes through until being fed to a model. Data could be coming multiple sources with different formats or formats changing over time, renamed fields, new categories, etc. Any of these changes can lead to a significant impact on the model performance

**Data drift & Concept drift**

In real world, data is always changing. Changes can be due to change in business behavior such as a company operating in a new region, the company introducing a new product, new competitions in the market, social trends, global events affecting each industry ... These changes may affect the distribution of the data as we knew it at the time of the previous training session and hence the data we trained our model on becomes less and less relevant to the business problem over time.

*Data drift* happens when the input feature probability distribution P(X) changes over time. This can be due to either some change in the data structure or change in real world.

*Concept drifts* happens when the probability distribution P(Y|X) of the output changes over time. This can be caused by changes in data structure or a shift in real world data, which affects prediction quality indefinitely. A typical example is Digital Marketing where the common metric CTR can change drastically due to new competitions. Concept drift can be *gradual*, *sudden*, *in blips* or *recurring*.

**Model drift**

The model drift happens when the predictive power of last trained model on any new dataset deteriotes over time. This is usually a consequence of data and concept drifts. In other situation, it could also be due to a model that wasn't stabilized against bias or overfitting. In our current situation, we will assume that the model was stabilized enough during training.

**Statistical metrics**
- [Kolomogorov-Smirnoff (K-S) test](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)
- [Cramer's V](https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V)
- [Predictive Power Score](https://docs.deepchecks.com/en/stable/checks_gallery/tabular/train_test_validation/plot_feature_label_correlation_change.html#)
- Outlier detection

**Resources**
- [How to monitor ml models in production](https://deepchecks.com/how-to-monitor-ml-models-in-production/)

In [None]:
import warnings
warnings.filterwarnings('ignore')

from datetime import datetime
import traceback
import pandas as pd
import numpy as np
import os
import json
import pickle
import datetime
from sklearn.ensemble import BaseEnsemble

from deepchecks.tabular import Dataset
from deepchecks.tabular import Suite
from deepchecks.tabular.checks import WholeDatasetDrift, DataDuplicates, NewLabelTrainTest, TrainTestFeatureDrift, TrainTestLabelDrift
from deepchecks.tabular.checks import FeatureLabelCorrelation, FeatureLabelCorrelationChange, ConflictingLabels, OutlierSampleDetection 
from deepchecks.tabular.checks import WeakSegmentsPerformance, RocReport, ConfusionMatrixReport, TrainTestPredictionDrift, CalibrationScore, BoostingOverfit

import sys
from importlib import reload
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'dags', 'src'))

import helpers
import config
import preprocess

reload(helpers)
reload(config)
reload(preprocess)

In [None]:
#### drifts.py methods ####
def check_data_quality(df:pd.DataFrame, predictors:list, target:str, job_id:str):
    """
    checks for data quality.
    A report will be saved in the results directory.
    :param df: dataframe to check
    :param predictors: predictors to check for drifts
    :param target: target variable to check for drifts
    :param job_id: job ID
    :return: boolean
    """
    features = [col for col in predictors if col in df.columns]
    cat_features = [col for col in config.CAT_VARS if col in df.columns]
    dataset = Dataset(df, label=target, features=features, cat_features=cat_features, datetime_name=config.DATETIME_VARS[0])
    retrain_suite = Suite("data quality",
        DataDuplicates().add_condition_ratio_less_or_equal(0.3), #Checks for duplicate samples in the dataset
        ConflictingLabels().add_condition_ratio_of_conflicting_labels_less_or_equal(0), #Find samples which have the exact same features' values but different labels
        FeatureLabelCorrelation().add_condition_feature_pps_less_than(0.9), #Return the PPS (Predictive Power Score) of all features in relation to the label
        OutlierSampleDetection(outlier_score_threshold=0.7).add_condition_outlier_ratio_less_or_equal(0.1), #Detects outliers in a dataset using the LoOP algorithm
    )
    r = retrain_suite.run(dataset)
    try:
        r.save_as_html(f"{config.PATH_DIR_RESULTS}/{job_id}_data_quality_report.html")
        print("[INFO] Data quality report saved as {}".format(f"{config.PATH_DIR_RESULTS}/{job_id}_data_quality_report.html"))
    except Exception as e:
        print(f"[WARNING][DRIFTS.SKIP_TRAIN] {traceback.format_exc()}")
    return {"report": r, "retrain": r.passed()}

def check_data_drift(ref_df:pd.DataFrame, cur_df:pd.DataFrame, predictors:list, target:str, job_id:str):
    """
    Check for data drifts between two datasets and decide whether to retrain the model. 
    A report will be saved in the results directory.
    :param ref_df: Reference dataset
    :param cur_df: Current dataset
    :param predictors: Predictors to check for drifts
    :param target: Target variable to check for drifts
    :param job_id: Job ID
    :return: boolean
    """
    ref_features = [col for col in predictors if col in ref_df.columns]
    cur_features = [col for col in predictors if col in cur_df.columns]
    ref_cat_features = [col for col in config.CAT_VARS if col in ref_df.columns]
    cur_cat_features = [col for col in config.CAT_VARS if col in cur_df.columns]
    ref_dataset = Dataset(ref_df, label=target, features=ref_features, cat_features=ref_cat_features, datetime_name=config.DATETIME_VARS[0])
    cur_dataset = Dataset(cur_df, label=target, features=cur_features, cat_features=cur_cat_features, datetime_name=config.DATETIME_VARS[0])
    
    suite = Suite("data drift",
        NewLabelTrainTest(),
        WholeDatasetDrift().add_condition_overall_drift_value_less_than(0.01), #0.2
        FeatureLabelCorrelationChange().add_condition_feature_pps_difference_less_than(0.05), #0.2
        TrainTestFeatureDrift().add_condition_drift_score_less_than(0.01), #0.1
        TrainTestLabelDrift().add_condition_drift_score_less_than(0.01) #0.1
    )
    r = suite.run(ref_dataset, cur_dataset)
    retrain = (len(r.get_not_ran_checks())>0) or (len(r.get_not_passed_checks())>0)
    
    try:
        r.save_as_html(f"{config.PATH_DIR_RESULTS}/{job_id}_data_drift_report.html")
        print("[INFO] Data drift report saved as {}".format(f"{config.PATH_DIR_RESULTS}/{job_id}_data_drift_report.html"))
    except Exception as e:
        print(f"[WARNING][DRIFTS.check_DATA_DRIFT] {traceback.format_exc()}")
    return {"report": r, "retrain": retrain}

def check_model_drift(ref_df:pd.DataFrame, cur_df:pd.DataFrame, model:BaseEnsemble, predictors:list, target:str, job_id:str):
    """
    Using the same pre-trained model, compare drifts in predictions between two datasets and decides whether to retrain the model. A report will be saved in the results directory.
    :param ref_df: Reference dataset
    :param cur_df: Current dataset
    :param model: Pre-trained model. Only scikit-learn and xgboost models are supported.
    :param predictors: Predictors to check for drifts
    :param target: Target variable to check for drifts
    :param job_id: Job ID
    :return: boolean
    """
    ref_features = [col for col in predictors if col in ref_df.columns]
    cur_features = [col for col in predictors if col in cur_df.columns]
    ref_cat_features = [col for col in config.CAT_VARS if col in ref_df.columns]
    cur_cat_features = [col for col in config.CAT_VARS if col in cur_df.columns]
    ref_dataset = Dataset(ref_df, label=target, features=ref_features, cat_features=ref_cat_features, datetime_name=config.DATETIME_VARS[0])
    cur_dataset = Dataset(cur_df, label=target, features=cur_features, cat_features=cur_cat_features, datetime_name=config.DATETIME_VARS[0])
    
    suite = Suite("model drift",
        #For each class plots the ROC curve, calculate AUC score and displays the optimal threshold cutoff point.
        RocReport().add_condition_auc_greater_than(0.7), 
        #Calculate prediction drift between train dataset and test dataset, Cramer's V for categorical output and Earth Movers Distance for numerical output.
        TrainTestPredictionDrift().add_condition_drift_score_less_than(max_allowed_categorical_score=0.1) 
        )
    r = suite.run(ref_dataset, cur_dataset, model)
    retrain = (len(r.get_not_ran_checks())>0) or (len(r.get_not_passed_checks())>0)
    try:
        r.save_as_html(f"{config.PATH_DIR_RESULTS}/{job_id}_model_drift_report.html")
        print("[INFO] Model drift report saved as {}".format(f"{config.PATH_DIR_RESULTS}/{job_id}_model_drift_report.html"))
    except Exception as e:
        print(f"[WARNING][DRIFTS.check_MODEL_DRIFT] {traceback.format_exc()}")
    
    return {"report": r, "retrain": retrain}

In [None]:
job_id1 = "12196ecaa65e4831987aee4bfced5f60"
job_id2 = "a6f0952cd9b54e319ac4fbcef223556c" 
job_id3 = "aa4c3eaadb02409281b589829e3c9370"
filename1 = f"../dags/data/raw/{job_id1}_2015-01-01_2015-05-31.csv"
filename2 = f"../dags/data/raw/{job_id2}_2015-01-01_2015-05-31.csv"
filename3 = f"../dags/data/raw/{job_id3}_2015-06-01_2015-12-31.csv"

df1 = pd.read_csv(filename1)
df2 = pd.read_csv(filename2)
df3 = pd.read_csv(filename3)

tdf1 = pd.read_csv(f"../dags/data/preprocessed/{job_id1}_training.csv")
vdf1 = pd.read_csv(f"../dags/data/preprocessed/{job_id1}_inference.csv")
vdf2 = preprocess.preprocess_data(df=df2, mode="inference", job_id=job_id2, rescale=False, ref_job_id=job_id1)
vdf3 = preprocess.preprocess_data(df=df3, mode="inference", job_id=job_id3, rescale=False, ref_job_id=job_id1)

deploy_report = json.load(open(f"../dags/models/deploy_report.json", "r"))
pred_model = pickle.load(open(f"../dags/models/{deploy_report['prediction_model']}", "rb"))

In [26]:
dq_chk1 = check_data_quality(df1, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id1)
dq_chk2 = check_data_quality(df2, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id2)
dq_chk3 = check_data_quality(df3, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id3)

[INFO] Data quality report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/8a0281dfd8be44aba405a02c2adc8622_data_quality_report.html


[INFO] Data quality report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/a763ea0e608c4fc2a090f365669da986_data_quality_report.html


[INFO] Data quality report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/c4930da6c81a4fcdaf708453c1bad93d_data_quality_report.html


In [17]:
print(f"retrain: {dq_chk1['retrain']}")
dq_chk1['report']

retrain: True


Accordion(children=(VBox(children=(HTML(value='\n            <h1 id="summary_37VT06Q9Q44FORWLP4UC87KO4">data q…

In [91]:
# compare raw data
dd_1_2 = check_data_drift(ref_df=df1, cur_df=df2, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id2)
dd_1_3 = check_data_drift(ref_df=df1, cur_df=df3, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id3)

# compare preprocessed datasets
dd_1_2b = check_data_drift(ref_df=vdf1, cur_df=vdf2, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id2+"_b")
dd_1_3b = check_data_drift(ref_df=vdf1, cur_df=vdf3, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id3+"_b")

[INFO] Data drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/a763ea0e608c4fc2a090f365669da986_data_drift_report.html


[INFO] Data drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/c4930da6c81a4fcdaf708453c1bad93d_data_drift_report.html


[INFO] Data drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/a763ea0e608c4fc2a090f365669da986_b_data_drift_report.html


[INFO] Data drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/c4930da6c81a4fcdaf708453c1bad93d_b_data_drift_report.html


In [96]:
print(f"retrain: {dd_1_2b['retrain']}")
dd_1_2b['report']

retrain: False


Accordion(children=(VBox(children=(HTML(value='\n            <h1 id="summary_O13RA0ZGUP42LCVBGL45SP66U">data d…

In [97]:
print(f"retrain: {dd_1_3b['retrain']}")
dd_1_3b['report']

retrain: True


Accordion(children=(VBox(children=(HTML(value='\n            <h1 id="summary_XIB7L835WP1SQU5ML1DK8C9MD">data d…

In [115]:
md_1_2 = check_model_drift(ref_df=vdf1, cur_df=vdf2, model=pred_model, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id2)
md_1_3 = check_model_drift(ref_df=vdf1, cur_df=vdf3, model=pred_model, predictors=config.PREDICTORS, target=config.TARGET, job_id=job_id3)

[INFO] Model drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/a763ea0e608c4fc2a090f365669da986_model_drift_report.html


[INFO] Model drift report saved as /Users/girabawe/Projects/ProjectPro/project_01_model-testing/dags/results/c4930da6c81a4fcdaf708453c1bad93d_model_drift_report.html


In [117]:
print(f"retrain: {md_1_3['retrain']}")
md_1_3['report']

retrain: False


Accordion(children=(VBox(children=(HTML(value='\n            <h1 id="summary_VGUMR7ZGMGFDX6AJVR6T3XZJA">model …