# ML Monitoring - Feature importance weighted drift detection and automated-retraining

## Overview

Every model over time is impacted by model performance decay due to data drift and concept drift. One of many solution is to perform drift detection and set up automated retraining of the model. Drift in every feature does't have the same impact on model performance. In this project we combine feature importance of a particular feature with its drift score obtained through statistical test to determine wether to retrain model or not. Thus model will be retrained only when there is drift on features with higher importance and also when feature with lower importance faces higher data drift.

## Dataset:

* Contains warehouse demand data from 2017-01-01 to 2020-11-15
* Working with preprocessed data
* No data leakage as missing values were filled without requirement of any transformation

## Assumptions

* Initial model deployment on march 2019
* True labels/ actual demand available over the weekend
* Model monitored on weekly basis

## Libraries

In [3]:
import pandas as pd
import numpy as np
from datetime import date,datetime,timedelta
from sklearn.model_selection import train_test_split
from rdt import HyperTransformer
import xgboost as xgb
import optuna
import traceback
import shap
import pickle
import logging
from importlib import reload
import logging
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error, mean_absolute_error
from evidently import ColumnMapping
from evidently.analyzers.stattests import StatTest
from evidently.dashboard import Dashboard
from evidently.dashboard.tabs import DataDriftTab, CatTargetDriftTab, NumTargetDriftTab, RegressionPerformanceTab
from evidently.options import DataDriftOptions
from evidently.model_profile import Profile
from evidently.model_profile.sections import DataDriftProfileSection, NumTargetDriftProfileSection
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.express as px
import os
import json
import mlflow
from functools import wraps
from mlflow.tracking import MlflowClient
%matplotlib inline

In [4]:
# set up logging
reload(logging)
logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.INFO, datefmt='%I:%M:%S')

In [5]:
# mlflow runner
def mlflow_runner(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        # log into MLflow
        client = MlflowClient()
        experiment_name = kwargs['experiment_name']
        try:
            mlflow.set_experiment(experiment_name)
        except Exception:
            logging.error('Experiment does not exists')
            logging.error('Stack trace:{}'.format(traceback.format_exc()))
            experiment_id = mlflow.create_experiment(experiment_name)
            mlflow.set_experiment(experiment_name)
        finally:
            with mlflow.start_run() as run:
                rv = func(*args, **kwargs)
            return rv

    return wrapper

## Data

### Load data

In [6]:
df = pd.read_csv('cpp_demand_forecasting_clean_data_v2.csv', index_col=0)

In [7]:
logging.info(f'dataset shape : {df.shape}')

03:05:15 INFO:dataset shape : (13315, 33)


In [8]:
# set random state
rng = np.random.RandomState(0)

In [9]:
df.head(2)

Unnamed: 0_level_0,date,warehouse_ID,Latitude,Longitude,Product_Type,year,month,is_weekend,is_warehouse_closed,daily_dispatch_count,...,statewise_land_area_per_sqmile,statewise_population_per_sqmile,statewise_geographic_region,geographic_region_division,statewise_median_age,statewise_median_household_income,total_count,yearly_count,monthly_count,weekly_count
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0x2710,2017-01-01,WH_0x3e9,41.681471,-72.794746,Type_A,2017,1,Yes,No,5.5,...,4842,744.722016,new_england,north_east,41.2,78833,0,0,0,0
0x33e6,2017-01-01,WH_0x3ea,38.749077,-105.18306,Type_A,2017,1,Yes,No,6.1,...,103642,56.078318,mountain,west,37.3,41053,0,0,0,0


In [10]:
df.date = pd.to_datetime(df.date)

<b> Data available during initial model deployment is from 2017-01-01 to 2019-03-01

In [11]:
intial_df = df[df.date < datetime(2019,3,1)]

In [12]:
# Split the dataset
train_df, test_df = train_test_split(intial_df,
                                     test_size=0.3,
                                     shuffle=False,
                                     random_state=rng)
eval_df, serve_df = train_test_split(test_df,
                                     test_size=0.5,
                                     shuffle=False,
                                     random_state=rng)

In [13]:
logging.info(f'train_df shape : {train_df.shape}')
logging.info(f'eval_df shape : {eval_df.shape}')
logging.info(f'serve_df shape : {serve_df.shape}')

03:05:22 INFO:train_df shape : (2794, 33)
03:05:22 INFO:eval_df shape : (599, 33)
03:05:22 INFO:serve_df shape : (599, 33)


### Data preprocessing

In [14]:
def split_features_target(df):
    target =  'daily_dispatch_count'
    drop_feats = ['weekly_dispatch_count']
    y = df[target]
    X = df.drop(columns = drop_feats+[target])
    return X,y

In [35]:
X,y = split_features_target(train_df)
X_eval,y_eval = split_features_target(eval_df)
X_serve,y_serve = split_features_target(serve_df)

### Data transformation

In [36]:
ht = HyperTransformer()

In [37]:
ht.detect_initial_config(data=X)

Detecting a new config from the data ... SUCCESS
Setting the new config ... SUCCESS
Config:
{
    "sdtypes": {
        "date": "datetime",
        "warehouse_ID": "categorical",
        "Latitude": "numerical",
        "Longitude": "numerical",
        "Product_Type": "categorical",
        "year": "numerical",
        "month": "numerical",
        "is_weekend": "categorical",
        "is_warehouse_closed": "categorical",
        "week": "numerical",
        "state": "categorical",
        "county": "categorical",
        "state_cases": "numerical",
        "state_deaths": "numerical",
        "county_cases": "numerical",
        "county_deaths": "numerical",
        "day_of_week": "numerical",
        "days_since_warehouse_started": "numerical",
        "state_abbr": "categorical",
        "is_holiday": "numerical",
        "statewise_population": "numerical",
        "statewise_land_area_per_sqmile": "numerical",
        "statewise_population_per_sqmile": "numerical",
        "statew

In [38]:
ht.fit(X)
X_transformed = ht.transform(X)
X_eval_transformed = ht.transform(X_eval)
X_serve_transformed = ht.transform(X_serve)

The data contains 116 new categories that were not seen in the original data (examples: {'WH_0x3ed'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.
The data contains 116 new categories that were not seen in the original data (examples: {'florida'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.
The data contains 116 new categories that were not seen in the original data (examples: {'miami-dade'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.
The data contains 116 new categories that were not seen in the original data (examples: {'FL'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.
The data contains 116 new categories that were not seen in the original data (examples: {'mid_atlantic'}). Assigning 

## Modelling

In [39]:
### Hypertuning

def objective(trial,data=X_transformed, target=y):
    
    param = {
        "n_estimators" : trial.suggest_int('n_estimators', 0, 500),
        'max_depth':trial.suggest_int('max_depth', 2, 20),
        'lambda': trial.suggest_loguniform('lambda', 1e-3, 10.0),
        'alpha': trial.suggest_loguniform('alpha', 1e-3, 10.0),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 300),
        'learning_rate':trial.suggest_loguniform('learning_rate',0.005,0.5),
        'colsample_bytree':trial.suggest_uniform('colsample_bytree',0.3,0.9),
        'subsample':trial.suggest_uniform('subsample',0.4,0.9),
        'random_state': trial.suggest_categorical('random_state',[rng]),
        'nthread' : trial.suggest_categorical('nthread',[-1])
    }
    model = xgb.XGBRegressor(**param)  
    
    model.fit(X_transformed,y,eval_set=[(X_eval_transformed,y_eval)],early_stopping_rounds=20,verbose=False)
    
    preds = model.predict(X_eval_transformed)
    
    r2 = r2_score(y_eval, preds)
    
    return r2
optuna.logging.set_verbosity(0)
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

In [57]:
@mlflow_runner
def fit_model(X,y, X_test,y_test, params=None,mlflow_log = False, experiment_name=None):
    if params:
        reg = xgb.XGBRegressor(**params)
    else:
        reg = xgb.XGBRegressor()
    reg.fit(X,y,eval_set=[(X_test,y_test)],early_stopping_rounds=10,verbose=False)
    if mlflow_log:
        mlflow.log_dict(reg.get_params(),'xgboost_regressor_params')
        
    return reg

In [58]:
reg = fit_model(X_transformed,y, X_eval_transformed,y_eval, mlflow_log=True,experiment_name='warehouse_demand_forecasting')

`early_stopping_rounds` in `fit` method is deprecated for better compatibility with scikit-learn, use `early_stopping_rounds` in constructor or`set_params` instead.


#### Model evaluation

In [59]:
def rewrite_name(df_type, error_type):
    return f"{df_type}{'_'}{error_type}"

def get_scores(y_actual, y_predicted,df_type='train'):
    #y_predicted = np.expm1(y_predicted)
    #y_actual = np.expm1(y_actual)
    rmse = mean_squared_error(y_actual, y_predicted, squared=False)
    mse = mean_squared_error(y_actual, y_predicted, squared=True)
    mae = mean_absolute_error(y_actual, y_predicted)
    mape = mean_absolute_percentage_error(y_actual, y_predicted)
    return {rewrite_name(df_type, 'rmse'):rmse, rewrite_name(df_type, 'mse'):mse, rewrite_name(df_type, 'mae'):mae, rewrite_name(df_type, 'mape'):mape}

def dataset_results(X,y, model, df_type):

    y_pred = model.predict(X)
    
    return get_scores(y,y_pred,df_type)
    

#### Train score

In [60]:
dataset_results(X_transformed,y, reg, 'train')

{'train_rmse': 0.8588220704988938,
 'train_mse': 0.7375753487760068,
 'train_mae': 0.6316703897078888,
 'train_mape': 0.0849176099824632}

#### Eval score

In [61]:
dataset_results(X_eval_transformed,y_eval, reg, 'eval')

{'eval_rmse': 0.9780817870850148,
 'eval_mse': 0.9566439822274162,
 'eval_mae': 0.7291432013694751,
 'eval_mape': 0.11933018414067095}

#### Serve score

In [62]:
dataset_results(X_serve_transformed,y_serve, reg, 'serve')

{'serve_rmse': 1.013378544545335,
 'serve_mse': 1.0269360745448215,
 'serve_mae': 0.7789813618827144,
 'serve_mape': 0.16712777485409916}

#### Final model trained using data from 2017-01-01 to 2019-03-01

In [101]:
final_X, final_y = split_features_target(intial_df)
ht.fit(final_X)
final_X_transform = ht.transform(final_X)
reg = fit_model(final_X_transform,final_y, X_eval_transformed,y_eval, mlflow_log=True,experiment_name='warehouse_demand_forecasting')


`early_stopping_rounds` in `fit` method is deprecated for better compatibility with scikit-learn, use `early_stopping_rounds` in constructor or`set_params` instead.



### Feature attribution weighted drift detection

In [102]:
def get_feature_importance(model, X):
    # DF, based on which importance is checked

    # Explain model predictions using shap library:
    explainer = shap.TreeExplainer(model)
    shap_scores = explainer.shap_values(X)

    # mean of absolute shap values for every feature
    mean_abs_shap_values = pd.Series(
        np.abs(shap_scores).mean(axis=0),
        index=X.columns).sort_values(ascending=False)
    return mean_abs_shap_values

In [103]:
def get_relative_feature_importance_to_max(feature_importance):
    
    # maximum feature importance
    max_feature_importance = feature_importance.max()

    # relative_feature_importance_to_max is relative feature importance w.r.t max_feature_importance
    relative_feature_importance_to_max = feature_importance / max_feature_importance
    return relative_feature_importance_to_max.to_dict()

In [104]:
numerical_features = [
    'statewise_population_per_sqmile', 'state_cases', 'state_deaths'
]
numerical_features_value = [
    feature + '.value' for feature in numerical_features
]
categorical_features = [
    'warehouse_ID', 'Product_Type', 'is_weekend', 'is_warehouse_closed',
    'state', 'day_of_week', 'is_holiday', 'county'
]
categorical_features_value = [
    feature + '.value' for feature in categorical_features
]
column_mapping = ColumnMapping(id='ID',
                               datetime='date',
                               numerical_features=numerical_features_value,
                               categorical_features=categorical_features_value,
                               task='regression',
                               target='daily_dispatch_count',
                               prediction='y_pred')

In [105]:
def get_drift_profile(reference, production, profile, mlflow_log=False):

    drift_profile = Profile(sections=[profile()])
    drift_profile.calculate(reference,
                                 production,
                                 column_mapping=column_mapping)
    report = drift_profile.json()
    drift_profile_report = json.loads(report)
    
    if mlflow_log:
        if profile == DataDriftProfileSection:
            name = 'data_drift_profile'
        elif profile == NumTargetDriftProfileSection:
            name = 'num_target_drift_profile'
        else:
            pass
        mlflow.log_dict(drift_profile_report, f"{name}.json")
    
    
    return drift_profile_report

In [106]:
def get_data_drift(drift_profile):

    drifts = []
    for feature in column_mapping.numerical_features + column_mapping.categorical_features:
        drifts.append(
            (feature, drift_profile['data_drift']['data']['metrics'][feature]['drift_score'], 
             drift_profile['data_drift']['data']['metrics'][feature]['stattest_name'], 
             drift_profile['data_drift']['data']['metrics'][feature]['drift_detected']))
    return  pd.DataFrame(
        drifts,
        columns=['feature', 'drift_score', 'stattest_name', 'drift_detected'])

In [107]:
def get_weighted_data_drift_score(drift, feature_importance, relative_feature_importance_to_max_dict, threshold, stattest_type):
    
    #feature_importance
    drift['feature_importance'] = drift['feature'].map(feature_importance)
    
    # map relative feature importance
    drift['relative_feature_importance'] = drift['feature'].map(relative_feature_importance_to_max_dict)
    if stattest_type == 'p_value':
        # relative feature importance weighted drift score
        inverse_feature_importance_wrt_max_feature_importance = drift.relative_feature_importance.max() / drift.relative_feature_importance
        drift['feature_importance_weighted_drift_score'] = (drift.drift_score * inverse_feature_importance_wrt_max_feature_importance).replace(np.inf, 1)

        # drift detection based on the weighted drift score based on threshold
        drift['feature_importance_weighted_drift_detected'] = drift[
            'feature_importance_weighted_drift_score'] < threshold
    else:
        # relative feature importance weighted drift score
        drift['feature_importance_weighted_drift_score'] = drift[
            'relative_feature_importance'] * drift['drift_score']

        # drift detection based on the weighted drift score based on threshold
        drift['feature_importance_weighted_drift_detected'] = drift[
            'feature_importance_weighted_drift_score'] > threshold

    return drift

In [108]:
def is_weighted_data_drift_detected(data_drift_profile, production,  model):
    drift_detected = False
    
    feature_importance = get_feature_importance(model, production)
    # relative feature importance based on mean shap values
    relative_feature_importance_to_max_dict = get_relative_feature_importance_to_max(feature_importance)
    drift_scores = get_data_drift(data_drift_profile)
    stattest_type = drift_scores['stattest_name'].iloc[0].split()[1]
    if stattest_type == 'p_value':
        threshold = 0.05
    else:
        threshold = 0.1
    weighted_drift_scores = get_weighted_data_drift_score(drift_scores, feature_importance, relative_feature_importance_to_max_dict,
                                                     threshold, stattest_type)

    if weighted_drift_scores['feature_importance_weighted_drift_detected'].sum(
    ) > 0:
        drift_detected = True
    else:
        drift_detected = False
    return drift_detected, weighted_drift_scores

In [109]:
def overlay_distribution(data1, data2, feature,  opacity=0.5, mlflow_log = False):
    normalization_type = 'probability density'
    fig = go.Figure()
    fig.add_trace(go.Histogram(x=data1, name="training",histnorm=normalization_type))
    fig.add_trace(go.Histogram(x=data2, name="production",histnorm=normalization_type))
    feature = feature.split('.')[0]
    # Overlay both histograms
    fig.update_layout(barmode='overlay', title=feature+" Distribution",
                      xaxis_title=feature,
                      yaxis_title=normalization_type,)
    # Reduce opacity to see both histograms
    fig.update_traces(opacity=opacity)
    if mlflow_log:
        mlflow.log_figure(fig, f"{feature}_Distribution.html")
        
    fig.show()

In [110]:
def display_drifted_features(reference, production, drift_df, mlflow_log = False):
    reference_rdt = ht.reverse_transform(reference)
    production_rdt = ht.reverse_transform(production)
    for index, row in drift_df.iterrows():
        if row['feature_importance_weighted_drift_detected']:

            logging.info(f"drift detected for {row['feature']}")
            logging.info(f"drift_score : {row['drift_score']}")
            logging.info(f"stattest_name : {row['stattest_name']}")
            logging.info(f"raw_feature_importance : {row['feature_importance']}") 
            logging.info(f"relative_feature_importance : {row['relative_feature_importance']}") 
            logging.info(f"feature_importance_weighted_drift_score : {row['feature_importance_weighted_drift_score']}")
            
            overlay_distribution(reference_rdt[row['feature'].split('.')[0]], production_rdt[row['feature'].split('.')[0]], row['feature'], opacity = .4,mlflow_log=True)
        if mlflow_log:
                mlflow.log_param(f"{row['feature']}_feature_importance_weighted_drift_detected", row['feature_importance_weighted_drift_detected'])
                mlflow.log_param(f"{row['feature']}_drift_score", row['drift_score'])
                mlflow.log_param(f"{row['feature']}_stattest_name", row['stattest_name'])
                mlflow.log_param(f"{row['feature']}_raw_feature_importance", row['feature_importance'])
                mlflow.log_param(f"{row['feature']}_relative_feature_importance", row['relative_feature_importance'])
                mlflow.log_param(f"{row['feature']}_feature_importance_weighted_drift_score", row['feature_importance_weighted_drift_score'])

    

In [111]:
def display_target_drift(y_reference, y_production, target_name, target_drift_detected, drift_score, stattest_type, mlflow_log = False):
    logging.info(f"Target drift detected for for {target_name}")
    logging.info(f"drift_score : {drift_score}")
    logging.info(f"stattest_name : {stattest_type}")
    if mlflow_log:
        mlflow.log_param(f"{target_name}_drift_detected", target_drift_detected)
        mlflow.log_param(f"{target_name}_target_drift_score", drift_score)
        mlflow.log_param(f"{target_name}_target_stattest_name", stattest_type)
    overlay_distribution(y_reference, y_production, target_name, opacity = .4, mlflow_log=True)
            

#### Test for target drift

In [112]:
def determine_test_type(reference):

    num_data = reference.shape[0]

    if num_data <= 1000:
        return 'p_value'
    else:
        return 'distance'

In [113]:
def test_target_drift(X_reference, y_reference, X_production, y_production, mlflow_log = False):
    target_drift_detected = False
    # get target drift report
    if column_mapping.task == 'regression':
        profile = NumTargetDriftProfileSection
        drift_name = 'num_target_drift'
    else:
        pass
    target_drift_profile = get_drift_profile(X_reference.join(y_reference), X_production.join(y_production), profile, mlflow_log)
    
    drift_score = target_drift_profile[drift_name]['data']['metrics']['target_drift']
    target_name = target_drift_profile[drift_name]['data']['utility_columns']['target']
    stattest_type = determine_test_type(X_reference)
    if stattest_type == 'p_value':
        threshold = 0.05
        if drift_score <= threshold:

            target_drift_detected = True

    else:
        threshold = 0.1
        if drift_score >= threshold:

            target_drift_detected = True
   
    if mlflow_log:
        display_target_drift(y_reference, y_production, target_name, target_drift_detected, drift_score, stattest_type, mlflow_log = True)
    return target_drift_detected
      

In [114]:
def test_for_drift(X_reference, y_reference, X_production, y_production, model, mlflow_log = False):

    
    target_drift_detected = test_target_drift(X_reference, y_reference, X_production, y_production, mlflow_log)

    logging.info(f"target drift detect: {target_drift_detected}")
    # get data drift report
    data_drift_profile = get_drift_profile(X_reference, X_production,
                                           DataDriftProfileSection, mlflow_log)
    # data drift detected?
    drift_detected, drift_df = is_weighted_data_drift_detected(data_drift_profile,
                                                     X_production,  model)
    
    if drift_detected:
        logging.info('data set drift detected')
        display_drifted_features(X_reference, X_production, drift_df, mlflow_log)
    else:
        logging.info('drift not detected')
    drift_detected = drift_detected or target_drift_detected
    return drift_detected

In [115]:
@mlflow_runner
def test_mlflow(experiment_name):
    # Log parameters
    mlflow.log_param("begin", datetime(2020,1,2))
    mlflow.log_param("end", datetime(2022,1,2))
    mlflow.log_dict(X_transformed.sample(500).to_dict(), 'input_features.json')
    mlflow.log_dict(X_eval_transformed.sample(500).to_dict(), 'target.json')
    return test_for_drift(X_transformed.sample(500), y.sample(500), X_eval_transformed.sample(500), y_eval.sample(500) ,reg, mlflow_log = True)


In [116]:
#test_mlflow(experiment_name = 'Data Drift Evaluation with Evidently')

In [117]:
@mlflow_runner
def simulate_Weekly_experiment(df, model, preds, experiment_name, week_num):
    # Get weekly data
        row = df.iloc[0]
        dt = row['date']
        start = dt - timedelta(days=row['date'].weekday())
        end = start + timedelta(days=6)
        sub_df = df.set_index('date')[start:end]
        sub_df = sub_df.reset_index().set_index('ID')
        df = df[(df.date.dt.date > end.date())]
        print(f'Experiment week {week_num}')
        print('---------------------------------')
        print(f"Experiment start date : {start.to_pydatetime()}")
        print(f"Experiment end_date : {end.to_pydatetime()}")
        mlflow.log_param('experiment_week',week_num)
        mlflow.log_param('week_begin_experiment_start_date',start)
        mlflow.log_param('week_end_experiment_end_date',end)
        
        # Transform sub_df to X, y
        X_prod,y_prod = split_features_target(sub_df)
        X_prod_transformed = ht.transform(X_prod)
        mlflow.log_dict(X_prod_transformed.to_dict(), 'input_features.json')
        mlflow.log_dict(y_prod.to_dict(), 'target.json')
        # weekly_prediction
        weekly_predictions = model.predict(X_prod_transformed)

        mlflow.log_dict(pd.Series(weekly_predictions,name=y_prod.name, index = y_prod.index).to_dict(),'weekly_predictions.json')
        # Predict for X,y
        scores_dict = get_scores(y_prod,weekly_predictions,f"weekly")
        mlflow.log_metrics(scores_dict)
        preds.extend(weekly_predictions.tolist())
        ht.fit(X_prod)
        if test_for_drift(X_transformed, y, X_prod_transformed, y_prod, reg, mlflow_log=True):
            # retrain model
            # test for drift an
            logging.info('Retraining model')
            model.fit(X_prod_transformed, y_prod)
        else:
            logging.info('No Drift detection')
        return df, preds

In [118]:
def simulate_model(df, model, experiment_name):
    
    preds = []
    df = df.reset_index()
    current_df = df
    week_num = 1
    while current_df.shape[0] > 0:

        current_df, preds = simulate_Weekly_experiment(current_df, model, preds, experiment_name = experiment_name, week_num=week_num)
        week_num +=1
    return df,preds   

In [119]:
df, preds = simulate_model(serve_df, reg, experiment_name='weekly_ml_monitoring')

Experiment week 1
---------------------------------
Experiment start date : 2018-12-24 00:00:00
Experiment end_date : 2018-12-30 00:00:00


03:55:18 INFO:Target drift detected for for daily_dispatch_count
03:55:18 INFO:drift_score : 0.8506966925281619
03:55:18 INFO:stattest_name : distance


03:55:18 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:18 INFO:data set drift detected
03:55:19 INFO:drift detected for Product_Type.value
03:55:19 INFO:drift_score : 0.8325546111576977
03:55:19 INFO:stattest_name : Jensen-Shannon distance
03:55:19 INFO:raw_feature_importance : 2.3783957958221436
03:55:19 INFO:relative_feature_importance : 1.0
03:55:19 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:19 INFO:drift detected for is_weekend.value
03:55:19 INFO:drift_score : 0.8325546111576977
03:55:19 INFO:stattest_name : Jensen-Shannon distance
03:55:19 INFO:raw_feature_importance : 0.4681791365146637
03:55:19 INFO:relative_feature_importance : 0.196846604347229
03:55:19 INFO:feature_importance_weighted_drift_score : 0.1638855481400204


03:55:19 INFO:Retraining model


Experiment week 2
---------------------------------
Experiment start date : 2018-12-31 00:00:00
Experiment end_date : 2019-01-06 00:00:00



The data contains 38 new categories that were not seen in the original data (examples: {'WH_0x3ea', 'WH_0x3e9'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.


The data contains 46 new categories that were not seen in the original data (examples: {'No'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.


The data contains 38 new categories that were not seen in the original data (examples: {'connecticut', 'colorado'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.


The data contains 38 new categories that were not seen in the original data (examples: {'teller', 'hartford'}). Assigning them random values. If you want to model new categories, please fit the transformer again with the new data.


The data contains 38 new categories that were not seen in the original data 

03:55:19 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:20 INFO:data set drift detected
03:55:20 INFO:drift detected for Product_Type.value
03:55:20 INFO:drift_score : 0.8325546111576977
03:55:20 INFO:stattest_name : Jensen-Shannon distance
03:55:20 INFO:raw_feature_importance : 2.5258266925811768
03:55:20 INFO:relative_feature_importance : 1.0
03:55:20 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:20 INFO:Retraining model


Experiment week 3
---------------------------------
Experiment start date : 2019-01-07 00:00:00
Experiment end_date : 2019-01-13 00:00:00


03:55:21 INFO:Target drift detected for for daily_dispatch_count
03:55:21 INFO:drift_score : 0.7807266090424902
03:55:21 INFO:stattest_name : distance


03:55:21 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:21 INFO:data set drift detected
03:55:21 INFO:drift detected for Product_Type.value
03:55:21 INFO:drift_score : 0.8325546111576977
03:55:21 INFO:stattest_name : Jensen-Shannon distance
03:55:21 INFO:raw_feature_importance : 1.535874605178833
03:55:21 INFO:relative_feature_importance : 1.0
03:55:21 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:21 INFO:Retraining model


Experiment week 4
---------------------------------
Experiment start date : 2019-01-14 00:00:00
Experiment end_date : 2019-01-20 00:00:00


03:55:22 INFO:Target drift detected for for daily_dispatch_count
03:55:22 INFO:drift_score : 0.7568025391727874
03:55:22 INFO:stattest_name : distance


03:55:22 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:22 INFO:data set drift detected
03:55:22 INFO:drift detected for Product_Type.value
03:55:22 INFO:drift_score : 0.8325546111576977
03:55:22 INFO:stattest_name : Jensen-Shannon distance
03:55:22 INFO:raw_feature_importance : 1.6264139413833618
03:55:22 INFO:relative_feature_importance : 1.0
03:55:22 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:22 INFO:Retraining model


Experiment week 5
---------------------------------
Experiment start date : 2019-01-21 00:00:00
Experiment end_date : 2019-01-27 00:00:00


03:55:23 INFO:Target drift detected for for daily_dispatch_count
03:55:23 INFO:drift_score : 0.739195939029973
03:55:23 INFO:stattest_name : distance


03:55:23 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:23 INFO:data set drift detected
03:55:23 INFO:drift detected for Product_Type.value
03:55:23 INFO:drift_score : 0.8325546111576977
03:55:23 INFO:stattest_name : Jensen-Shannon distance
03:55:23 INFO:raw_feature_importance : 1.5279386043548584
03:55:23 INFO:relative_feature_importance : 1.0
03:55:23 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:24 INFO:Retraining model


Experiment week 6
---------------------------------
Experiment start date : 2019-01-28 00:00:00
Experiment end_date : 2019-02-03 00:00:00


03:55:24 INFO:Target drift detected for for daily_dispatch_count
03:55:24 INFO:drift_score : 0.6941825430893434
03:55:24 INFO:stattest_name : distance


03:55:24 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:24 INFO:data set drift detected
03:55:25 INFO:drift detected for Product_Type.value
03:55:25 INFO:drift_score : 0.8325546111576977
03:55:25 INFO:stattest_name : Jensen-Shannon distance
03:55:25 INFO:raw_feature_importance : 1.5493420362472534
03:55:25 INFO:relative_feature_importance : 1.0
03:55:25 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:25 INFO:Retraining model


Experiment week 7
---------------------------------
Experiment start date : 2019-02-04 00:00:00
Experiment end_date : 2019-02-10 00:00:00


03:55:25 INFO:Target drift detected for for daily_dispatch_count
03:55:25 INFO:drift_score : 0.6648371414155752
03:55:25 INFO:stattest_name : distance


03:55:25 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:26 INFO:data set drift detected
03:55:26 INFO:drift detected for Product_Type.value
03:55:26 INFO:drift_score : 0.8325546111576977
03:55:26 INFO:stattest_name : Jensen-Shannon distance
03:55:26 INFO:raw_feature_importance : 1.768558144569397
03:55:26 INFO:relative_feature_importance : 1.0
03:55:26 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:26 INFO:Retraining model


Experiment week 8
---------------------------------
Experiment start date : 2019-02-11 00:00:00
Experiment end_date : 2019-02-17 00:00:00


03:55:27 INFO:Target drift detected for for daily_dispatch_count
03:55:27 INFO:drift_score : 0.602266990111311
03:55:27 INFO:stattest_name : distance


03:55:27 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:27 INFO:data set drift detected
03:55:27 INFO:drift detected for Product_Type.value
03:55:27 INFO:drift_score : 0.8325546111576977
03:55:27 INFO:stattest_name : Jensen-Shannon distance
03:55:27 INFO:raw_feature_importance : 1.8100552558898926
03:55:27 INFO:relative_feature_importance : 1.0
03:55:27 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:27 INFO:Retraining model


Experiment week 9
---------------------------------
Experiment start date : 2019-02-18 00:00:00
Experiment end_date : 2019-02-24 00:00:00


03:55:28 INFO:Target drift detected for for daily_dispatch_count
03:55:28 INFO:drift_score : 0.6454393335295251
03:55:28 INFO:stattest_name : distance


03:55:28 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:28 INFO:data set drift detected
03:55:28 INFO:drift detected for Product_Type.value
03:55:28 INFO:drift_score : 0.8325546111576977
03:55:28 INFO:stattest_name : Jensen-Shannon distance
03:55:28 INFO:raw_feature_importance : 1.8101354837417603
03:55:28 INFO:relative_feature_importance : 1.0
03:55:28 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:29 INFO:Retraining model


Experiment week 10
---------------------------------
Experiment start date : 2019-02-25 00:00:00
Experiment end_date : 2019-03-03 00:00:00


03:55:29 INFO:Target drift detected for for daily_dispatch_count
03:55:29 INFO:drift_score : 0.6019716597946714
03:55:29 INFO:stattest_name : distance


03:55:29 INFO:target drift detect: True

ntree_limit is deprecated, use `iteration_range` or model slicing instead.

03:55:29 INFO:data set drift detected
03:55:30 INFO:drift detected for Product_Type.value
03:55:30 INFO:drift_score : 0.8325546111576977
03:55:30 INFO:stattest_name : Jensen-Shannon distance
03:55:30 INFO:raw_feature_importance : 1.7997283935546875
03:55:30 INFO:relative_feature_importance : 1.0
03:55:30 INFO:feature_importance_weighted_drift_score : 0.8325546111576977


03:55:30 INFO:Retraining model


#### Drift detection graphs

In [120]:
expirement_results = mlflow.search_runs(experiment_names=['weekly_ml_monitoring'])

In [121]:
expirement_results.set_index('params.week_end_experiment_end_date',inplace=True)
expirement_results.sort_index(ascending=True,inplace=True)

In [122]:
expirement_results.head(2)

Unnamed: 0_level_0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.weekly_mse,metrics.weekly_rmse,metrics.weekly_mae,metrics.weekly_mape,...,params.county.value_raw_feature_importance,params.Product_Type.value_drift_score,params.warehouse_ID.value_drift_score,params.state_deaths.value_feature_importance_weighted_drift_score,params.state.value_feature_importance_weighted_drift_score,params.warehouse_ID.value_relative_feature_importance,params.state_cases.value_feature_importance_weighted_drift_detected,tags.mlflow.source.type,tags.mlflow.user,tags.mlflow.source.name
params.week_end_experiment_end_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-12-30 00:00:00,9e0596649a7348ab840732222cc4fd05,2,FINISHED,file:///C:/Users/shree/projects/ML_Monitoring/...,2022-06-27 10:25:18.399000+00:00,2022-06-27 10:25:19.498000+00:00,0.048557,0.220356,0.179265,0.034397,...,0.0,0.8325546111576977,0.8325546111576977,0.0,0.0,0.0072659021243453,False,LOCAL,shree,d:\anaconda\envs\anaconda_env\lib\site-package...
2018-12-30 00:00:00,3df74c0de9f24d5e9eeb01dfbf1a7448,2,FINISHED,file:///C:/Users/shree/projects/ML_Monitoring/...,2022-06-27 10:05:43.254000+00:00,2022-06-27 10:05:44.644000+00:00,2.14723,1.465343,1.083843,0.186534,...,0.0,0.8325546111576977,0.8325546111576977,0.0,0.0,0.0131383063271641,False,LOCAL,shree,d:\anaconda\envs\anaconda_env\lib\site-package...


In [123]:
expirement_results.artifact_uri.iloc[0]

'file:///C:/Users/shree/projects/ML_Monitoring/Feature_importance_weighted_drift_detection_and_automated_retraining/mlruns/2/9e0596649a7348ab840732222cc4fd05/artifacts'

In [124]:
all_cols = expirement_results.columns
drift_detection_cols = all_cols[all_cols.str.endswith('drift_detected')]
drift_score_cols = all_cols[all_cols.str.endswith('drift_score')]

In [125]:
for col in drift_detection_cols:
    expirement_results[col] = expirement_results[col].replace([None],False).replace('False',False).astype(bool).astype(int)

In [126]:
for col in drift_score_cols:
    expirement_results[col] = expirement_results[col].replace([None],0).astype(float)

In [127]:
drift_detected_cols = []
for col in drift_detection_cols:
    if expirement_results[col].sum() > 0:
        drift_detected_cols.append(col)


In [128]:
drift_detected_cols

['params.daily_dispatch_count_drift_detected',
 'params.Product_Type.value_feature_importance_weighted_drift_detected',
 'params.is_weekend.value_feature_importance_weighted_drift_detected']

In [129]:
drift_detected_features = pd.Series(drift_detected_cols).map(lambda x:x.split('.')[1]).str.replace('_drift_detected','').to_list()

In [130]:
drift_detected_experiment_cols = all_cols[[any([(feature in col) for feature in drift_detected_features]) for col in all_cols.to_list()]]

In [131]:
drift_detected_experiment_results = expirement_results[drift_detected_experiment_cols]

In [132]:
fig = px.line(drift_detected_experiment_results,  y='params.daily_dispatch_count_target_drift_score', markers=True)
fig.show()

In [133]:
fig = px.line(drift_detected_experiment_results,  y='params.Product_Type.value_feature_importance_weighted_drift_detected', markers=True)
fig.show()

In [134]:
fig = px.line(drift_detected_experiment_results,  y='params.daily_dispatch_count_drift_detected', markers=True)
fig.show()

In [136]:
fig = px.line(drift_detected_experiment_results,  y='params.Product_Type.value_drift_score', markers=True)
fig.show()

In [137]:
drift_detected_experiment_results

Unnamed: 0_level_0,params.daily_dispatch_count_drift_detected,params.is_weekend.value_stattest_name,params.Product_Type.value_feature_importance_weighted_drift_detected,params.Product_Type.value_relative_feature_importance,params.is_weekend.value_drift_score,params.is_weekend.value_feature_importance_weighted_drift_score,params.daily_dispatch_count_target_drift_score,params.is_weekend.value_feature_importance_weighted_drift_detected,params.is_weekend.value_relative_feature_importance,params.is_weekend.value_raw_feature_importance,params.daily_dispatch_count_target_stattest_name,params.Product_Type.value_feature_importance_weighted_drift_score,params.Product_Type.value_raw_feature_importance,params.Product_Type.value_stattest_name,params.Product_Type.value_drift_score
params.week_end_experiment_end_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2018-12-30 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.163886,0.850697,1,0.196846604347229,0.4681791365146637,distance,0.832555,2.378395795822144,Jensen-Shannon distance,0.832555
2018-12-30 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.117613,0.850697,1,0.1412676572799682,0.2465317100286483,distance,0.832555,1.7451391220092771,Jensen-Shannon distance,0.832555
2019-01-06 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.760357,0,0.0,0.0,distance,0.832555,2.5258266925811768,Jensen-Shannon distance,0.832555
2019-01-06 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.760357,0,0.0,0.0,distance,0.832555,2.5119287967681885,Jensen-Shannon distance,0.832555
2019-01-13 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.780727,0,0.0,0.0,distance,0.832555,1.535874605178833,Jensen-Shannon distance,0.832555
2019-01-13 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.780727,0,0.0,0.0,distance,0.832555,1.511664867401123,Jensen-Shannon distance,0.832555
2019-01-20 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.756803,0,0.0,0.0,distance,0.832555,1.6264139413833618,Jensen-Shannon distance,0.832555
2019-01-20 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.756803,0,0.0,0.0,distance,0.832555,1.6264139413833618,Jensen-Shannon distance,0.832555
2019-01-27 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.739196,0,0.0,0.0,distance,0.832555,1.5279386043548584,Jensen-Shannon distance,0.832555
2019-01-27 00:00:00,1,Jensen-Shannon distance,1,1.0,0.832555,0.0,0.739196,0,0.0,0.0,distance,0.832555,1.5279386043548584,Jensen-Shannon distance,0.832555
