# **MLOps.. From Model to Production**

## Track models performance.. WHY?

> ### Changes in data that impact model performance
>> Data source changed: data types, format, schema etc.
>> Data drifts caused by changes in behaviour e.g., spammers rate sending emails per minute
> ### Model performance changes over time
>> Drifts in the target variable



<img src="https://drive.google.com/uc?export=view&id=1zwSOMIcSUoT9j9CkpB0tJvNo8aYJoWKo">

## In this tutorial we will explore dashboards to monitor

*   Model Performance
*   Target Drift
*   Data Drift

---
## Objectives for today session
1. Understand the benefits of monitorning your machine learning model
2. What can be monitored
3. From theory to practice

# Machine Learning (ML) Lifecyle

<img src="https://drive.google.com/uc?export=view&id=1daEp_7S7UD3tdaBxWDoerjsVCwawV3qx">

---
<img src="https://drive.google.com/uc?export=view&id=1xvOHmPctwxiz5fELKITnKPgVCPUNH8cI">

# Monitor your models 💪😼

> ## In general, ML models do not provide 100% correct results 
>> Model validation should be of statistical nature instead of traditional software development binary pass/fail test
>
> How we decide if the model is good enough for deployment?
>> - based on the model nature e.g., ensemble models track adjustments to parameters and/or hyperparameters 
>> - decide on thresholds for acceptable values e.g., compare with performance from previous model runs
>> - sometimes performance cannot be seen as a ”whole” therefore slices of test data are taken to validate model performance for specific relevant features e.g., gender and/or country

<img src="https://drive.google.com/uc?export=view&id=1O9cYp7OnqUOTJcLW4BVMvPtVnRmOuB3s">

## This hands-on tutorial uses the open source library Evidently

> Evidently helps evaluate machine learning models during validation and monitor them in production. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files.
>> Licensed under the Apache License, Version 2.0 (the "License")
>>
>> See more at: https://github.com/evidentlyai/
>>
>> Documentation https://docs.evidentlyai.com/quick-start

## Many other tools are available.
> See some examples in the list below
>> *   neptune.ai https://ui.neptune.ai/ 
>> *   fiddler https://www.fiddler.ai/ml-monitoring
>> *   amazon sagemaker https://aws.amazon.com/sagemaker/model-monitor/

# Install evidently
1. pip install evidently

For Jupyter Notebook
1. jupyter nbextension install --sys-prefix --symlink --overwrite --py evidently
2. jupyter nbextension enable evidently --py --sys-prefix

In [1]:
pip install evidently



# Evidently dashboards

## Requirements
**In general for all dashboards**
> * You need to have input features, and target and/or prediction columns available.
> * You will need two datasets. The reference dataset serves as a benchmark. We analyze the change by comparing the current production data to the reference data
> * You can potentially choose any two datasets for comparison. But keep in mind that only the reference dataset will be used as a basis for comparison


## **1. Data Drift**
* The report detects changes in feature distributions. 
* Performs a suitable statistical test for numerical and categorical features
* Plots feature values and distributions for the two datasets

**Requirements**
> * You will need two datasets. The reference dataset serves as a benchmark. We analyze the change by comparing the current production data to the reference data
> * The dataset should include the features you want to evaluate for drift. The structure (column names) of both datasets should be identical
> * The DateTime column is the only exception. If available, it can be used as the x-axis in the plots
> * All feature columns analyzed for drift should have the numerical type (np.number)
> * Categorical data can be encoded as numerical labels and specified in the column_mapping

More https://docs.evidentlyai.com/reports/data-drift

## **2. Regression Performance**
* The report analyzes the performance of a regression model
* Works for a single model or helps compare the two
* Displays a variety of plots related to the performance and errors
* Helps explore areas of under- and overestimation

More https://docs.evidentlyai.com/reports/reg-performance

<img src="https://drive.google.com/uc?export=view&id=1uhV4KW6oXfHsGQXAmBLjFDC_MSjQa3Ck">

## **3. Categorical Drift**
* The report explores the changes in the categorical target function (prediction)
* Performs a suitable statistical test to compare target (prediction) distribution
* Plots the relations between each individual feature and the target (prediction)

More https://docs.evidentlyai.com/reports/categorical-target-drift

## **4. Numerical Target Drift**
* The report explores the changes in the numerical target function (prediction).  
* Performs a suitable statistical test to compare target (prediction) distribution
* Calculates the correlations between the feature and the target (prediction)
* Plots the relations between each individual feature and the target (prediction)

More https://docs.evidentlyai.com/reports/num-target-drift

## **5. Classification Performance**
* The report analyzes the performance of a classification model
* Works for a single model or helps compare the two
* Works for binary and multi-class classification 
* Displays a variety of plots related to the model performance 
* Helps explore regions where the model makes different types of errors

**Requirements**
> * You need to have input features, and both target and prediction columns available
> * You can use both numerical labels like "0", "1", "2" or class names like "virginica", "setoza", "versicolor" inside the target and prediction columns. The labels should be the same for the target and predictions
> * Column order in Binary Classification. For binary classification, class order matters. The tool expects that the target (so-called positive) class is the first in the column_mapping['prediction'] list


More https://docs.evidentlyai.com/reports/classification-performance

<img src="https://drive.google.com/uc?export=view&id=1n-ShAEy-nJOoYdqKKlYxK0revno2Kpz_">



## **6. Probabilistic Classification Performance**
* The report analyzes the performance of a probabilistic classification model
* Works for a single model or helps compare the two
* Works for binary and multi-class classification
* Displays a variety of plots related to the model performance
* Helps explore regions where the model makes different types of errors

More https://docs.evidentlyai.com/reports/probabilistic-classification-performance

# Import required libraries

In [2]:
import pandas as pd
import numpy as np

from sklearn import ensemble
from sklearn.ensemble import RandomForestRegressor

from evidently.dashboard import Dashboard
from evidently.tabs import DataDriftTab, NumTargetDriftTab, RegressionPerformanceTab

from evidently.model_profile import Profile
from evidently.profile_sections import RegressionPerformanceProfileSection

# Regression Performance Dashboard

### Dataset - Bike Sharing Dataset

More information about the dataset can be found in UCI machine learning repository: 

https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Acknowledgement: Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg


In [3]:
# Load dataset
raw_data = pd.read_csv(
    'Bike-Sharing-Dataset/day.csv',
    header = 0,
    sep = ',',
    parse_dates=['dteday']
    )

In [4]:
# Take Reference dataset and Production Dataset
ref_data = raw_data[:120]
prod_data = raw_data[120:150]

In [5]:
ref_data.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


# Regression Model

In [6]:
# Required variables to train Random Forest Model
target = 'cnt'

# "Spliting" the features in two lists is helpful for the Dashboard
# For the Random Forest Model this is not relevant
numerical_features = ['mnth', 'temp', 'atemp', 'hum', 'windspeed']
categorical_features = ['season', 'holiday', 'weekday', 'workingday', 'weathersit',]

# Features to be used by the random forest model
features = numerical_features + categorical_features

In [7]:
# Build and train the Random Forest Model
# Use the reference data
model = RandomForestRegressor(random_state = 0)

model.fit(
    ref_data[features],
    ref_data[target]
    )

RandomForestRegressor(random_state=0)

In [8]:
# Predict for reference and production data
ref_data['prediction']  = model.predict(ref_data[features])
prod_data['prediction'] = model.predict(prod_data[features])

# Regression Perfomance Report

The **keys** from the dict variable (column_mapping) are used to configure the information required for the dashboard

- **target** - what is the model target variable name
- **prediction** - informs about the tast e.g., is a predictive task
- **datetime** - informs the variable that gives an "ordering"/when (if exists)
- **numerical_features** - columns names used to predict that are numeric
- **categorical_features** - columns names used to predict that are categoric


In [9]:
column_mapping = {}

column_mapping['target'] = target
column_mapping['prediction'] = 'prediction'

# If available, it can be used as the x-axis in the dashboard plots
column_mapping['datetime'] = 'dteday'

# Inform the Dashboard which are 
# the numerical and categorical features used in the model
column_mapping['numerical_features'] = numerical_features
column_mapping['categorical_features'] = categorical_features

The below is used to inform we want to build Regression Performance Dashboard
> from evidently.tabs import RegressionPerformanceTab

Provide the requirements
1. Reference and Production data (ref_data & prod_data)
2. Provide the target, numerical and categorical features (column_mapping)
3. Optional, however relevant for this dataset, the date of the event for the x-axis plots (column_mapping)

In [10]:
dashboard = Dashboard(tabs=[RegressionPerformanceTab])

In [11]:
dashboard.calculate(
    ref_data,
    prod_data,
    column_mapping=column_mapping
    )

Dashboard presents performance of the model as a whole but also for its "tails"

- OVER (top-5% of predictions with overestimation)
- UNDER (top-5% of the predictions with underestimation)
- MAJORITY (the rest 90%)

For each feature
* For the numerical features, it shows the mean value per group
* For the categorical features, it shows the most common value

In [12]:
# See the dashboard in the notebook
#dashboard.show()

In [13]:
# Save the dashboard as an HTML file
dashboard.save('reports/bike_sharing_demand_model_perfomance.html')

# Data drift dashboard in jupyter notebook

Data from https://www.kaggle.com/c/bike-sharing-demand/data

## Bicycle Demand Data

In [14]:
raw_data = pd.read_csv(
    'bike-sharing-demand/train.csv',
    header=0,
    sep=',',
    parse_dates=['datetime'],
    index_col='datetime'
    )

In [15]:
raw_data.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


## Regression Model
### Feature engineering

In [16]:
raw_data['month'] = raw_data.index.map(lambda x : x.month)
raw_data['hour'] = raw_data.index.map(lambda x : x.hour)
raw_data['weekday'] = raw_data.index.map(lambda x : x.weekday() + 1)

In [17]:
raw_data.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count,month,hour,weekday
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16,1,0,6
2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40,1,1,6
2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32,1,2,6
2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13,1,3,6
2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1,1,4,6


### Model training

In [18]:
target = 'count'
prediction = 'prediction'
numerical_features = ['temp', 'atemp', 'humidity', 'windspeed', 'hour', 'weekday']
categorical_features = ['season', 'holiday', 'workingday']

In [19]:
# Reference data and current data
reference = raw_data.loc['2011-01-01 00:00:00':'2011-01-28 23:00:00']
current = raw_data.loc['2011-01-29 00:00:00':'2011-02-28 23:00:00']

In [20]:
reference.head()

Unnamed: 0_level_0,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count,month,hour,weekday
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16,1,0,6
2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40,1,1,6
2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32,1,2,6
2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13,1,3,6
2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1,1,4,6


In [21]:
regressor = ensemble.RandomForestRegressor(random_state = 0, n_estimators = 50)
regressor.fit(reference[numerical_features + categorical_features], reference[target])

RandomForestRegressor(n_estimators=50, random_state=0)

In [22]:
# Predictions for reference and current dataset
ref_prediction = regressor.predict(reference[numerical_features + categorical_features])
current_prediction = regressor.predict(current[numerical_features + categorical_features])

In [23]:
reference['prediction'] = ref_prediction
current['prediction'] = current_prediction

### Model Perfomance

In [24]:
column_mapping = {}

column_mapping['target'] = target
column_mapping['prediction'] = prediction
column_mapping['numerical_features'] = numerical_features
column_mapping['categorical_features'] = categorical_features

In [25]:
regression_perfomance_dashboard = Dashboard(tabs=[RegressionPerformanceTab])
regression_perfomance_dashboard.calculate(reference, None, column_mapping=column_mapping)

In [26]:
#regression_perfomance_dashboard.show()

In [27]:
regression_perfomance_dashboard.save('reports/regression_performance_at_training.html')

### Drifts in slices

- Instead of taking all the current data, compare with slices e.g., one week

### Week 1

In [28]:
regression_perfomance_dashboard.calculate(
    reference,
    current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
    column_mapping=column_mapping
    )

In [29]:
#regression_perfomance_dashboard.show()

In [30]:
regression_perfomance_dashboard.save('reports/regression_performance_after_week1.html')

### **Now, let's look at another Dashboard**

### Instead of the model performance let's explores the changes in the numerical target function (prediction) 

In [31]:
target_drift_dashboard = Dashboard(tabs=[NumTargetDriftTab])
target_drift_dashboard.calculate(
    reference,
    current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
    column_mapping=column_mapping
    )

In [32]:
#target_drift_dashboard.show()

In [33]:
target_drift_dashboard.save('reports/target_drift_after_week1.html')

### Week 2

In [34]:
regression_perfomance_dashboard.calculate(
    reference,
    current.loc['2011-02-07 00:00:00':'2011-02-14 23:00:00'],
    column_mapping=column_mapping
    )

In [35]:
#regression_perfomance_dashboard.show()

In [36]:
regression_perfomance_dashboard.save('reports/regression_performance_after_week2.html')

In [37]:
target_drift_dashboard.calculate(
    reference,
    current.loc['2011-02-07 00:00:00':'2011-02-14 23:00:00'],
    column_mapping=column_mapping
    )

In [38]:
#target_drift_dashboard.show()

In [39]:
target_drift_dashboard.save('reports/target_drift_after_week2.html')

### Week 3

In [40]:
regression_perfomance_dashboard.calculate(
    reference,
    current.loc['2011-02-15 00:00:00':'2011-02-21 23:00:00'],
    column_mapping=column_mapping
    )

In [41]:
#regression_perfomance_dashboard.show()

In [42]:
regression_perfomance_dashboard.save('reports/regression_performance_after_week3.html')

In [43]:
target_drift_dashboard.calculate(
    reference,
    current.loc['2011-02-15 00:00:00':'2011-02-21 23:00:00'],
    column_mapping=column_mapping
    )


In [44]:
#target_drift_dashboard.show()

In [45]:
target_drift_dashboard.save('reports/target_drift_after_week3.html')

## Data Drift

In [46]:
column_mapping = {}
column_mapping['numerical_features'] = numerical_features

In [47]:
data_drift_dashboard = Dashboard(tabs=[DataDriftTab])

data_drift_dashboard.calculate(
    reference,
    current.loc['2011-01-29 00:00:00':'2011-02-07 23:00:00'],
    column_mapping=column_mapping
    )

In [48]:
#data_drift_dashboard.show()

In [49]:
data_drift_dashboard.save("reports/data_drift_dashboard_after_week1.html")

# **Profiles** are best for integration into prediction pipelines or with external visualization tools.

# Regression Model Profile

In [50]:
# Same as we did in the beginning of our session

# Load dataset
raw_data = pd.read_csv(
    'Bike-Sharing-Dataset/day.csv',
    header = 0,
    sep = ',',
    parse_dates=['dteday']
    )
# Take Reference dataset and Production Dataset
ref_data = raw_data[:120]
prod_data = raw_data[120:150]
# Required variables to train Random Forest Model
target = 'cnt'

# "Spliting" the features in two lists is helpful for the Dashboard
# For the Random Forest Model this is not relevant
numerical_features = ['mnth', 'temp', 'atemp', 'hum', 'windspeed']
categorical_features = ['season', 'holiday', 'weekday', 'workingday', 'weathersit',]

# Features to be used by the random forest model
features = numerical_features + categorical_features
# Build and train the Random Forest Model
# Use the reference data
model = RandomForestRegressor(random_state = 0)

model.fit(
    ref_data[features],
    ref_data[target]
    )
# Predict for reference and production data
ref_data['prediction']  = model.predict(ref_data[features])
prod_data['prediction'] = model.predict(prod_data[features])



column_mapping = {}

column_mapping['target'] = target
column_mapping['prediction'] = 'prediction'
column_mapping['datetime'] = 'dteday'

column_mapping['numerical_features'] = numerical_features
column_mapping['categorical_features'] = categorical_features

In [51]:
bike_regression_performance_profile = Profile(sections=[RegressionPerformanceProfileSection])

In [52]:
bike_regression_performance_profile.calculate(ref_data, prod_data, column_mapping=column_mapping)

In [53]:
regression_profile = bike_regression_performance_profile.json()

In [54]:
regression_profile

'{"regression_performance": {"name": "regression_performance", "datetime": "2021-11-16 17:27:48.588139", "data": {"utility_columns": {"date": "dteday", "id": null, "target": "cnt", "prediction": "prediction", "drift_conf_level": 0.95, "drift_features_share": 0.5, "nbinsx": null, "xbins": null}, "cat_feature_names": ["season", "holiday", "weekday", "workingday", "weathersit"], "num_feature_names": ["mnth", "temp", "atemp", "hum", "windspeed"], "target_names": null, "metrics": {"reference": {"mean_error": 8.929166666666674, "mean_abs_error": 125.09516666666663, "mean_abs_perc_error": 8.495802832467792, "error_std": 186.45387709836987, "abs_error_std": 138.0765912530898, "abs_perc_error_std": 16.117011043699662, "error_normality": {"order_statistic_medians_x": [-2.526542275665766, -2.197894402213753, -2.0086641993623844, -1.8721280960359694, -1.7635663945500484, -1.672523510822877, -1.5935482122864373, -1.5234210952712135, -1.4600748144725424, -1.40209915455854, -1.3484871031580363, -1.29

<img src="https://drive.google.com/uc?export=view&id=11tmHnX541zcLehbxThu8bgSxths9YoCt">