<a href="https://colab.research.google.com/github/Malachyiii/Model-Performance-Estimation/blob/main/Model_Performance_Estimation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

## ***Hey did you change your runtime to GPU?***


1.   Select "Runtime" menu from the toolbar
2.   Change Runtime Type
3.   Hardware Accelerator = GPU


### Purpose Statement
The goal of this notebook is to explore Model Performance Estimation using the [NannyML](https://nannyml.readthedocs.io/en/main/how_it_works/performance_estimation.html#direct-loss-estimation-dle) package.


## Background

We are not always lucky enough to have continuous ground truth for our trained models. The question therefore becomes how can we tell if our model is still performing well long after training is complete?

Direct loss estimation is one possible answer. It's fairly simple in concept, and this notebook is meant to show a fairly simple example using a toy dataset. It is a 3 step process


1.   Train Your Model (the child model)
2.   Train a second model (The nanny model) using the loss from the first model as a target, and the child model features and predictions as features
3.   Once your model is in production, use the nanny model to estimate the loss of the production model

In this notebook we will train a simple Neural Network on our dataset, then use NannyML to do Direct Loss Estimation on the model for an unseen test set. We can then compare what NannyML thought the loss would be to the actual loss (since we have the true values for test)

# Package Installation

First we have to install the Nanny ML package, then we need to upgrade some of the packages that are installed natively in Colab

In [1]:
%%capture
!pip install nannyml scikeras
!pip install --upgrade tensorflow
!pip install --upgrade scikit-learn
!pip install -U matplotlib

# !! AFTER RUNNING THIS YOU MUST RESTART !!

#Set Up

In [2]:
import nannyml as nml

#Importing basic handling functions
import numpy as np
import pandas as pd
import sys
import math

#Specialty data wrangling functions
from pandas.api.types import is_datetime64_any_dtype as is_datetime
from pandas.core.generic import is_number

#importing oure necessary tensorflow functions
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, losses
from tensorflow.keras.models import Model
from scikeras.wrappers import KerasRegressor

from sklearn.preprocessing import OneHotEncoder

In [3]:
#We create our model args here
class args:
  #Overall training arguments
  batch_size = 32
  epochs = 500

  #Model arguments
  activation = 'relu'
  dropout = 0.1
  optimizer = 'adam'
  loss = losses.MeanAbsoluteError() #loss = mean(abs((y_true - y_pred))

  #Creating a callback function
  callback = tf.keras.callbacks.EarlyStopping(
                                  monitor='val_loss',
                                  min_delta=0,
                                  patience=20,
                                  verbose=0,
                                  mode='min',
                                  baseline=None,
                                  restore_best_weights=True
                                  )

In [4]:
#Test if Tensorflow is using the appropriate chip
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print('GPU device not found')
else:
  print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


# Data Import and Cleaning

In [5]:
#Read in our toy data set
traindf, testdf, test_targets = nml.datasets.load_synthetic_car_price_dataset()

#It comes with some prediction columns, but we'll drop them so we can make our own
traindf = traindf.drop("y_pred", axis = 1)
testdf = testdf.drop("y_pred", axis = 1)

#Lets do a quick inspection
print("The shape of the training set is: ", traindf.shape)
print()
print(traindf.head())
print()
print("The shape of the test set is: ", testdf.shape)
print()
print(testdf.head())

The shape of the training set is:  (60000, 9)

   car_age  km_driven  price_new  accident_count  door_count      fuel  \
0     15.0   144020.0    42810.0             4.0         3.0    diesel   
1     12.0    57078.0    31835.0             3.0         3.0  electric   
2      2.0    76288.0    31851.0             3.0         5.0    diesel   
3      7.0    97593.0    29288.0             2.0         3.0  electric   
4     13.0     9985.0    41350.0             1.0         5.0    diesel   

  transmission  y_true                timestamp  
0    automatic   569.0  2017-01-24 08:00:00.000  
1    automatic  4277.0  2017-01-24 08:00:33.600  
2    automatic  7011.0  2017-01-24 08:01:07.200  
3       manual  5576.0  2017-01-24 08:01:40.800  
4    automatic  6456.0  2017-01-24 08:02:14.400  

The shape of the test set is:  (60000, 8)

   car_age  km_driven  price_new  accident_count  door_count      fuel  \
0      9.0    96276.0    36603.0             4.0         4.0       gas   
1     12.0    25

## Cleaning

In [6]:
#we are just gonna drop all the date columns, as we're not doing a time series model
train = traindf.drop("timestamp", axis = 1)
test = testdf.drop("timestamp", axis = 1)

#next we need to clean up our string columns
if (train.dtypes == 'object').any():

  #We will create a one-hot encoder that drops a column only if it encounters binary (ex: Male/Female) data
  #Any columns must represent at least 1% of the data with a max of 20 categories
  enc = OneHotEncoder(drop = 'if_binary', min_frequency=0.01, max_categories=20)

  #Now we one-hot encode the data, extract the groups, add the encoded data to the data frame, and drop the string columns
  encoded = pd.DataFrame(enc.fit_transform(train.select_dtypes(include='object')).toarray(), columns = list(enc.get_feature_names_out()))
  train = train.select_dtypes(exclude='object')

  train = pd.concat([train, encoded], axis = 1)

  #Repeat for test
  encoded = pd.DataFrame(enc.transform(test.select_dtypes(include='object')).toarray(), columns = list(enc.get_feature_names_out()))
  test = test.select_dtypes(exclude='object')

  test = pd.concat([test, encoded], axis = 1)

In [7]:
#put the target column last and make it a float
train = np.asarray(pd.concat([train.drop("y_true", axis = 1),train["y_true"]], axis = 1)).astype('float32')

test = np.asarray(pd.concat([test, test_targets], axis = 1)).astype('float32')

In [8]:
#Now we create a normalization layer to standardize the data
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(train)

#The decoder allows us to change the predictions back into person_count at the end
decoder = tf.keras.layers.Normalization(axis=None, invert=True)
decoder.adapt(train[:, -1])

In [9]:
#Now we normalize each of our data sets and split the Target from the Data
train = normalizer(train).numpy()
trainx = train[:, :-1]
trainy = train[:, -1]

test = normalizer(test).numpy()
testx = test[:, :-1]
testy = test[:, -1]


In [10]:
#Now we fill the na values with -1
trainx = np.nan_to_num(trainx, copy=True, nan=-1.0, posinf=-1.0, neginf=-1.0)
testx = np.nan_to_num(testx, copy=True, nan=-1.0, posinf=-1.0, neginf=-1.0)

# Model Creation and Fitting

In [11]:
# First we get the number of columns from the training data
size = trainx.shape[1]

#Because we are using the KerasRegressor wrapper function, we must define the model and compile it in a single function
def base_model():
  model = tf.keras.Sequential([
    layers.Input(size),
    
    #We will define three different layers of neurons, each with a dropout layer
    layers.Dense(units = 2 * size, activation = args.activation),
    layers.Dropout(args.dropout),
    
    #Block 2, half the size of block 1
    layers.Dense(units = size, activation = args.activation),
    layers.Dropout(args.dropout),
    
    #Block 3, half the size of block 2
    layers.Dense(units = round(0.5 * size), activation = args.activation),
    layers.Dropout(args.dropout),
    
    #Output lay is a single unit with no activation because we are expecting to
    #produce a regression
    layers.Dense(units = 1)
  ])
  #Printing so we can review for diagnostic purposes
  print(model.summary())

  #compile the model with the optimizer and loss, then return it
  model.compile(optimizer= args.optimizer, loss= args.loss)
  return model

In [None]:
#Now we instantiate our model. We are using the KerasRegressor Wrapper in this
#notebook, Normally all of these parameters would be in .fit
model = KerasRegressor(model=base_model, 
                       #here we put out our training args
                       epochs=args.epochs,
                       batch_size = args.batch_size,
                       callbacks=[args.callback],

                       #The KerasRegressor wrapper for sklearn uses the fit__ prefix to denote fitting args
                       fit__validation_split = 0.25) 

#Now we fit our model and save the history for plotting later
model.fit(trainx, trainy)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 18)                180       
                                                                 
 dropout (Dropout)           (None, 18)                0         
                                                                 
 dense_1 (Dense)             (None, 9)                 171       
                                                                 
 dropout_1 (Dropout)         (None, 9)                 0         
                                                                 
 dense_2 (Dense)             (None, 4)                 40        
                                                                 
 dropout_2 (Dropout)         (None, 4)                 0         
                                                                 
 dense_3 (Dense)             (None, 1)                 5

# Model Performance Evaluation

In [None]:
#In order to monitor our performance and make an estimation, we need to do predictions on our sets
traindf["y_pred"] = decoder(model.predict(trainx).flatten())
testdf["y_pred"] = decoder(model.predict(testx).flatten())

No we are going to create our Direct Loss Estimation. Even though we did not create a time series model, we'll be bringing our time back in. Remember that the idea of DLE is that we have ground truth over a certain period, deploy the model, then look for a change or drift over time.

In [None]:
#Now we create our Direct Loss Estimation Object
estimator = nml.DLE(
    #What columns are we using
    feature_column_names=['car_age', 'km_driven', 'price_new', 'accident_count',
                          'door_count', 'fuel', 'transmission'],
    #What is the name of the prediction and truth columns
    y_pred='y_pred',
    y_true='y_true',
    #Where is our timestamp
    timestamp_column_name='timestamp',
    #We will be using mean absolute error
    metrics=['mae'],
    #How many observations will be used to aggregate and compute the mean
    chunk_size=6000,
)

#Now we fit to our training data
estimator.fit(traindf)

#and make our estimation on the test data, then convert that to a dataframe
est_perf = estimator.estimate(testdf)
est_perf_data = est_perf.to_df()

In [None]:
#Now we can plot what our metric looks like
fig = est_perf.filter(metrics=['mae']).plot()
fig.show()

As we can see above, our DLE estimator has noticed a big change in the Mean Absolute Error of our model. We think that the Mean Absolute Error drops suddenly after the model has deployed. Lets take a look at the actual test y values and see if we are right. 

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from sklearn.metrics import mean_absolute_error
import matplotlib.pyplot as plt

# add ground truth to analysis
analysis_full = pd.concat([testdf, test_targets], axis = 1)
df_all = pd.concat([traindf, analysis_full]).reset_index(drop=True)
df_all['timestamp'] = pd.to_datetime(df_all['timestamp'])
# calculate actual MAE
target_col = estimator.y_true
pred_score_col = 'y_pred'
actual_performance = []
for idx in est_perf_data.index:
    start_date, end_date = est_perf_data.loc[idx, ('chunk', 'start_date')], est_perf_data.loc[idx, ('chunk', 'end_date')]
    sub = df_all[df_all['timestamp'].between(start_date, end_date)]
    actual_perf = mean_absolute_error(sub[target_col], sub[pred_score_col])
    est_perf_data.loc[idx, ('mae', 'realized')] = actual_perf
# plot
first_analysis = est_perf_data[('chunk', 'start_date')].values[10]
plt.figure(figsize=(10,5))
plt.plot(est_perf_data[('chunk', 'start_date')], est_perf_data[('mae', 'value')], label='estimated MAE')
plt.plot(est_perf_data[('chunk', 'start_date')], est_perf_data[('mae', 'realized')], label='actual MAE')
plt.xticks(rotation=90)
plt.axvline(x=first_analysis, label='First analysis chunk', linestyle=':', color='grey')
plt.ylabel('MAE')
plt.legend()
plt.show()

It looks like our DLE did a really good job of estimating the Loss! Going forward we can use this to see if our model drifts away. No ground truth required!