# From Notebook to ModelOp Center:

## Training, Evaluating, and Conforming a Model for Deployment
In this notebook, we demonstrate the process of
1. training a model,
2. evaluating its performance,
3. saving for later use,
4. and conforming it to MOC standards

More specifically, we will train a linear regression predictor on the Ames Housing Data dataset.

**I - Model Training**

Let's load in the necessary libraries. We will be using `sklearn` to train the model, and `aequitas` for bias detection.

In [1]:
import pickle
import pandas
import numpy
import copy

from aequitas.bias import Bias
from aequitas.group import Group
from aequitas.preprocessing import preprocess_input_df

from sklearn import set_config
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

set_config(display='diagram')

The **Ames Housing Data** dataset can be found [at this link](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data). Download the train dataset (we will be using it exclusively as those have all have an actual SalePrice value, our ground truth to use with the monitoring capabilities of ModelOp Center) and load it into a Pandas DataFrame. For the purposes of showcasing monitoring capabilities, we will add a randomly generated feature for each row, `gender`, for demonstration.

In [2]:
df = pandas.read_csv('./house_price_data.csv')

In [3]:
df.columns.values

array(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2',
       'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu',
       'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageCars',
       'GarageArea', 'GarageQual', 'GarageCond', 'Pav

Let's look at the top of the data:

In [4]:
df.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


Before proceeding with model development, we will split the original dataset into two sets: a **baseline** set which will be used as a reference set, and a **sample** set which will mimic input data to the model once the model is in use.

We'l also prepare a `_scored` version of the dataframes for later use during our MOC monitoring phase. In that DataFrame, we'll mainly be using the `ground_truth`, which in our case is `SalePrice`, and `predictions` (later to be added) to compare drift, bias, and other metrics that we will want to monitor in the lifecycle of the model.

In [5]:
df_baseline, df_sample = train_test_split(df, train_size=0.8, random_state=0)

df_baseline_scored = df_baseline.copy(deep=True)
df_sample_scored = df_sample.copy(deep=True)

df_baseline.to_json('df_baseline.json', orient='records', lines=True)
df_sample.to_json('df_sample.json', orient='records', lines=True)

We will have to **clean up** the data. There are quite a few null values in the dataset and, although effort to properly impute and clean data is necessary, we will only apply simple imputations for the sake of this demonstration. Note that these steps will also be necessary once we want to write code that conforms to ModelOp standards.

In [6]:
numerical_features = []
categorical_features = []
for i,j in zip(df.dtypes.index, df.dtypes.values):
    if j=='object':
        categorical_features.append(i)
    else:
        numerical_features.append(i)

**Categorical Features**

In [7]:
print(categorical_features)

['MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu', 'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive', 'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition']


**Numerical Features**

In [8]:
print(numerical_features)

['Id', 'MSSubClass', 'LotFrontage', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd', 'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal', 'MoSold', 'YrSold', 'SalePrice']


In [9]:
# imputing missing GarageYrBlt values with corresponding YrBlt values
df_baseline.loc[:,'GarageYrBlt'] = [df_baseline.loc[i, 'GarageYrBlt'] if not x else df_baseline.loc[i, 'YearBuilt'] for i, x in df_baseline.loc[:,'GarageYrBlt'].isna().items()]
df_sample.loc[:,'GarageYrBlt'] = [df_sample.loc[i, 'GarageYrBlt'] if not x else df_sample.loc[i, 'YearBuilt'] for i, x in df_sample.loc[:,'GarageYrBlt'].isna().items()]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [10]:
# imputing all missing values in numerical features with 0
for col in numerical_features:
    df_baseline.loc[:, col] = df_baseline.loc[:, col].fillna(0)
    df_sample.loc[:, col] = df_sample.loc[:, col].fillna(0)

In [11]:
# imputing all missing values in categorical features with 'None'
for col in categorical_features:
    df_baseline.loc[:, col] = df_baseline.loc[:, col].fillna('None')
    df_sample.loc[:, col] = df_sample.loc[:, col].fillna('None')

Our data still contains non-predictive features, such as `Id` and `SalePrice`. We remove those now.

In [12]:
predictive_features = [
    f for f in list(df.columns.values)
    if f not in ['Id', 'SalePrice']
]

Everything looks good; we'll proceed with model training. We need to specify **predictive** and **responsive** variables for each of the training and test sets. We'll set those by filtering the baseline and sample sets.

In [13]:
X_train = df_baseline[predictive_features]
X_test = df_sample[predictive_features]

y_train = df_baseline['SalePrice']
y_test = df_sample['SalePrice']

Will will train a **Lasso** linear regression model. Since our data contains categorical features, we will need to one-hot encode using `pandas.get_dummies()`

In [14]:
# One hot encoding with pandas.get_dummies()
X_train = pandas.get_dummies(X_train, columns=categorical_features)
X_test = pandas.get_dummies(X_test, columns=categorical_features)

# The final list of encoded columns
train_encoded_columns = X_train.columns

# filling in any missing encoded columns with 0s
for col in train_encoded_columns:
    if col not in X_test.columns:
        X_test[col] = numpy.zeros(X_test.shape[0])

# restricting X_test columns to only be final list of encoded columns
X_test = X_test[train_encoded_columns]

# Saving the final list of encoded columns
pickle.dump(train_encoded_columns, open('train_encoded_columns.pickle', 'wb'))

Let's fit the model to the training data.

In [15]:
lasso = LassoCV()
lasso.fit(X_train, y_train)

In [16]:
lasso.score(X_train, y_train)

0.7708447275907354

**II - Model Evaluation**

Before saving our trained model for further use, let's take a look at some performance metrics. We will evaluate the model on both the training and test sets; we want to see a stable performance between the two.  

For repeatability, let's define a function which computes multiple metrics at once.

In [17]:
def compute_metrics(y, y_preds):
    """
    A function to evaluate a regression model.
    
    param: y: true (ground truth) values
    param: y_preds: predicted values (as predicted by model)
    
    return: multiple regression performance metrics
    """
    
    return {
        'Mean Absolute Error' : mean_absolute_error(y, y_preds),
        'Root Mean Squared Error' : mean_squared_error(y, y_preds) ** 0.5,
        'R2 Score' : r2_score(y, y_preds)
    }

Let's compute predictions on both training and test sets:

In [18]:
y_train_preds = lasso.predict(X_train)
y_test_preds = lasso.predict(X_test)

In [19]:
performance_df = pandas.DataFrame(
    data=[{}],
    columns=['Mean Absolute Error', 'Root Mean Squared Error', 'R2 Score'],
    index=['Training Set', 'Test Set']
)
performance_df.loc['Training Set', :] = compute_metrics(y_train, y_train_preds)
performance_df.loc['Test Set', :] = compute_metrics(y_test, y_test_preds)

Let's look at how our model performed:

In [20]:
performance_df

Unnamed: 0,Mean Absolute Error,Root Mean Squared Error,R2 Score
Training Set,24576.407762,37561.947354,0.770845
Test Set,27946.1833,58544.859989,0.503682


There is quite a difference in performance between the training set and the test set, showing some amount of overfitting. Further model improvements are needed to achieve more accurate inferences. For now, we will contend with this model and use it to produce new predictions.

**III - Saving and Loading the Trained Model**

Now that the model is **trained** and **evaluated**, we save it in a binary format. It will later be loaded and used to make new predictions.

In [21]:
pickle.dump(lasso, open('lasso.pickle', 'wb'))

The model is reloaded on-demands as follows:

In [22]:
lasso_loaded = pickle.load(open('lasso.pickle', 'rb'))

Predictions can be produced on-demand by calling the `predict()` function:

In [23]:
new_preds = lasso_loaded.predict(X_test)

Before heading into the next section, let's append our predictions to our `_scored` DataFrames and save them. Once again, these data sets will be used mainly for monitoring purposes. Specifically for Regression models, MOC expected the ground truth (actual) column to be named `ground_truth` and the predictions to be named `predictions`. We'll make those changes and save our data sets.

In [25]:
df_baseline_scored['predictions'] = y_train_preds
df_baseline_scored = df_baseline_scored.rename(columns={'SalePrice':'ground_truth'})

df_sample_scored['predictions'] = y_test_preds
df_sample_scored = df_sample_scored.rename(columns={'SalePrice':'ground_truth'})
                                           
df_baseline_scored.to_json('df_baseline_scored.json', orient='records', lines=True)
df_sample_scored.to_json('df_sample_scored.json', orient='records', lines=True)

**IV - Evaluating Bias on Protected Classes**

*Work in progress*

**V - Conforming Model Code to MOC Requirements**

Conformance is best demonstrated through example. Let's look at the code below:

In [32]:
import pandas
import pickle
import copy
from aequitas.preprocessing import preprocess_input_df
from aequitas.group import Group
from aequitas.bias import Bias

# modelop.init
def begin():
    global lasso_model
    
    # load pickled Lasso linear regression model
    lasso_model = pickle.load(open('lasso.pickle', 'rb'))
    # load train_encoded_columns
    train_encoded_columns = pickle.load(open('train_encoded_columns.pickle', 'rb'))

# modelop.score
def action(data):
    # Turn data into DataFrame
    df = pandas.DataFrame(data)
    
    predictive_features = ['MSSubClass', 'MSZoning', 'LotFrontage',
                           'LotArea', 'Street', 'Alley', 'LotShape',
                           'LandContour', 'Utilities', 'LotConfig',
                           'LandSlope', 'Neighborhood', 'Condition1',
                           'Condition2', 'BldgType', 'HouseStyle',
                           'OverallQual', 'OverallCond', 'YearBuilt',
                           'YearRemodAdd', 'RoofStyle', 'RoofMatl', 
                           'Exterior1st', 'Exterior2nd', 'MasVnrType',
                           'MasVnrArea', 'ExterQual', 'ExterCond',
                           'Foundation', 'BsmtQual', 'BsmtCond',
                           'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
                           'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF',
                           'TotalBsmtSF', 'Heating', 'HeatingQC',
                           'CentralAir', 'Electrical', '1stFlrSF',
                           '2ndFlrSF', 'LowQualFinSF', 'GrLivArea',
                           'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
                           'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr',
                           'KitchenQual', 'TotRmsAbvGrd', 'Functional',
                           'Fireplaces', 'FireplaceQu', 'GarageType',
                           'GarageYrBlt', 'GarageFinish', 'GarageCars',
                           'GarageArea', 'GarageQual', 'GarageCond',
                           'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
                           'EnclosedPorch', '3SsnPorch', 'ScreenPorch',
                           'PoolArea', 'PoolQC', 'Fence', 'MiscFeature',
                           'MiscVal', 'MoSold', 'YrSold', 'SaleType',
                           'SaleCondition']
    
    categorical_features = ['MSZoning', 'Street', 'Alley', 'LotShape',
                            'LandContour', 'Utilities', 'LotConfig',
                            'LandSlope', 'Neighborhood', 'Condition1',
                            'Condition2', 'BldgType', 'HouseStyle',
                            'RoofStyle', 'RoofMatl', 'Exterior1st', 
                            'Exterior2nd', 'MasVnrType', 'ExterQual',
                            'ExterCond', 'Foundation', 'BsmtQual',
                            'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 
                            'BsmtFinType2', 'Heating', 'HeatingQC',
                            'CentralAir', 'Electrical', 'KitchenQual',
                            'Functional', 'FireplaceQu', 'GarageType',
                            'GarageFinish', 'GarageQual', 'GarageCond',
                            'PavedDrive', 'PoolQC', 'Fence',
                            'MiscFeature', 'SaleType', 'SaleCondition']
    
    # imputing missing values
    for col in predictive_features:
        if df.loc[:,col].isna().sum()>0:
            if df.loc[:,col].dtype=='object':
                df.loc[:,col] = df.loc[:,col].fillna('None')
            else:
                df.loc[:,col] = df.loc[:,col].fillna(0)
    
    # one-hot encode
    df = pandas.get_dummies(df, columns=categorical_features)

    # filling in any missing encoded columns with 0s
    for col in train_encoded_columns:
        if col not in df.columns:
            df[col] = numpy.zeros(df.shape[0])

    # restricting columns to only be final list of encoded columns
    df = df[train_encoded_columns]
    
    df['predictions'] = lasso_model.predict(df)
    
    # MOC expects the action function to be a "yield" function
    return df.to_dict(orient='records')
    # yield df.to_dict(orient='records')

# modelop.metrics
def metrics(data):
    pass

There are four main sections that are standard to almost any model in MOC:
1. Library imports
2. `init` function
3. `score` function
4. `metrics` function

**Library** imports are always at the top. We don't need to include all libraries that we used for training and model evaluation. We just need the libraries for processing and scoring.

The **`init`** function runs once per deployment, and is used to load and persist into memory any variable that needs to be accessed at scoring time. For example, the init function is where we load the saved model binary. We make the variable global so it can be accessed from the scoring function. In our example, we also included the `train_encoded_columns` as this information will not change per prediction and only needs to be instantiated once.

The **`score`** function is the function that runs anytime we make a scoring (prediction) request. This is where we put our prediction code. We have to remember to include any steps that were not captured by the pipeline, such as feature engineering or re-encoding.

The **`metrics`** functions is where model evaluation is carried out. In our example, this is the place where we replicate the calculations of Group and/or Bias metrics.

Let us test our source code to see if we missed anything. We will load input data and scored input data to test both the scoring and metrics functions:

In [33]:
test_sample = pandas.read_json('df_baseline.json', orient='records', lines=True)
metrics_sample = pandas.read_json('df_baseline_scored.json', orient='records', lines=True)

Let's check that the **`init`** function can load the trained model binary:

In [34]:
begin()

No errors from the **`init`** function. Let's make a call to the **`score`** function on input data:

In [35]:
scores = action(test_sample)

We have a set of scores! Finally, let's call the **`metrics`** function on scored data:

*Work in progress*