CS4001/4042 Assignment 1, Part B, Q4
---

Model degradation is a common issue faced when deploying machine learning models (including neural networks) in the real world. New data points could exhibit a different pattern from older data points due to factors such as changes in government policy or market sentiments. For instance, housing prices in Singapore have been increasing and the Singapore government has introduced 3 rounds of cooling measures over the past years (16 December 2021, 30 September 2022, 27 April 2023).

In such situations, the distribution of the new data points could differ from the original data distribution which the models were trained on. Recall that machine learning models often work with the assumption that the test distribution should be similar to train distribution. When this assumption is violated, model performance will be adversely impacted.  In the last part of this assignment, we will investigate to what extent model degradation has occurred.




---



---



Your co-investigators used a linear regression model to rapidly test out several combinations of train/test splits and shared with you their findings in a brief report attached in Appendix A below. You wish to investigate whether your deep learning model corroborates with their findings.

In [189]:
!pip install alibi-detect



In [190]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd

from alibi_detect.cd import TabularDrift

> Evaluate your model from B1 on data from year 2022 and report the test R2.

In [191]:
#installations and imports required to produce model from B1

!pip install pytorch_tabular[extra]

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
    DataConfig,
    OptimizerConfig,
    TrainerConfig,
)



In [192]:
df = pd.read_csv('hdb_price_prediction.csv')

# TODO: Enter your code here

#obtaining our training and validation data
train_df = df[df["year"] <= 2019]
val_df = df[df["year"] == 2020]


#setting up our DataConfig
data_config = DataConfig(
    target=["resale_price"],
    continuous_cols=["dist_to_nearest_stn", "dist_to_dhoby", "degree_centrality", "eigenvector_centrality", "remaining_lease_years", "floor_area_sqm"],
    categorical_cols=["month", "town", "flat_model_type", "storey_range"],
)


#setting up TrainerConfig
trainer_config = TrainerConfig(
    auto_lr_find = True,
    batch_size=1024,
    max_epochs=50,
  )


#setting up CategoryEmbeddingModelConfig
model_config = CategoryEmbeddingModelConfig(
    task="regression",
    layers=50,  #single hidden layer with 50 neurons
)


#setting up OptimizerConfig
optimizer_config = OptimizerConfig(
    optimizer="Adam",
  )


#creating our model
tabular_model = TabularModel(
    data_config=data_config,
    model_config = model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)

tabular_model.fit(train = train_df, validation = val_df)

2023-10-03 08:59:22,104 - {pytorch_tabular.tabular_model:105} - INFO - Experiment Tracking is turned off
INFO:pytorch_tabular.tabular_model:Experiment Tracking is turned off
INFO:lightning_fabric.utilities.seed:Global seed set to 42
2023-10-03 08:59:22,181 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
INFO:pytorch_tabular.tabular_model:Preparing the DataLoaders
2023-10-03 08:59:22,191 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting up the datamodule for regression task
INFO:pytorch_tabular.tabular_datamodule:Setting up the datamodule for regression task
2023-10-03 08:59:22,618 - {pytorch_tabular.tabular_model:521} - INFO - Preparing the Model: CategoryEmbeddingModel
INFO:pytorch_tabular.tabular_model:Preparing the Model: CategoryEmbeddingModel
2023-10-03 08:59:22,675 - {pytorch_tabular.tabular_model:268} - INFO - Preparing the Trainer
INFO:pytorch_tabular.tabular_model:Preparing the Trainer
  rank_zero_deprecation(
INFO:pytorch_lightning.trainer.c

Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

  rank_zero_warn(
  rank_zero_warn(
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_steps=100` reached.
INFO:pytorch_lightning.tuner.lr_finder:Learning rate set to 0.5754399373371567
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/.lr_find_b0e90ab0-fdc7-4588-a83f-71254b6d4295.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restored all states from the checkpoint file at /content/.lr_find_b0e90ab0-fdc7-4588-a83f-71254b6d4295.ckpt
2023-10-03 08:59:27,153 - {pytorch_tabular.tabular_model:575} - INFO - Suggested LR: 0.5754399373371567. For plot and detailed analysis, use `find_learning_rate` method.
INFO:pytorch_tabular.tabular_model:Suggested LR: 0.5754399373371567. For plot and detailed analysis, use `find_learning_rate` method.
2023-10-03 08:59:27,155 - {pytorch_tabular.tabular_model:582} - INFO - Training Started
INFO:pytorch_tabular.tabular_model:Training Started
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK

Output()

2023-10-03 09:00:09,778 - {pytorch_tabular.tabular_model:584} - INFO - Training the model completed
INFO:pytorch_tabular.tabular_model:Training the model completed
2023-10-03 09:00:09,782 - {pytorch_tabular.tabular_model:1258} - INFO - Loading the best model
INFO:pytorch_tabular.tabular_model:Loading the best model
  rank_zero_deprecation(


<pytorch_lightning.trainer.trainer.Trainer at 0x78cc6a53c100>

In [193]:
from sklearn.metrics import mean_squared_error, r2_score

test_df = df[df["year"] == 2022]

#getting our prediction for 2022
pred_df_2022 = tabular_model.predict(test = test_df)
y = pred_df_2022["resale_price"]
pred = pred_df_2022["resale_price_prediction"]

#calculating R-squared
r_squared = r2_score(y, pred)
print("R-squared: ", r_squared)

Output()

R-squared:  0.43884718152026214


> Evaluate your model from B1 on data from year 2023 and report the test R2.

In [194]:
# TODO: Enter your code here

test_df = df[df["year"] == 2023]

#getting our prediction for 2023
pred_df_2023 = tabular_model.predict(test = test_df)
y = pred_df_2023["resale_price"]
pred = pred_df_2023["resale_price_prediction"]

#calculating R-squared
r_squared = r2_score(y, pred)
print("R-squared: ", r_squared)

Output()

R-squared:  0.16212581893466183


> Did model degradation occur for the deep learning model?


Yes model degradation did occur as we see the R-squared value drop significantly when predicting for year 2022 and 2023.



---



---



Model degradation could be caused by [various data distribution shifts](https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html#data-shift-types): covariate shift (features), label shift and/or concept drift (altered relationship between features and labels).
There are various conflicting terminologies in the [literature](https://www.sciencedirect.com/science/article/pii/S0950705122002854#tbl1). Let’s stick to this reference for this assignment.

> Using the **Alibi Detect** library, apply the **TabularDrift** function with the training data (year 2019 and before) used as the reference and **detect which features have drifted** in the 2023 test dataset. Before running the statistical tests, ensure you **sample 1000 data points** each from the train and test data. Do not use the whole train/test data. (Hint: use this example as a guide https://docs.seldon.io/projects/alibi-detect/en/stable/examples/cd_chi2ks_adult.html)


In [195]:
from alibi_detect.cd import TabularDrift

train_df = df[df["year"] <= 2019].drop("year", axis=1)
test_df = df[df["year"] == 2023].drop("year", axis=1)


#sampling 1000 from train and test, we remove the target col (resale_price)
train_df_1000 = train_df.sample(1000, random_state = 1).drop("resale_price", axis=1)
test_df_1000 = test_df.sample(1000, random_state = 2).drop("resale_price", axis=1)


#creating our dictionary for index of categories
categorical_cols=["month", "town", "flat_model_type", "storey_range"]
categories_per_feature = {}
for i in categorical_cols:
  idx = train_df_1000.columns.get_loc(i)
  categories_per_feature[idx] = None


#initialising the detector
cd = TabularDrift(train_df_1000.values, p_val=0.05, categories_per_feature=categories_per_feature)


#detecting drift in test data
fpreds = cd.predict(test_df_1000.values, drift_type='feature')
labels = ['No!', 'Yes!']
feature_names = [f for f in train_df_1000.columns]

for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = fpreds['data']['is_drift'][f]
    stat_val, p_val = fpreds['data']['distance'][f], fpreds['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')


month -- Drift? Yes! -- Chi2 460.854 -- p-value 0.000
town -- Drift? Yes! -- Chi2 43.199 -- p-value 0.013
full_address -- Drift? No! -- K-S 0.058 -- p-value 0.066
nearest_stn -- Drift? No! -- K-S 0.044 -- p-value 0.279
dist_to_nearest_stn -- Drift? Yes! -- K-S 0.082 -- p-value 0.002
dist_to_dhoby -- Drift? Yes! -- K-S 0.069 -- p-value 0.016
degree_centrality -- Drift? No! -- K-S 0.028 -- p-value 0.817
eigenvector_centrality -- Drift? No! -- K-S 0.053 -- p-value 0.116
flat_model_type -- Drift? Yes! -- Chi2 82.560 -- p-value 0.000
remaining_lease_years -- Drift? Yes! -- K-S 0.149 -- p-value 0.000
floor_area_sqm -- Drift? No! -- K-S 0.060 -- p-value 0.052
storey_range -- Drift? No! -- Chi2 18.672 -- p-value 0.229


> Assuming that the flurry of housing measures have made an impact on the relationship between all the features and resale_price (i.e. P(Y|X) changes), which type of data distribution shift possibly led to model degradation?


Since P(Y|X) has changed but P(X) remains the same, it is likely that <b>"Concept Drift"</b> has led to model drgradation.



> From your analysis via TabularDrift, which features contribute to this shift?


From my analysis, the features that have contributed to this shift are:

1. month
1. town
1. dist_to_nearest_stn
1. dist_to_dhoby
1. flat_model_type
1. remaining_lease_years


> Suggest 1 way to address model degradation and implement it, showing improved test R2 for year 2023.


1 way we can address model degradation is to perform regular model retraining. We should frequently update the model with fresh data, such as those of recent years, in order to keep the model relevant. Data distributions can change over time due to various factors, such as seasonality, evolving user preferences, or external events. A model trained on historical data may become less accurate as it encounters new, unseen patterns.

In [196]:
# TODO: Enter your code here

#since i have proposed regular model retraining, our new model would be trained together with data of more recent years
train_df = df[df["year"] <= 2021]  #initially it was <= 2019, now its <= 2021
val_df = df[df["year"] == 2022]  #we also change the validation from 2020 to 2022
test_df = df[df["year"] == 2023]


#setting up our DataConfig
data_config2 = DataConfig(
    target=["resale_price"],
    continuous_cols=["dist_to_nearest_stn", "dist_to_dhoby", "degree_centrality", "eigenvector_centrality", "remaining_lease_years", "floor_area_sqm"],
    categorical_cols=["month", "town", "flat_model_type", "storey_range"],
)


#setting up TrainerConfig
trainer_config2 = TrainerConfig(
    auto_lr_find = True,
    batch_size=1024,
    max_epochs=50,
  )


#setting up CategoryEmbeddingModelConfig
model_config2 = CategoryEmbeddingModelConfig(
    task="regression",
    layers=50,  #single hidden layer with 50 neurons
)


#setting up OptimizerConfig
optimizer_config2 = OptimizerConfig(
    optimizer="Adam",
  )


#creating our model
tabular_model2 = TabularModel(
    data_config=data_config2,
    model_config = model_config2,
    optimizer_config=optimizer_config2,
    trainer_config=trainer_config2,
)

tabular_model2.fit(train = train_df, validation = val_df)

2023-10-03 09:00:11,860 - {pytorch_tabular.tabular_model:105} - INFO - Experiment Tracking is turned off
INFO:pytorch_tabular.tabular_model:Experiment Tracking is turned off
INFO:lightning_fabric.utilities.seed:Global seed set to 42
2023-10-03 09:00:11,917 - {pytorch_tabular.tabular_model:473} - INFO - Preparing the DataLoaders
INFO:pytorch_tabular.tabular_model:Preparing the DataLoaders
2023-10-03 09:00:11,930 - {pytorch_tabular.tabular_datamodule:290} - INFO - Setting up the datamodule for regression task
INFO:pytorch_tabular.tabular_datamodule:Setting up the datamodule for regression task
2023-10-03 09:00:12,460 - {pytorch_tabular.tabular_model:521} - INFO - Preparing the Model: CategoryEmbeddingModel
INFO:pytorch_tabular.tabular_model:Preparing the Model: CategoryEmbeddingModel
2023-10-03 09:00:12,526 - {pytorch_tabular.tabular_model:268} - INFO - Preparing the Trainer
INFO:pytorch_tabular.tabular_model:Preparing the Trainer
  rank_zero_deprecation(
INFO:pytorch_lightning.trainer.c

Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

  rank_zero_warn(
  rank_zero_warn(
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_steps=100` reached.
INFO:pytorch_lightning.tuner.lr_finder:Learning rate set to 0.5754399373371567
INFO:pytorch_lightning.utilities.rank_zero:Restoring states from the checkpoint path at /content/.lr_find_ba1a1e5c-e261-4db3-8b66-2f0880c6ac81.ckpt
INFO:pytorch_lightning.utilities.rank_zero:Restored all states from the checkpoint file at /content/.lr_find_ba1a1e5c-e261-4db3-8b66-2f0880c6ac81.ckpt
2023-10-03 09:00:17,976 - {pytorch_tabular.tabular_model:575} - INFO - Suggested LR: 0.5754399373371567. For plot and detailed analysis, use `find_learning_rate` method.
INFO:pytorch_tabular.tabular_model:Suggested LR: 0.5754399373371567. For plot and detailed analysis, use `find_learning_rate` method.
2023-10-03 09:00:17,980 - {pytorch_tabular.tabular_model:582} - INFO - Training Started
INFO:pytorch_tabular.tabular_model:Training Started
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK

Output()

2023-10-03 09:01:38,626 - {pytorch_tabular.tabular_model:584} - INFO - Training the model completed
INFO:pytorch_tabular.tabular_model:Training the model completed
2023-10-03 09:01:38,630 - {pytorch_tabular.tabular_model:1258} - INFO - Loading the best model
INFO:pytorch_tabular.tabular_model:Loading the best model
  rank_zero_deprecation(


<pytorch_lightning.trainer.trainer.Trainer at 0x78cbdef26890>

In [197]:
#testing new model to see if R-squared improves (old r2 score = 0.16212581893466183)

#getting our prediction for 2023 on new model
pred_df_2023_new = tabular_model2.predict(test = test_df)
y2 = pred_df_2023_new["resale_price"]
pred2 = pred_df_2023_new["resale_price_prediction"]

#calculating R-squared
r_squared2 = r2_score(y2, pred2)
print("R-squared: ", r_squared2)

Output()

R-squared:  0.3905472678875198


As we can see, with a model thats trained on more recent data, the R-squared value increases. This thus shows the importance of regular model retraining.

### Appendix A



Here are our results from a linear regression model. We used StandardScaler for continuous variables and OneHotEncoder for categorical variables.

While 2021 data can be predicted well, test R2 dropped rapidly for 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| Year <= 2020 | 2021     | 0.76    |
| Year <= 2020 | **2022**     | 0.41    |
| Year <= 2020 | **2023**     | **0.10**   |



Similarly, a model trained on 2017 data can predict 2018-2021 well (with slight degradation in performance for 2021), but drops drastically in 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2017         | 2018     | 0.90    |
|              | 2019     | 0.89    |
|              | 2020     | 0.87    |
|              | 2021     | 0.72    |
|              | **2022**     | **0.37**    |
|              | **2023**     | **0.09**    |

With the test set fixed at year 2021, training on data from 2017-2020 still works well on the test data, with minimal degradation. Training sets closer to year 2021 generally do better.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2020         | 2021     | 0.81    |
| 2019         | 2021     | 0.75    |
| 2018         | 2021     | 0.73    |
| 2017         | 2021     | 0.72    |