# Question B4 (10 marks)

Model degradation is a common issue faced when deploying machine learning models (including neural networks) in the real world. New data points could exhibit a different pattern from older data points due to factors such as changes in government policy or market sentiments. For instance, housing prices in Singapore have been increasing and the Singapore government has introduced 3 rounds of cooling measures over the past years (16 December 2021, 30 September 2022, 27 April 2023).

In such situations, the distribution of the new data points could differ from the original data distribution which the models were trained on. Recall that machine learning models often work with the assumption that the test distribution should be similar to train distribution. When this assumption is violated, model performance will be adversely impacted.  In the last part of this assignment, we will investigate to what extent model degradation has occurred.




---



---



Your co-investigators used a linear regression model to rapidly test out several combinations of train/test splits and shared with you their findings in a brief report attached in Appendix A below. You wish to investigate whether your deep learning model corroborates with their findings.

In [1]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd

from alibi_detect.cd import TabularDrift

In [2]:
# Import the sklearn metrics functions
from sklearn.metrics import mean_squared_error, r2_score

# Import the torch library and the tabular module
import torch
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
    DataConfig,
    OptimizerConfig,
    TrainerConfig,
)

In [3]:
# Replicate the B1 model

# Load the dataset
df = pd.read_csv('hdb_price_prediction.csv')

# Dividing the dataset into train, validation and test sets by applying the given conditions
train_df = df[df['year'] <= 2019]  # Training data includes entries from year 2019 and before
validation_df = df[df['year'] == 2020]  # Validation data includes entries from year 2020
test_df = df[df['year'] == 2021]  # Test data includes entries from year 2021

# Define the target variable, as well as the names of the continuous and categorical variables
target = ['resale_price']  # Target variable

# Column type variables from the assignment pdf file
categorical_cols = ['month', 'town', 'flat_model_type', 'storey_range']  # Categorical columns
continuous_cols = ['dist_to_nearest_stn', 'dist_to_dhoby', 'degree_centrality', 'eigenvector_centrality', 'remaining_lease_years', 'floor_area_sqm']  # Continuous columns

# Define the data configuration
data_config = DataConfig(
    target=target,  # Target variable
    continuous_cols=continuous_cols,  # Continuous variables
    categorical_cols=categorical_cols  # Categorical variables
)

# Define the trainer configuration
trainer_config = TrainerConfig(
    auto_lr_find=True,  # Automatically tune the learning rate
    batch_size=1024,  # Set batch_size to be 1024
    max_epochs=50  # Set max_epoch as 50
)

# Define the model configuration
model_config = CategoryEmbeddingModelConfig(
    task="regression",  # Regression task
    layers="50",  # 1 hidden layer containing 50 neurons
    learning_rate=0.01  # Learning rate
)

# Define the optimiser configuration
optimizer_config = OptimizerConfig(
    optimizer="Adam"  # Choose Adam optimiser
)

# Initialise the model and put all the configs together
model = TabularModel(
    data_config=data_config,  # Data configuration
    model_config=model_config,  # Model configuration
    optimizer_config=optimizer_config,  # Optimiser configuration
    trainer_config=trainer_config  # Trainer configuration
)

# Train the model
model.fit(train=train_df, validation=validation_df)

# Test the model
test_predictions = model.predict(test_df)

# Evaluate the model
result = model.evaluate(test_df)

test_predictions

Seed set to 42


GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/saved_models exists and is not empty.
/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.5754399373371567
Restoring states from the checkpoint path at /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/.lr_find_b060cf0a-3323-41c7-8e92-c19d9616d928.ckpt
Restored all states from the checkpoint at /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/.lr_find_b060cf0a-3323-41c7-8e92-c19d9616d928.ckpt


Output()

Output()

/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.


Unnamed: 0,resale_price_prediction
87370,150961.671875
87371,177636.234375
87372,303759.500000
87373,297804.281250
87374,272824.218750
...,...
116422,567490.875000
116423,561628.375000
116424,629146.125000
116425,672985.875000


1.Evaluate your model from B1 on data from year 2022 and report the test R2.

In [4]:
df = pd.read_csv('hdb_price_prediction.csv')

# Get the data for year 2022
test_df_22 = df[df['year'] == 2022]

# Test the model
test_predictions_22 = model.predict(test_df_22)

# Define the targets and predictions
y_true = test_df_22['resale_price']
y_pred = test_predictions_22

# Calculate the R2 score
r2_22 = r2_score(y_true, y_pred)
print(f'Test R2 for year 2022: {r2_22}')

Test R2 for year 2022: 0.3750455022462872


2.Evaluate your model from B1 on data from year 2023 and report the test R2.

In [5]:
# Get the data for year 2023
test_df_23 = df[df['year'] == 2023]

# Test the model
test_predictions_23 = model.predict(test_df_23)

# Define the targets and predictions
y_true = test_df_23['resale_price']
y_pred = test_predictions_23

# Calculate the R2 score
r2_23 = r2_score(y_true, y_pred)
print(f'Test R2 for year 2023: {r2_23}')

Test R2 for year 2023: 0.08798522105802631


3.Did model degradation occur for the deep learning model?


In [6]:
model_degradation = f"""
Yes, model degradation appears to have occurred for the deep learning model. This is evident from the significant decrease in the R2 score from {r2_22:.4f} in the year 2022 to {r2_23:.4f} in the year 2023. The R2 score, which measures the proportion of the variance in the dependent variable that is predictable from the independent variables, has dropped substantially, indicating a reduction in the model's predictive accuracy or its ability to generalize to new data over time.
"""
print(f'Observation: {model_degradation}')


Observation: 
Yes, model degradation appears to have occurred for the deep learning model. This is evident from the significant decrease in the R2 score from 0.3750 in the year 2022 to 0.0880 in the year 2023. The R2 score, which measures the proportion of the variance in the dependent variable that is predictable from the independent variables, has dropped substantially, indicating a reduction in the model's predictive accuracy or its ability to generalize to new data over time.





---



---



4.Model degradation could be caused by [various data distribution shifts](https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html#data-shift-types): covariate shift (features), label shift and/or concept drift (altered relationship between features and labels).
There are various conflicting terminologies in the [literature](https://www.sciencedirect.com/science/article/pii/S0950705122002854#tbl1). Let’s stick to this reference for this assignment.

> Using the **Alibi Detect** library, apply the **TabularDrift** function with the training data (year 2019 and before) used as the reference and **detect which features have drifted** in the 2023 test dataset. Before running the statistical tests, ensure you **sample 1000 data points** each from the train and test data. Do not use the whole train/test data. (Hint: use this example as a guide https://docs.seldon.io/projects/alibi-detect/en/stable/examples/cd_chi2ks_adult.html)


In [7]:
# Define all column names (categorical + continuous)
all_cols = categorical_cols + continuous_cols

# Define the features and target variables for the whole dataset
X = df[all_cols]
y = df['resale_price']

# Get a random sample of the train and test datasets (1000 data points), if not randomly sampled, the p-value 
train_sample = train_df.sample(n=1000, random_state=SEED)
test_sample = test_df_23.sample(n=1000, random_state=SEED)

# Define the reference variables
X_ref = train_sample[all_cols].values
y_ref = train_sample[target].values

# Define the test variables
X_test = test_sample[all_cols].values
y_test = test_sample[target].values

# Create a mapping of unique categories for categorical columns
cat_map = {}
for i in range(len(X.columns)):
    if X.columns[i] in categorical_cols:
        cat_map[i] = df[X.columns[i]].unique().tolist()

# Create a dictionary to specify categories for each feature (for TabularDrift)
categories_per_feature = {f: None for f in list(cat_map.keys())}

# Initialize the TabularDrift detector with a p-value threshold
cd = TabularDrift(X_ref, p_val=.05, categories_per_feature=categories_per_feature)

# Predict drift on the test dataset
predict = cd.predict(X_test)
labels = ['No!', 'Yes!']
print('Drift? {}'.format(labels[predict['data']['is_drift']]))
print("Threshold:", predict['data']['threshold'])


# Detect and print drifted features
feature_predict = cd.predict(X_test, drift_type='feature')

for feature in range(cd.n_features):
    stat = 'Chi2' if feature in list(categories_per_feature.keys()) else 'K-S'
    fname = X.columns.values[feature]
    is_drift = feature_predict['data']['is_drift'][feature]
    stat_val, p_val = feature_predict['data']['distance'][feature], feature_predict['data']['p_val'][feature]
    print(f'{fname:<20} \t Drift? {labels[is_drift]} \t {stat:<8} {stat_val:.3f} \t p-value {p_val:.3f}')

Drift? Yes!
Threshold: 0.005
month                	 Drift? Yes! 	 Chi2     430.336 	 p-value 0.000
town                 	 Drift? No! 	 Chi2     33.178 	 p-value 0.127
flat_model_type      	 Drift? Yes! 	 Chi2     62.122 	 p-value 0.001
storey_range         	 Drift? Yes! 	 Chi2     27.842 	 p-value 0.010
dist_to_nearest_stn  	 Drift? No! 	 K-S      0.035 	 p-value 0.561
dist_to_dhoby        	 Drift? No! 	 K-S      0.059 	 p-value 0.059
degree_centrality    	 Drift? No! 	 K-S      0.038 	 p-value 0.455
eigenvector_centrality 	 Drift? No! 	 K-S      0.056 	 p-value 0.084
remaining_lease_years 	 Drift? Yes! 	 K-S      0.163 	 p-value 0.000
floor_area_sqm       	 Drift? Yes! 	 K-S      0.062 	 p-value 0.041


5.Assuming that the flurry of housing measures have made an impact on the relationship between all the features and resale_price (i.e. P(Y|X) changes), which type of data distribution shift possibly led to model degradation?


In [8]:
observation_q5 = """
The analysis shows significant changes in how features like the month, town, type of flat, and distance to important locations relate to housing resale prices in 2023 compared to the training data. This situation, known as concept drift, happens when the relationship between features P(Y|X) (like location, flat type) and the outcome (resale price) changes over time, while the probability of the features P(X) stays similar. In simpler terms, what used to predict housing prices well doesn't work the same way anymore, likely due to new housing policies. Compared to data drift, where the distribution of the features P(X) changes, concept drift is more likely to have led to model degradation in this case. 
"""
print(f'Observation: {observation_q5}')



Observation: 
The analysis shows significant changes in how features like the month, town, type of flat, and distance to important locations relate to housing resale prices in 2023 compared to the training data. This situation, known as concept drift, happens when the relationship between features P(Y|X) (like location, flat type) and the outcome (resale price) changes over time, while the probability of the features P(X) stays similar. In simpler terms, what used to predict housing prices well doesn't work the same way anymore, likely due to new housing policies. Compared to data drift, where the distribution of the features P(X) changes, concept drift is more likely to have led to model degradation in this case. 



6.From your analysis via TabularDrift, which features contribute to this shift?


In [9]:
observation_q6 = """
As the concept of concept drift suggests, the relationship between features and the target variable has changed, while the distribution of the features remains similar. The features that contributed to this shift should be the ones that have not changed their distribution. The feature that did not change its distribution in the 'degree_centrality' feature.
"""
print(f'Observation: {observation_q6}')

Observation: 
As the concept of concept drift suggests, the relationship between features and the target variable has changed, while the distribution of the features remains similar. The features that contributed to this shift should be the ones that have not changed their distribution. The feature that did not change its distribution in the 'degree_centrality' feature.



7.Suggest 1 way to address model degradation and implement it, showing improved test R2 for year 2023.


In [10]:
observation_q7 = """
As we have identified concept drift as the main cause of model degradation, we can address it by retraining the model on the new data. This will help the model learn the new relationships between the features and the target variable. In our context, we can retrain the model on data from 2022 and before, and test it on data from 2023. In this way, the model will be able to capture the new patterns in the data and improve its predictive accuracy because the 2020-2022 data will likely be more similar to the 2023 data than the 2019 data.
"""
print(f'Observation: {observation_q7}')

Observation: 
As we have identified concept drift as the main cause of model degradation, we can address it by retraining the model on the new data. This will help the model learn the new relationships between the features and the target variable. In our context, we can retrain the model on data from 2022 and before, and test it on data from 2023. In this way, the model will be able to capture the new patterns in the data and improve its predictive accuracy because the 2020-2022 data will likely be more similar to the 2023 data than the 2019 data.



In [11]:
# Previously, we trained the model on data from 2019 and before. We will now train the model on data from 2022 and before, and test it on data from 2023.
df_train_22 = df[df['year'] <= 2022]

# Train the model on the new training data
model.fit(train=df_train_22)

# Test the model on the 2023 data
test_predictions_23 = model.predict(test_df_23)

# Evaluate the model
result = model.evaluate(test_df_23)

Seed set to 42


GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/saved_models exists and is not empty.
/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.
/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.


Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.6918309709189363
Restoring states from the checkpoint path at /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/.lr_find_36b74edf-d9a0-4b1b-a72f-1f503256cfac.ckpt
Restored all states from the checkpoint at /Users/kristiyancholakov/Programming/PyCharmProjects/SC4001/Programming Assignment/.lr_find_36b74edf-d9a0-4b1b-a72f-1f503256cfac.ckpt


Output()

Output()

/opt/anaconda3/envs/sc4001_assignment/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=9` in the `DataLoader` to improve performance.


In [12]:
# Compute the R2 score

# Define the targets and predictions
y_true = test_df_23['resale_price']
y_pred = test_predictions_23

print(f"Test R2 on the 2023 data (<2022 training): {r2_score(y_true, y_pred)}")

Test R2 on the 2023 data (<2022 training): 0.5259192150847298


In [13]:
observation_q7_2 = f"""
As we can see the R2 score has improved from {r2_23:.4f} to {r2_score(y_true, y_pred):.4f} after retraining the model on the new data (containing the more recent data). This suggests that retraining the model on the 2022 and before data has helped the model capture the new patterns in the data and improve its predictive accuracy.
"""
print(f'Observation: {observation_q7_2}')

Observation: 
As we can see the R2 score has improved from 0.0880 to 0.5259 after retraining the model on the new data (containing the more recent data). This suggests that retraining the model on the 2022 and before data has helped the model capture the new patterns in the data and improve its predictive accuracy.



### Appendix A



Here are our results from a linear regression model. We used StandardScaler for continuous variables and OneHotEncoder for categorical variables.

While 2021 data can be predicted well, test R2 dropped rapidly for 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| Year <= 2020 | 2021     | 0.76    |
| Year <= 2020 | **2022**     | 0.41    |
| Year <= 2020 | **2023**     | **0.10**   |



Similarly, a model trained on 2017 data can predict 2018-2021 well (with slight degradation in performance for 2021), but drops drastically in 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2017         | 2018     | 0.90    |
|              | 2019     | 0.89    |
|              | 2020     | 0.87    |
|              | 2021     | 0.72    |
|              | **2022**     | **0.37**    |
|              | **2023**     | **0.09**    |

With the test set fixed at year 2021, training on data from 2017-2020 still works well on the test data, with minimal degradation. Training sets closer to year 2021 generally do better.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2020         | 2021     | 0.81    |
| 2019         | 2021     | 0.75    |
| 2018         | 2021     | 0.73    |
| 2017         | 2021     | 0.72    |