# Question B4 (10 marks)

Model degradation is a common issue faced when deploying machine learning models (including neural networks) in the real world. New data points could exhibit a different pattern from older data points due to factors such as changes in government policy or market sentiments. For instance, housing prices in Singapore have been increasing and the Singapore government has introduced 3 rounds of cooling measures over the past years (16 December 2021, 30 September 2022, 27 April 2023).

In such situations, the distribution of the new data points could differ from the original data distribution which the models were trained on. Recall that machine learning models often work with the assumption that the test distribution should be similar to train distribution. When this assumption is violated, model performance will be adversely impacted.  In the last part of this assignment, we will investigate to what extent model degradation has occurred.




---



---



Your co-investigators used a linear regression model to rapidly test out several combinations of train/test splits and shared with you their findings in a brief report attached in Appendix A below. You wish to investigate whether your deep learning model corroborates with their findings.

In [None]:
!pip install alibi-detect

In [None]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd

from alibi_detect.cd import TabularDrift

1.Evaluate your model from B1 on data from year 2022 and report the test R2.

In [None]:
df = pd.read_csv('hdb_price_prediction.csv')

# TODO: Enter your code here
import pytorch_tabular
from sklearn.metrics import r2_score

model = pytorch_tabular.tabular_model.TabularModel.load_model('models/b1')

test_data = df[df['year'] == 2022]
predictions = model.predict(test_data)

print("R2: ", r2_score(test_data['resale_price'], predictions['resale_price_prediction']))

2.Evaluate your model from B1 on data from year 2023 and report the test R2.

In [None]:
# TODO: Enter your code here
test_data = df[df['year'] == 2023]
predictions = model.predict(test_data)

print("R2: ", r2_score(test_data['resale_price'], predictions['resale_price_prediction']))

3.Did model degradation occur for the deep learning model?


In [None]:
# YOUR ANSWER HERE
"""

"""



---



---



4.Model degradation could be caused by [various data distribution shifts](https://huyenchip.com/2022/02/07/data-distribution-shifts-and-monitoring.html#data-shift-types): covariate shift (features), label shift and/or concept drift (altered relationship between features and labels).
There are various conflicting terminologies in the [literature](https://www.sciencedirect.com/science/article/pii/S0950705122002854#tbl1). Let’s stick to this reference for this assignment.

> Using the **Alibi Detect** library, apply the **TabularDrift** function with the training data (year 2019 and before) used as the reference and **detect which features have drifted** in the 2023 test dataset. Before running the statistical tests, ensure you **sample 1000 data points** each from the train and test data. Do not use the whole train/test data. (Hint: use this example as a guide https://docs.seldon.io/projects/alibi-detect/en/stable/examples/cd_chi2ks_adult.html)


In [2]:
# YOUR CODE HERE
target = ["resale_price"]
categorical_cols = ["month", "town", "flat_model_type", "storey_range"]
continuous_cols = [
    "dist_to_nearest_stn",
    "dist_to_dhoby",
    "degree_centrality",
    "eigenvector_centrality",
    "remaining_lease_years",
    "floor_area_sqm",
]

# Extract unique categories for each categorical column
category_map = {}
for i, col in enumerate(categorical_cols):
    category_map[i] = df[col].unique().tolist()

n_train = 1000
n_test = 1000

X_train = df[df["year"] <= 2019]
X_test = df[df["year"] == 2023]

X_train = X_train[:n_train] # Sample from the dataset
X_test = X_test[:n_test] # Sample from the dataset

X_ref = X_train[categorical_cols + continuous_cols].values
X_test = X_test[categorical_cols + continuous_cols].values

y_ref = X_train[target].values
y_test = df[df["year"] == 2023][target].values[:n_test]

categories_per_feature = {f: None for f in list(category_map.keys())}
cd = TabularDrift(
    X_ref, p_val=0.05, categories_per_feature=categories_per_feature
)

predictions = cd.predict(X_test)
labels = ['No','Yes']
print('Drift? {}'.format(labels[predictions['data']['is_drift']]))

fpreds = cd.predict(X_test, drift_type='feature')
feature_names = categorical_cols + continuous_cols

for f in range(cd.n_features):
    stat = 'Chi2' if f in list(categories_per_feature.keys()) else 'K-S'
    fname = feature_names[f]
    is_drift = fpreds['data']['is_drift'][f]
    stat_val, p_val = fpreds['data']['distance'][f], fpreds['data']['p_val'][f]
    print(f'{fname} -- Drift? {labels[is_drift]} -- {stat} {stat_val:.3f} -- p-value {p_val:.3f}')

5.Assuming that the flurry of housing measures have made an impact on the relationship between all the features and resale_price (i.e. P(Y|X) changes), which type of data distribution shift possibly led to model degradation?


In [3]:
# YOUR ANSWER HERE
"""
Concept Drift. If the housing measures change the relationship between all the features and the resale_price, this would impact the conditions by which X impacts Y, thereby changing P(Y|X). Hence, despite the fact that our input features selected (X) for the model did not change, changes in the 'conditions' of the housing market still impact the resale price (Y) of the house. Thereby reflecting a case of Concept Drift which possibly led to model degradation.
"""

6.From your analysis via TabularDrift, which features contribute to this shift?


In [5]:
# YOUR ANSWER HERE
[
    "town",
    "flat_model_type",
    "storey_range",
    "dist_to_dhoby",
    "eigenvector_centrality",
    "remaining_lease_years",
    "floor_area_sqm"
]

7.Suggest 1 way to address model degradation and implement it, showing improved test R2 for year 2023.


In [None]:
# YOUR CODE HERE
# We can try to train the model on the reference year in appendix A where the
# data is likely representative of the data in 2023.
model.fit(train=df[(df['year'] >= 2022) & (df['year'] < 2023)])

test_data = df[df['year'] == 2023]

predictions = model.predict(test_data)

print("R2: ", r2_score(test_data['resale_price'], predictions['resale_price_prediction']))

### Appendix A



Here are our results from a linear regression model. We used StandardScaler for continuous variables and OneHotEncoder for categorical variables.

While 2021 data can be predicted well, test R2 dropped rapidly for 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| Year <= 2020 | 2021     | 0.76    |
| Year <= 2020 | **2022**     | 0.41    |
| Year <= 2020 | **2023**     | **0.10**   |



Similarly, a model trained on 2017 data can predict 2018-2021 well (with slight degradation in performance for 2021), but drops drastically in 2022 and 2023.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2017         | 2018     | 0.90    |
|              | 2019     | 0.89    |
|              | 2020     | 0.87    |
|              | 2021     | 0.72    |
|              | **2022**     | **0.37**    |
|              | **2023**     | **0.09**    |

With the test set fixed at year 2021, training on data from 2017-2020 still works well on the test data, with minimal degradation. Training sets closer to year 2021 generally do better.

| Training set | Test set | Test R2 |
|--------------|----------|---------|
| 2020         | 2021     | 0.81    |
| 2019         | 2021     | 0.75    |
| 2018         | 2021     | 0.73    |
| 2017         | 2021     | 0.72    |