## Topic 1: Data Preprocessing & Evaluation


### 1.1 Evaluation Metrics (MAPE) (Solution for Q1.1)
Key Concept: RMSE is standard. MAPE is tricky because actual ratings can be 0 (division by zero error). We handle this with an epsilon or by masking.

In [24]:
import numpy as np

def MAPE(y_true, y_pred):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100.0
    return mape
actual_ratings = np.array([4.0, 3.0, 5.0, 2.0, 4.5])
predicted_ratings = np.array([3.8, 3.2, 4.8, 2.2, 4.2])

print("MAPE:", MAPE(actual_ratings, predicted_ratings))


MAPE: 6.4666666666666694


### (Solution for Q1.2)

In [26]:
import numpy as np
import pandas as pd

def MAPE_ignore_zero(y_true, y_pred):
    y_true = np.asarray(y_true, dtype=float)
    y_pred = np.asarray(y_pred, dtype=float)
    mask = (y_true != 0)
    return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100.0

actual_ratings = np.array([4.0, 3.0, 5.0, 2.0, 4.5, 0.0])
predicted_ratings = np.array([3.8, 3.2, 4.8, 2.2, 4.2, 1.0])
print("Tolerant MAPE (ignore zeros):", MAPE_ignore_zero(actual_ratings, predicted_ratings))




Tolerant MAPE (ignore zeros): 6.4666666666666694


### 1.2 Data Loading & ID Encoding
Key Concept: Recommender systems (SVD/Neural) require User and Item IDs to be continuous integers starting from 0. We use LabelEncoder for this.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# 1. Load Data (Example)
# df = pd.read_csv('filepath.csv')
# Assume df has columns: 'user_id', 'item_id', 'rating'

# 2. Encode IDs (CRITICAL STEP)
# We combine train and test sets to ensure the encoder knows ALL users/items
user_le = LabelEncoder()
all_users = df['user_id'].unique()
user_le.fit(all_users)
df['user_id'] = user_le.transform(df['user_id'])
n_users = len(user_le.classes_)

item_le = LabelEncoder()
all_items = df['item_id'].unique()
item_le.fit(all_items)
df['item_id'] = item_le.transform(df['item_id'])
n_items = len(item_le.classes_)

# 3. Split Data
X = df[['user_id', 'item_id']].values
y = df['rating'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

NameError: name 'df' is not defined

## Topic 2: Baseline Methods

### 2.1 Median-Based Recommender (Solution for Q2.1)
Key Concept: Instead of np.mean, we use np.median which is robust to outliers.

In [27]:

train_url = 'https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/train.csv'
test_url  = 'https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/test.csv'

In [41]:
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator
from sklearn.metrics import mean_squared_error

# ======================================================
# Load dataset
# ======================================================
train = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/train.csv')
test = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/test.csv')

# ======================================================
# 1. USER MEDIAN RECOMMENDER
# ======================================================

class UserMedianRS(BaseEstimator):

    def fit(self, train_df):
        self.user_median = train_df.groupby("user_id")["rating"].median().to_dict()
        self.global_median = train_df["rating"].median()
        return self

    def predict(self, df):
        preds = []
        for uid in df["user_id"]:
            if uid in self.user_median:
                preds.append(self.user_median[uid])
            else:
                preds.append(self.global_median)
        return np.array(preds)


# ======================================================
# 2. ITEM MEDIAN RECOMMENDER
# ======================================================

class ItemMedianRS(BaseEstimator):

    def fit(self, train_df):
        self.item_median = train_df.groupby("movie_id")["rating"].median().to_dict()
        self.global_median = train_df["rating"].median()
        return self

    def predict(self, df):
        preds = []
        for mid in df["movie_id"]:
            if mid in self.item_median:
                preds.append(self.item_median[mid])
            else:
                preds.append(self.global_median)
        return np.array(preds)


# ======================================================
# 3. FITTING + PREDICTING + RMSE
# ======================================================

def rmse(true, pred):
    return np.sqrt(mean_squared_error(true, pred))


# ----- User Median -----
user_model = UserMedianRS()
user_model.fit(train)

user_pred = user_model.predict(test)
user_rmse = rmse(test["rating"], user_pred)

# ----- Item Median -----
item_model = ItemMedianRS()
item_model.fit(train)

item_pred = item_model.predict(test)
item_rmse = rmse(test["rating"], item_pred)

print(f"User Median RS RMSE: {user_rmse:.4f}")
print(f"Item Median RS RMSE: {item_rmse:.4f}")


User Median RS RMSE: 1.0840
Item Median RS RMSE: 1.1090


### 2.2 Hybrid Baseline (Average of User & Item Models) (Solution for Q2.2)
Key Concept: Combine predictions from two models to reduce variance.

In [42]:
import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator
from sklearn.metrics import mean_squared_error

# ======================================================
# Load dataset
# ======================================================
train = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/train.csv')
test = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/test.csv')

# ======================================================
# 1. Average User–Item Median Recommender
# ======================================================

class AveUserItemMedianRS(BaseEstimator):

    def fit(self, train_df):
        self.user_median = train_df.groupby("user_id")["rating"].median().to_dict()
        self.item_median = train_df.groupby("movie_id")["rating"].median().to_dict()
        self.global_median = train_df["rating"].median()

        return self


    def predict(self, df):
        preds = []

        for uid, mid in zip(df["user_id"], df["movie_id"]):
            if uid in self.user_median:
                u_med = self.user_median[uid]
            else:
                u_med = self.global_median

            if mid in self.item_median:
                i_med = self.item_median[mid]
            else:
                i_med = self.global_median

            preds.append(0.5 * (u_med + i_med))

        return np.array(preds)


# ======================================================
# 2. RMSE Helper
# ======================================================

def rmse(true, pred):
    return np.sqrt(mean_squared_error(true, pred))


# ======================================================
# 3. FIT, PREDICT, EVALUATE
# ======================================================

model = AveUserItemMedianRS()
model.fit(train)

pred = model.predict(test)
score = rmse(test["rating"], pred)

print(f"AveUserItemMedianRS RMSE: {score:.4f}")


AveUserItemMedianRS RMSE: 1.0032


## Topic 3: Matrix Factorization (SVD)

### 3.1 Standard SVD (ALS with Ridge) (Solution for Q3.1)
Key Concept: Alternating Least Squares (ALS). Fix User Matrix $P$, solve Item Matrix $Q$. Fix $Q$, solve $P$.

In [43]:
!wget -O TabRS.py https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/src/TabRS.py
from TabRS import SVD


--2025-11-26 06:00:21--  https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/src/TabRS.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9001 (8.8K) [text/plain]
Saving to: ‘TabRS.py’


2025-11-26 06:00:21 (18.8 MB/s) - ‘TabRS.py’ saved [9001/9001]



In [45]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from TabRS import SVD

def rmse(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))


train = pd.read_csv("https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/train.csv")
test = pd.read_csv("https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/test.csv")

X_train = train[["user_id", "movie_id"]].values
y_train = train["rating"].values

X_test  = test[["user_id", "movie_id"]].values
y_test  = test["rating"].values


# ======================================================
# Initialize SVD Model
# ======================================================
n_users = train["user_id"].max() + 1
n_items = train["movie_id"].max() + 1

model = SVD(
    n_users=n_users,
    n_items=n_items,
    K=10,
    lam=0.02,
    iterNum=10,
    verbose=1
)

# ======================================================
# Fit SVD
# ======================================================
model.fit(X_train, y_train)

# ======================================================
# Predict on test set
# ======================================================
pred = model.predict(X_test)

# ======================================================
# Compute RMSE
# ======================================================
score = rmse(y_test, pred)

print(f"Reg-SVD Test RMSE: {score:.4f}")


Fitting Reg-SVD: K: 10, lam: 0.02000
RegSVD-ALS: 0; obj: 1.307; rmse:1.138, diff: 1140.396
RegSVD-ALS: 1; obj: 0.762; rmse:0.873, diff: 0.545
RegSVD-ALS: 2; obj: 0.760; rmse:0.872, diff: 0.003
RegSVD-ALS: 3; obj: 0.759; rmse:0.871, diff: 0.000
RegSVD-ALS: 4; obj: 0.759; rmse:0.871, diff: 0.000
Reg-SVD Test RMSE: 0.9711


### 3.2 Overfitting vs Underfitting (Solution for Q3.2)

Concept: If Training RMSE drops significantly but Validation RMSE remains high, the model is overfitting.

Solution: Increase Regularization ($\lambda$)  / Decrease Complexity (Lower $K$) (Maybe but K = 3 is already small).

### 3.3 Hyperparameter Tuning (Solution for Q3.3)

In [46]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator, RegressorMixin

# -------------------------------------------------------------------
# Load SVD implementation
# -------------------------------------------------------------------
!wget -O TabRS.py https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/src/TabRS.py
from TabRS import SVD


# RMSE scorer for GridSearchCV
def rmse_score(y_true, y_pred):
    return np.sqrt(mean_squared_error(y_true, y_pred))


# -------------------------------------------------------------------
# Synthetic dataset
# -------------------------------------------------------------------
data = {
    'user_id': [0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10],
    'item_id': [0,2,1,2,1,3,1,3,2,3,2,3,4,5,4,5,6,7,6,7,8,9],
    'rating':  [3,5,5,3,4,2,1,3,4,5,2,3,3,4,4,5,2,3,3,4,4,5]
}
df = pd.DataFrame(data)

X = df[['user_id','item_id']].values
y = df['rating'].values

n_users = df.user_id.max()+1
n_items = df.item_id.max()+1


# -------------------------------------------------------------------
# Wrapper class to make SVD compatible with GridSearchCV
# -------------------------------------------------------------------
class SVD_Wrapper(BaseEstimator, RegressorMixin):
    def __init__(self, K=10, lam=0.01):
        self.K = K
        self.lam = lam

    def fit(self, X, y):
        self.model = SVD(
            n_users=n_users,
            n_items=n_items,
            K=self.K,
            lam=self.lam,
            iterNum=10,
            verbose=0
        )
        self.model.fit(X, y)
        return self

    def predict(self, X):
        return self.model.predict(X)


# -------------------------------------------------------------------
# Hyperparameter Grid
# -------------------------------------------------------------------
param_grid = {
    'K': [2, 5, 10],
    'lam': [0.01, 0.03, 0.05]
}

grid = GridSearchCV(
    SVD_Wrapper(),
    param_grid,
    cv=3,
    scoring='neg_root_mean_squared_error'
)

# -------------------------------------------------------------------
# Fit GridSearchCV
# -------------------------------------------------------------------
grid.fit(X, y)

print("Best params:", grid.best_params_)
print("Best RMSE:", -grid.best_score_)

print("\nCV Results:")
cv_results = pd.DataFrame(grid.cv_results_)
print(cv_results[['param_K','param_lam','mean_test_score']])


--2025-11-26 06:05:48--  https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/src/TabRS.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9001 (8.8K) [text/plain]
Saving to: ‘TabRS.py’


2025-11-26 06:05:48 (86.8 MB/s) - ‘TabRS.py’ saved [9001/9001]

Best params: {'K': 2, 'lam': 0.05}
Best RMSE: 1.1006769121106124

CV Results:
   param_K  param_lam  mean_test_score
0        2       0.01        -1.352196
1        2       0.03        -1.631848
2        2       0.05        -1.100677
3        5       0.01        -1.286755
4        5       0.03        -1.438986
5        5       0.05        -1.262419
6       10       0.01        -1.426531
7       10       0.03        -1.391376
8       10       0.05        -1.189849


### 3.4 Huber SVD (Solution for Q3.4)
Key Concept: Replace Ridge with HuberRegressor to handle outliers (noise) better.

In [50]:
import numpy as np
from sklearn.linear_model import HuberRegressor, Ridge

class Huber_SVD:

    def __init__(self, n_users, n_items, K=3, lam=0.1, delta=1.35, n_iter=10):
        self.n_users = n_users
        self.n_items = n_items
        self.K = K
        self.lam = lam
        self.delta = delta
        self.n_iter = n_iter

    def fit(self, X, y):

        # initialize latent factors
        self.P = 0.1 * np.random.randn(self.n_users, self.K)
        self.Q = 0.1 * np.random.randn(self.n_items, self.K)

        # build user/item index lists
        self.user_ratings = {}
        self.item_ratings = {}

        for (u, i), r in zip(X, y):
            self.user_ratings.setdefault(u, []).append((i, r))
            self.item_ratings.setdefault(i, []).append((u, r))

        # ALS
        for it in range(self.n_iter):

            # ---- Update user factors ----
            for u in range(self.n_users):

                if u not in self.user_ratings:
                    continue

                items = [i for (i, _) in self.user_ratings[u]]
                ratings = np.array([r for (_, r) in self.user_ratings[u]])

                Q_sub = self.Q[items]

                # Huber regression
                huber = HuberRegressor(
                    epsilon=self.delta,
                    alpha=self.lam,
                    fit_intercept=False,
                    max_iter=200,
                    warm_start=True
                )

                try:
                    huber.fit(Q_sub, ratings)
                    self.P[u] = huber.coef_
                except:
                    ridge = Ridge(alpha=self.lam, fit_intercept=False)
                    ridge.fit(Q_sub, ratings)
                    self.P[u] = ridge.coef_

            # ---- Update item factors ----
            for i in range(self.n_items):

                if i not in self.item_ratings:
                    continue

                users = [u for (u, _) in self.item_ratings[i]]
                ratings = np.array([r for (_, r) in self.item_ratings[i]])

                P_sub = self.P[users]

                huber = HuberRegressor(
                    epsilon=self.delta,
                    alpha=self.lam,
                    fit_intercept=False,
                    max_iter=200,
                    warm_start=True
                )

                try:
                    huber.fit(P_sub, ratings)
                    self.Q[i] = huber.coef_
                except:
                    ridge = Ridge(alpha=self.lam, fit_intercept=False)
                    ridge.fit(P_sub, ratings)
                    self.Q[i] = ridge.coef_

            print(f"Iteration {it+1}/{self.n_iter} done.")

        return self

    def predict(self, X):
        preds = []
        for u, i in X:
            preds.append(np.dot(self.P[u], self.Q[i]))
        return np.array(preds)


In [53]:
# Load Netflix dataset
train = pd.read_csv("https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/train.csv")
test  = pd.read_csv("https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/netflix/test.csv")

X_train = train[['user_id','movie_id']].values
y_train = train['rating'].values

X_test = test[['user_id','movie_id']].values
y_test = test['rating'].values

n_users = train.user_id.max()+1
n_items = train.movie_id.max()+1


# -----------------------------------------------------
# (1) λ=0.1, K=3, δ=1.35
# -----------------------------------------------------
model1 = Huber_SVD(n_users, n_items, K=3, lam=0.1, delta=1.35, n_iter=3)
model1.fit(X_train, y_train)
pred1 = model1.predict(X_test)
print("RMSE (λ=0.1, K=3, δ=1.35):", rmse(y_test, pred1))


# -----------------------------------------------------
# (2) λ=0.3, K=5, δ=1.5
# -----------------------------------------------------
model2 = Huber_SVD(n_users, n_items, K=5, lam=0.3, delta=1.5, n_iter=3)
model2.fit(X_train, y_train)
pred2 = model2.predict(X_test)
print("RMSE (λ=0.3, K=5, δ=1.5):", rmse(y_test, pred2))


Iteration 1/3 done.
Iteration 2/3 done.
Iteration 3/3 done.
RMSE (λ=0.1, K=3, δ=1.35): 1.5267917045891513
Iteration 1/3 done.
Iteration 2/3 done.
Iteration 3/3 done.
RMSE (λ=0.3, K=5, δ=1.5): 1.2830423596328324


## Topic 4: Neural Recommender Systems

In [55]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/refs/heads/main/dataset/udemy/udemy_clean.csv')

### 4.1 Preprocessing Side Info (Solution for Q4.1)
Key Concept: Neural nets can take more than just ID. We need to scale dense features (age, time) and encode categorical ones (genre, gender).

In [56]:
# Categorical and dense features
cat_features = ['Instructor', 'Level']
dense_features = ['User_vote', 'Total_hours', 'Lecture']

from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler

label_encoders = {}

for col in cat_features:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col].astype(str))
    label_encoders[col] = le

scaler = StandardScaler()
df[dense_features] = scaler.fit_transform(df[dense_features])


### 4.2 Neural Architecture (PlainRS) (Solution for Q4.2)




In [58]:
import tensorflow as tf
from tensorflow.keras import layers, Model

class PlainRS(Model):
    def __init__(self, n_instructor, n_level, dense_dim):
        super().__init__()

        # Embeddings
        self.instructor_emb = layers.Embedding(
            input_dim=n_instructor,
            output_dim=50,
            embeddings_initializer='uniform'
        )

        self.level_emb = layers.Embedding(
            input_dim=n_level,
            output_dim=30,
            embeddings_initializer='uniform'
        )

        # Dense features: (User_vote, Total_hours, Lecture)
        self.dense_input = layers.Dense(dense_dim)

        # Combined layers
        self.concat = layers.Concatenate()
        self.hidden = layers.Dense(64, activation='relu')
        self.out_layer = layers.Dense(1)

    def call(self, inputs):
        instructor_id, level_id, dense_feats = inputs

        ins_vec = self.instructor_emb(instructor_id)
        lvl_vec = self.level_emb(level_id)

        x = self.concat([ins_vec, lvl_vec, dense_feats])
        x = self.hidden(x)
        out = self.out_layer(x)
        return out


In [59]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/main/dataset/udemy/udemy_clean.csv')

# Categorical encoders
ins_enc = LabelEncoder()
lvl_enc = LabelEncoder()

df['Instructor_enc'] = ins_enc.fit_transform(df['Instructor'])
df['Level_enc'] = lvl_enc.fit_transform(df['Level'])

# Dense features
dense_feats = ['User_vote', 'Total_hours', 'Lecture']
scaler = StandardScaler()
df[dense_feats] = scaler.fit_transform(df[dense_feats])

# Train-test split
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# Prepare TF tensors
X_train = (
    train_df['Instructor_enc'].values,
    train_df['Level_enc'].values,
    train_df[dense_feats].values
)
y_train = train_df['Rating'].values

X_test = (
    test_df['Instructor_enc'].values,
    test_df['Level_enc'].values,
    test_df[dense_feats].values
)
y_test = test_df['Rating'].values


In [60]:
model = PlainRS(
    n_instructor=df['Instructor_enc'].nunique(),
    n_level=df['Level_enc'].nunique(),
    dense_dim=len(dense_feats)
)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-3),
    loss='mse',
    metrics=[tf.keras.losses.MeanAbsoluteError(), tf.keras.metrics.RootMeanSquaredError()]
)

callback = tf.keras.callbacks.EarlyStopping(
    monitor='val_root_mean_squared_error',
    patience=5,
    restore_best_weights=True
)

history = model.fit(
    X_train, y_train,
    validation_split=0.1,
    epochs=200,
    batch_size=128,
    callbacks=[callback],
    verbose=1
)


Epoch 1/200




[1m56/56[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 12ms/step - loss: 13.3214 - mean_absolute_error: 3.5229 - root_mean_squared_error: 3.6285 - val_loss: 0.7471 - val_mean_absolute_error: 0.6545 - val_root_mean_squared_error: 0.8644
Epoch 2/200
[1m56/56[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - loss: 0.6393 - mean_absolute_error: 0.6043 - root_mean_squared_error: 0.7938 - val_loss: 0.2964 - val_mean_absolute_error: 0.4344 - val_root_mean_squared_error: 0.5444
Epoch 3/200
[1m56/56[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - loss: 0.2475 - mean_absolute_error: 0.3793 - root_mean_squared_error: 0.4973 - val_loss: 0.2309 - val_mean_absolute_error: 0.3839 - val_root_mean_squared_error: 0.4806
Epoch 4/200
[1m56/56[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - loss: 0.1707 - mean_absolute_error: 0.3142 - root_mean_squared_error: 0.4131 - val_loss: 0.2035 - val_mean_absolute_error: 0.3530 - val_root_mean_squared_er

In [61]:
loss, mae, rmse = model.evaluate(X_test, y_test, verbose=0)
print(f"Test MAE:  {mae:.4f}")
print(f"Test RMSE: {rmse:.4f}")


Test MAE:  0.3502
Test RMSE: 0.4553


### 4.3 Neural Architecture (DizzyRS) (Solution for Q4.3)


In [62]:
# DizzyRS implementation (TensorFlow / Keras)
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

# -----------------------
# 1) Utilities
# -----------------------
class RMSE(tf.keras.metrics.Metric):
    def __init__(self, name="rmse", **kwargs):
        super().__init__(name=name, **kwargs)
        self.sse = self.add_weight(name="sse", initializer="zeros")
        self.n = self.add_weight(name="n", initializer="zeros")
    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.cast(y_true, tf.float32)
        y_pred = tf.cast(y_pred, tf.float32)
        se = tf.reduce_sum(tf.square(y_true - y_pred))
        self.sse.assign_add(se)
        self.n.assign_add(tf.cast(tf.size(y_true), tf.float32))
    def result(self):
        return tf.sqrt(self.sse / (self.n + 1e-12))
    def reset_states(self):
        self.sse.assign(0.0); self.n.assign(0.0)

def rmse_np(y_true, y_pred):
    return np.sqrt(np.mean((y_true - y_pred)**2))

# -----------------------
# 2) Load & preprocess data (Q4 dataset)
# -----------------------
url = 'https://raw.githubusercontent.com/statmlben/CUHK-STAT3009/refs/heads/main/dataset/udemy/udemy_clean.csv'
df = pd.read_csv(url)

# features
cat_features = ['Instructor', 'Level']
dense_features = ['User_vote', 'Total_hours', 'Lecture']

# Encode categoricals
label_encoders = {}
for col in cat_features:
    le = LabelEncoder()
    df[col + '_enc'] = le.fit_transform(df[col].astype(str))
    label_encoders[col] = le

# Standardize dense features
scaler = StandardScaler()
df[dense_features] = scaler.fit_transform(df[dense_features])

# target: try common names
if 'rating' in df.columns:
    y_col = 'rating'
elif 'Rating' in df.columns:
    y_col = 'Rating'
elif 'price' in df.columns:
    y_col = 'price'
else:
    raise ValueError("Cannot find rating column; please inspect dataset.")

# prepare inputs
X_instructor = df['Instructor_enc'].values.astype('int32')
X_level = df['Level_enc'].values.astype('int32')
X_dense = df[dense_features].values.astype('float32')
y = df[y_col].values.astype('float32')

# train/val/test split
idx = np.arange(len(df))
train_idx, test_idx = train_test_split(idx, test_size=0.15, random_state=42)
train_idx, val_idx = train_test_split(train_idx, test_size=0.15, random_state=42)

def make_ds(idxs, batch=256, shuffle=False):
    d = tf.data.Dataset.from_tensor_slices((
        {
            'instr': X_instructor[idxs],
            'lvl'  : X_level[idxs],
            'dense': X_dense[idxs]
        },
        y[idxs]
    ))
    if shuffle:
        d = d.shuffle(10000, seed=42)
    d = d.batch(batch).prefetch(tf.data.AUTOTUNE)
    return d

batch_size = 256
train_ds = make_ds(train_idx, batch=batch_size, shuffle=True)
val_ds   = make_ds(val_idx, batch=batch_size, shuffle=False)
test_ds  = make_ds(test_idx, batch=batch_size, shuffle=False)

n_instructor = int(df['Instructor_enc'].nunique())
n_level = int(df['Level_enc'].nunique())
n_dense = len(dense_features)

# -----------------------
# 3) DizzyRS model
# -----------------------
from tensorflow.keras import layers, Model

class DizzyRS(Model):
    def __init__(self,
                 n_instructor,
                 n_level,
                 n_dense,
                 instr_emb_dim=50,
                 level_emb_dim_left=30,
                 level_emb_dim_right=30,
                 dense_proj_dim=32,
                 hidden_units=64,
                 dropout=0.2):
        super().__init__()
        # left branch embeddings
        self.instr_emb = layers.Embedding(input_dim=n_instructor, output_dim=instr_emb_dim, name='instr_emb')
        self.level_emb_left = layers.Embedding(input_dim=n_level, output_dim=level_emb_dim_left, name='level_emb_left')
        # dense projection
        self.dense_proj = layers.Dense(dense_proj_dim, activation='relu', name='dense_proj')
        # left branch head
        self.left_concat = layers.Concatenate(name='left_concat')
        self.left_hidden = layers.Dense(hidden_units, activation='relu', name='left_hidden')
        self.left_out = layers.Dense(1, name='left_out')  # produces out1

        # right branch (separate level embedding)
        self.level_emb_right = layers.Embedding(input_dim=n_level, output_dim=level_emb_dim_right, name='level_emb_right')
        self.right_proj = layers.Dense(16, activation='relu', name='right_proj')
        self.right_out = layers.Dense(1, name='right_out')  # produces out2

        # optional dropout
        self.dropout = layers.Dropout(dropout)

    def call(self, inputs, training=False):
        instr = inputs['instr']    # shape (batch,)
        lvl   = inputs['lvl']      # shape (batch,)
        dense = inputs['dense']    # shape (batch, n_dense)

        # left branch
        instr_v = self.instr_emb(instr)           # (batch, instr_emb_dim)
        lvl_v_left = self.level_emb_left(lvl)     # (batch, level_emb_dim_left)
        dense_proj = self.dense_proj(dense)       # (batch, dense_proj_dim)
        left_feat = self.left_concat([instr_v, lvl_v_left, dense_proj])
        left_feat = self.dropout(left_feat, training=training)
        left_h = self.left_hidden(left_feat)
        out1 = self.left_out(left_h)              # (batch, 1)

        # right branch
        lvl_v_right = self.level_emb_right(lvl)   # (batch, level_emb_dim_right)
        right_h = self.right_proj(lvl_v_right)
        out2 = self.right_out(right_h)            # (batch, 1)

        # sum outputs and squeeze
        out = out1 + out2
        return tf.squeeze(out, axis=-1)

# -----------------------
# 4) Instantiate, compile, train
# -----------------------
model = DizzyRS(n_instructor=n_instructor,
                n_level=n_level,
                n_dense=n_dense,
                instr_emb_dim=50,
                level_emb_dim_left=30,
                level_emb_dim_right=30,
                dense_proj_dim=32,
                hidden_units=64,
                dropout=0.2)

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='mse',
    metrics=[RMSE(), tf.keras.metrics.MeanAbsoluteError(name='mae')]
)

# callbacks: early stopping on validation RMSE (metric name 'val_rmse')
earlystop = tf.keras.callbacks.EarlyStopping(
    monitor='val_rmse',
    patience=5,
    mode='min',
    restore_best_weights=True,
    verbose=1
)
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_rmse', factor=0.5, patience=3, verbose=1, min_lr=1e-6)

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=100,
    callbacks=[earlystop, reduce_lr],
    verbose=2
)

# -----------------------
# 5) Evaluate
# -----------------------
res = model.evaluate(test_ds, return_dict=True)
print("Test results:", res)

# -----------------------
# 6) Example predictions on a small batch
# -----------------------
for batch_x, batch_y in test_ds.take(1):
    preds = model.predict(batch_x)
    print("sample preds:", preds[:8])
    print("sample truths:", batch_y.numpy()[:8])
    break


Epoch 1/100
28/28 - 3s - 111ms/step - loss: 8.0214 - mae: 2.5962 - rmse: 2.8322 - val_loss: 1.1463 - val_mae: 0.8770 - val_rmse: 1.0707 - learning_rate: 1.0000e-03
Epoch 2/100
28/28 - 0s - 10ms/step - loss: 0.8149 - mae: 0.6512 - rmse: 0.9027 - val_loss: 0.4387 - val_mae: 0.5403 - val_rmse: 0.6624 - learning_rate: 1.0000e-03
Epoch 3/100
28/28 - 0s - 8ms/step - loss: 0.3950 - mae: 0.4981 - rmse: 0.6285 - val_loss: 0.3049 - val_mae: 0.4295 - val_rmse: 0.5522 - learning_rate: 1.0000e-03
Epoch 4/100
28/28 - 0s - 8ms/step - loss: 0.3001 - mae: 0.4258 - rmse: 0.5478 - val_loss: 0.2511 - val_mae: 0.3904 - val_rmse: 0.5011 - learning_rate: 1.0000e-03
Epoch 5/100
28/28 - 0s - 14ms/step - loss: 0.2413 - mae: 0.3803 - rmse: 0.4912 - val_loss: 0.2180 - val_mae: 0.3620 - val_rmse: 0.4669 - learning_rate: 1.0000e-03
Epoch 6/100
28/28 - 1s - 21ms/step - loss: 0.1888 - mae: 0.3329 - rmse: 0.4345 - val_loss: 0.2038 - val_mae: 0.3511 - val_rmse: 0.4514 - learning_rate: 1.0000e-03
Epoch 7/100
28/28 - 1s 