# Two towers approach

The Two-Towers Neural Network is a deep learning-based approach used for recommendation systems. It leverages a neural network architecture with two distinct input "towers": one for users and one for items. Each tower processes its respective input (user and item embeddings) through shared or separate neural layers, ultimately learning to predict user-item interactions.

This approach is particularly powerful for large-scale recommendation systems as it can learn complex, non-linear relationships between users and items.


## Setup

In [1]:
%%bash
DATA_FOLDER="../data"
if [ ! -d "$DATA_FOLDER" ]; then
    wget --no-check-certificate "https://drive.usercontent.google.com/download?id=1qe5hOSBxzIuxBb1G_Ih5X-O65QElollE&export=download&confirm=t&uuid=b2002093-cc6e-4bd5-be47-9603f0b33470" -O KuaiRec.zip
    unzip KuaiRec.zip -d "$DATA_FOLDER"
    rm KuaiRec.zip
fi

In [2]:
import pandas as pd
from keras.callbacks import EarlyStopping, ReduceLROnPlateau
from keras.api.layers import Concatenate, Dense, Dropout, Input
from keras.api.models import Model
from keras.api.optimizers import Adam
from keras.saving import load_model
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, r2_score
import json

In [3]:
# Load the datasets.
data = "../data/KuaiRec 2.0/data"
interactions = pd.read_csv(f"{data}/big_matrix.csv")
categories = pd.read_csv(f"{data}/item_categories.csv")
user_features = pd.read_csv(f"{data}/user_features.csv")
item_features = pd.read_csv(f"{data}/item_daily_features.csv")

In [4]:
# Drop useless data.
interactions = interactions.dropna()
interactions = interactions.drop_duplicates()
interactions = interactions[interactions["timestamp"] >= 0]

# Drop useless data.
user_features = user_features[["user_active_degree", "follow_user_num"] + [f"onehot_feat{x}" for x in range(18)]]
user_features.fillna(-1, inplace=True)

# Aggregate days.
item_features = item_features_agg = item_features.groupby("video_id").agg({
    "play_cnt": "sum",
    "share_cnt": "sum",
    "download_cnt": "sum",
    "comment_cnt": "sum",
    "upload_type": "first",
    "author_id": "first",
    "video_duration": "first"
})
item_features["video_duration"] = item_features["video_duration"].fillna(item_features["video_duration"].median())
# Drop usless data.
item_features = item_features.dropna()
item_features = item_features.drop_duplicates()

# Drop useless data.
categories = categories.dropna()
categories = categories.drop_duplicates()
# Convert the feat column from a string representation of an array into an array.
categories["feat"] = categories["feat"].apply(lambda x: json.loads(x))
all_categories = set(cat for sublist in categories["feat"] for cat in sublist)
# Convert multi-label categories to binary columns.
for cat in all_categories:
    item_features[f'cat_{cat}'] = categories["feat"].apply(lambda x: int(cat in x))

## Training the model

### Preparing the train and test data

In [5]:
# Encode the labels of string columns.
item_features["upload_type"] = LabelEncoder().fit_transform(item_features["upload_type"])
user_features["user_active_degree"] = LabelEncoder().fit_transform(user_features["user_active_degree"])

In [6]:
# Standardize features by removing the mean and scaling to unit variance.
user_features = StandardScaler().fit_transform(user_features)
item_features = StandardScaler().fit_transform(item_features)
watch_ratios = StandardScaler().fit_transform(interactions[["watch_ratio"]].values)


In [7]:
# Split into train and test.

user_features_train = user_features[interactions["user_id"]]
item_features_train = item_features[interactions["video_id"]]
y_train = watch_ratios

# Load the test df.
test_df = pd.read_csv(f"{data}/small_matrix.csv")
test_df = test_df.dropna()
test_df = test_df.drop_duplicates()
test_df = test_df[test_df["timestamp"] >= 0]

user_features_test = user_features[test_df["user_id"]]
item_features_test = item_features[test_df["video_id"]]
y_test = StandardScaler().fit_transform(test_df[["watch_ratio"]].values)


### Model definition

In [8]:
n_users = user_features_train.shape[1]
n_videos = item_features_train.shape[1]

input_user = Input(shape=(n_users,))
input_video = Input(shape=(n_videos,))

combined = Concatenate()([input_user, input_video])
x = Dense(128, activation="relu")(combined)
x = Dropout(0.3)(x)
x = Dense(64, activation="relu")(x)
x = Dense(32, activation="relu")(x)
output = Dense(1)(x)

model = Model(inputs=[input_user, input_video], outputs=output)

model.compile(optimizer=Adam(), loss="mse", metrics=["mae"])

lr_scheduler = ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=2, min_lr=1e-6, verbose=1)
early_stopping = EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)


### Training

In [9]:
model.fit(
    x=[user_features_train, item_features_train],
    y=y_train,
    validation_data=([user_features_test, item_features_test], y_test),
    epochs=5,
    batch_size=128,
    callbacks=[lr_scheduler, early_stopping]
)

model.save("two-towers.keras")


Epoch 1/5
[1m90352/90352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m107s[0m 1ms/step - loss: 0.9465 - mae: 0.3329 - val_loss: 0.9359 - val_mae: 0.2884 - learning_rate: 0.0010
Epoch 2/5
[1m90352/90352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m101s[0m 1ms/step - loss: 0.9274 - mae: 0.3236 - val_loss: 0.9324 - val_mae: 0.2885 - learning_rate: 0.0010
Epoch 3/5
[1m90352/90352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m103s[0m 1ms/step - loss: 0.9539 - mae: 0.3235 - val_loss: 0.9327 - val_mae: 0.2888 - learning_rate: 0.0010
Epoch 4/5
[1m90320/90352[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 992us/step - loss: 0.9153 - mae: 0.3224
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m90352/90352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m106s[0m 1ms/step - loss: 0.9153 - mae: 0.3224 - val_loss: 0.9339 - val_mae: 0.2902 - learning_rate: 0.0010
Epoch 5/5
[1m90352/90352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m101s[0m 1ms/step

## Benchmarks

In [10]:
model = load_model("two-towers.keras")

inverse_scaler = StandardScaler().fit(interactions[["watch_ratio"]].values)

y_true = inverse_scaler.inverse_transform(y_test).flatten()

predictions = model.predict([user_features_test, item_features_test])
y_pred = inverse_scaler.inverse_transform(predictions)

[1m140456/140456[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 342us/step


In [11]:
mae = mean_absolute_error(y_true, y_pred)
rmse = root_mean_squared_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)

print("📊 Model Evaluation Metrics:")
print(f"➡️  MAE  (Mean Absolute Error)    : {round(mae, 4)}")
print(f"➡️  RMSE (Root Mean Squared Error): {round(rmse, 4)}")
print(f"➡️  R² Score                      : {round(r2, 4)}")

📊 Model Evaluation Metrics:
➡️  MAE  (Mean Absolute Error)    : 0.4844
➡️  RMSE (Root Mean Squared Error): 1.6211
➡️  R² Score                      : 0.0676


## Conclusion

In this notebook, we implemented and evaluated a **Two Towers Neural Network** for collaborative filtering. This architecture uses two separate neural networks to process user and item data, respectively, and then combines their outputs to make predictions.

### Insights from the Evaluation:
- **MAE**: The **Mean Absolute Error** of **0.4844** indicates that, on average, the predicted ratings differ from the actual ratings by about 0.48. This shows a moderate level of prediction accuracy, with room for further improvement.

- **RMSE**: The **Root Mean Squared Error** of **1.6211** suggests there is some variability between the predicted and actual ratings, although slightly better than the previous result. This indicates the model reduces larger errors somewhat but could still be optimized further.

- **R² Score**: The **R² Score** of **0.0676** implies that the model explains about **6.6%** of the variance in the actual ratings. While slightly improved, this low value still points to potential underfitting and the need for better representation of the data’s underlying patterns.

The **Two Towers Neural Network** model demonstrates **moderate performance** based on the evaluation metrics. While the **MAE** shows reasonably close predictions, the **RMSE** and particularly the **R² Score** suggest that the model struggles to fully capture the complexity and variance in the data.

To enhance the model’s performance, the following steps may be considered:
- Further **hyperparameter tuning** (e.g., learning rate, number of layers, neurons per layer)
- Applying **regularization techniques** to reduce overfitting
- Introducing **advanced architectures**, such as attention mechanisms
- Exploring **hybrid recommender systems** that combine collaborative and content-based features

Overall, the **Two Towers architecture** provides a solid baseline for collaborative filtering, but there is significant room for optimization to improve both accuracy and generalization. Continued experimentation with more complex models and refined training strategies could lead to better recommendation outcomes.
