# Two towers implementation

The Two Towers architecture is a popular neural network approach for recommender systems, especially when both user and item features are available. Here’s why it is used and how it works:

**1. Flexible Feature Integration:**  
The Two Towers model allows you to incorporate a wide range of user and item features (such as demographics, content tags, or behavioral data), not just interaction histories. This flexibility helps the model learn richer representations.

**2. Scalability:**  
By learning separate embeddings for users and items, the Two Towers approach enables efficient retrieval of recommendations, even in large-scale systems. After training, you can precompute embeddings and use fast similarity search for recommendations.

**3. Generalization:**  
Unlike traditional collaborative filtering, which relies solely on the interaction matrix, Two Towers can generalize to new users or items if their features are known, making it suitable for cold-start scenarios.

**4. Architecture Overview:**  
- **User Tower:** Processes user features and outputs a user embedding.
- **Item Tower:** Processes item features and outputs an item embedding.
- The similarity (often dot product or cosine similarity) between user and item embeddings predicts the likelihood of interaction.

**5. Real-World Use:**  
This architecture is widely used in industry (e.g., YouTube, Google, TikTok) for large-scale personalized recommendations.

In [3]:
import pandas as pd
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.layers import Concatenate, Dense, Dropout, Input
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, r2_score
import json
import load_data
import numpy as np

2025-05-17 19:57:53.188577: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-17 19:57:56.320521: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747504677.901126  213913 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747504678.428914  213913 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1747504679.868216  213913 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

In [None]:
small_matrix, big_matrix, item_categories, item_features, social_network, user_features, captions = load_data.load_data()

item_features = item_features_agg = item_features.groupby("video_id").agg({
    "play_cnt": "sum",
    "share_cnt": "sum",
    "download_cnt": "sum",
    "comment_cnt": "sum",
    "upload_type": "first",
    "author_id": "first",
    "video_duration": "first"
})
item_features["video_duration"] = item_features["video_duration"].fillna(item_features["video_duration"].median())
item_features = item_features.dropna()
item_features = item_features.drop_duplicates()
all_categories = set(cat for sublist in item_categories["feat"] for cat in sublist)
for cat in all_categories:
    item_features[f'cat_{cat}'] = item_categories["feat"].apply(lambda x: int(cat in x))


In [None]:
# Improved implementation of the Two Towers model

# User tower
user_input = Input(shape=(user_features.shape[1],), name='user_input')
user_tower = Dense(128, activation='relu')(user_input)
user_tower = Dropout(0.3)(user_tower)
user_tower = Dense(64, activation='relu')(user_tower)
user_tower = Dense(32, activation='relu')(user_tower)
user_tower = Dropout(0.2)(user_tower)

# Item tower
item_input = Input(shape=(item_features.shape[1],), name='item_input')
item_tower = Dense(128, activation='relu')(item_input)
item_tower = Dropout(0.3)(item_tower)
item_tower = Dense(64, activation='relu')(item_tower)
item_tower = Dense(32, activation='relu')(item_tower)
item_tower = Dropout(0.2)(item_tower)

# Concatenate towers
merged = Concatenate()([user_tower, item_tower])
merged = Dense(64, activation='relu')(merged)
merged = Dropout(0.2)(merged)
merged = Dense(32, activation='relu')(merged)
output = Dense(1, activation='linear', name='output')(merged)

model = Model(inputs=[user_input, item_input], outputs=output)
model.compile(optimizer=Adam(learning_rate=1e-3), loss='mse', metrics=['mae'])

# Callbacks for better training
lr_scheduler = ReduceLROnPlateau(
    monitor="val_loss", factor=0.5, patience=2, min_lr=1e-6, verbose=1
)
early_stopping = EarlyStopping(
    monitor="val_loss", patience=5, restore_best_weights=True, verbose=1
)

In [None]:
# Encode the labels of string columns
item_features["upload_type"] = LabelEncoder().fit_transform(item_features["upload_type"])
user_features["user_active_degree"] = LabelEncoder().fit_transform(user_features["user_active_degree"])
# Standardize features by removing the mean and scaling to unit variance
user_features = StandardScaler().fit_transform(user_features)
item_features = StandardScaler().fit_transform(item_features)
watch_ratios = StandardScaler().fit_transform(big_matrix[["watch_ratio"]].values)

# Split into train and test

user_features_train = user_features[big_matrix["user_id"]]
item_features_train = item_features[big_matrix["video_id"]]
y_train = watch_ratios


user_features_test = user_features[small_matrix["user_id"]]
item_features_test = item_features[small_matrix["video_id"]]
y_test = StandardScaler().fit_transform(small_matrix[["watch_ratio"]].values)

In [None]:
model.fit(
    x=[user_features_train, item_features_train],
    y=y_train,
    validation_data=([user_features_test, item_features_test], y_test),
    epochs=10,
    batch_size=128,
    callbacks=[lr_scheduler, early_stopping]
)

model.save()

In [None]:

# Load the model
two_tower_model = load_model("two-towers.keras", custom_objects={'tf': tf})

inverse_scaler = StandardScaler().fit(big_matrix[["watch_ratio"]].values)

y_true = inverse_scaler.inverse_transform(y_test).flatten()

predictions = model.predict([user_features_test, item_features_test])
y_pred = inverse_scaler.inverse_transform(predictions)


In [16]:
# Evaluate the model
mae = mean_absolute_error(y_true, y_pred)
rmse = root_mean_squared_error(y_true, y_pred)
r2 = r2_score(y_true, y_pred)

print(f"Evaluate the model")
print(f"True values: {y_true[:5]}")
print(f"Predicted values: {y_pred[:5]}")
print(f"MAE: {mae}")
print(f"RMSE: {rmse}")
print(f"R2: {r2}")


Evaluate the model
True values: [0.71234272 2.17924934 2.37223467 0.51962865 0.33643166]
Predicted values: [[1.4275737 ]
 [1.4261898 ]
 [0.90504426]
 [1.2559516 ]
 [0.9100992 ]]
MAE: 0.5125399806729959
RMSE: 1.6252481923620308
R2: 0.06278182109012642
