#### What is the Two-Tower Model?

The Two-Tower model is a deep learning recommendation model that is widely used for matching user embeddings with item embeddings. It is designed to efficiently handle large-scale recommendation problems. It consists of two separate neural networks (towers):

- User Tower: Maps user features (such as demographics, interactions, etc.) to an embedding space.
- Item Tower: Maps item features (such as product description, category, etc.) to an embedding space.
- The output of each tower is a low-dimensional embedding for the user and item. The similarity between user and item embeddings is used to compute a matching score, typically using the dot product or cosine similarity.

This model is particularly effective for retrieval tasks in large catalogs, such as recommending relevant items to users from a vast pool.

In [11]:
import numpy as np
import pandas as pd

# Define the number of users and items
n_users = 1000
n_items = 500

# Generate random data for users (age, gender, and number of interactions)
user_ids = np.arange(n_users)
user_ages = np.random.randint(18, 65, size=n_users)
user_genders = np.random.choice([0, 1], size=n_users)  # 0: Female, 1: Male
user_interactions = np.random.randint(1, 5, size=n_users)

# Generate random data for items (category and price)
item_ids = np.arange(n_items)
item_categories = np.random.randint(0, 10, size=n_items)  # 10 different categories
item_prices = np.random.uniform(5, 200, size=n_items)

# Create Pandas DataFrames for easy viewing
user_data = pd.DataFrame({'user_id': user_ids, 'age': user_ages, 'gender': user_genders, 'interactions': user_interactions})
item_data = pd.DataFrame({'item_id': item_ids, 'category': item_categories, 'price': item_prices})

# Generate random user-item interactions with ratings
n_interactions = 10000  # Total number of interactions to generate
interactions = pd.DataFrame({
    'user_id': np.random.randint(0, n_users, n_interactions),
    'item_id': np.random.randint(0, n_items, n_interactions),
    'rating': np.random.randint(1, 6, n_interactions)  # Ratings between 1 and 5
})

interactions.sort_values(['user_id', 'item_id'], inplace=True)
# Create the 'user-item-rating' matrix
interactions.head()


Unnamed: 0,user_id,item_id,rating
7198,0,28,2
9810,0,35,4
2124,0,46,4
1984,0,62,1
1099,0,83,2


#### Matrix Factorization (MF) Model Implementation

Matrix Factorization (MF) is a popular technique for recommendation systems where the goal is to decompose the user-item interaction matrix into two low-rank matrices representing the users and items.

Pros and Cons of Matrix Factorization:

Pros:

- Captures latent factors that may not be apparent from the original data.
- Simple and effective for cold-start problems (less interaction data).

Cons:

- Limited when it comes to sparse data (users/items with few interactions).
- Cannot handle side features (e.g., user demographics, item metadata).

#### Implementation:

In [12]:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# Convert the generated data into the format required by Surprise
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(interactions[['user_id', 'item_id', 'rating']], reader)

# Train-test split
trainset, testset = train_test_split(data, test_size=0.25)

# Matrix Factorization using SVD
svd_model = SVD()
svd_model.fit(trainset)
predictions = svd_model.test(testset)

# Evaluate with RMSE
rmse = accuracy.rmse(predictions)

RMSE: 1.4748


#### Collaborative Filtering (CF) Model Implementation

- User-Based Collaborative Filtering: Similar users are recommended items that were liked by other similar users.

- Item-Based Collaborative Filtering: Similar items are recommended based on items that users liked previously.

#### Pros and Cons of Collaborative Filtering:

Pros:

- User-based CF is intuitive and works well for items with a lot of user interactions.
- Item-based CF is good when items are frequently interacted with by users and we want to find similarities between items.

Cons:

- Struggles with the cold-start problem.
- Computationally expensive for large datasets.

In [13]:
# User-Based CF Implementation:
# Compute similarity between users using Cosine similarity
from surprise import KNNBasic

# User-based collaborative filtering
sim_options = {'name': 'cosine', 'user_based': True}  # Use cosine similarity and user-based
ub_cf_model = KNNBasic(sim_options=sim_options)
ub_cf_model.fit(trainset)
predictions_ub_cf = ub_cf_model.test(testset)

# Evaluate with RMSE
rmse_ub_cf = accuracy.rmse(predictions_ub_cf)


Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.7446


In [14]:
# Item-Based CF Implementation:
# Compute similarity between items using Cosine similarity
# Item-based collaborative filtering
sim_options = {'name': 'cosine', 'user_based': False}  # Use cosine similarity and item-based
ib_cf_model = KNNBasic(sim_options=sim_options)
ib_cf_model.fit(trainset)
predictions_ib_cf = ib_cf_model.test(testset)

# Evaluate with RMSE
rmse_ib_cf = accuracy.rmse(predictions_ib_cf)


Computing the cosine similarity matrix...
Done computing similarity matrix.
RMSE: 1.7215


#### Two-Tower Model Implementation using Numpy
In the Two-Tower model, we will create separate embeddings for users and items using random weights, and then calculate the dot product of the embeddings as the score for matching.


Pros and Cons of Two-Tower Model:

Pros:

- Handles both cold-start and rich feature data (user/item attributes) well.
- Efficient for large-scale retrieval tasks.

Cons:

- Requires extensive training data to learn good embeddings.
- More computationally intensive to train compared to CF.

##### Implementation:

In [42]:
import numpy as np
import pandas as pd

# Step 1: Add new user and item features

# Define the number of users and items
n_users = 1000
n_items = 500

# User data (with more features)
user_ids = np.arange(n_users)
user_ages = np.random.randint(18, 65, size=n_users)
user_genders = np.random.choice([0, 1], size=n_users)  # 0: Female, 1: Male
user_interactions = np.random.randint(1, 5, size=n_users)
user_regions = np.random.randint(0, 5, size=n_users)  # Assume 5 regions
user_device_types = np.random.choice([0, 1, 2], size=n_users)  # 0: Desktop, 1: Mobile, 2: Tablet

# Item data (with more features)
item_ids = np.arange(n_items)
item_categories = np.random.randint(0, 10, size=n_items)  # 10 categories
item_subcategories = np.random.randint(0, 5, size=n_items)  # 5 subcategories
item_prices = np.random.uniform(5, 200, size=n_items)
item_brands = np.random.randint(0, 50, size=n_items)  # 50 brands
item_discounts = np.random.uniform(0, 0.5, size=n_items)  # Discount percentage between 0 and 50%

# Create Pandas DataFrames
user_data = pd.DataFrame({
    'user_id': user_ids,
    'age': user_ages,
    'gender': user_genders,
    'interactions': user_interactions,
    'region': user_regions,
    'device_type': user_device_types
})

item_data = pd.DataFrame({
    'item_id': item_ids,
    'category': item_categories,
    'subcategory': item_subcategories,
    'price': item_prices,
    'brand': item_brands,
    'discount': item_discounts
})

# Generate random user-item interactions with ratings
n_interactions = 10000
interactions = pd.DataFrame({
    'user_id': np.random.randint(0, n_users, n_interactions),
    'item_id': np.random.randint(0, n_items, n_interactions),
    'rating': np.random.randint(1, 6, n_interactions)  # Ratings between 1 and 5
})

interactions.sort_values(['user_id', 'item_id'], inplace=True)

# Step 2: Modify the TwoTowerModel to accommodate categorical embeddings and more features

import numpy as np
import pandas as pd
import torch.nn as nn
import torch

class TwoTowerModel:
    def __init__(self, user_data, item_data, embedding_dim, category_dim, subcategory_dim, brand_dim, region_dim, device_dim, lr=0.001):
        self.user_data = user_data
        self.item_data = item_data
        self.embedding_dim = embedding_dim
        self.lr = lr

        # Initialize embeddings for users and items with the specified embedding dimension
        self.user_embeddings = np.random.randn(len(user_data), embedding_dim) * 0.01
        self.item_embeddings = np.random.randn(len(item_data), embedding_dim) * 0.01

        # Embeddings for categorical variables (make sure these dimensions match your design)
        self.category_embeddings = np.random.randn(10, category_dim) * 0.01  # 10 categories
        self.subcategory_embeddings = np.random.randn(5, subcategory_dim) * 0.01  # 5 subcategories
        self.brand_embeddings = np.random.randn(50, brand_dim) * 0.01  # 50 brands
        self.region_embeddings = np.random.randn(5, region_dim) * 0.01  # 5 regions
        self.device_embeddings = np.random.randn(3, device_dim) * 0.01  # 3 device types

        # Create embedding layers for numerical features
        self.age_embedding_layer = nn.Linear(1, embedding_dim)  # Assuming age is a numerical feature
        self.interactions_embedding_layer = nn.Linear(1, embedding_dim)  # Assuming interactions is a numerical feature

    def get_user_embedding(self, user_id):
        user_info = self.user_data[self.user_data['user_id'] == user_id].iloc[0]

        # Continuous features (age, interactions)
        age_embedding = torch.tensor([user_info['age'] / 100]).reshape(1, 1)
        interactions_embedding = torch.tensor([user_info['interactions'] / 10]).reshape(1, 1)

        # Scalar embeddings
        gender_embedding = np.array([user_info['gender'] * 0.1])  # Scalar for gender
        region_embedding = self.region_embeddings[int(user_info['region'])].reshape(-1)  # Ensure 1D
        device_embedding = self.device_embeddings[int(user_info['device_type'])].reshape(-1)  # Ensure 1D

        # Combine all features into the user embedding
        user_embedding = np.concatenate([
            age_embedding.flatten(),
            interactions_embedding.flatten(),
            gender_embedding,
            region_embedding,
            device_embedding
        ])

        print(f"Sizes - Age: {age_embedding.size(0)}, Interactions: {interactions_embedding.size(0)}, Gender: {gender_embedding.size}, Region: {region_embedding.size}, Device: {device_embedding.size}")

        # Combine with user embeddings
        final_user_embedding = np.concatenate([user_embedding, self.user_embeddings[user_id]])
        print(f"Sizes - final_user_embedding: {final_user_embedding.flatten().size}")
        return final_user_embedding

    def get_item_embedding(self, item_id):
        item_info = self.item_data[self.item_data['item_id'] == item_id].iloc[0]

        # Continuous features (price, discount)
        price_embedding = np.array([item_info['price'] / 200]).reshape(1, 1)
        discount_embedding = np.array([item_info['discount']]).reshape(1, 1)

        # Embedding for categorical features
        category_embedding = self.category_embeddings[int(item_info['category'])].reshape(-1)  # Ensure 1D
        subcategory_embedding = self.subcategory_embeddings[int(item_info['subcategory'])].reshape(-1)  # Ensure 1D
        brand_embedding = self.brand_embeddings[int(item_info['brand'])].reshape(-1)  # Ensure 1D

        # Combine all features into the item embedding
        item_embedding = np.concatenate([
            price_embedding.flatten(),
            discount_embedding.flatten(),
            category_embedding,
            subcategory_embedding,
            brand_embedding
        ])

        print(f"Sizes - Price: {price_embedding.flatten().size}, Discount: {discount_embedding.flatten().size}, Category: {category_embedding.size}, Subcategory: {subcategory_embedding.size}, Brand: {brand_embedding.size}")

        # Combine with item embeddings
        final_item_embedding = np.concatenate([item_embedding, self.item_embeddings[item_id]])
        print(f"Sizes - final_item_embedding: {final_item_embedding.flatten().size}")
        return final_item_embedding

    def forward(self, user_id, item_id):
        user_embedding = self.get_user_embedding(user_id)
        item_embedding = self.get_item_embedding(item_id)

        # Ensure the shapes are the same before addition
        if user_embedding.shape != item_embedding.shape:
            raise ValueError("User and item embeddings must have the same shape for addition.")

        # Combine user and item embeddings by addition
        combined_embedding = user_embedding + item_embedding
        return combined_embedding

    def train(self, user_item_pairs, ratings, epochs=10):
        for epoch in range(epochs):
            total_loss = 0
            for (user_id, item_id), rating in zip(user_item_pairs, ratings):
                pred = self.forward(user_id, item_id)
                # For regression, we use the L2 loss (mean squared error)
                error = pred - rating
                total_loss += error ** 2

                # Gradient Descent Update
                user_embedding = self.get_user_embedding(user_id)
                item_embedding = self.get_item_embedding(item_id)

                print(f"Sizes - user_embedding: {user_embedding.flatten().size}")
                print(f"Sizes - item_embedding: {item_embedding.flatten().size}")
                print(f"Sizes - self.user_embeddings[user_id]: {self.user_embeddings[user_id].flatten().size}")
                print(f"Sizes - self.item_embeddings[item_id]: {self.item_embeddings[item_id].flatten().size}")

                # Update user embeddings (split based on how they were created)
                age_grad = self.lr * error * (user_embedding[:1])  # Gradient for age embedding
                interactions_grad = self.lr * error * (user_embedding[1:2])  # Gradient for interactions
                gender_grad = self.lr * error * (user_embedding[2:3])  # Gradient for gender
                region_grad = self.lr * error * user_embedding[3:6]  # Gradient for region
                device_grad = self.lr * error * user_embedding[6:9]  # Gradient for device
                user_embedding_grad = self.user_embeddings[user_id] * self.lr * error  # Gradient for user embeddings

                # Update the user embeddings
                self.user_embeddings[user_id] -= user_embedding_grad
                self.user_embeddings[user_id] -= (age_grad + interactions_grad + gender_grad + region_grad + device_grad)

                # Update item embeddings (split based on how they were created)
                price_grad = self.lr * error * (item_embedding[:1])  # Gradient for price embedding
                discount_grad = self.lr * error * (item_embedding[1:2])  # Gradient for discount
                category_grad = self.lr * error * item_embedding[2:12]  # Gradient for category (10 categories)
                subcategory_grad = self.lr * error * item_embedding[12:15]  # Gradient for subcategory (5 subcategories)
                brand_grad = self.lr * error * item_embedding[15:17]  # Gradient for brand (50 brands)
                item_embedding_grad = self.item_embeddings[item_id] * self.lr * error  # Gradient for item embeddings

                # Update the item embeddings
                self.item_embeddings[item_id] -= item_embedding_grad
                self.item_embeddings[item_id] -= (price_grad + discount_grad + category_grad + subcategory_grad + brand_grad)


            print(f"Epoch {epoch + 1}, Loss: {total_loss:.4f}")

# Example initialization and training
# Assuming user_data and item_data are already defined DataFrames as shown in your previous code
user_item_pairs = list(zip(interactions['user_id'], interactions['item_id']))
ratings = interactions['rating'].values

embedding_dim = 10
category_dim = 2
subcategory_dim = 3
brand_dim = 2
region_dim = 3
device_dim = 3

# Initialize the Two-Tower model with categorical embeddings
tt_model = TwoTowerModel(
    user_data, item_data, embedding_dim, category_dim, subcategory_dim, brand_dim, region_dim, device_dim, lr=0.001
)

# Train the model
tt_model.train(user_item_pairs, ratings, epochs=10)

Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - user_embedding: 19
Sizes - item_embedding: 19
Sizes - self.user_embeddings[user_id]: 10
Sizes - self.item_embeddings[item_id]: 10


ValueError: operands could not be broadcast together with shapes (19,) (3,) 

#### Fit the Data for All Three Models
We now fit the same user-item interaction data to all three models: MF, CF (user-based and item-based), and the Two-Tower model. We will then compare the results based on prediction accuracy.


In [43]:
from sklearn.metrics import ndcg_score

# Prepare actual and predicted ratings for NDCG evaluation
y_true = np.array([rating for (_, _), rating in zip(user_item_pairs, ratings)])
y_pred_mf = np.array([svd_model.predict(user_id, item_id).est for user_id, item_id in user_item_pairs])
y_pred_ub_cf = np.array([ub_cf_model.predict(user_id, item_id).est for user_id, item_id in user_item_pairs])
y_pred_ib_cf = np.array([ib_cf_model.predict(user_id, item_id).est for user_id, item_id in user_item_pairs])
y_pred_tt = np.array([tt_model.forward(user_id, item_id) for user_id, item_id in user_item_pairs])

# Evaluate NDCG for all models
ndcg_mf = ndcg_score([y_true], [y_pred_mf])
ndcg_ub_cf = ndcg_score([y_true], [y_pred_ub_cf])
ndcg_ib_cf = ndcg_score([y_true], [y_pred_ib_cf])
ndcg_tt = ndcg_score([y_true], [y_pred_tt])

print(f"NDCG MF: {ndcg_mf}")
print(f"NDCG User-based CF: {ndcg_ub_cf}")
print(f"NDCG Item-based CF: {ndcg_ib_cf}")
print(f"NDCG Two-Tower: {ndcg_tt}")


Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19
Sizes - Age: 1, Interactions: 1, Gender: 1, Region: 3, Device: 3
Sizes - final_user_embedding: 19
Sizes - Price: 1, Discount: 1, Category: 2, Subcategory: 3, Brand: 2
Sizes - final_item_embedding: 19


ValueError: Found array with dim 3. None expected <= 2.

#### When to Use Two-Tower Model

- When to Use:
    - When you have rich user and item feature data and want to compute embeddings.
    - Useful for cold-start problems where collaborative filtering may fail.
    - Efficient for retrieval tasks where you need to match users to a large number of items.

- When Not to Use:
    - When you have limited feature data or sparse interactions.
    - For small datasets, simpler models like matrix factorization or collaborative filtering may be sufficient and more computationally efficient.