*UE Learning from User-generated Data, CP MMS, JKU Linz 2024*
# Exercise 6: Content based Filtering

In this exercise, we delve into content-based filtering, a type of recommender system that makes recommendations by utilizing the features of items and a profile of the user's preferences. Unlike collaborative filtering, which relies on the user-item interactions, content-based filtering focuses on the properties of the items themselves to make recommendations. This approach is particularly useful when we have detailed metadata about items or when dealing with cold start problems related to new users or items.

An example of item content properties could be (for a music track): artist name, track title, year of release, genre, as well as the audio track itself (remember, in the collaborative filtering scenario we only worked with item ids without caring about what they actually were). Such complex information as the audio track is usually handled in a form of (item) embeddings, high-dimensional vector representations that capture the characteristics of each item. By analyzing these embeddings, we can identify items that are similar to those a user has liked in the past and recommend them accordingly.

Please consult the lecture slides on content-based filtering for a recap.

Make sure to rename the notebook according to the convention:

LUD24_ex06_k<font color='red'><Matr. Number\></font>_<font color='red'><Surname-Name\></font>.ipynb

for example:

LUD24_ex06_k000007_Bond_James.ipynb

## Implementation
In this exercise, you will implement two content-based filtering algorithms using item embeddings. We provide the embeddings for each item, and your task is to find items most similar to a user's consumption history. You will then evaluate the performance of your algorithms using the normalized Discounted Cumulative Gain (nDCG) metric across different user groups.

Please **only use libraries already imported in the notebook**.

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity as cosine_similarity
from rec import inter_matr_implicit
from tqdm import tqdm
from typing import Callable, List

## Data Overview and Loading

This exercise utilizes a dataset that consists of user-item interactions and item embeddings. The dataset is split into several files:

`*.user`: Contains information about users.
`*.item`: Contains information about items.
`*.inter_train`: Contains user-item interactions used for training the recommender system.
`*.inter_test`: Contains user-item interactions held out for testing the recommender system.
`*.id_musicnn`: Contains embeddings for each item.

Let's start by loading these files and taking a closer look at their content.

In [2]:
def read(dataset, file):
    return pd.read_csv(dataset + '/' + dataset + '.' + file, sep='\t')

# Load User Data
users = read('lfm-tiny-tunes', 'user')
print("Users Data Head:")
print(users.head())

# Load Item Data
items = read('lfm-tiny-tunes', 'item')
print("\nItem Data Head:")
print(items.head())

# Load Training Interactions
train_inters = read('lfm-tiny-tunes', 'inter_train')
print("\nTraining Interactions Head:")
print(train_inters.head())

# Load Testing Interactions
test_inters = read('lfm-tiny-tunes', 'inter_test')
print("\nTesting Interactions Head:")
print(test_inters.head())

# Load Embeddings
embedding = read('lfm-tiny-tunes', 'id_musicnn')
print("\nEmbeddings Head:")
print(embedding.head())

train_interaction_matrix = inter_matr_implicit(users, items, train_inters, 'lfm-tiny-tunes')
test_interaction_matrix = inter_matr_implicit(users, items, test_inters, 'lfm-tiny-tunes')

Users Data Head:
   user_id country  age_at_registration gender    registration_date
0        0      RU                   25      m  2006-06-12 13:25:12
1        1      US                   23      m  2005-08-18 15:25:41
2        2      FR                   25      m  2006-02-26 22:39:03
3        3      DE                    2      m  2007-02-28 10:12:13
4        4      UA                   23      n  2007-10-09 15:21:20

Item Data Head:
                artist                                   track  item_id
0           Black Flag                              Rise Above        0
1                 Blur  For Tomorrow - 2012 Remastered Version        1
2          Damien Rice                            Moody Mooday        2
3                 Muse                            Feeling Good        3
4  My Bloody Valentine                                    Soon        4

Training Interactions Head:
   user_id  item_id  listening_events
0      510       50                 3
1      510      324  

Item Similarity Calculation

Here you can experiment with the cosine similarity between two item embeddings, to get a feeling what sensible inputs and outputs are. Cosine similarity is a measure that calculates the cosine of the angle between two vectors, often used to measure item similarity in recommendation systems. We will use this later.

In [3]:
embedding_x = np.array([0.1, 0.2, 0.3, 0.4]).reshape(1, -1)
embedding_y = np.array([0.0, 0.1, 0.1, -0.1]).reshape(1, -1)
similarity = cosine_similarity(embedding_x, embedding_y)
print(f"Cosine Similarity: {similarity}")
assert -1 <= similarity <= 1, "Cosine similarity is out of bounds."

Cosine Similarity: [[0.10540926]]


## <font color='red'>TASK 1/3</font>: Implementing the Average Embedding Similarity Recommender

The idea of the first content-based recommender is to use the embeddings of the items the user already consumed to create one representation of the user's taste (by just averaging them). Then the recommedation score is assigned to each of the items in the collection as cosine_similarity between the user's profile and each item's embedding. Highest scoring items not present in the user's history are then selected for recommendation.

First, familiarize yourself with the concept of item embeddings, which are vector representations capturing the essential features or qualities of each item.

After that, develop a function that takes a user's interaction history (items they have already interacted with/seen) and the item embeddings as input. This function should:

* Create the user's profile embedding as discussed, calculate the similarity between the profile and all items in the collection.
* Rank the items based on their similarity scores.
* To ensure that only unseen items are recommended, set the aggregated similarity scores of the items the user has already interacted with to **0** as a floating point number. This will effectively remove them from consideration during the ranking process.
* Recommend the top-K most similar items that the user has not interacted with yet.

In [4]:
def calculate_user_profile_embedding(seen_item_ids: list, item_embeddings: pd.DataFrame) -> np.ndarray:
    """
    Calculates the average embedding of items a user has interacted with to create a user profile embedding.

    Parameters:
    - seen_item_ids - list[int], IDs of items already seen by the user, used to filter the embeddings;
    - item_embeddings - pd.DataFrame, Unsorted DataFrame containing item_id and item embeddings as separate columns;

    Returns:
    - np.ndarray: 1D numpy array, representing the user's average embedding profile.
    """
    
    user_profile_embedding = None
    
    # TODO: YOUR IMPLEMENTATION
    seen_embeddings = item_embeddings[item_embeddings['item_id'].isin(seen_item_ids)]
    user_profile_embedding = seen_embeddings.drop(columns='item_id').sum(axis=0)

    return user_profile_embedding


def average_embedding_similarity_rec(seen_item_ids: list, item_embeddings: pd.DataFrame, _calculate_user_profile_embedding: Callable[[List[int], pd.DataFrame], np.ndarray], top_k: int=10) -> np.ndarray:
    """
    Recommends items to a user based on the average embedding similarity of items they have interacted with.
    It computes the cosine similarity between the user profile and all other items, recommending
    the top-K most similar items that the user has not yet interacted with.

    seen_item_ids - list[int], ids of items already seen by the user (to exclude from recommendation);
    embedding - pd.DataFrame, Unsorted DataFrame containing item_id and item embeddings as separate columns;
    _calculate_user_profile_embedding - function, function to calculate the user's average embedding profile;
    topK - int, number of recommendations per user to be returned;

    returns - 1D np.ndarray, array of IDs of the top-K recommended items, sorted by decreasing similarity
            to the user's average embedding profile;
    """

    recommended_item_ids = None
    user_profile_embedding = None

    # TODO: YOUR IMPLEMENTATION
    user_profile_embedding = _calculate_user_profile_embedding(seen_item_ids, item_embeddings)
    user_profile_embedding_reshaped = user_profile_embedding.values.reshape(1, -1)
    embed_values = item_embeddings.drop(columns='item_id').values
    similarities = cosine_similarity(user_profile_embedding_reshaped, embed_values)
    similarities_df = pd.DataFrame(similarities, columns=item_embeddings['item_id'])
    similarities_df.loc[:, seen_item_ids] = 0
    
    recommended_item_ids = similarities_df.T.nlargest(top_k, 0).index.values

    return recommended_item_ids

In [5]:
user_id_example = users['user_id'].iloc[1]
seen_item_ids = train_inters[train_inters['user_id'] == user_id_example]['item_id'].values.tolist()
recommended_items = average_embedding_similarity_rec(seen_item_ids, embedding, calculate_user_profile_embedding, top_k=5)
print(f"Recommended Items for User {user_id_example}: {recommended_items}")

Recommended Items for User 1: [244  53 259 201 293]


## <font color='red'>TASK 2/3</font>: Implementing the Aggregated Item Similarity Recommender

In this task, you would be required to implement what we call Aggregated Item Similarity Recommender. This technique also builds upon the concept of item embeddings and similarity calculations, but instead of creating a single user profile, it considers the individual similarities between each item the user has interacted with and all other items. This allows for a more diverse set of recommendations that capture different aspects of the user's preferences.

* For each item the user has interacted with, calculate its similarity to all other items in the dataset using their embeddings (using cosine similarity). Make sure to sort the item embeddings before calling `_aggregated_item_similarity_rec`.
* For each item that could potentially be recommended, combine the similarity scores it received from each of the items the user has already interacted with. This combined score will reflect the overall similarity of the potential recommendation to the user's preferences as expressed through their past interactions. Keep track of the highest similarity score encountered for each potential recommendation.
* To ensure that only unseen items are recommended, set the aggregated similarity scores of the items the user has already interacted with to **0** as a floating point number. This will effectively remove them from consideration during the ranking process.
* Rank the items based on their aggregated similarity scores. Recommend the top-K items that the user has not interacted with yet.

In [6]:
def compute_aggregated_scores(seen_item_ids: list, item_embeddings: pd.DataFrame) -> np.ndarray:
    """
    Computes aggregated similarity scores for all items in the embedding DataFrame, comparing them against items the user has seen.
    
    seen_item_ids - list[int], ids of items already seen by the user (to exclude from recommendation);
    embedding - pd.DataFrame, Sorted DataFrame containing item_id and item embeddings as separate columns;
    
    returns - np.ndarray, array of aggregated similarity scores for all items, with higher scores indicating higher similarity;
    """
    
    # TODO: YOUR IMPLEMENTATION
    seen_embeddings = item_embeddings[item_embeddings['item_id'].isin(seen_item_ids)]
    similarity_scores = cosine_similarity(seen_embeddings.drop(columns='item_id'), item_embeddings.drop(columns='item_id'))
    scores_agg = similarity_scores.sum(axis=0)
    recommendation_scores = pd.DataFrame(scores_agg, index=item_embeddings['item_id'], columns=['score'])

    return recommendation_scores


def aggregated_item_similarity_rec(seen_item_ids: list, item_embeddings: pd.DataFrame, _compute_aggregated_scores: Callable[[List[int], pd.DataFrame], np.ndarray], top_k: int=10) -> np.ndarray:
    """
    Recommends items to a user based on the items they have already seen, by sorting the calculated similarity scores
    and selecting the top-k items.

    seen_item_ids - list[int], ids of items already seen by the user (to exclude from recommendation);
    embedding - pd.DataFrame, Unsorted DataFrame containing item_id and item embeddings as separate columns;
    _compute_aggregated_scores - function, function to compute aggregated similarity scores for all items;
    topK - int, number of recommendations per user to be returned;

    returns - 1D np.ndarray, array of IDs of the top-K recommended items, sorted by decreasing similarity
            to the user's average embedding profile;
    """

    # TODO: YOUR IMPLEMENTATION
    recommendation_scores = _compute_aggregated_scores(seen_item_ids, item_embeddings)
    recommendation_scores.loc[seen_item_ids, 'score'] = 0.0
    recommended_item_ids = recommendation_scores['score'].nlargest(top_k).index.values

    return recommended_item_ids

In [7]:
user_id_example = users['user_id'].iloc[1]
seen_items_user_example = np.where(train_interaction_matrix[user_id_example, :] > 0)[0]
recommended_items = aggregated_item_similarity_rec(seen_items_user_example.tolist(), embedding, compute_aggregated_scores, top_k=5)
print(f"Recommended Items for User {user_id_example}: {recommended_items}")

Recommended Items for User 1: [164 244 259 201  85]


## <font color='red'>TASK 3/3</font>: Evaluating Recommendations with nDCG

In this task, you will evaluate the performance of the content-based filtering algorithm you've implemented, alongside other recommender systems that utilize collaborative filtering. Specifically, you will use the normalized Discounted Cumulative Gain (nDCG) metric to assess how effective each recommender system is across different user groups based on their interaction levels.

You will compare the following recommender systems:

Average Embedding Similarity Recommender (Avg_Item_Embd)
Aggregated Item Similarity Recommender (Aggr_Item_Sim)
Singular Value Decomposition (SVD)
Item K-Nearest Neighbors (ItemKNN)
Top Popular (TopPop)

In [8]:
from rec import svd_decompose, svd_recommend_to_list
from rec import recTopK
from rec import recTopKPop
from sklearn.metrics import ndcg_score

In [9]:
def evaluate_ndcg_by_user_groups(user_groups: dict, recommenders: dict, train_interaction_matrix: np.ndarray, test_interaction_matrix: np.ndarray,
                                 U: np.ndarray, V: np.ndarray, item_embeddings: pd.DataFrame, _calculate_user_profile_embedding, 
                                 _compute_aggregated_scores, topK: int=10, n_neighbors: int=5) -> pd.DataFrame:
    """
    Evaluates recommender systems across user groups, calculating average nDCG scores.

    user_groups - dict, keys - names of the user groups (str), values - lists of user IDs belonging to each group;
    recommenders - dict, keys - names of recommenders (str), values - recommender functions;
    train_interaction_matrix - 2D np.ndarray (users x items), interaction matrix from the training set;
    test_interaction_matrix - 2D np.ndarray (users x items), interaction matrix from the test set;
    U, V - 2D np.ndarray, matrices resulting from SVD decomposition of the interaction matrix;
    item_embeddings - pd.DataFrame, DataFrame containing item IDs and their embeddings;
    topK - int, number of top recommendations to consider for evaluation;
    n_neighbors - int, number of neighbors for ItemKNN recommender;

    returns - pd.DataFrame, with columns: 'User Group', 'Recommender', 'Average nDCG', containing evaluation results;
    """
    results = []

    for group_name, users in user_groups.items():
        for recommender_name, recommender_func in tqdm(recommenders.items(), desc=f'Evaluating {group_name} Users'):
            nDCG_scores = []
            for user_id in users:
                seen_items = np.where(train_interaction_matrix[user_id, :] > 0)[0]  # Items already interacted with by the user

                if recommender_name == 'SVD':
                    recommendations = recommender_func(user_id, seen_items.tolist(), U, V, topK)
                elif recommender_name == 'ItemKNN':
                    recommendations = recommender_func(train_interaction_matrix, user_id, topK, n_neighbors)
                elif recommender_name == 'TopPop':
                    recommendations = recommender_func(train_interaction_matrix, user_id, topK)
                elif recommender_name == 'Avg_Item_Embd':
                    recommendations = recommender_func(seen_items.tolist(), item_embeddings, _calculate_user_profile_embedding, topK)
                elif recommender_name == 'Aggr_Item_Sim':
                    recommendations = recommender_func(seen_items.tolist(), item_embeddings, _compute_aggregated_scores, topK)
                else:
                    raise NotImplementedError(f'Recommender {recommender_name} not implemented.')

                if not isinstance(recommendations, np.ndarray):
                    recommendations = np.array(recommendations)

                # Calculate nDCG
                true_relevance = test_interaction_matrix[user_id, :].reshape(1, -1)
                predicted_scores = np.zeros((1, train_interaction_matrix.shape[1]))
                predicted_scores[0, recommendations] = 1
                nDCG_score = ndcg_score(true_relevance, predicted_scores)
                nDCG_scores.append(nDCG_score)

            avg_nDCG = np.mean(nDCG_scores)
            results.append({'User Group': group_name, 'Recommender': recommender_name, 'Average nDCG': avg_nDCG})

    return pd.DataFrame(results)

Here, you will implement a function that evaluates the performance of the recommenders across different user groups based on their interaction levels. You will need to split the users into two groups: one with low interaction levels (below or equal a certain threshold) and one with high interaction levels (above the threshold). The function should then call the `evaluate_ndcg_by_user_groups` function to calculate the average nDCG scores for each recommender across the user groups. Make sure to only use the passed variables and parameters in your implementation.

In [10]:
def evaluate_recommenders(user_info: pd.DataFrame, parameters: dict, recommender: dict, user_threshold: int) -> (pd.DataFrame, dict):
    """
    Evaluates recommenders across user groups based on interaction levels.

    Splits users into low and high interaction groups based on a threshold and calculates
    average nDCG scores for each recommender within each group.

    user_info - pd.DataFrame, DataFrame containing user information.
    parameters - dict, Dictionary containing data and parameters for evaluation, including:
        train_interaction_matrix - 2D np.ndarray, test_interaction_matrix - 2D np.ndarray,
        U - 2D np.ndarray, V - 2D np.ndarray, item_embeddings - pd.DataFrame, topK - int,
        n_neighbors - int.
    recommender - dict, Dictionary of recommender functions, with keys as recommender names
                        and values as the corresponding functions.
    user_threshold - int, Threshold for dividing users into low and high interaction groups.

    returns - tuple:
        pd.DataFrame, DataFrame containing evaluation results with columns: 'User Group',
            'Recommender', 'Average nDCG'.
        dict, Dictionary containing the user groups with keys 'Low Interaction' and
            'High Interaction', and values as lists of user IDs.

    """

    evaluation_results_df = None

    user_info['interactions'] = np.sum(parameters['train_interaction_matrix'] > 0, axis=1)

    user_groups = {
        'Low Interaction': user_info[user_info['interactions'] <= user_threshold].index.to_list(),
        'High Interaction': user_info[user_info['interactions'] > user_threshold]['user_id'].index.to_list()
    }

    # TODO: YOUR IMPLEMENTATION
    evaluation_results_df = evaluate_ndcg_by_user_groups(user_groups, 
                                                         recommender,
                                                         parameters['train_interaction_matrix'], 
                                                         parameters['test_interaction_matrix'], 
                                                         parameters['U'], 
                                                         parameters['V'], 
                                                         parameters['item_embeddings'], 
                                                         calculate_user_profile_embedding, 
                                                         compute_aggregated_scores, 
                                                         parameters['topK'], 
                                                         parameters['n_neighbors'])

    return evaluation_results_df, user_groups

The following Cell will evaluate the implemented recommenders on the given dataset. The evaluation results will be displayed in a DataFrame, showing the average nDCG scores for each recommender across different user groups. This Cell is for you to see how the input looks like. For a correct evaluation, the code below needs to run without errors and the nDCG scores need to be output as described.

In [11]:
# Define recommenders with correct parameters
recommenders = {
    'Avg_Item_Embd': average_embedding_similarity_rec,
    'Aggr_Item_Sim': aggregated_item_similarity_rec,
    'SVD': svd_recommend_to_list,
    'ItemKNN': recTopK,
    'TopPop': recTopKPop
}

U, V = svd_decompose(train_interaction_matrix)

data = {
    'train_interaction_matrix': train_interaction_matrix,
    'test_interaction_matrix': test_interaction_matrix,
    'U': U,
    'V': V,
    'item_embeddings': embedding,
    '_calculate_user_profile_embedding': calculate_user_profile_embedding,
    '_compute_aggregated_scores': compute_aggregated_scores,
    'topK': 10,
    'n_neighbors': 5}

evaluation_results_df, user_groups = evaluate_recommenders(users, data, recommenders, user_threshold=5)
print(f"Number of Users with low interaction levels: {len(user_groups['Low Interaction'])}")
print(f"Number of Users with high interaction levels: {len(user_groups['High Interaction'])}")
print(evaluation_results_df)

Evaluating Low Interaction Users: 100%|██████████| 5/5 [00:43<00:00,  8.68s/it]
Evaluating High Interaction Users: 100%|██████████| 5/5 [01:30<00:00, 18.12s/it]

Number of Users with low interaction levels: 560
Number of Users with high interaction levels: 655
         User Group    Recommender  Average nDCG
0   Low Interaction  Avg_Item_Embd      0.174140
1   Low Interaction  Aggr_Item_Sim      0.173580
2   Low Interaction            SVD      0.223840
3   Low Interaction        ItemKNN      0.252406
4   Low Interaction         TopPop      0.199557
5  High Interaction  Avg_Item_Embd      0.219581
6  High Interaction  Aggr_Item_Sim      0.220410
7  High Interaction            SVD      0.278832
8  High Interaction        ItemKNN      0.326684
9  High Interaction         TopPop      0.266074





In [49]:
# The end.