# Experimentation and Results

## Objective of the project 

This study seeks to conduct a thorough comparative analysis of these three models, focusing
on their performance with regards to accuracy, computational complexity, scalability, and their
effectiveness in handling data sparsity and dynamically changing environments. By evaluat-
ing these aspects, the research aims to illuminate the operational strengths and weaknesses
of each model, providing clear insights that could guide the development and deployment of
future recommender systems. Through this comparative framework, we aspire to answer which
model, under what conditions, provides the most reliable and robust recommendations, thereby
significantly contributing to the optimization of digital services.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
from collections import Counter, defaultdict
from surprise import Dataset, Reader, KNNBasic, SVD, CoClustering, accuracy
from surprise.model_selection import train_test_split
from surprise.accuracy import rmse, mae


In [3]:
links_df = pd.read_csv('MovieLens_100k/links.csv')
movies_df = pd.read_csv('MovieLens_100k/movies.csv')
ratings_df = pd.read_csv('MovieLens_100k/ratings.csv')
tags_df = pd.read_csv('MovieLens_100k/tags.csv')

datasets = {
    "Links": links_df,
    "Movies": movies_df,
    "Ratings": ratings_df,
    "Tags": tags_df
}

datasets_info = {name: df.head() for name, df in datasets.items()}
datasets_info

{'Links':    movieId  imdbId   tmdbId
 0        1  114709    862.0
 1        2  113497   8844.0
 2        3  113228  15602.0
 3        4  114885  31357.0
 4        5  113041  11862.0,
 'Movies':    movieId                               title  \
 0        1                    Toy Story (1995)   
 1        2                      Jumanji (1995)   
 2        3             Grumpier Old Men (1995)   
 3        4            Waiting to Exhale (1995)   
 4        5  Father of the Bride Part II (1995)   
 
                                         genres  
 0  Adventure|Animation|Children|Comedy|Fantasy  
 1                   Adventure|Children|Fantasy  
 2                               Comedy|Romance  
 3                         Comedy|Drama|Romance  
 4                                       Comedy  ,
 'Ratings':    userId  movieId  rating  timestamp
 0       1        1     4.0  964982703
 1       1        3     4.0  964981247
 2       1        6     4.0  964982224
 3       1       47     5.0  9

## Dataset structure

In [4]:
# Check for missing values in each dataset
missing_values = {name: df.isnull().sum() for name, df in datasets.items()}

# Print the information about missing values
for name, missing in missing_values.items():
    print(f"Missing values in {name} dataset:\n{missing}\n")

Missing values in Links dataset:
movieId    0
imdbId     0
tmdbId     8
dtype: int64

Missing values in Movies dataset:
movieId    0
title      0
genres     0
dtype: int64

Missing values in Ratings dataset:
userId       0
movieId      0
rating       0
timestamp    0
dtype: int64

Missing values in Tags dataset:
userId       0
movieId      0
tag          0
timestamp    0
dtype: int64



In [5]:
# Print the shape of each DataFrame
for name, df in datasets.items():
    print(f"The shape of the {name} DataFrame is: {df.shape}")

The shape of the Links DataFrame is: (9742, 3)
The shape of the Movies DataFrame is: (9742, 3)
The shape of the Ratings DataFrame is: (100836, 4)
The shape of the Tags DataFrame is: (3683, 4)


In [6]:
distribution_of_ratings = ratings_df.groupby('rating').size().reset_index(name='count')
distribution_of_ratings

Unnamed: 0,rating,count
0,0.5,1370
1,1.0,2811
2,1.5,1791
3,2.0,7551
4,2.5,5550
5,3.0,20047
6,3.5,13136
7,4.0,26818
8,4.5,8551
9,5.0,13211


# Collaborative Filtering Algorithms

### For the collaborative filtering, we have implemented 3 algorithms liisted below: 

### a. KNNBasic (K-Nearest Neighbors)
The KNNBasic algorithm leverages the k-nearest neighbors technique to predict user ratings
based on the weighted average of ratings from similar users or items. For KNN, we have chosen 3 different similarity measures to test: Pearson, Pearson baseline and Mean squared difference. Refer to the technical report for more detail. 

### b. SVD (Singular Value Decomposition)
SVD: SVD is a matrix factorization technique that decomposes the user-item rating matrix into
latent factors, enabling the prediction of ratings through these latent factors.

### c. CoClustering
CoClustering: CoClustering simultaneously clusters users and items to uncover hidden re-
lationships in the data, facilitating more accurate rating predictions.

### Useful functions 

In [7]:
def get_top_n(predictions, n=10):
    """Return the top-N recommendation for each user from a set of predictions."""
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Map the predictions to only the top N items
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

def get_top_n_recommendations(user_id, n=10):
    # Get a list of all movies in the dataset
    all_movies = movies_df['movieId'].unique()
    
    # Get movies that the user has already rated
    rated_movies = ratings_df[ratings_df['userId'] == user_id]['movieId'].tolist()
    
    # Predict ratings for all movies the user hasn't rated yet
    predictions = []
    for movie_id in set(all_movies) - set(rated_movies):
        pred = model.predict(uid=user_id, iid=movie_id)
        predictions.append((movie_id, pred.est))
    
    # Sort the predictions by estimated rating in descending order and select the top N
    top_n = sorted(predictions, key=lambda x: x[1], reverse=True)[:n]
    
    # Map the movie IDs back to titles
    top_n_movies = [(movies_df[movies_df['movieId'] == mid]['title'].values[0], est) for mid, est in top_n]
    
    return top_n_movies

def precision_recall_at_k(predictions, k=10, threshold=0.7):  
        user_est_true = defaultdict(list)
        for uid, _, true_r, est, _ in predictions:
            user_est_true[uid].append((est, true_r))
        precisions = dict()
        recalls = dict()
        for uid, user_ratings in user_est_true.items():
            user_ratings.sort(key=lambda x: x[0], reverse=True)
            n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)
            n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])
            n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold)) for (est, true_r) in user_ratings[:k])
            precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0
            recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0
        return precisions, recalls
     
def compute_rmse(predictions):
    """Compute Root Mean Squared Error (RMSE)."""
    mse = np.mean([(true_r - est) ** 2 for (_, _, true_r, est, _) in predictions])
    rmse = np.sqrt(mse)
    return rmse

def compute_mae(predictions):
    """Compute Mean Absolute Error (MAE)."""
    mae = np.mean([abs(true_r - est) for (_, _, true_r, est, _) in predictions])
    return mae

### In this part, we use the Surprise library, renowned for its robust implementation of various collaborative filtering algorithms, to evaluate different recommendation system models. Specifically, we implement KNNBasic, SVD, and CoClustering algorithms, chosen for their widespread recognition and effectiveness in collaborative filtering tasks. 

In [17]:
import time
import pandas as pd
from surprise import Dataset, Reader, KNNBasic, SVD, CoClustering, accuracy
from surprise.model_selection import train_test_split


#Function to evaluate a model with a given algorithm and similarity measure
def evaluate_algorithm(algo_name, similarity_measure, train_set, test_set, user_based=True):
    if algo_name == 'KNNBasic':
        sim_options = {
            'name': similarity_measure,
            'user_based': user_based
        }
        model = KNNBasic(sim_options=sim_options)
    elif algo_name == 'SVD':
        model = SVD()
    elif algo_name == 'CoClustering':
        model = CoClustering()

    # Measure start time
    start_time = time.time()
    
    # Train the model
    model.fit(train_set)
    
    # Make predictions on the test set
    predictions = model.test(test_set)

    # Measure end time
    end_time = time.time()
    # Calculate running time
    running_time = end_time - start_time
    
    # Evaluate accuracy
    rmse_score = accuracy.rmse(predictions, verbose=False)
    mae_score = accuracy.mae(predictions, verbose=False)
    
    # Compute precision and recall
    precisions, recalls = precision_recall_at_k(predictions, k=10, threshold=0.7)
    precision_avg = sum(prec for prec in precisions.values()) / len(precisions)
    recall_avg = sum(rec for rec in recalls.values()) / len(recalls)
    
    return algo_name, similarity_measure, user_based, rmse_score, mae_score, precision_avg, recall_avg, running_time

# Function to evaluate all scenarios
def evaluate_all_scenarios(train_set, test_set, scenario_name):
    results_combined = []
    for algo_name, similarity_measure in algorithms:
        for user_based in [True, False]:
            algo_name, similarity_measure, user_based, rmse_score, mae_score, precision_avg, recall_avg, running_time = evaluate_algorithm(algo_name, similarity_measure, train_set, test_set, user_based)
            results_combined.append({
                'Scenario': scenario_name,
                'Algorithm': algo_name,
                'Similarity Measure': similarity_measure if similarity_measure else 'N/A',
                'User-Based': user_based,
                'RMSE': rmse_score,
                'MAE': mae_score,
                'Precision@10': precision_avg,
                'Recall@10': recall_avg,
                'Running Time (s)': running_time
            })
    return results_combined

# Create a Surprise dataset
ratings_df['rating'] = (ratings_df['rating'] - ratings_df['rating'].min()) / (ratings_df['rating'].max() - ratings_df['rating'].min())

reader = Reader(rating_scale=(ratings_df['rating'].min(), ratings_df['rating'].max()))
data = Dataset.load_from_df(ratings_df[['userId', 'movieId', 'rating']], reader)


# Split the data into training and test sets
train_set, test_set = train_test_split(data, test_size=0.20)

# Output to check
train_set.n_ratings, len(test_set)

# List of algorithms and their similarity measures to evaluate
algorithms = [
    ('KNNBasic', 'pearson'),
    ('KNNBasic', 'pearson_baseline'),
    ('KNNBasic', 'msd'),
    ('SVD', None),  # SVD does not use similarity measures
    ('CoClustering', None)  # CoClustering does not use similarity measures
]

# Evaluate normal scenario
results_normal = evaluate_all_scenarios(train_set, test_set, "Normal")

# Simulation for sparse data
sparse_ratings_df = ratings_df.sample(frac=0.1, random_state=42)  # Keeping only 10% of the data
sparse_data = Dataset.load_from_df(sparse_ratings_df[['userId', 'movieId', 'rating']], reader)
sparse_train_set, sparse_test_set = train_test_split(sparse_data, test_size=0.20)

# Evaluate sparse data scenario
results_sparse = evaluate_all_scenarios(sparse_train_set, sparse_test_set, "Sparse")

# Simulation for cold start problem (new users)
new_user_ratings_df = ratings_df[ratings_df['userId'].isin(ratings_df['userId'].sample(frac=0.1, random_state=42))]

# Create a Surprise dataset for new user data
new_user_data = Dataset.load_from_df(new_user_ratings_df[['userId', 'movieId', 'rating']], reader)
new_user_train_set, new_user_test_set = train_test_split(new_user_data, test_size=0.20)

# Evaluate new user data scenario
results_new_user = evaluate_all_scenarios(new_user_train_set, new_user_test_set, "New User")

Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Computing the pearson similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Compu

In [19]:
results_CF1 = pd.DataFrame(results_normal + results_sparse + results_new_user)
results_CF1

Unnamed: 0,Scenario,Algorithm,Similarity Measure,User-Based,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,KNNBasic,pearson,True,0.217433,0.167403,0.681419,0.482391,1.577275
1,Normal,KNNBasic,pearson,False,0.214941,0.166982,0.517631,0.389731,14.983212
2,Normal,KNNBasic,pearson_baseline,True,0.217501,0.167079,0.676271,0.487729,1.601246
3,Normal,KNNBasic,pearson_baseline,False,0.204038,0.154259,0.609774,0.461591,12.312512
4,Normal,KNNBasic,msd,True,0.213605,0.163704,0.672722,0.488166,1.281166
5,Normal,KNNBasic,msd,False,0.20216,0.15562,0.522313,0.41725,10.44312
6,Normal,SVD,,True,0.19944,0.153197,0.636856,0.450793,1.219783
7,Normal,SVD,,False,0.199464,0.153466,0.636138,0.449077,1.084938
8,Normal,CoClustering,,True,0.558738,0.512389,0.111245,0.032385,2.333833
9,Normal,CoClustering,,False,0.561689,0.513963,0.144278,0.04202,2.31691


### Observation

User-based algorithms generally outperformed item-based algorithms across the different sce-
narios tested. Specifically, KNNBasic with Pearson similarity in a user-based setting demon-
strated competitive performance, particularly excelling in the normal and new user scenarios.
It achieved notable metrics, including high Precision@10 and Recall@10, highlighting its ef-
fectiveness in these contexts. SVD consistently delivered strong results across all scenarios,
achieving the lowest RMSE and MAE values, coupled with high Precision@10 and Recall@10.
This consistent performance, combined with relatively short running times, underscores SVD’s
robustness and reliability as a recommendation algorithm for the dataset we have worked on.

# Graph-based Algorithms

### For the graph-based models, we have implemented 3 algorithms as well: LightGCN, Graph Attention Network, GraphSAGE
Graph Construction: In the graph construction phase, nodes are created to represent both users
and movies, while edges represent the interactions between these users and movies based on
their ratings. To facilitate this, numpy is utilized for constructing the adjacency matrix, which
captures the user-movie interaction graph in a structured form. This adjacency matrix serves
as the foundation for various graph-based algorithms, allowing us to represent the relationships
between users and movies effectively. Additionally, TensorFlow is employed to handle the graph
representation and computation, leveraging its powerful capabilities for efficient processing and
model training in subsequent stages. 

### In this first part of the code, we have implemented all the useful functions and code that all the 3 algorithms shared in commun to avoid code redundancy. 

In [37]:
import numpy as np
import pandas as pd
import tensorflow as tf
import time
from collections import defaultdict
from sklearn.preprocessing import LabelEncoder

# Load the dataset
ratings = pd.read_csv('MovieLens_100k/ratings.csv')
movies = pd.read_csv('MovieLens_100k/movies.csv')

# Data Preprocessing
user_encoder = LabelEncoder()
item_encoder = LabelEncoder()

ratings['user'] = user_encoder.fit_transform(ratings['userId'])
ratings['item'] = item_encoder.fit_transform(ratings['movieId'])

num_users = ratings['user'].nunique()
num_items = ratings['item'].nunique()

# Normalize the ratings 
ratings['rating'] = (ratings['rating'] - ratings['rating'].min()) / (ratings['rating'].max() - ratings['rating'].min())

def get_sparse_data(ratings, frac=0.1):
    sparse_ratings_df = ratings.sample(frac=frac, random_state=42)  
    return sparse_ratings_df

def get_new_user_data(ratings, frac=0.1):
    new_user_ratings_df = ratings[ratings['userId'].isin(ratings['userId'].sample(frac=frac, random_state=42))]
    return new_user_ratings_df

def evaluate_model(model, train_data, test_data, adj_matrix=None):
    # Prepare the training data
    user_indices = train_data['userId'].values
    item_indices = train_data['movieId'].values
    labels = train_data['rating'].values

    dataset = tf.data.Dataset.from_tensor_slices((user_indices, item_indices, labels))
    dataset = dataset.shuffle(buffer_size=len(train_data)).batch(256)

    # Training loop
    optimizer = tf.keras.optimizers.Adam()
    loss_fn = tf.keras.losses.MeanSquaredError()  

    @tf.function
    def train_step(user_indices, item_indices, labels):
        with tf.GradientTape() as tape:
            if adj_matrix is not None:
                scores = model(user_indices, item_indices, adj_matrix)
            else:
                scores = model(user_indices, item_indices)
            loss = loss_fn(labels, scores)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        return loss

    num_epochs = 10
    start_time = time.time()

    for epoch in range(num_epochs):
        for batch in dataset:
            user_indices_batch, item_indices_batch, labels_batch = batch
            loss = train_step(user_indices_batch, item_indices_batch, labels_batch)
        print(f'Epoch {epoch}, Loss: {loss.numpy()}')

    training_time = time.time() - start_time

    # Evaluating the model
    predictions = []

    test_user_indices = test_data['user'].values
    test_item_indices = test_data['item'].values
    test_labels = test_data['rating'].values

    for (user_index, item_index, label) in zip(test_user_indices, test_item_indices, test_labels):
        user_index_tensor = tf.constant([user_index])
        item_index_tensor = tf.constant([item_index])
        if adj_matrix is not None:
            score = model(user_index_tensor, item_index_tensor, adj_matrix).numpy()[0]
        else:
            score = model(user_index_tensor, item_index_tensor).numpy()[0]
        predictions.append((user_index, item_index, label, score, 0))

    return predictions, training_time

def compute_metrics(predictions):
    def get_top_n(predictions, n=10):
        top_n = defaultdict(list)
        for uid, iid, true_r, est, _ in predictions:
            top_n[uid].append((iid, est))
        for uid, user_ratings in top_n.items():
            user_ratings.sort(key=lambda x: x[1], reverse=True)
            top_n[uid] = user_ratings[:n]
        return top_n

    def precision_recall_at_k(predictions, k=10, threshold=0.7): 
        user_est_true = defaultdict(list)
        for uid, _, true_r, est, _ in predictions:
            user_est_true[uid].append((est, true_r))
        precisions = dict()
        recalls = dict()
        for uid, user_ratings in user_est_true.items():
            user_ratings.sort(key=lambda x: x[0], reverse=True)
            n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)
            n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])
            n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold)) for (est, true_r) in user_ratings[:k])
            precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0
            recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0
        return precisions, recalls

    def compute_rmse(predictions):
        mse = np.mean([(true_r - est) ** 2 for (_, _, true_r, est, _) in predictions])
        rmse = np.sqrt(mse)
        return rmse

    def compute_mae(predictions):
        mae = np.mean([abs(true_r - est) for (_, _, true_r, est, _) in predictions])
        return mae

    rmse = compute_rmse(predictions)
    mae = compute_mae(predictions)
    precisions, recalls = precision_recall_at_k(predictions, k=10)

    precision_at_10 = np.mean(list(precisions.values()))
    recall_at_10 = np.mean(list(recalls.values()))

    return rmse, mae, precision_at_10, recall_at_10

# Function to run evaluation for each model and scenario
def run_evaluation(model, model_name, train_data, test_data, adj_matrix=None):
    predictions, training_time = evaluate_model(model, train_data, test_data, adj_matrix)
    rmse, mae, precision_at_10, recall_at_10 = compute_metrics(predictions)
    results = pd.DataFrame({
        "Scenario": [scenario],
        "Algorithm": [model_name],
        "RMSE": [rmse],
        "MAE": [mae],
        "Precision@10": [precision_at_10],
        "Recall@10": [recall_at_10],
        "Running Time (s)": [training_time]
    })
    return results

## LightGCN Algorithms

In [15]:
# Evaluate LightGCN
class LightGCN(tf.keras.Model):
    def __init__(self, num_users, num_items, embedding_dim):
        super(LightGCN, self).__init__()
        self.user_embedding = tf.keras.layers.Embedding(num_users, embedding_dim)
        self.item_embedding = tf.keras.layers.Embedding(num_items, embedding_dim)

    def call(self, user_indices, item_indices):
        user_embeddings = self.user_embedding(user_indices)
        item_embeddings = self.item_embedding(item_indices)
        scores = tf.reduce_sum(user_embeddings * item_embeddings, axis=1)
        return scores

embedding_dim = 64
lightgcn_model = LightGCN(num_users, num_items, embedding_dim)


# Evaluate normal scenario
scenario = "Normal"
train_data = ratings.sample(frac=0.8, random_state=42)
test_data = ratings.drop(train_data.index)
results_lightGCN_normal = run_evaluation(lightgcn_model, "LightGCN", train_data, test_data)

# Evaluate sparse scenario
scenario = "Sparse"
train_data = get_sparse_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_lightGCN_sparse = run_evaluation(lightgcn_model, "LightGCN", train_data, test_data)

# Evaluate new user scenario
scenario = "New User"
train_data = get_new_user_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_lightGCN_new_user = run_evaluation(lightgcn_model, "LightGCN", train_data, test_data)

# Combine LightGCN results into a single DataFrame
results_lightGCN_combined = pd.concat([results_lightGCN_normal, results_lightGCN_sparse, results_lightGCN_new_user], ignore_index=True)

2024-07-22 19:31:03.217447: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.28932985663414


2024-07-22 19:31:04.882563: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.0534636490046978


2024-07-22 19:31:07.520747: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.01657233014702797


2024-07-22 19:31:09.328605: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.01574024371802807


2024-07-22 19:31:11.211531: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.04002286493778229


2024-07-22 19:31:12.985620: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.010989277623593807


2024-07-22 19:31:14.825036: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.01989801414310932


2024-07-22 19:31:16.658457: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.01790313608944416


2024-07-22 19:31:18.456158: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.00961360614746809


2024-07-22 19:31:20.565704: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.002892544027417898


2024-07-22 19:32:44.802943: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-07-22 19:32:44.996401: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.005441933870315552
Epoch 1, Loss: 0.0037565233651548624


2024-07-22 19:32:45.204268: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-07-22 19:32:45.404667: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.0032107247970998287
Epoch 3, Loss: 0.002524585695937276


2024-07-22 19:32:45.630849: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.0008240296156145632


2024-07-22 19:32:45.840155: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.0012967386282980442


2024-07-22 19:32:46.075698: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.0004003034846391529


2024-07-22 19:32:46.304455: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.0006772354827262461


2024-07-22 19:32:46.525788: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.0005122995935380459


2024-07-22 19:32:46.770635: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.0004288715135771781


2024-07-22 19:38:47.895628: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.0146847078576684


2024-07-22 19:38:49.890116: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.008437825366854668


2024-07-22 19:38:51.835151: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.011311208829283714


2024-07-22 19:38:53.593727: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.007394261192530394


2024-07-22 19:38:55.577969: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.007581524085253477


2024-07-22 19:38:57.926579: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.010632302612066269


2024-07-22 19:39:00.646273: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.009195016697049141


2024-07-22 19:39:02.962324: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.004009094089269638


2024-07-22 19:39:05.855038: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.00395465362817049


2024-07-22 19:39:07.926802: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.0026527103036642075


In [16]:
results_lightGCN_combined

Unnamed: 0,Scenario,Algorithm,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,LightGCN,0.248914,0.184007,0.670826,0.427905,20.89531
1,Sparse,LightGCN,0.142379,0.091555,0.946684,0.309876,2.923885
2,New User,LightGCN,0.151234,0.110233,0.844048,0.594584,23.099947


### Observation



# Graph Attention Network (GAT)

In [17]:
# Evaluate GAT
class GraphAttentionLayer(tf.keras.layers.Layer):
    def __init__(self, output_dim, attn_heads=1, dropout_rate=0.0, **kwargs):
        super(GraphAttentionLayer, self).__init__(**kwargs)
        self.output_dim = output_dim
        self.attn_heads = attn_heads
        self.dropout_rate = dropout_rate
        self.dropout = tf.keras.layers.Dropout(dropout_rate)
        self.attn_kernels = []
        self.attn_self_kernels = []

        for _ in range(attn_heads):
            self.attn_kernels.append(tf.keras.layers.Dense(output_dim, use_bias=False))
            self.attn_self_kernels.append(tf.keras.layers.Dense(1, use_bias=False))

    def call(self, inputs):
        features, adj_matrix = inputs
        attn_outs = []

        for kernel, self_kernel in zip(self.attn_kernels, self.attn_self_kernels):
            attn_out = kernel(features)
            attn_self = self_kernel(features)
            attn_all = tf.add(attn_self, tf.transpose(attn_self))
            attn_all = tf.nn.leaky_relu(attn_all)
            attn_all = tf.nn.softmax(attn_all, axis=-1)
            attn_all = self.dropout(attn_all)
            node_features = tf.matmul(attn_all, attn_out)
            attn_outs.append(node_features)

        return tf.concat(attn_outs, axis=-1)

class GATModel(tf.keras.Model):
    def __init__(self, num_users, num_items, embedding_dim, attn_heads=1, dropout_rate=0.0):
        super(GATModel, self).__init__()
        self.user_embedding = tf.keras.layers.Embedding(num_users, embedding_dim)
        self.item_embedding = tf.keras.layers.Embedding(num_items, embedding_dim)
        self.gat_layer = GraphAttentionLayer(embedding_dim, attn_heads, dropout_rate)

    def call(self, user_indices, item_indices, adj_matrix):
        user_embeddings = self.user_embedding(user_indices)
        item_embeddings = self.item_embedding(item_indices)
        all_embeddings = tf.concat([user_embeddings, item_embeddings], axis=0)
        all_embeddings = self.gat_layer([all_embeddings, adj_matrix])
        user_embeddings = all_embeddings[:tf.shape(user_indices)[0]]
        item_embeddings = all_embeddings[tf.shape(user_indices)[0]:]
        scores = tf.reduce_sum(user_embeddings * item_embeddings, axis=1)
        return scores

adj_matrix = np.zeros((num_users + num_items, num_users + num_items))
for _, row in ratings.iterrows():
    user_id = int(row['user'])
    item_id = int(row['item']) + num_users
    adj_matrix[user_id, item_id] = 1
    adj_matrix[item_id, user_id] = 1

adj_matrix = tf.convert_to_tensor(adj_matrix, dtype=tf.float32)

embedding_dim = 64
attn_heads = 8
dropout_rate = 0.5
gat_model = GATModel(num_users, num_items, embedding_dim, attn_heads, dropout_rate)

# Combine results for GAT

# Evaluate normal scenario
scenario = "Normal"
train_data = ratings.sample(frac=0.8, random_state=42)
test_data = ratings.drop(train_data.index)
results_GAT_normal = run_evaluation(gat_model, "GAT", train_data, test_data, adj_matrix)

# Evaluate sparse scenario
scenario = "Sparse"
train_data = get_sparse_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_GAT_sparse = run_evaluation(gat_model, "GAT", train_data, test_data, adj_matrix)

# Evaluate new user scenario
scenario = "New User"
train_data = get_new_user_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_GAT_new_user = run_evaluation(gat_model, "GAT", train_data, test_data, adj_matrix)

# Combine GAT results into a single DataFrame
results_GAT_combined = pd.concat([results_GAT_normal, results_GAT_sparse, results_GAT_new_user], ignore_index=True)

2024-07-22 19:41:57.120941: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.04164614528417587


2024-07-22 19:42:15.129184: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.09281927347183228


2024-07-22 19:42:36.500524: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.048704296350479126


2024-07-22 19:43:00.014601: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.07605619728565216


2024-07-22 19:43:39.974912: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.047069624066352844


2024-07-22 19:43:58.547278: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.06208371743559837


2024-07-22 19:44:21.376450: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.030390137806534767


2024-07-22 19:44:49.071506: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.04275280982255936


2024-07-22 19:45:10.581492: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.032630305737257004


2024-07-22 19:45:33.375788: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.04514634609222412


2024-07-22 19:55:38.077915: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.04317385330796242


2024-07-22 19:55:40.679911: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.04331973195075989


2024-07-22 19:55:43.115968: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.03145625814795494


2024-07-22 19:55:45.549496: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.02483309432864189


2024-07-22 19:55:48.397139: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.028472645208239555


2024-07-22 19:55:51.121040: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.04274964705109596


2024-07-22 19:55:54.028474: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.026976875960826874


2024-07-22 19:55:57.036478: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.0423617921769619


2024-07-22 19:55:59.775645: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.03205025941133499


2024-07-22 19:56:04.140274: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.02661043405532837


2024-07-22 20:32:55.267918: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.057373929768800735


2024-07-22 20:33:19.755149: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.03648923709988594


2024-07-22 20:33:43.454937: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.02544747292995453


2024-07-22 20:34:03.873626: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.05476325377821922


2024-07-22 20:34:23.846139: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.035788439214229584


2024-07-22 20:34:43.396208: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.06346458941698074


2024-07-22 20:35:02.985062: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.030774442479014397


2024-07-22 20:35:22.250132: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.03495458513498306


2024-07-22 20:35:41.558774: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.04711348935961723


2024-07-22 20:36:00.952542: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.033315509557724


In [18]:
results_GAT_combined

Unnamed: 0,Scenario,Algorithm,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,GAT,2.197966,1.59185,0.581629,0.54181,234.961458
1,Sparse,GAT,2.779026,1.96898,0.607473,0.210916,33.093867
2,New User,GAT,1.605601,1.060572,0.614286,0.474437,209.951633


# GraphSAGE (SAmple and aggreGatE)

In [19]:
# Evaluate GraphSAGE
class GraphSAGELayer(tf.keras.layers.Layer):
    def __init__(self, output_dim, aggregator_type='mean', dropout_rate=0.0, **kwargs):
        super(GraphSAGELayer, self).__init__(**kwargs)
        self.output_dim = output_dim
        self.aggregator_type = aggregator_type
        self.dropout_rate = dropout_rate
        self.dense_self = tf.keras.layers.Dense(output_dim, use_bias=False)
        self.dense_neighbor = tf.keras.layers.Dense(output_dim, use_bias=False)
        self.dropout = tf.keras.layers.Dropout(dropout_rate)
        self.act = tf.nn.relu

    def call(self, inputs):
        features, adj_matrix = inputs
        if self.aggregator_type == 'mean':
            neighbor_features = tf.matmul(adj_matrix, features)
            node_features = self.dense_self(features) + self.dense_neighbor(neighbor_features)
        elif self.aggregator_type == 'max':
            neighbor_features = tf.reduce_max(tf.matmul(adj_matrix, features), axis=1)
            node_features = self.dense_self(features) + self.dense_neighbor(neighbor_features)
        elif self.aggregator_type == 'lstm':
            raise NotImplementedError("LSTM aggregator is not implemented in this example.")
        else:
            raise ValueError(f"Unknown aggregator type: {self.aggregator_type}")
        
        node_features = self.act(node_features)
        node_features = self.dropout(node_features)
        return node_features

class GraphSAGEModel(tf.keras.Model):
    def __init__(self, num_users, num_items, embedding_dim, aggregator_type='mean', dropout_rate=0.0):
        super(GraphSAGEModel, self).__init__()
        self.user_embedding = tf.keras.layers.Embedding(num_users, embedding_dim)
        self.item_embedding = tf.keras.layers.Embedding(num_items, embedding_dim)
        self.graphsage_layer = GraphSAGELayer(embedding_dim, aggregator_type, dropout_rate)

    def call(self, user_indices, item_indices, adj_matrix):
        user_embeddings = self.user_embedding(user_indices)
        item_embeddings = self.item_embedding(item_indices)
        all_user_embeddings = self.user_embedding(tf.range(num_users))
        all_item_embeddings = self.item_embedding(tf.range(num_items))
        all_embeddings = tf.concat([all_user_embeddings, all_item_embeddings], axis=0)
        all_embeddings = self.graphsage_layer([all_embeddings, adj_matrix])
        user_embeddings = tf.gather(all_embeddings, user_indices)
        item_embeddings = tf.gather(all_embeddings, item_indices + num_users)
        scores = tf.reduce_sum(user_embeddings * item_embeddings, axis=1)
        return scores

embedding_dim = 64
aggregator_type = 'mean'
dropout_rate = 0.5
graphsage_model = GraphSAGEModel(num_users, num_items, embedding_dim, aggregator_type, dropout_rate)

# Combine results for GraphSAGE

# Evaluate normal scenario
scenario = "Normal"
train_data = ratings.sample(frac=0.8, random_state=42)
test_data = ratings.drop(train_data.index)
results_SAGE_normal = run_evaluation(graphsage_model, "GraphSAGE", train_data, test_data, adj_matrix)

# Evaluate sparse scenario
scenario = "Sparse"
train_data = get_sparse_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_SAGE_sparse = run_evaluation(graphsage_model, "GraphSAGE", train_data, test_data, adj_matrix)

# Evaluate new user scenario
scenario = "New User"
train_data = get_new_user_data(ratings, frac=0.1)
test_data = ratings.drop(train_data.index)
results_SAGE_new_user = run_evaluation(graphsage_model, "GraphSAGE", train_data, test_data, adj_matrix)

# Combine GraphSAGE results into a single DataFrame
results_SAGE_combined = pd.concat([results_SAGE_normal, results_SAGE_sparse, results_SAGE_new_user], ignore_index=True)

2024-07-22 20:43:35.659895: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.06893366575241089


2024-07-22 20:45:23.889446: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.08900823444128036


2024-07-22 20:47:34.639566: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.12095379084348679


2024-07-22 20:49:57.484678: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.0843137726187706


2024-07-22 20:51:47.706927: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.05222022160887718


2024-07-22 20:53:35.361876: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.05716760456562042


2024-07-22 20:55:20.705604: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.0774926245212555


2024-07-22 20:57:08.923738: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.05163055285811424


2024-07-22 20:58:51.604235: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.05806591734290123


2024-07-22 21:00:32.962941: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.026816928759217262


2024-07-22 22:31:16.546231: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.0421975702047348


2024-07-22 22:31:28.935082: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.028855610638856888


2024-07-22 22:31:41.315727: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.030069472268223763


2024-07-22 22:31:53.692435: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.024326035752892494


2024-07-22 22:32:06.996283: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.01785353012382984


2024-07-22 22:32:18.823266: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.028327492997050285


2024-07-22 22:32:30.651658: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.019512180238962173


2024-07-22 22:32:42.434182: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.022672701627016068


2024-07-22 22:32:54.157859: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.022627104073762894


2024-07-22 22:33:05.761436: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.022172890603542328


2024-07-23 06:07:43.977535: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 0, Loss: 0.03333196043968201


2024-07-23 06:09:36.635938: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 1, Loss: 0.02856939472258091


2024-07-23 06:11:30.918223: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 2, Loss: 0.042769815772771835


2024-07-23 06:13:23.846837: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 3, Loss: 0.03738968074321747


2024-07-23 06:15:17.369917: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 4, Loss: 0.04058655723929405


2024-07-23 06:17:12.636314: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 5, Loss: 0.02843991480767727


2024-07-23 06:19:05.741806: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 6, Loss: 0.03035759925842285


2024-07-23 06:20:59.629750: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 7, Loss: 0.03030986897647381


2024-07-23 06:22:52.686607: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 8, Loss: 0.029295336455106735


2024-07-23 06:24:51.456300: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


Epoch 9, Loss: 0.026935644447803497


In [21]:
results_SAGE_combined

Unnamed: 0,Scenario,Algorithm,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,GraphSAGE,0.233555,0.180065,0.650521,0.443586,1135.601531
1,Sparse,GraphSAGE,0.227016,0.176313,0.774862,0.230562,123.121829
2,New User,GraphSAGE,0.421851,0.375382,0.142857,0.007143,1141.266339


# Hypergraph-based Models 

### For the hypergraph-based models, we have implemented 2 algorithms: HyperGCN which an extension of Graph Convolution Network on Hypergraphs and Node2Vec algorithm. 

This part gather all the code thaat the 2 algoriithms have in commun notably the construction of the hypergraph with hyperedge users who has given the same rating on a movie. 

In [32]:
import pandas as pd
from sklearn.model_selection import train_test_split
import hypernetx as hnx
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, regularizers
from collections import defaultdict
from joblib import Parallel, delayed
import time
from sklearn.preprocessing import MinMaxScaler
from scipy.sparse import csr_matrix
import matplotlib.pyplot as plt


# Normalize ratings
scaler = MinMaxScaler()
ratings_df['rating'] = scaler.fit_transform(ratings_df[['rating']])

# Split the data into training and test sets
train, test = train_test_split(ratings_df, test_size=0.2, random_state=42)

# Build the hypergraph
edges = defaultdict(list)
for _, row in train.iterrows():
    user_node = f'user_{row["userId"]}'
    movie_node = f'movie_{row["movieId"]}'
    rating = row["rating"]
    hyperedge = f'{movie_node}_rating_{rating}'
    edges[hyperedge].append(user_node)
    edges[hyperedge].append(movie_node)

H = hnx.Hypergraph(edges)
print(f"Hypergraph created with {len(H.nodes)} nodes and {len(H.edges)} edges.")
print(f"Number of nodes in hypergraph: {len(H.nodes)}")
print(f"Sample nodes: {list(H.nodes)[:5]}")

sparse_data = get_sparse_data(ratings_df)
new_user_data = get_new_user_data(ratings_df)

# Create adjacency matrix for hypergraph
def create_hypergraph_adjacency_matrix(hypergraph):
    node_list = list(hypergraph.nodes)
    node_idx = {node: idx for idx, node in enumerate(node_list)}
    n = len(node_list)
    
    data = []
    row = []
    col = []

    for edge in hypergraph.edges:
        edge_nodes = list(hypergraph.edges[edge])
        for i in range(len(edge_nodes)):
            for j in range(i + 1, len(edge_nodes)):
                node_i = node_idx[edge_nodes[i]]
                node_j = node_idx[edge_nodes[j]]
                data.append(1)
                row.append(node_i)
                col.append(node_j)
                data.append(1)
                row.append(node_j)
                col.append(node_i)

    adj_matrix = csr_matrix((data, (row, col)), shape=(n, n))
    print(f"Adjacency matrix created with shape {adj_matrix.shape}")
    return adj_matrix, node_idx

adj_matrix, node_to_idx = create_hypergraph_adjacency_matrix(H)
adj_matrix_sparse = tf.sparse.SparseTensor(indices=np.array([adj_matrix.nonzero()[0], adj_matrix.nonzero()[1]]).T,
                                           values=adj_matrix.data.astype(np.float32),
                                           dense_shape=adj_matrix.shape)
adj_matrix_sparse = tf.sparse.reorder(adj_matrix_sparse)


# Evaluation metrics functions
def precision_recall_at_k(predictions, k=10, threshold=0.5):
    user_est_true = defaultdict(list)
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))
    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():
        user_ratings.sort(key=lambda x: x[0], reverse=True)
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)
        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])
        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold)) for (est, true_r) in user_ratings[:k])
        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 0
        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 0
    return precisions, recalls

def compute_rmse(predictions):
    mse = np.mean([(true_r - est) ** 2 for (_, _, true_r, est, _) in predictions])
    rmse = np.sqrt(mse)
    return rmse

def compute_mae(predictions):
    mae = np.mean([abs(true_r - est) for (_, _, true_r, est, _) in predictions])
    return mae

# Functions to generate sparse and new user data
def get_sparse_data(ratings, frac=0.1):
    sparse_ratings_df = ratings.sample(frac=frac, random_state=42) 
    return sparse_ratings_df

def get_new_user_data(ratings, frac=0.1):
    new_user_ratings_df = ratings[ratings['userId'].isin(ratings['userId'].sample(frac=frac, random_state=42))]
    return new_user_ratings_df

def evaluate_model(test, embeddings, user_mapping, movie_mapping, scenario, algorithm):
    def predict_rating(user, movie):
        if user in user_mapping and movie in movie_mapping:
            user_idx = user_mapping[user]
            movie_idx = movie_mapping[movie]
            if user_idx >= embeddings.shape[0] or movie_idx >= embeddings.shape[0]:
                return 0
            user_emb = embeddings[user_idx]
            movie_emb = embeddings[movie_idx]
            return np.dot(user_emb, movie_emb)
        else:
            return 0

    predictions = []
    for _, row in test.iterrows():
        uid = row['userId']
        mid = row['movieId']
        true_r = row['rating']
        est = predict_rating(uid, mid)
        predictions.append((uid, mid, true_r, est, None))

    rmse = compute_rmse(predictions)
    mae = compute_mae(predictions)
    precisions, recalls = precision_recall_at_k(predictions, k=10)

    avg_precision = np.mean(list(precisions.values()))
    avg_recall = np.mean(list(recalls.values()))

    results = pd.DataFrame({
        'Scenario': [scenario],
        'Algorithm': [algorithm],
        'RMSE': [rmse],
        'MAE': [mae],
        'Precision@10': [avg_precision],
        'Recall@10': [avg_recall]
    })
    
    return results

2024-07-25 20:36:28.175221: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Hypergraph created with 9593 nodes and 26961 edges.
Number of nodes in hypergraph: 9593
Sample nodes: ['user_509.0', 'movie_7347.0', 'user_380.0', 'user_274.0', 'user_474.0']
Adjacency matrix created with shape (9593, 9593)


# HyperGCN

In [13]:
class ImprovedHyperGCN(tf.keras.Model):
    def __init__(self, num_nodes, num_features, hidden_dims, output_dim, dropout_rate):
        super(ImprovedHyperGCN, self).__init__()
        self.embedding = layers.Embedding(num_nodes, num_features)
        self.hidden_layers = [layers.Dense(dim, activation='relu', kernel_regularizer=regularizers.l2(0.01)) for dim in hidden_dims]
        self.output_layer = layers.Dense(output_dim, activation='linear')
        self.dropout = layers.Dropout(dropout_rate)

    def call(self, adj_matrix):
        x = self.embedding(tf.range(adj_matrix.shape[0]))
        x = tf.sparse.sparse_dense_matmul(adj_matrix, x)
        for layer in self.hidden_layers:
            x = layer(x)
            x = self.dropout(x)
        x = self.output_layer(x)
        return x

def train_hypergcn_model(H, train, test, scenario):
    user_ids = train['userId'].unique()
    movie_ids = train['movieId'].unique()

    user_mapping = {user_id: idx for idx, user_id in enumerate(user_ids)}
    movie_mapping = {movie_id: idx + len(user_ids) for idx, movie_id in enumerate(movie_ids)}

    train['user_idx'] = train['userId'].map(user_mapping)
    train['movie_idx'] = train['movieId'].map(movie_mapping)

    test['user_idx'] = test['userId'].map(user_mapping)
    test['movie_idx'] = test['movieId'].map(movie_mapping)

    num_nodes = len(user_ids) + len(movie_ids)
    num_features = 128
    hidden_dims = [256, 128]
    output_dim = 1
    dropout_rate = 0.5

    model = ImprovedHyperGCN(num_nodes, num_features, hidden_dims, output_dim, dropout_rate)
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

    def pairwise_hinge_loss(positive_scores, negative_scores, margin=1.0):
        return tf.reduce_mean(tf.maximum(0.0, margin - positive_scores + negative_scores))

    def improved_sample_negative_edges(train_df, user_mapping, movie_mapping, num_neg_samples):
        users = train_df['userId'].unique()
        movies = train_df['movieId'].unique()
        positive_samples = train_df[['userId', 'movieId']].values
        positive_samples_set = set((user_mapping[user], movie_mapping[movie]) for user, movie in positive_samples)
        
        neg_samples = []
        while len(neg_samples) < num_neg_samples:
            user = np.random.choice(users)
            user_idx = user_mapping[user]
            negative_movies = np.setdiff1d(movies, train_df[train_df['userId'] == user]['movieId'].values)
            neg_movie = np.random.choice(negative_movies)
            neg_movie_idx = movie_mapping[neg_movie]
            if (user_idx, neg_movie_idx) not in positive_samples_set:
                neg_samples.append((user_idx, neg_movie_idx))
        
        neg_df = pd.DataFrame(neg_samples, columns=['user_idx', 'movie_idx'])
        print(f"Generated {len(neg_df)} negative samples")
        return neg_df

    positive_samples = train[['user_idx', 'movie_idx']]
    neg_train_samples = improved_sample_negative_edges(train, user_mapping, movie_mapping, num_neg_samples=len(positive_samples))
    print(f"Number of positive samples: {len(positive_samples)}")
    print(f"Number of negative samples: {len(neg_train_samples)}")

    epochs = 200
    batch_size = 256

    def train_step(model, adj_matrix, optimizer, positive_samples, negative_samples):
        with tf.GradientTape() as tape:
            positive_user_embeddings = model.embedding(positive_samples['user_idx'].values)
            positive_movie_embeddings = model.embedding(positive_samples['movie_idx'].values)
            negative_user_embeddings = model.embedding(negative_samples['user_idx'].values)
            negative_movie_embeddings = model.embedding(negative_samples['movie_idx'].values)

            positive_scores = tf.reduce_sum(positive_user_embeddings * positive_movie_embeddings, axis=1)
            negative_scores = tf.reduce_sum(negative_user_embeddings * negative_movie_embeddings, axis=1)

            loss = pairwise_hinge_loss(positive_scores, negative_scores)

        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        return loss

    start_time = time.time()
    for epoch in range(epochs):
        loss = train_step(model, adj_matrix_sparse, optimizer, positive_samples, neg_train_samples)
        if epoch % 10 == 0:
            print(f'Epoch {epoch}, Loss: {loss.numpy()}')

    end_time = time.time()
    embeddings = model.embedding(tf.range(num_nodes)).numpy()

    results = evaluate_model(test, embeddings, user_mapping, movie_mapping, scenario, "HyperGCN")
    results['Running Time (s)'] = end_time - start_time
    
    return results

# Evaluate HyperGCN model for different scenarios
results_hypergcn_normal = train_hypergcn_model(H, train, test, "Normal")
results_hypergcn_sparse = train_hypergcn_model(H, sparse_data, test, "Sparse")
results_hypergcn_new_user = train_hypergcn_model(H, new_user_data, test, "New User")

# Combine HyperGCN results into a single DataFrame
results_hypergcn_combined = pd.concat([results_hypergcn_normal, results_hypergcn_sparse, results_hypergcn_new_user], ignore_index=True)

Generated 80668 negative samples
Number of positive samples: 80668
Number of negative samples: 80668
Epoch 0, Loss: 0.999934732913971
Epoch 10, Loss: 0.9869195818901062
Epoch 20, Loss: 0.9526710510253906
Epoch 30, Loss: 0.8124383091926575
Epoch 40, Loss: 0.3995513319969177
Epoch 50, Loss: 0.1945350468158722
Epoch 60, Loss: 0.11867774277925491
Epoch 70, Loss: 0.07247839868068695
Epoch 80, Loss: 0.04291127249598503
Epoch 90, Loss: 0.024636704474687576
Epoch 100, Loss: 0.013938989490270615
Epoch 110, Loss: 0.007689888123422861
Epoch 120, Loss: 0.004110455513000488
Epoch 130, Loss: 0.0021072046365588903
Epoch 140, Loss: 0.001034366781823337
Epoch 150, Loss: 0.00048456471995450556
Epoch 160, Loss: 0.000214907675399445
Epoch 170, Loss: 8.409914880758151e-05
Epoch 180, Loss: 2.6276418793713674e-05
Epoch 190, Loss: 5.5467430684075225e-06
Generated 10084 negative samples
Number of positive samples: 10084
Number of negative samples: 10084
Epoch 0, Loss: 1.0000203847885132
Epoch 10, Loss: 0.96293

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train['user_idx'] = train['userId'].map(user_mapping)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train['movie_idx'] = train['movieId'].map(movie_mapping)


Generated 100654 negative samples
Number of positive samples: 100654
Number of negative samples: 100654
Epoch 0, Loss: 1.0000200271606445
Epoch 10, Loss: 0.9884659051895142
Epoch 20, Loss: 0.9515051245689392
Epoch 30, Loss: 0.7592405676841736
Epoch 40, Loss: 0.3240206837654114
Epoch 50, Loss: 0.1749018430709839
Epoch 60, Loss: 0.11328203231096268
Epoch 70, Loss: 0.07312606275081635
Epoch 80, Loss: 0.04645015299320221
Epoch 90, Loss: 0.028919674456119537
Epoch 100, Loss: 0.017635205760598183
Epoch 110, Loss: 0.010489451698958874
Epoch 120, Loss: 0.006125546991825104
Epoch 130, Loss: 0.0034394057001918554
Epoch 140, Loss: 0.001914128428325057
Epoch 150, Loss: 0.001048172009177506
Epoch 160, Loss: 0.0005439634551294148
Epoch 170, Loss: 0.0002834537881426513
Epoch 180, Loss: 0.00014076723891776055
Epoch 190, Loss: 6.63048485876061e-05


In [14]:
results_hypergcn_combined

Unnamed: 0,Scenario,Algorithm,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,HyperGCN,1.348297,1.068175,0.866068,0.516942,76.816076
1,Sparse,HyperGCN,0.586816,0.491858,0.812171,0.343434,16.090587
2,New User,HyperGCN,1.832062,1.540105,0.865797,0.606913,94.246901


# Node2Vec

In [43]:
import pandas as pd
from sklearn.model_selection import train_test_split
import hypernetx as hnx
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, regularizers
from collections import defaultdict
from joblib import Parallel, delayed
import time
from sklearn.preprocessing import MinMaxScaler
from scipy.sparse import csr_matrix
import matplotlib.pyplot as plt

class Node2Vec(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim):
        super(Node2Vec, self).__init__()
        self.embedding = layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim,
                                          embeddings_initializer=tf.keras.initializers.RandomNormal(stddev=1.0),
                                          embeddings_regularizer=tf.keras.regularizers.l2(1e-6))
        self.dropout = layers.Dropout(0.5)
        self.batch_norm = layers.BatchNormalization()
        self.dense1 = layers.Dense(256, activation='relu')
        self.dense2 = layers.Dense(128, activation='relu')
        self.dense3 = layers.Dense(1, activation='linear')
    
    def call(self, inputs):
        x = self.embedding(inputs)
        x = tf.reduce_mean(x, axis=1)  # Mean pooling
        x = self.batch_norm(x)
        x = self.dropout(x)
        x = self.dense1(x)
        x = self.dense2(x)
        return self.dense3(x)

def train_node2vec_model(H, train, test, scenario):
    user_ids = train['userId'].unique()
    movie_ids = train['movieId'].unique()

    user_mapping = {user_id: idx for idx, user_id in enumerate(user_ids)}
    movie_mapping = {movie_id: idx + len(user_ids) for idx, movie_id in enumerate(movie_ids)}

    train['user_idx'] = train['userId'].map(user_mapping)
    train['movie_idx'] = train['movieId'].map(movie_mapping)

    test['user_idx'] = test['userId'].map(user_mapping)
    test['movie_idx'] = test['movieId'].map(movie_mapping)

    # Generate random walks from the hypergraph
    def random_walk(hypergraph, start_node, walk_length):
        walk = [start_node]
        while len(walk) < walk_length:
            cur = walk[-1]
            neighbors = list(hypergraph.neighbors(cur))
            if neighbors:
                walk.append(np.random.choice(neighbors))
            else:
                break
        return [str(node) for node in walk]

    def generate_walks(hypergraph, num_walks, walk_length):
        print("Generating random walks...")
        nodes = list(hypergraph.nodes)
        walks = Parallel(n_jobs=-1)(delayed(random_walk)(hypergraph, np.random.choice(nodes), walk_length) for _ in range(num_walks))
        print("Random walks generation completed.")
        return walks

    num_walks = 150  # Number of walks per node
    walk_length = 80  # Length of each walk
    dimensions = 128  # Dimension of the embeddings
    window_size = 5  # Window size for context-target pairs
    epochs = 200  # Number of training epochs
    learning_rate = 0.0001  # Adjusted learning rate

    walks = generate_walks(H, num_walks, walk_length)

    # Convert walks to integer indices
    node_to_idx = {node: idx for idx, node in enumerate(H.nodes)}
    walks_indices = [[node_to_idx[node] for node in walk if node in node_to_idx] for walk in walks]
    vocab_size = len(node_to_idx)

    X = []
    y = []
    for walk in walks_indices:
        if len(walk) > window_size:
            for i in range(len(walk) - window_size):
                context = walk[i:i + window_size]
                target = walk[i + window_size]
                X.append(context)
                y.append(target)

    X = np.array(X)
    y = np.array(y)

    if X.size == 0 or y.size == 0:
        print("No data generated for training. Check the random walk and context-target extraction steps.")
        return pd.DataFrame({
            'Scenario': [scenario],
            'Algorithm': ['Node2Vec'],
            'RMSE': [None],
            'MAE': [None],
            'Precision@10': [None],
            'Recall@10': [None],
            'Running Time (s)': [None]
        })
    else:
        dataset = tf.data.Dataset.from_tensor_slices((X, y)).batch(256).shuffle(buffer_size=1024).repeat()
        steps_per_epoch = len(X) // 256
        if steps_per_epoch == 0:
            steps_per_epoch = 1

        lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
            initial_learning_rate=learning_rate,
            decay_steps=100000,
            decay_rate=0.96,
            staircase=True)

        model = Node2Vec(vocab_size, dimensions)
        model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule), loss='mean_squared_error', metrics=['accuracy'])
        print("Starting model training...")
        start_time = time.time()
        history = model.fit(dataset, epochs=epochs, steps_per_epoch=steps_per_epoch)
        end_time = time.time()
        print(f"Training completed in {end_time - start_time:.2f} seconds")

        embeddings = model.embedding.get_weights()[0]

    results = evaluate_model(test, embeddings, user_mapping, movie_mapping, scenario, "Node2Vec")
    results['Running Time (s)'] = end_time - start_time
    
    return results

# Evaluate Node2Vec model for different scenarios
results_node2vec_normal = train_node2vec_model(H, train, test, "Normal")
results_node2vec_sparse = train_node2vec_model(H, sparse_data, test, "Sparse")
results_node2vec_new_user = train_node2vec_model(H, new_user_data, test, "New User")

# Combine Node2Vec results into a single DataFrame
results_node2vec_combined = pd.concat([results_node2vec_normal, results_node2vec_sparse, results_node2vec_new_user], ignore_index=True)

Generating random walks...




Random walks generation completed.
Starting model training...
Epoch 1/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 12ms/step - accuracy: 0.0014 - loss: 2365155.5000
Epoch 2/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.0000e+00 - loss: 2346714.0000
Epoch 3/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.0000e+00 - loss: 2304159.7500
Epoch 4/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.0000e+00 - loss: 2129557.2500
Epoch 5/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.0000e+00 - loss: 2310509.7500
Epoch 6/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.0000e+00 - loss: 2206428.2500
Epoch 7/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.0000e+00 - loss: 2121127.7500
Epoch 8/200
[1m43/43[0m [32m━

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train['user_idx'] = train['userId'].map(user_mapping)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  train['movie_idx'] = train['movieId'].map(movie_mapping)


Generating random walks...
Random walks generation completed.
Starting model training...
Epoch 1/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 19ms/step - accuracy: 0.0024 - loss: 2329312.2500
Epoch 2/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.0000e+00 - loss: 2253533.0000
Epoch 3/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - accuracy: 0.0000e+00 - loss: 2202216.5000
Epoch 4/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step - accuracy: 0.0000e+00 - loss: 2297614.7500
Epoch 5/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 14ms/step - accuracy: 0.0000e+00 - loss: 2299053.7500
Epoch 6/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step - accuracy: 0.0000e+00 - loss: 2255951.2500
Epoch 7/200
[1m43/43[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.0000e+00 - loss: 2140327.7500
Epoch

In [44]:
results_node2vec_combined

Unnamed: 0,Scenario,Algorithm,RMSE,MAE,Precision@10,Recall@10,Running Time (s)
0,Normal,Node2Vec,8.286368,6.071646,0.844083,0.358547,143.922792
1,Sparse,Node2Vec,8.070935,5.747824,0.828152,0.344028,150.034234
2,New User,Node2Vec,8.21136,6.095754,0.829264,0.372612,144.782401
