# Sequential recommender
The idea of this task is to use a sequential model for recommending. For this task a pre-processed version of dataset is used. Where we add embeddings generated from the books descriptions and reduced with PCA. Order of recommendations is implicit (as our dataset does not contain the interactions). Yet, an alternative version with augmented timestamps was also tested. As the timestamps were generated randomly, it makes no significant difference for the final results.

In [1]:
import numpy as np
import pandas as pd

# ratings_df = pd.read_csv('../data/Ratings_merged_emb_time.csv', delimiter=',')
ratings_df = pd.read_csv('../data/df_with_emb_cleaned.csv', delimiter=',')
ratings_df.head()

Unnamed: 0,user,item,label,Age,pca_dim_1,pca_dim_2,pca_dim_3,pca_dim_4,pca_dim_5,pca_dim_6,pca_dim_7,pca_dim_8,pca_dim_9,pca_dim_10,Year
0,67544.0,440214009,7.0,30.0,0.098736,-0.016876,-0.064115,-0.043559,0.011505,0.062887,0.004183,0.003117,0.043744,0.009762,1993.0
1,67544.0,688077080,8.0,30.0,-0.06332,-0.03906,0.001323,-0.025799,0.026691,0.008113,0.002605,-0.110809,0.049842,-0.035884,1989.0
2,219008.0,679405135,4.0,60.0,0.031406,0.060715,0.08135,-0.115592,0.063185,0.093926,0.010901,0.0281,-0.013741,-0.079226,1996.0
3,219008.0,446519723,7.0,60.0,0.150597,-0.103935,0.003405,0.022489,-0.018591,-0.1361,0.041277,0.053403,0.011589,0.037841,1995.0
4,219008.0,140096361,6.0,60.0,-0.054594,0.001724,-0.063644,-0.011405,-0.014985,0.116361,0.030079,-0.026273,0.054377,0.051752,1987.0


Data preparation includes specification of the columns into categorical and non-categorical (dense) and user vs item. RNN4Rec uses Recursive Neural Network approach under the hood, with the LSTM layers

In [2]:
from libreco.algorithms import RNN4Rec
from libreco.data import DatasetFeat
from libreco.data import random_split

# specify complete columns information
sparse_col = []
dense_col = ["Age", "Year", "pca_dim_1", "pca_dim_2", "pca_dim_3", "pca_dim_4", "pca_dim_5", "pca_dim_6", "pca_dim_7", "pca_dim_8", "pca_dim_9", "pca_dim_10"]
user_col = ["Age"]
item_col = ["Year", "pca_dim_1", "pca_dim_2", "pca_dim_3", "pca_dim_4", "pca_dim_5", "pca_dim_6", "pca_dim_7", "pca_dim_8", "pca_dim_9", "pca_dim_10"]

train_df, eval_df = random_split(ratings_df, test_size=0.2)

train_data, data_info = DatasetFeat.build_trainset(
    train_df, user_col, item_col, sparse_col, dense_col
)

eval_data = DatasetFeat.build_evalset(eval_df)

model = RNN4Rec(
    task="ranking",
    rnn_type="lstm",
    data_info=data_info,
    loss_type="focal",
    embed_size=16,
    n_epochs=2,
    lr=3e-3,
    lr_decay=False,
    reg=None,
    batch_size=64,
    sampler="popular",
    num_neg=1,
    hidden_units=(110, 32),
    recent_num=10,
    tf_sess_config=None)
model.fit(train_data, neg_sampling=True, verbose=2, shuffle=False, eval_data=eval_data, metrics=["ndcg", "precision"]
)

2024-08-11 19:18:40.404611: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Instructions for updating:
non-resource variables are not supported in the long term


2024-08-11 19:18:49.445503: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Training start time: [35m2024-08-11 19:18:49[0m
total params: [33m995,276[0m | embedding params: [33m879,788[0m | network params: [33m115,488[0m


2024-08-11 19:18:51.746971: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
train: 100%|██████████| 257/257 [00:08<00:00, 31.07it/s]


Epoch 1 elapsed: 8.304s
	 [32mtrain_loss: 0.0866[0m


eval_listwise: 100%|██████████| 243/243 [00:00<00:00, 1769.29it/s]


	 eval ndcg@10: 0.0018
	 eval precision@10: 0.0004


train: 100%|██████████| 257/257 [00:06<00:00, 38.72it/s]


Epoch 2 elapsed: 6.642s
	 [32mtrain_loss: 0.0846[0m


eval_listwise: 100%|██████████| 243/243 [00:00<00:00, 2888.49it/s]

	 eval ndcg@10: 0.0082
	 eval precision@10: 0.0016





In [4]:
from libreco.evaluation import evaluate

eval_result = evaluate(model, eval_data, neg_sampling=True, metrics=["ndcg", "precision", "recall"])
print(f"Evaluation Results:\n{eval_result}")

eval_listwise: 100%|██████████| 243/243 [00:00<00:00, 2327.08it/s]

Evaluation Results:
{'ndcg': 0.00815473617915307, 'precision': 0.001646090534979424, 'recall': 0.008916323731138546}



