# User to User collaborative filtering

An idea of user-to-user collaborative filtering approach is that recommendations are based on finding similarities between the users in terms of how they rated the items. So, it is like item-to-item but rows of the matrix are columns now\
So, the matrix of interactions is built, and cosine similarities between the users are calculated.

In [1]:
import numpy as np
import pandas as pd

**Preprocessing**\
For this task we need the ratings table only.

In [2]:
ratings_df = pd.read_csv('../data/Ratings.csv', delimiter=';', dtype={'User-ID': np.int32, 'ISBN': str, 'Rating': np.int8})
ratings_df.columns = ['user', 'item', 'label']
ratings_df.head()

Unnamed: 0,user,item,label
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


Then we drop the rows with:
- missing data (all the values are crucial)
- zero ratings (as they have no impact)

In [3]:
print('Total ratings:', ratings_df.shape[0])

ratings_df = ratings_df[ratings_df['label'] > 0]
ratings_df.dropna()

Total ratings: 1149780


Unnamed: 0,user,item,label
1,276726,0155061224,5
3,276729,052165615X,3
4,276729,0521795028,6
6,276736,3257224281,8
7,276737,0600570967,6
...,...,...,...
1149773,276704,0806917695,5
1149775,276704,1563526298,9
1149777,276709,0515107662,10
1149778,276721,0590442449,10


After experimenting with straightfoward approach with sparse matrices (UserRecommender class) we switched to LibRecommender, which is far superior in terms of performance due to a number of optimizations.\
The tasks performed below are:
- train/eval split
- data preparation to LibReco model input
- model creation with 5 nearest neighbors configuration

In [4]:
from libreco.algorithms import UserCF
from libreco.data import DatasetPure
from libreco.data import random_split

ratings_df = ratings_df[["user", "item", "label"]]
train_df, eval_df = random_split(ratings_df, test_size=0.2)

train_data, data_info = DatasetPure.build_trainset(train_df)
eval_data = DatasetPure.build_evalset(eval_df)

# Step 3: Build and train the model
model = UserCF(task="ranking", 
               data_info=data_info,
               k_sim=5, 
               sim_type="cosine", 
               min_common=1)

model.fit(train_data, neg_sampling=True)


2024-08-11 20:54:18.553510: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Instructions for updating:
non-resource variables are not supported in the long term
Training start time: [35m2024-08-11 20:54:29[0m
Final block size and num: (2847, 24)
sim_matrix elapsed: 16.725s
sim_matrix, shape: (68312, 68312), num_elements: 5471892, density: 0.1173 %


top_k: 100%|██████████| 68312/68312 [00:01<00:00, 42068.13it/s]


Evaluating the model on 3 metrics: precision, recall and NDCG (Normalized Discounted Cumulative Gain).

In [5]:
from libreco.evaluation import evaluate

eval_result = evaluate(model, eval_data, neg_sampling=True, metrics=[ "precision", "recall", "ndcg"])
print(f"Evaluation Results:\n{eval_result}")

eval_listwise:   0%|          | 0/16407 [00:00<?, ?it/s]

[31mno suitable recommendation for user 1722, return default recommendation[0m


eval_listwise:   1%|▏         | 206/16407 [00:00<00:45, 359.25it/s]

[31mno suitable recommendation for user 3824, return default recommendation[0m
[31mno suitable recommendation for user 36445, return default recommendation[0m
[31mno suitable recommendation for user 34473, return default recommendation[0m
[31mno suitable recommendation for user 66407, return default recommendation[0m
[31mno suitable recommendation for user 17719, return default recommendation[0m
[31mno suitable recommendation for user 916, return default recommendation[0m
[31mno suitable recommendation for user 30015, return default recommendation[0m
[31mno suitable recommendation for user 21184, return default recommendation[0m
[31mno suitable recommendation for user 1826, return default recommendation[0m


eval_listwise: 100%|██████████| 16407/16407 [00:14<00:00, 1095.85it/s]


Evaluation Results:
{'precision': 0.005582982873163893, 'recall': 0.020491124062358475, 'ndcg': 0.0834280508171489}


There were users with no recommendations due to not enough ratings. So, then we decided to give recommendations for the users with <5 ratings for this task

**Removing users with <5 ratings**

In [9]:
rating_count=pd.DataFrame(ratings_df["user"].value_counts())
u_threshold=rating_count[rating_count["count"]<5].index
ratings_df=ratings_df[~ratings_df["user"].isin(u_threshold)]

print('Total ratings (users rated <5 books excluded):', ratings_df.shape[0])

Total ratings (users rated <5 books excluded): 329336


In [10]:
train_df, eval_df = random_split(ratings_df, test_size=0.2)

train_data, data_info = DatasetPure.build_trainset(train_df)
eval_data = DatasetPure.build_evalset(eval_df)

# Step 3: Build and train the model
model = UserCF(task="ranking", 
               data_info=data_info,
               k_sim=10, 
               sim_type="cosine", 
               min_common=1)

model.fit(train_data, neg_sampling=True)

Training start time: [35m2024-08-11 20:55:56[0m
Final block size and num: (12019, 1)
sim_matrix elapsed: 0.854s
sim_matrix, shape: (12019, 12019), num_elements: 2366310, density: 1.6381 %


top_k: 100%|██████████| 12019/12019 [00:00<00:00, 20190.77it/s]


In [11]:
from libreco.evaluation import evaluate

eval_result = evaluate(model, eval_data, neg_sampling=True, metrics=[ "precision", "recall", "ndcg"])
print(f"Evaluation Results:\n{eval_result}")

eval_listwise: 100%|██████████| 9492/9492 [00:08<00:00, 1099.43it/s]


Evaluation Results:
{'precision': 0.01145174884112937, 'recall': 0.02694686251132423, 'ndcg': 0.05798763795606977}


Finally, the precision is not big. But 5-times better compared to a dataset with users which rated less then 5 books