# Fetching the data

Here we are downloading the data from Github

In [7]:
!git clone https://github.com/massquantity/LibRecommender.git

Cloning into 'LibRecommender'...
remote: Enumerating objects: 9346, done.[K
remote: Counting objects: 100% (9346/9346), done.[K
remote: Compressing objects: 100% (2618/2618), done.[K
remote: Total 9346 (delta 6401), reused 9270 (delta 6357), pack-reused 0[K
Receiving objects: 100% (9346/9346), 11.30 MiB | 12.00 MiB/s, done.
Resolving deltas: 100% (6401/6401), done.


In [5]:
cd LibRecommender

/content/LibRecommender/LibRecommender


In [6]:
pwd

'/content/LibRecommender/LibRecommender'

## Installation of requirements

In [7]:
!pip install .

Processing /content/LibRecommender/LibRecommender
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: LibRecommender
  Building wheel for LibRecommender (pyproject.toml) ... [?25l[?25hdone
  Created wheel for LibRecommender: filename=LibRecommender-1.4.0-cp310-cp310-linux_x86_64.whl size=2095039 sha256=24dd63096d45d8a64d426f6c4706125ffc9a830615a6ce426e5930ada47b8091
  Stored in directory: /tmp/pip-ephem-wheel-cache-36ytfc5e/wheels/32/32/7b/217d1af97d891e0f50b19e88caf6967c887e8e6c6762752c5b
Successfully built LibRecommender
Installing collected packages: LibRecommender
Successfully installed LibRecommender-1.4.0


In [8]:
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import LightGCN  # pure data, algorithm LightGCN
from libreco.evaluation import evaluate

Instructions for updating:
non-resource variables are not supported in the long term


## Getting the data from the sample data

In [9]:
data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",
                   names=["user", "item", "label", "time"])

  data = pd.read_csv("examples/sample_data/sample_movielens_rating.dat", sep="::",


In [10]:
data.head()

Unnamed: 0,user,item,label,time
0,5488,488,3,959821507
1,5698,1682,4,958593778
2,3116,260,5,969465180
3,1425,1354,5,1024172237
4,4808,540,2,962954668


## Data check

Trying to understand how many users have made how many ratings

In [26]:
data.shape

(100000, 4)

In [27]:
# Checking users and how many ratings they have made
data['user'].value_counts()

4169    233
4277    178
1680    165
889     154
1181    148
       ... 
3459      1
4197      1
5635      1
4992      1
3802      1
Name: user, Length: 5958, dtype: int64

In [29]:
data['item'].value_counts()

2858    372
260     296
2028    293
1196    293
1210    273
       ... 
966       1
2175      1
2674      1
632       1
3601      1
Name: item, Length: 3312, dtype: int64

## Data split and Train

Here we are first splitting the data and then tryting to train the model.

In [11]:
train_data, eval_data, test_data = random_split(data, multi_ratios=[0.8, 0.1, 0.1])

In [12]:
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)
print(data_info)  # n_users: 5894, n_items: 3253, data sparsity: 0.4172 %

n_users: 5894, n_items: 3253, data density: 0.4172 %


In [13]:
lightgcn = LightGCN(
    task="ranking",
    data_info=data_info,
    loss_type="bpr",
    embed_size=16,
    n_epochs=3,
    lr=1e-3,
    batch_size=2048,
    num_neg=1,
    device="cuda",
)

In [15]:
lightgcn.fit(
    train_data,
    neg_sampling=True,
    verbose=2,
    eval_data=eval_data,
    metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)

Training start time: [35m2024-02-13 14:15:58[0m


train: 100%|██████████| 40/40 [00:01<00:00, 21.52it/s]


Epoch 1 elapsed: 1.865s
	 [32mtrain_loss: 0.6926[0m


eval_pointwise: 100%|██████████| 3/3 [00:00<00:00, 245.55it/s]
eval_listwise: 100%|██████████| 1823/1823 [00:00<00:00, 2463.61it/s]


	 eval log_loss: 0.6931
	 eval roc_auc: 0.5293
	 eval precision@10: 0.0012
	 eval recall@10: 0.0050
	 eval ndcg@10: 0.0054


train: 100%|██████████| 40/40 [00:02<00:00, 15.07it/s]


Epoch 2 elapsed: 2.667s
	 [32mtrain_loss: 0.6916[0m


eval_pointwise: 100%|██████████| 3/3 [00:00<00:00, 437.10it/s]
eval_listwise: 100%|██████████| 1823/1823 [00:00<00:00, 2698.93it/s]


	 eval log_loss: 0.6921
	 eval roc_auc: 0.6767
	 eval precision@10: 0.0056
	 eval recall@10: 0.0238
	 eval ndcg@10: 0.0255


train: 100%|██████████| 40/40 [00:01<00:00, 23.46it/s]


Epoch 3 elapsed: 1.715s
	 [32mtrain_loss: 0.6858[0m


eval_pointwise: 100%|██████████| 3/3 [00:00<00:00, 358.22it/s]
eval_listwise: 100%|██████████| 1823/1823 [00:00<00:00, 2715.34it/s]


	 eval log_loss: 0.6867
	 eval roc_auc: 0.7587
	 eval precision@10: 0.0083
	 eval recall@10: 0.0382
	 eval ndcg@10: 0.0384


## Evaluation

In [16]:
evaluate(
    model=lightgcn,
    data=test_data,
    neg_sampling=True,
    metrics=["loss", "roc_auc", "precision", "recall", "ndcg"],
)

eval_pointwise: 100%|██████████| 3/3 [00:00<00:00, 340.26it/s]
eval_listwise: 100%|██████████| 1846/1846 [00:00<00:00, 2536.51it/s]


{'loss': 0.6867100704501925,
 'roc_auc': 0.7587469661146322,
 'precision': 0.008098591549295776,
 'recall': 0.03754641802918924,
 'ndcg': 0.03670004789053784}

## Trying out the model

Here we are trying out model with

- Check if item 110 is a good recommendation for user 2211
- Recommend 7 items to user 2211
- Do a cold start pediction
- Do a cold start recommendation

In [17]:
lightgcn.predict(user=2211, item=110)

array([0.53209186], dtype=float32)

In [18]:
lightgcn.recommend_user(user=2211, n_rec=7)

{2211: array([1196, 2858,  260, 2997,  608, 3578, 1210])}

In [19]:
lightgcn.predict(user="ccc", item="not item", cold_start="average")

[31mDetect 1 unknown interaction(s), position: [0][0m


array([0.5050815], dtype=float32)

In [20]:
lightgcn.recommend_user(user="are we good?", n_rec=7, cold_start="popular")

[31mDetect unknown user: are we good?[0m


{'are we good?': array([ 593, 1641, 1527,  919,  919, 2174,  593])}