Authors' Note: Original code is from Microsoft Recommenders and was modified to meet the objectives of the project.


<i>Copyright (c) Recommenders contributors.</i>

<i>Licensed under the MIT License.</i>

Argyriou, A., González-Fierro, M., & Zhang, L. (2020). Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems. Companion Proceedings of the Web Conference 2020, 50–51. https://doi.org/10.1145/3366424.3382692

## Bayesian Personalized Ranking (BPR)

The implementation of the model is from [Cornac](https://github.com/PreferredAI/cornac), which is a framework for recommender systems with a focus on models leveraging auxiliary data (e.g., item descriptive text and image, social network, etc).

### Global Settings and Imports

In [1]:
import os
import sys
import cornac
import pandas as pd

from recommenders.datasets import movielens
from recommenders.datasets.python_splitters import python_random_split
from recommenders.evaluation.python_evaluation import map, ndcg_at_k, precision_at_k, recall_at_k
from recommenders.models.cornac.cornac_utils import predict_ranking
from recommenders.utils.timer import Timer
from recommenders.utils.constants import SEED
from recommenders.utils.notebook_utils import store_metadata

print(f"System version: {sys.version}")
print(f"Cornac version: {cornac.__version__}")

  from .autonotebook import tqdm as notebook_tqdm
top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```



System version: 3.9.21 (main, Dec 11 2024, 16:35:24) [MSC v.1929 64 bit (AMD64)]
Cornac version: 2.3.0


In [None]:
# top k items to recommend
TOP_K = 10

# Model parameters
NUM_FACTORS = 200
NUM_EPOCHS = 100

### Load and split data

Data is randomly split into training and test sets with the ratio of 75/25.


In [242]:
own_data = pd.read_csv(r'P:\pCloud Offline\PLUS\2nd Sem\Privacy Engineering\Data\Results\own_k15_20_20.csv')
own_data = own_data.drop('timestamp', axis=1)
own_data.head()

Unnamed: 0,userID,itemID,rating
0,1,103365,1
1,1,120650,1
2,1,152587,1
3,1,165073,1
4,1,165887,1


In [243]:
train, test = python_random_split(own_data, 0.75)

### Cornac Dataset

To work with models implemented in Cornac, we need to construct an object from [Dataset](https://cornac.readthedocs.io/en/latest/data.html#module-cornac.data.dataset) class.

In [244]:
train_set = cornac.data.Dataset.from_uir(train.itertuples(index=False), seed=SEED)

print('Number of users: {}'.format(train_set.num_users))
print('Number of items: {}'.format(train_set.num_items))

Number of users: 3034
Number of items: 454


### Train the BPR model

In [245]:
bpr = cornac.models.BPR(
    k=NUM_FACTORS,
    max_iter=NUM_EPOCHS,
    learning_rate=0.01,
    lambda_reg=0.001,
    verbose=True,
    seed=SEED
)

In [246]:
with Timer() as t:
    bpr.fit(train_set)
print("Took {} seconds for training.".format(t))

100%|██████████| 100/100 [00:04<00:00, 21.55it/s, correct=98.50%, skipped=8.11%]

Optimization finished!
Took 4.6698 seconds for training.





### Prediction and Evaluation

Now that our model is trained, we can produce the ranked lists for recommendation. 

Note that BPR model is effectively designed for item ranking.  Hence, we only measure the performance using ranking metrics.

In [247]:
with Timer() as t:
    all_predictions = predict_ranking(bpr, train, usercol='userID', itemcol='itemID', remove_seen=True)
print("Took {} seconds for prediction.".format(t))

Took 1.7702 seconds for prediction.


In [None]:
k = 10 # this is recsys top-predictions (not to be confused with k-anonymization)
eval_map = map(test, all_predictions, col_prediction='prediction', k=k)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=k)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=k)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=k)

print("MAP:\t%f" % eval_map,
      "NDCG:\t%f" % eval_ndcg,
      "Precision@K:\t%f" % eval_precision,
      "Recall@K:\t%f" % eval_recall, sep='\n')

MAP:	0.650337
NDCG:	0.757584
Precision@K:	0.541061
Recall@K:	0.746705


## References

1. Rendle, S., Freudenthaler, C., Gantner, Z., & Schmidt-Thieme, L. (2009, June). BPR: Bayesian personalized ranking from implicit feedback. https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf
2. Pan, R., Zhou, Y., Cao, B., Liu, N. N., Lukose, R., Scholz, M., & Yang, Q. (2008, December). One-class collaborative filtering. https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/04781145.pdf
3. **Cornac** - A Comparative Framework for Multimodal Recommender Systems. https://cornac.preferred.ai/