Skip to content

State-of-The-Art Rating-based RECOmmendation system: pytorch lightning implementation

License

Notifications You must be signed in to change notification settings

KyleOng/starreco

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

starreco

Python Pytorch Lightning Version GitHub repo size License

starreco stands for State-of-The-Art Review Recommendation System.

starreco is a Pytorch lightning implementation for a series of SOTA deep learning rating-based recommendation systems. This repository also serves as a part of the author's master thesis work's literature review.

Features

  • Up to 20+ recommendation models across 20 publications.
  • Built on top of Pytorch lightning.
  • GPU acceleration execution.
  • Reducing memory usage for large sparse matrices.
  • Simple and understandable code.
  • Easy extension and code reusability.

Click here to get started!

Research Models

Research model Description Reference
MF Matrix Factorization [1]
GMF Generalized Matrix Factorization [2]
MLP Multilayer Perceptrons [2]
NeuMF Neural Matrix Factorization [2]
FM Factorization Machine [3]
NeuFM Neural Factorization Machine [4]
WDL Wide & Deep Learning [5]
DeepFM Deep Factorization Machine [6]
xDeepFM Extreme Deep Factorization Machine [7]
FGCNN Feature Generation by using Convolutional Neural Network [8]
ONCF Outer-based Product Neural Collaborative Filtering [9]
CNNDCF Convolutional Neural Network based Deep Colloborative Filtering [10]
ConvMF Convolutional Matrix Factorization [11]
AutoRec AutoRec [12]
DeepRec DeepRec [13]
CFN Collaborative Filtering Network [14]
CDAE Collaborative Denoising AutoEncoder [15]
CCAE Collaborative Convolutional AutoEncoder [16]
SDAECF Stacked Denoising AutoEncoder for Collaborative Filtering [17]
mDACF marginalized Denoising AutoEncoder Collaborative Filtering [18]
GMF++ Generalized Matrix Factorization ++ [19]
MLP++ Multilayer Perceptrons ++ [19]
NeuMF++ Neural Matrix Factorization ++ [20]

Datasets

  • Movielen Dataset: A movie rating dataset collected from the Movielens websites by the GroupLensResearch Project at University of Minnesota. The datasets were collected over various time periods, depending on the sizes given. Movielen 1M Dataset** has been chosen. It contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.

  • Bookcrossing Dataset: The BookCrossing (BX) dataset was collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. It contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

Getting Started

Installation

Create virtual environment

python3 -m virtualenv env # Python 3.6 and above

Activate virtual environment

source env/bin/activate # Linux
./env/Scripts/activate # Windows

Clone and install necessary python packages

git clone https://github.com/KyleOng/star-reco
pip install -r requirements.txt

Example

import os

import torch
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning.callbacks import ModelCheckpoint

from starreco.modules import *
from starreco.data import *
    
# data module
data_module = StarDataModule("ml-1m")
data_module.setup()
    
# module
module = MF([data_module.dataset.rating.num_users, data_module.dataset.rating.num_items],
            "lr" = 0.007629571188584098,
            "weight_decay" = 1.0643056040513936e-05)

# setup
# checkpoint callback
current_version = max(0, len(list(os.walk("checkpoints/mf")))-1)
checkpoint_callback = ModelCheckpoint(dirpath = f"checkpoints/mf/version_{current_version}",
                                      monitor = "val_loss",
                                      filename = "mf-{epoch:02d}-{train_loss:.4f}-{val_loss:.4f}")
# logger
logger = TensorBoardLogger("training_logs", name = "mf")
# trainer
trainer = Trainer(logger = logger,
                  gpus = -1 if torch.cuda.is_available() else None, 
                  max_epochs = 100, 
                  progress_bar_refresh_rate = 2,
                  callbacks=[checkpoint_callback])
trainer.fit(module, data_module)

# evaluate
module_test = MF.load_from_checkpoint(checkpoint_callback.best_model_path)
trainer.test(module_test, datamodule = data_module)

References

[1] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30-37.

[2] He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T. S. (2017, April). Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web (pp. 173-182).

[3] Rendle, S. (2010, December). Factorization machines. In 2010 IEEE International Conference on Data Mining (pp. 995-1000). IEEE.

[4] He, X., & Chua, T. S. (2017, August). Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 355-364).

[5] Cheng, H. T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., ... & Shah, H. (2016, September). Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems (pp. 7-10).

[6] Guo, H., Tang, R., Ye, Y., Li, Z., & He, X. (2017). DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247.

[7] Lian, J., Zhou, X., Zhang, F., Chen, Z., Xie, X., & Sun, G. (2018, July). xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1754-1763).

[8] Liu, B., Tang, R., Chen, Y., Yu, J., Guo, H., & Zhang, Y. (2019, May). Feature generation by convolutional neural network for click-through rate prediction. In The World Wide Web Conference (pp. 1119-1129).

[9] He, X., Du, X., Wang, X., Tian, F., Tang, J., & Chua, T. S. (2018). Outer product-based neural collaborative filtering. arXiv preprint arXiv:1808.03912.

[10] Wu, Y., Wei, J., Yin, J., Liu, X., & Zhang, J. (2020). Deep Collaborative Filtering Based on Outer Product. IEEE Access, 8, 85567-85574.

[11] Kim, D., Park, C., Oh, J., Lee, S., & Yu, H. (2016, September). Convolutional matrix factorization for document context-aware recommendation. In Proceedings of the 10th ACM conference on recommender systems (pp. 233-240).

[12] Sedhain, S., Menon, A. K., Sanner, S., & Xie, L. (2015, May). Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th international conference on World Wide Web (pp. 111-112).

[13] Kuchaiev, O., & Ginsburg, B. (2017). Training deep autoencoders for collaborative filtering. arXiv preprint arXiv:1708.01715.

[14] Strub, F., Mary, J., & Gaudel, R. (2016). Hybrid collaborative filtering with autoencoders. arXiv preprint arXiv:1603.00806.

[15] Wu, Yao, et al. "Collaborative denoising auto-encoders for top-n recommender systems." Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, 2016.

[16] Zhang, S. Z., Li, P. H., & Chen, X. N. (2019, December). Collaborative Convolution AutoEncoder for Recommendation Systems. In Proceedings of the 2019 8th International Conference on Networks, Communication and Computing (pp. 202-207).

[17] Strub, F., & Mary, J. (2015, December). Collaborative filtering with stacked denoising autoencoders and sparse inputs. In NIPS workshop on machine learning for eCommerce.

[18] Li, S., Kawale, J., & Fu, Y. (2015, October). Deep collaborative filtering via marginalized denoising auto-encoder. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 811-820).

[19] Liu, Y., Wang, S., Khan, M. S., & He, J. (2018). A novel deep hybrid recommender system based on auto-encoder with neural collaborative filtering. Big Data Mining and Analytics, 1(3), 211-221.

[20] To be published.

Github References

Releases

No releases published

Packages

No packages published

Languages