Skip to content

andreeaiana/newsreclib

Repository files navigation

NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

python pytorch lightning torchmetrics hydra optuna Template black license

NewsRecLib is a library based on PyTorch Lightning and Hydra for the development and evaluation of neural news recommenders (NNR). The framework is highly configurable and modularized, decoupling core model components from one another. It enables running experiments from a single configuration file that navigates the pipeline from dataset selection and loading to model evaluation. NewsRecLib provides implementations of several neural news recommenders, training methods, standard evaluation benchmarks, hypeparameter optimization algorithms, extensive logging functionalities, and evaluation metrics (ranging from accuracy-based to beyond accuracy performance evaluation).

The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation.

NewsRecLib schema

Installation

NewsRecLib requires Python version 3.9 or later.

NewsRecLib requires PyTorch, PyTorch Lightning, and TorchMetrics version 2.0 or later. If you want to use NewsRecLib with GPU, please ensure CUDA or cudatoolkit version of 11.8.

Install from source

CONDA

   git clone https://github.com/andreeaiana/newsreclib.git
   cd newsreclib
   conda create --name newsreclib_env python=3.9
   conda activate newsreclib_env
   pip install -e .

Quick Start

NewsRecLib's entry point is the function train, which accepts a configuration file that drives the entire experiment.

Basic Configuration

The following example shows how to train a NRMS model on the MINDsmall dataset with the original configurations (i.e., news encoder contextualizing pretrained embeddings, model trained by optimizing cross-entropy loss), using an existing configuration file.

    python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

In the basic experiment, the experiment configuration only specifies required hyperparameter values which are not set in the configurations of the corresponding modules.

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
    model:
        use_plm: False
        pretrained_embeddings_path: ${paths.data_dir}MINDsmall_train/transformed_word_embeddings.npy
        embed_dim: 300
        num_heads: 15

For training the NRMS model on the MINDlarge dataset, execute the following command:

python newsreclib/train.py experiment=nrms_mindlarge_pretrainedemb_celoss_bertsent

To understand how to adjust configuration files when transitioning from smaller to larger datasets, refer to the examples provided in nrms_mindsmall_pretrainedemb_celoss_bertsent and nrms_mindlarge_pretrainedemb_celoss_bertsent. These files will guide you in scaling your configurations appropriately.

Note: The same procedure applies for the advanced configuration shown below.

Advanced Configuration

The advanced scenario depicts a more complex experimental setting. Users cn overwrite from the main experiment configuration file any of the predefined module configurations. The following code snippet shows how to train a NRMS model with a PLM-based news encoder, and a supervised contrastive loss objective instead of the default settings.

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent

This is achieved by creating an experiment configuration file with the following specifications:

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
        use_plm: True
        tokenizer_name: "roberta-base"
        tokenizer_use_fast: True
        tokenizer_max_len: 96
    model:
        loss: "sup_con_loss"
        temperature: 0.1
        use_plm: True
        plm_model: "roberta-base"
        frozen_layers: [0, 1, 2, 3, 4, 5, 6, 7]
        pretrained_embeddings_path: None
        embed_dim: 768
        num_heads: 16

Alternatively, configurations can be overridden from the command line, as follows:

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent data.batch_size=128

Features

Contributing

We welcome all contributions to NewsRecLib! You can get involved by contributing code, making improvements to the documentation, reporting or investigating bugs and issues.

Resources

This repository was inspired by:

Other useful repositories:

License

NewsRecLib uses a MIT License.

Citation

We did our best to provide all the bibliographic information of the methods, models, datasets, and techniques available in NewsRecLib to credit their authors. Please remember to cite them if you use NewsRecLib in your research.

If you use NewsRecLib, please cite the following publication:

@inproceedings{iana2023newsreclib,
  title={NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation},
  author={Iana, Andreea and Glava{\v{s}}, Goran and Paulheim, Heiko},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={296--310},
  year={2023}
}