Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity

This repository contains the code used for the experiments in "Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity" published at SIGIR 2022 (available here).

Citation

If you use this code to produce results for your scientific publication, or if you share a copy or fork, please refer to our SIGIR 2022 paper:

@inproceedings{oosterhuis2022plrank,
  Author = {Oosterhuis, Harrie},
  Booktitle = {Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR`22)},
  Organization = {ACM},
  Title = {Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity},
  Year = {2022}
}

License

The contents of this repository are licensed under the MIT license. If you modify its contents in any way, please link back to this repository.

Usage

This code makes use of Python 3, the numpy and the tensorflow packages, make sure they are installed.

A file is required that explains the location and details of the LTR datasets available on the system, for the Yahoo! Webscope, MSLR-Web30k, and Istella datasets an example file is available. Copy the file:

cp example_datasets_info.txt local_dataset_info.txt

Open this copy and edit the paths to the folders where the train/test/vali files are placed. (Note that the Istella dataset does not have a validation set by default, I recommend partitioning 10% from the training data to creat a validation set.)

Here are some command-line examples that illustrate how the results in the paper can be replicated. First create a folder to store the resulting models:

mkdir local_output

The experiments are all based on run.py with the --loss flag to indicate the loss to use: PL_rank_2/PL_rank_3/stochasticrank_PL (the losses from the SIGIR`21 PL-rank paper are also implemented); --cutoff indicates the top-k that is being optimized, e.g. 5 for DCG@5; --num_samples the number of samples to use per gradient estimation (with dynamic for a dynamic strategy); --dataset indicates the dataset name, e.g. Webscope_C14_Set1. The following command optimizes DCG@5 with PL-Rank-2 and with 100 samples on the Yahoo! dataset:

python3 run.py local_output/yahoo_ndcg5_dynamic_plrank2.txt --num_samples 100 --loss PL_rank_3 --cutoff 5 --dataset Webscope_C14_Set1

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
algorithms		algorithms
utils		utils
LICENSE		LICENSE
README.md		README.md
example_datasets_info.txt		example_datasets_info.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithms

algorithms

utils

utils

LICENSE

LICENSE

README.md

README.md

example_datasets_info.txt

example_datasets_info.txt

run.py

run.py

Repository files navigation

Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity

Citation

License

Usage

About

Releases

Packages

Languages

License

HarrieO/2022-SIGIR-plackett-luce

Folders and files

Latest commit

History

Repository files navigation

Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity

Citation

License

Usage

About

Resources

License

Stars

Watchers

Forks

Languages