Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study

This repo contains the code used to run experiments for the paper 'Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study', submmited to ECIR 2021.

Our work intends to reproduce the orginial work: Federated online learning to rank with evolution strategies (FOLTR-ES). The original repository can be found at: https://github.com/facebookresearch/foltr-es

Here are few steps for reproduce our experiments.

Setup python environment

Create a conda environment for running this code using the code below.

conda create --name federated python=3.6
source activate federated
# assuming you want to checkout the repo in the current directory
git clone https://github.com/ielab/foltr.git && cd foltr
pip install -r requirements.txt

Download datasets

In the paper, four datasets are used, MQ2007/2008, MSLR-WEB10K and Yahoo!Webscope.

MQ2007/2008 can be downloaded from the Microsoft Research website.
MSLR-WEB10K can be downloaded from the Microsoft Research website.
Yahoo!Webscope can be downloaded from Yahoo Webscope program and we used Set 1 of C14B dataset in our paper.

After downloading data files, they have to be unpacked within the ./code-and-results/data folder.

Reproducing results

The main functions for our methods are stored at ./code-and-results/foltr folder. The main fuctions for the original methods by FOLTR-ES are stored at ./code-and-results/foltr-original folder.

To reproduce our experiments reuslt, set up corresponding parameters and run file ./code-and-results/foltr_reproduce_run.py

python foltr_reproduce_run.py

Citation

If you use this code to produce results for your scientific publication, or if you share a copy or fork, please refer to our ECIR2021 paper:

@inproceedings{wang2021federated,
	author = {Wang, Shuyi and Zhuang, Shengyao and Zuccon, Guido},
	booktitle = {European Conference on Information Retrieval},
	title = {Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study},
	year = {2021}}

OLTR baselines

We use Pairwise Differentiable Gradient Descent (PDGD) as the baselines. This method is proposed by Oosterhuis and de Rijke at CIKM 2018 https://dl.acm.org/doi/pdf/10.1145/3269206.3271686.

Our implementation of PDGD is in another github repo: https://github.com/ArvinZhuang/OLTR. The run script that can reproduce the PDGD results presented in our paper is: experiments/run_PDGD_batch_update.py

Results and Figures

Our experiments result files and code to reproduce the plots in the paper are in the folder: ./code-and-results/results/.

Result files for experiments in each research question can be found at ./code-and-results/results/foltr-results. Result files for OLTR baselines can be found at ./code-and-results/results/PDGD.
Figures for RQ1 are in ./code-and-results/results/figures/RQ1 folder. Figures for RQ2 are in ./code-and-results/results/figures/RQ2 folder. Figures for RQ3 and RQ4 are in ./code-and-results/results/figures/RQ3-4 folder.

Figures for RQ1: performance of FOLTR-ES across datasets (averaged on all dataset splits)

(a) Mean batch MaxRR for MQ2007 (averaged on all dataset splits) (b) Mean batch MaxRR for MQ2008 (averaged on all dataset splits) (c) Mean batch MaxRR for MSLR10k(averaged on all dataset splits) (d) Mean batch MaxRR for Yahoo

Figures for RQ2: performance of FOLTR-ES with respect to number of clients

(a) Mean batch MaxRR for MQ2007 (averaged on all dataset splits) (b) Mean batch MaxRR for MQ2008 (averaged on all dataset splits) (c) Mean batch MaxRR for MSLR10k (averaged on all dataset splits) (d) Mean batch MaxRR for Yahoo

Figures for RQ3: performance of FOLTR-ES and PDGD across datasets

(a) Mean batch MaxRR for MQ2007 (averaged on all dataset splits) (b) Mean batch MaxRR for MQ2008 (averaged on all dataset splits) (c) Mean batch MaxRR for MSLR10k (averaged on all dataset splits) (d) Mean batch MaxRR for Yahoo

Figures for RQ4: performance of FOLTR-ES in terms of online nDCG@10

(a) Mean batch nDCG@10 for MQ2007 (averaged on all dataset splits) (b) Mean batch nDCG@10 for MQ2008 (averaged on all dataset splits) (c) Mean batch nDCG@10 for MSLR10k (averaged on all dataset splits) (d) Mean batch nDCG@10 for Yahoo

Figures for RQ4: performance of FOLTR-ES and PDGD in terms of offine nDCG@10

(a) Mean batch nDCG@10 for MQ2007 (averaged on all dataset splits) (b) Mean batch nDCG@10 for MQ2008 (averaged on all dataset splits) (c) Mean batch nDCG@10 for MSLR10k (averaged on all dataset splits) (d) Mean batch nDCG@10 for Yahoo

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
code-and-results		code-and-results
ecir2021-foltr-reproducibility		ecir2021-foltr-reproducibility
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code-and-results

code-and-results

ecir2021-foltr-reproducibility

ecir2021-foltr-reproducibility

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study

Setup python environment

Download datasets

Reproducing results

Citation

OLTR baselines

Results and Figures

Figures for RQ1: performance of FOLTR-ES across datasets (averaged on all dataset splits)

Figures for RQ2: performance of FOLTR-ES with respect to number of clients

Figures for RQ3: performance of FOLTR-ES and PDGD across datasets

Figures for RQ4: performance of FOLTR-ES in terms of online nDCG@10

Figures for RQ4: performance of FOLTR-ES and PDGD in terms of offine nDCG@10

About

Releases

Packages

Contributors 4

Languages

ielab/foltr

Folders and files

Latest commit

History

Repository files navigation

Federated Online Learning to Rank with Evolution Strategies: A Reproducibility Study

Setup python environment

Download datasets

Reproducing results

Citation

OLTR baselines

Results and Figures

Figures for RQ1: performance of FOLTR-ES across datasets (averaged on all dataset splits)

Figures for RQ2: performance of FOLTR-ES with respect to number of clients

Figures for RQ3: performance of FOLTR-ES and PDGD across datasets

Figures for RQ4: performance of FOLTR-ES in terms of online nDCG@10

Figures for RQ4: performance of FOLTR-ES and PDGD in terms of offine nDCG@10

About

Resources

Stars

Watchers

Forks

Languages