OSIRRC Docker Image for Anserini+BM25PRF

This readme is heavily based (i.e. copied from) the Anserini readme.

This is the docker image for implementing BM25 + Pseudo Relevance Feedback (PRF) [1] with Anserini [2]. The image is conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019.

This image is available on Docker Hub.

This image implemented Bm25+Pseudo Relevance Feedback(PRF) with Anserini.

Supported test collections: robust04
Supported hooks: init, index, search and train

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare \
  --repo osirrc2019/anserini-bm25prf \
  --tag latest \
  --collections robust04=/path/to/disk45=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection with default hyper-parameters.

python run.py search \
  --repo osirrc2019/anserini-bm25prf \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \ 
  --top_k 1000

The following jig command can be used to tune the hyper-parameters. Note that the grid search may take several hours.

python run.py train \
   --repo osirrc2019/anserini-bm25prf \
   --tag latest \
   --topic topics/topics.robust04.txt \
   --qrels $(pwd)/qrels/qrels.robust04.txt \
   --validation_split $(pwd)/sample_training_validation_query_ids/robust04_validation.txt \
   --test_split $(pwd)/sample_training_validation_query_ids/robust04_test.txt \
   --model_folder $(pwd)/trained \
   --collection robust04

Expected Results on TREC 2004 Robust

The following numbers should be able to be re-produced using the scripts provided by the jig.

BM25+PRF with Default Hyper-paramteres

Hyper-paramteres: k1=0.9 b=0.4 k1_prf=0.9 b_prf=0.4 num_new_terms=20 num_docs=10 new_term_weight=0.2

Command:

python run.py search \
  --repo osirrc2019/anserini-bm25prf   \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04

Metric	Score
MAP	0.2928
P@30	0.3438

Tuning BM25+PRF

Command:

python run.py train \
   --repo osirrc2019/anserini-bm25prf \
   --tag latest \
   --topic topics/topics.robust04.txt \
   --qrels $(pwd)/qrels/qrels.robust04.txt \
   --validation_split $(pwd)/sample_training_validation_query_ids/robust04_validation.txt \
   --test_split $(pwd)/sample_training_validation_query_ids/robust04_test.txt \
   --model_folder $(pwd)/trained \
   --collection robust04

Tuned Hyper-paramteres:

Paramteres	k1	b	k1_prf	b_prf	num_new_terms	num_docs	new_term_weight
Value	0.9	0.2	0.9	0.6	40	10	0.1

BM25+PRF with Tuned Hyper-paramteres

Hyper-paramteres: k1=0.9 b=0.2 k1_prf=0.9 b_prf=0.6 num_new_terms=40 num_docs=10 new_term_weight=0.1

Command:

 python run.py search \
  --repo osirrc2019/anserini-bm25prf \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \
  --opts k1=0.9 b=0.2 k1_prf=0.9 b_prf=0.6 num_new_terms=40 num_docs=10 new_term_weight=0.1

Metric	Score
MAP	0.2916
P@30	0.3396

Yes, the tuned hyper-parameters make the performance worse.......

Reference

[1] Stephen E. Robertson, and Karen Spärck Jones. Simple, proven approaches to text retrieval. University of Cambridge Computer Laboratory, 1994.

[2] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Anserini @ 578c2cf		Anserini @ 578c2cf
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Dockerfile		Dockerfile
README.md		README.md
index		index
init		init
interact		interact
requirements.txt		requirements.txt
runner.py		runner.py
search		search
splittools.py		splittools.py
train		train
tune.py		tune.py

osirrc/anserini-bm25prf-docker

Folders and files

Latest commit

History

Repository files navigation

OSIRRC Docker Image for Anserini+BM25PRF

Quick Start

Expected Results on TREC 2004 Robust

BM25+PRF with Default Hyper-paramteres

Tuning BM25+PRF

BM25+PRF with Tuned Hyper-paramteres

Reference

About

Resources

Stars

Watchers

Forks

Languages