Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
92 lines (67 sloc) 4.95 KB

Anserini Regression Experiments

Internally at Waterloo, tuna.cs.uwaterloo.ca is used for the development of Anserini and is set up to run the regression experiments described here. The regression script src/main/python/run_regression.py runs end-to-end regression experiments for various collections, which includes:

  • Building the index from scratch.
  • Running all retrieval runs in Anserini documentation.
  • Verifying results against effectiveness figures stored in src/main/resources/regression/.

We keep a change log whenever effectiveness changes or when new regressions are added.

Requirements

Python>=2.6 or Python>=3.5

pip install -r src/main/python/requirements.txt

Note that Oracle JVM is necessary to replicate our regression results; there are known issues with OpenJDK (see this and this).

Invocations

tl;dr - Copy and paste the following lines into console on tuna to run the regressions without building indexes from scratch:

nohup python src/main/python/run_regression.py --collection disk12 >& log.disk12 &
nohup python src/main/python/run_regression.py --collection robust04 >& log.robust04 &
nohup python src/main/python/run_regression.py --collection robust05 >& log.robust05 &
nohup python src/main/python/run_regression.py --collection core17 >& log.core17 &
nohup python src/main/python/run_regression.py --collection core18 >& log.core18 &

nohup python src/main/python/run_regression.py --collection mb11 >& log.mb11 &
nohup python src/main/python/run_regression.py --collection mb13 >& log.mb13 &

nohup python src/main/python/run_regression.py --collection wt10g >& log.wt10g &
nohup python src/main/python/run_regression.py --collection gov2 >& log.gov2 &
nohup python src/main/python/run_regression.py --collection cw09b >& log.cw09b &
nohup python src/main/python/run_regression.py --collection cw12b13 >& log.cw12b13 &
nohup python src/main/python/run_regression.py --collection cw12 >& log.cw12 &

nohup python src/main/python/run_regression.py --collection car17v1.5 >& log.car17v1.5 &
nohup python src/main/python/run_regression.py --collection car17v2.0 >& log.car17v2.0 &

Copy and paste the following lines into console on tuna to run the regressions from the raw collection, which includes building indexes from scratch (note difference is the additional --index option):

nohup python src/main/python/run_regression.py --collection disk12 --index >& log.disk12 &
nohup python src/main/python/run_regression.py --collection robust04 --index >& log.robust04 &
nohup python src/main/python/run_regression.py --collection robust05 --index >& log.robust05 &
nohup python src/main/python/run_regression.py --collection core17 --index >& log.core17 &
nohup python src/main/python/run_regression.py --collection core18 --index >& log.core18 &

nohup python src/main/python/run_regression.py --collection mb11 --index >& log.mb11 &
nohup python src/main/python/run_regression.py --collection mb13 --index >& log.mb13 &

nohup python src/main/python/run_regression.py --collection wt10g --index >& log.wt10g &
nohup python src/main/python/run_regression.py --collection gov2 --index >& log.gov2 &
nohup python src/main/python/run_regression.py --collection cw09b --index >& log.cw09b &
nohup python src/main/python/run_regression.py --collection cw12b13 --index >& log.cw12b13 &
nohup python src/main/python/run_regression.py --collection cw12 --index >& log.cw12 &

nohup python src/main/python/run_regression.py --collection car17v1.5 --index >& log.car17v1.5 &
nohup python src/main/python/run_regression.py --collection car17v2.0 --index >& log.car17v2.0 &

Watch out: the full cw12 regress takes a couple days to run and generates a 12TB index!

Details of each specific regression:

Additional Regressions

You can’t perform that action at this time.