Add regressions for DeepImpact and uniCOIL on MS MARCO passage (#1633)

castorini · Sep 5, 2021 · f79fb67 · f79fb67
1 parent 9bc0a1c
commit f79fb67
Show file tree

Hide file tree

Showing 12 changed files with 482 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -52,6 +52,7 @@ For the most part, these runs are based on [_default_ parameter settings](https:
 + Regressions for [Tweets2011 (MB11 &amp; MB12)](docs/regressions-mb11.md), [Tweets2013 (MB13 &amp; MB14)](docs/regressions-mb13.md)
 + Regressions for Complex Answer Retrieval (CAR17): [[v1.5](docs/regressions-car17v1.5.md)] [[v2.0](docs/regressions-car17v2.0.md)] [[v2.0 with doc2query](docs/regressions-car17v2.0-doc2query.md)]
 + Regressions for MS MARCO Passage Ranking: [[base](docs/regressions-msmarco-passage.md)] [[doc2query](docs/regressions-msmarco-passage-doc2query.md)] [[docTTTTTquery](docs/regressions-msmarco-passage-docTTTTTquery.md)]
++ Regressions for MS MARCO Passage Ranking: [[DeepImpact](docs/regressions-msmarco-passage-deepimpact.md)] [[uniCOIL](docs/regressions-msmarco-passage-unicoil.md)]
 + Regressions for MS MARCO Document Ranking, Per Doc: [[base](docs/regressions-msmarco-doc.md)] [[docTTTTTquery](docs/regressions-msmarco-doc-docTTTTTquery-per-doc.md)]
 + Regressions for MS MARCO Document Ranking, Per Passage: [[base](docs/regressions-msmarco-doc-per-passage.md)] [[docTTTTTquery](docs/regressions-msmarco-doc-docTTTTTquery-per-passage.md)]
 + Regressions for the TREC 2019 Deep Learning Track (Passage): [[base](docs/regressions-dl19-passage.md)] [[docTTTTTquery](docs/regressions-dl19-passage-docTTTTTquery.md)]

diff --git a/docs/experiments-msmarco-passage-deepimpact.md b/docs/experiments-msmarco-passage-deepimpact.md
@@ -15,7 +15,7 @@ We're going to use the repository's root directory as the working directory.
 First, we need to download and extract the MS MARCO passage dataset with DeepImpact processing:
 
 ```bash
-wget https://git.uwaterloo.ca/jimmylin/deep-impact/raw/master/msmarco-passage-deepimpact-b8.tar -P collections/
+wget https://git.uwaterloo.ca/jimmylin/deepimpact/raw/master/msmarco-passage-deepimpact-b8.tar -P collections/
 
 # Alternate mirror
 wget https://vault.cs.uwaterloo.ca/s/57AE5aAjzw2ox2n/download -O collections/msmarco-passage-deepimpact-b8.tar
@@ -51,16 +51,16 @@ The queries are already stored in the repo, so we can run retrieval directly:
 
 ```bash
 target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-deepimpact-b8 \
- -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \
+ -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \
  -output runs/run.msmarco-passage-deepimpact-b8.trec \
  -impact -pretokenized
 ```
 
 The queries are also available to download at the following locations:
 
 ```bash
-wget https://git.uwaterloo.ca/jimmylin/deep-impact/raw/master/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz -P collections/
-wget https://vault.cs.uwaterloo.ca/s/NYibRJ9bXs5PspH/download -O collections/topics.msmarco-passage.dev-subset.deep-impact.tsv.gz
+wget https://git.uwaterloo.ca/jimmylin/deepimpact/raw/master/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz -P collections/
+wget https://vault.cs.uwaterloo.ca/s/NYibRJ9bXs5PspH/download -O collections/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz
 
 # MD5 checksum: 88a2987d6a25b1be11c82e87677a262e
 ```

diff --git a/docs/experiments-msmarco-passage-unicoil.md b/docs/experiments-msmarco-passage-unicoil.md
@@ -90,6 +90,8 @@ QueriesRanked: 6980
 #####################
 ```
 
+This corresponds to the effectiveness reported in the paper.
+
 
 ## Reproduction Log[*](reproducibility.md)
 

diff --git a/docs/regressions-msmarco-passage-deepimpact.md b/docs/regressions-msmarco-passage-deepimpact.md
@@ -0,0 +1,91 @@
+# Anserini: Regressions for DeepImpact on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+
+This page documents regression experiments for DeepImpact on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework.
+DeepImpact is described in the following paper:
+
+> Antonio Mallia, Omar Khattab, Nicola Tonellotto, and Torsten Suel. [Learning Passage Impacts for Inverted Indexes.](https://dl.acm.org/doi/10.1145/3404835.3463030) _SIGIR 2021_.
+
+For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-deepimpact.md).
+
+The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-deepimpact.yaml).
+Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+## Indexing
+
+Typical indexing command:
+
+```
+nohup sh target/appassembler/bin/IndexCollection -collection JsonVectorCollection \
+ -input /path/to/msmarco-passage-deepimpact \
+ -index indexes/lucene-index.msmarco-passage-deepimpact.raw \
+ -generator DefaultLuceneDocumentGenerator \
+ -threads 16 -impact -pretokenized -storeRaw \
+  >& logs/log.msmarco-passage-deepimpact &
+```
+
+The directory `/path/to/msmarco-passage-deepimpact/` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+See [this page](experiments-msmarco-passage-deepimpact.md) for additional details.
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-deepimpact.raw \
+ -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \
+ -output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deepimpact.tsv.gz \
+ -impact -pretokenized &
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+tools/eval/trec_eval.9.0.4/trec_eval -m map -c -m recip_rank -c -m recall.1000 -c src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deepimpact.tsv.gz
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+MAP                                     | DeepImpact|
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3334    |
+
+
+MRR                                     | DeepImpact|
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3386    |
+
+
+R@1000                                  | DeepImpact|
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.9476    |
+
+The above runs are in TREC output format and evaluated with `trec_eval`.
+In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate:
+
+```bash
+python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \
+   --input runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \
+   --output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco --quiet
+
+python tools/scripts/msmarco/msmarco_passage_eval.py \
+   collections/msmarco-passage/qrels.dev.small.tsv \
+   runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco
+```
+
+The results should be as follows:
+
+```
+#####################
+MRR @10: 0.3252764133351524
+QueriesRanked: 6980
+#####################
+```
+
+The final evaluation metric is very close to the one reported in the paper (0.326).
diff --git a/docs/regressions-msmarco-passage-unicoil.md b/docs/regressions-msmarco-passage-unicoil.md
@@ -0,0 +1,91 @@
+# Anserini: Regressions for uniCOIL on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+
+This page documents regression experiments for uniCOIL on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework.
+The uniCOIL model is described in the following paper:
+
+> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
+
+For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-unicoil.md).
+
+The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil.yaml).
+Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+## Indexing
+
+Typical indexing command:
+
+```
+nohup sh target/appassembler/bin/IndexCollection -collection JsonVectorCollection \
+ -input /path/to/msmarco-passage-unicoil \
+ -index indexes/lucene-index.msmarco-passage-unicoil.raw \
+ -generator DefaultLuceneDocumentGenerator \
+ -threads 16 -impact -pretokenized -storeRaw \
+  >& logs/log.msmarco-passage-unicoil &
+```
+
+The directory `/path/to/msmarco-passage-unicoil/` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+See [this page](experiments-msmarco-passage-unicoil.md) for additional details.
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.msmarco-passage-unicoil.raw \
+ -topicreader TsvInt -topics src/main/resources/topics-and-qrels/topics.msmarco-passage.dev-subset.unicoil.tsv.gz \
+ -output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \
+ -impact -pretokenized &
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+tools/eval/trec_eval.9.0.4/trec_eval -m map -c -m recip_rank -c -m recall.1000 -c src/main/resources/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+MAP                                     | uniCOIL   |
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3574    |
+
+
+MRR                                     | uniCOIL   |
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.3625    |
+
+
+R@1000                                  | uniCOIL   |
+:---------------------------------------|-----------|
+[MS MARCO Passage: Dev](https://github.com/microsoft/MSMARCO-Passage-Ranking)| 0.9582    |
+
+The above runs are in TREC output format and evaluated with `trec_eval`.
+In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate:
+
+```bash
+python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \
+   --input runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \
+   --output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco --quiet
+
+python tools/scripts/msmarco/msmarco_passage_eval.py \
+   tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt \
+   runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco
+```
+
+The results should be as follows:
+
+```
+#####################
+MRR @10: 0.35155222404147896
+QueriesRanked: 6980
+#####################
+```
+
+This corresponds to the effectiveness reported in the paper.
diff --git a/docs/regressions.md b/docs/regressions.md
@@ -56,6 +56,9 @@ nohup python src/main/python/run_regression.py --collection msmarco-passage >& l
 nohup python src/main/python/run_regression.py --collection msmarco-passage-doc2query >& logs/log.msmarco-passage-doc2query &
 nohup python src/main/python/run_regression.py --collection msmarco-passage-docTTTTTquery >& logs/log.msmarco-passage-docTTTTTquery &
 
+nohup python src/main/python/run_regression.py --collection msmarco-passage-deepimpact >& logs/log.msmarco-passage-deepimpact &
+nohup python src/main/python/run_regression.py --collection msmarco-passage-unicoil >& logs/log.msmarco-passage-unicoil &
+
 nohup python src/main/python/run_regression.py --collection msmarco-doc >& logs/log.msmarco-doc &
 nohup python src/main/python/run_regression.py --collection msmarco-doc-per-passage >& logs/log.msmarco-doc-per-passage &
 nohup python src/main/python/run_regression.py --collection msmarco-doc-docTTTTTquery-per-doc >& logs/log.msmarco-doc-docTTTTTquery-per-doc &
@@ -121,6 +124,9 @@ nohup python src/main/python/run_regression.py --index --collection msmarco-pass
 nohup python src/main/python/run_regression.py --index --collection msmarco-passage-doc2query >& logs/log.msmarco-passage-doc2query &
 nohup python src/main/python/run_regression.py --index --collection msmarco-passage-docTTTTTquery >& logs/log.msmarco-passage-docTTTTTquery &
 
+nohup python src/main/python/run_regression.py --index --collection msmarco-passage-deepimpact >& logs/log.msmarco-passage-deepimpact &
+nohup python src/main/python/run_regression.py --index --collection msmarco-passage-unicoil >& logs/log.msmarco-passage-unicoil &
+
 nohup python src/main/python/run_regression.py --index --collection msmarco-doc >& logs/log.msmarco-doc &
 nohup python src/main/python/run_regression.py --index --collection msmarco-doc-per-passage >& logs/log.msmarco-doc-per-passage &
 nohup python src/main/python/run_regression.py --index --collection msmarco-doc-docTTTTTquery-per-doc >& logs/log.msmarco-doc-docTTTTTquery-per-doc &

diff --git a/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template b/src/main/resources/docgen/templates/msmarco-passage-deepimpact.template
@@ -0,0 +1,71 @@
+# Anserini: Regressions for DeepImpact on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+
+This page documents regression experiments for DeepImpact on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework.
+DeepImpact is described in the following paper:
+
+> Antonio Mallia, Omar Khattab, Nicola Tonellotto, and Torsten Suel. [Learning Passage Impacts for Inverted Indexes.](https://dl.acm.org/doi/10.1145/3404835.3463030) _SIGIR 2021_.
+
+For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-deepimpact.md).
+
+The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-deepimpact.yaml).
+Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-deepimpact.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+## Indexing
+
+Typical indexing command:
+
+```
+${index_cmds}
+```
+
+The directory `/path/to/msmarco-passage-deepimpact/` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+See [this page](experiments-msmarco-passage-deepimpact.md) for additional details.
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+${ranking_cmds}
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+${eval_cmds}
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+${effectiveness}
+
+The above runs are in TREC output format and evaluated with `trec_eval`.
+In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate:
+
+```bash
+python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \
+   --input runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz \
+   --output runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco --quiet
+
+python tools/scripts/msmarco/msmarco_passage_eval.py \
+   collections/msmarco-passage/qrels.dev.small.tsv \
+   runs/run.msmarco-passage-deepimpact.deepimpact.topics.msmarco-passage.dev-subset.deep-impact.tsv.gz.msmarco
+```
+
+The results should be as follows:
+
+```
+#####################
+MRR @10: 0.3252764133351524
+QueriesRanked: 6980
+#####################
+```
+
+The final evaluation metric is very close to the one reported in the paper (0.326).
diff --git a/src/main/resources/docgen/templates/msmarco-passage-unicoil.template b/src/main/resources/docgen/templates/msmarco-passage-unicoil.template
@@ -0,0 +1,71 @@
+# Anserini: Regressions for uniCOIL on [MS MARCO Passage](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+
+This page documents regression experiments for uniCOIL on the MS MARCO Passage Ranking Task, which is integrated into Anserini's regression testing framework.
+The uniCOIL model is described in the following paper:
+
+> Jimmy Lin and Xueguang Ma. [A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.](https://arxiv.org/abs/2106.14807) _arXiv:2106.14807_.
+
+For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-msmarco-passage-unicoil.md).
+
+The exact configurations for these regressions are stored in [this YAML file](../src/main/resources/regression/msmarco-passage-unicoil.yaml).
+Note that this page is automatically generated from [this template](../src/main/resources/docgen/templates/msmarco-passage-unicoil.template) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.
+
+## Indexing
+
+Typical indexing command:
+
+```
+${index_cmds}
+```
+
+The directory `/path/to/msmarco-passage-unicoil/` should be a directory containing the compressed `jsonl` files that comprise the corpus.
+See [this page](experiments-msmarco-passage-unicoil.md) for additional details.
+
+For additional details, see explanation of [common indexing options](common-indexing-options.md).
+
+## Retrieval
+
+Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/main/resources/topics-and-qrels/).
+The regression experiments here evaluate on the 6980 dev set questions; see [this page](experiments-msmarco-passage.md) for more details.
+
+After indexing has completed, you should be able to perform retrieval as follows:
+
+```
+${ranking_cmds}
+```
+
+Evaluation can be performed using `trec_eval`:
+
+```
+${eval_cmds}
+```
+
+## Effectiveness
+
+With the above commands, you should be able to reproduce the following results:
+
+${effectiveness}
+
+The above runs are in TREC output format and evaluated with `trec_eval`.
+In order to reproduce results reported in the paper, we need to convert to MS MARCO output format and then evaluate:
+
+```bash
+python tools/scripts/msmarco/convert_trec_to_msmarco_run.py \
+   --input runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz \
+   --output runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco --quiet
+
+python tools/scripts/msmarco/msmarco_passage_eval.py \
+   tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt \
+   runs/run.msmarco-passage-unicoil.unicoil.topics.msmarco-passage.dev-subset.unicoil.tsv.gz.msmarco
+```
+
+The results should be as follows:
+
+```
+#####################
+MRR @10: 0.35155222404147896
+QueriesRanked: 6980
+#####################
+```
+
+This corresponds to the effectiveness reported in the paper.