Skip to content
Permalink
Browse files

Add TREC CAR doc2query regression (#768)

  • Loading branch information...
lintool committed Aug 9, 2019
1 parent f7c134f commit 1217d475c88cc4782ff3056506afc43d71bf31fb
@@ -71,8 +71,9 @@ Note that these regressions capture the "out of the box" experience, based on [_
+ [Regressions for Tweets2013 (MB13 & MB14)](docs/regressions-mb13.md)
+ [Regressions for Complex Answer Retrieval v1.5 (CAR17)](docs/regressions-car17v1.5.md)
+ [Regressions for Complex Answer Retrieval v2.0 (CAR17)](docs/regressions-car17v2.0.md)
+ [Regressions for Complex Answer Retrieval v2.0 (CAR17) with Doc2query expansion](docs/regressions-car17v2.0-doc2query.md)
+ [Regressions for the MS MARCO Passage Task](docs/regressions-msmarco-passage.md)
+ [Regressions for the MS MARCO Passage Task with doc2query expansion](docs/regressions-msmarco-passage-doc2query.md)
+ [Regressions for the MS MARCO Passage Task with Doc2query expansion](docs/regressions-msmarco-passage-doc2query.md)
+ [Regressions for the MS MARCO Document Task](docs/regressions-msmarco-doc.md)

Other experiments:
@@ -1,4 +1,6 @@
# Anserini: Regressions for [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v1.5)
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v1.5)

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v1.5).

## Indexing

@@ -11,7 +13,7 @@ lucene-index.car17v1.5.pos+docvectors+rawdocs -storePositions -storeDocvectors \
-storeRawDocs >& log.car17v1.5.pos+docvectors+rawdocs &
```

The directory `/path/to/Car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
The directory `/path/to/car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -0,0 +1,83 @@
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v2.0) + Doc2query

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v2.0), with Doc2query expansions, as proposed in the following paper:

+ Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) _arxiv:1904.08375_

These experiments are integrated into Anserini's regression testing framework.
For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md).

## Indexing

Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection JsonCollection \
-generator LuceneDocumentGenerator -threads 30 -input \
/path/to/car17v2.0-doc2query -index \
lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -storePositions \
-storeDocvectors -storeRawDocs >& log.car17v2.0-doc2query.pos+docvectors+rawdocs \
&
```

The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the Doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](experiments-doc2query.md).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

The "benchmarkY1-test" topics and qrels (v2.0) are stored in `src/main/resources/topics-and-qrels/`, downloaded from [the CAR website](http://trec-car.cs.unh.edu/datareleases/):

+ `topics.car17v2.0.benchmarkY1test.txt`
+ `qrels.car17v2.0.benchmarkY1test.txt`

Specifically, this is the section-level passage retrieval task with automatic ground truth.

After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.bm25.topics.car17v2.0.benchmarkY1test.txt -bm25 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.bm25+rm3.topics.car17v2.0.benchmarkY1test.txt -bm25 -rm3 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.bm25+ax.topics.car17v2.0.benchmarkY1test.txt -bm25 -axiom -rerankCutoff 20 -axiom.deterministic &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.ql.topics.car17v2.0.benchmarkY1test.txt -ql &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.ql+rm3.topics.car17v2.0.benchmarkY1test.txt -ql -rm3 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt -output run.car17v2.0-doc2query.ql+ax.topics.car17v2.0.benchmarkY1test.txt -ql -axiom -rerankCutoff 20 -axiom.deterministic &
```

Evaluation can be performed using `trec_eval`:

```
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.bm25.topics.car17v2.0.benchmarkY1test.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.bm25+rm3.topics.car17v2.0.benchmarkY1test.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.bm25+ax.topics.car17v2.0.benchmarkY1test.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.ql.topics.car17v2.0.benchmarkY1test.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.ql+rm3.topics.car17v2.0.benchmarkY1test.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt run.car17v2.0-doc2query.ql+ax.topics.car17v2.0.benchmarkY1test.txt
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

MAP | BM25 | +RM3 | +Ax | QL | +RM3 | +Ax |
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
[TREC 2017 CAR: benchmarkY1test (v2.0)](http://trec-car.cs.unh.edu/datareleases/)| 0.1807 | 0.1521 | 0.1470 | 0.1752 | 0.1453 | 0.1339 |


RECIP_RANK | BM25 | +RM3 | +Ax | QL | +RM3 | +Ax |
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
[TREC 2017 CAR: benchmarkY1test (v2.0)](http://trec-car.cs.unh.edu/datareleases/)| 0.2750 | 0.2275 | 0.2186 | 0.2653 | 0.2156 | 0.1981 |


@@ -1,4 +1,6 @@
# Anserini: Regressions for [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v2.0)
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v2.0)

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v2.0).

## Indexing

@@ -11,7 +13,7 @@ lucene-index.car17v2.0.pos+docvectors+rawdocs -storePositions -storeDocvectors \
-storeRawDocs >& log.car17v2.0.pos+docvectors+rawdocs &
```

The directory `/path/to/Car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
The directory `/path/to/car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -1,4 +1,6 @@
# Anserini: Regressions for [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v1.5)
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v1.5)

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v1.5).

## Indexing

@@ -8,7 +10,7 @@ Typical indexing command:
${index_cmds}
```

The directory `/path/to/Car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
The directory `/path/to/car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -0,0 +1,47 @@
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v2.0) + Doc2query

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v2.0), with Doc2query expansions, as proposed in the following paper:

+ Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho. [Document Expansion by Query Prediction.](https://arxiv.org/abs/1904.08375) _arxiv:1904.08375_

These experiments are integrated into Anserini's regression testing framework.
For more complete instructions on how to run end-to-end experiments, refer to [this page](experiments-doc2query.md).

## Indexing

Typical indexing command:

```
${index_cmds}
```

The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the Doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](experiments-doc2query.md).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

The "benchmarkY1-test" topics and qrels (v2.0) are stored in `src/main/resources/topics-and-qrels/`, downloaded from [the CAR website](http://trec-car.cs.unh.edu/datareleases/):

+ `topics.car17v2.0.benchmarkY1test.txt`
+ `qrels.car17v2.0.benchmarkY1test.txt`

Specifically, this is the section-level passage retrieval task with automatic ground truth.

After indexing has completed, you should be able to perform retrieval as follows:

```
${ranking_cmds}
```

Evaluation can be performed using `trec_eval`:

```
${eval_cmds}
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

${effectiveness}
@@ -1,4 +1,6 @@
# Anserini: Regressions for [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v2.0)
# Anserini: Regressions for [CAR17](http://trec-car.cs.unh.edu/) (v2.0)

This page documents regression experiments for the [TREC 2017 Complex Answer Retrieval (CAR)](http://trec-car.cs.unh.edu/) section-level passage retrieval task (v2.0).

## Indexing

@@ -8,7 +10,7 @@ Typical indexing command:
${index_cmds}
```

The directory `/path/to/Car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
The directory `/path/to/car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -0,0 +1,109 @@
---
name: car17v2.0-doc2query
index_command: target/appassembler/bin/IndexCollection
index_utils_command: target/appassembler/bin/IndexUtils
search_command: target/appassembler/bin/SearchCollection
topic_root: src/main/resources/topics-and-qrels/
qrels_root: src/main/resources/topics-and-qrels/
ranking_root:
generator: LuceneDocumentGenerator
threads: 30
index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
topic_reader: Car
input_roots:
- /tuna1/
- /scratch2/
input: collections/car/paragraphCorpus.v2.0-expanded-topk10/
index_path: indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs
collection: JsonCollection
index_stats:
documents: 29794697
documents (non-empty): 29794694
total terms: 2541082416
topics:
- name: "[TREC 2017 CAR: benchmarkY1test (v2.0)](http://trec-car.cs.unh.edu/datareleases/)"
path: topics.car17v2.0.benchmarkY1test.txt
qrel: qrels.car17v2.0.benchmarkY1test.txt
evals:
- command: eval/trec_eval.9.0.4/trec_eval
params:
- -c -m map # -c: average over all queries in qrels - this is critical here
separator: "\t"
parse_index: 2
metric: map
metric_precision: 4
can_combine: true
- command: eval/trec_eval.9.0.4/trec_eval
params:
- -c -m recip_rank
separator: "\t"
parse_index: 2
metric: recip_rank
metric_precision: 4
can_combine: true
models:
- name: bm25
display: BM25
params:
- -bm25
results:
map:
- 0.1807
recip_rank:
- 0.2750
- name: bm25+rm3
display: +RM3
params:
- -bm25
- -rm3
results:
map:
- 0.1521
recip_rank:
- 0.2275
- name: bm25+ax
display: +Ax
params:
- -bm25
- -axiom
- -rerankCutoff 20
- -axiom.deterministic
results:
map:
- 0.1470
recip_rank:
- 0.2186
- name: ql
display: QL
params:
- -ql
results:
map:
- 0.1752
recip_rank:
- 0.2653
- name: ql+rm3
display: +RM3
params:
- -ql
- -rm3
results:
map:
- 0.1453
recip_rank:
- 0.2156
- name: ql+ax
display: +Ax
params:
- -ql
- -axiom
- -rerankCutoff 20
- -axiom.deterministic
results:
map:
- 0.1339
recip_rank:
- 0.1981

0 comments on commit 1217d47

Please sign in to comment.
You can’t perform that action at this time.