Skip to content
Permalink
Browse files

Add CAR 2.0 regression (#640)

  • Loading branch information...
lintool committed May 11, 2019
1 parent d911bba commit 2ba2b9582ee942aee714301b78015a2ded16da8c
@@ -50,7 +50,8 @@ Note that these regressions capture the "out of the box" experience, based on [_
+ [Experiments on ClueWeb12](docs/experiments-cw12.md)
+ [Experiments on Tweets2011 (MB11 & MB12)](docs/experiments-mb11.md)
+ [Experiments on Tweets2013 (MB13 & MB14)](docs/experiments-mb13.md)
+ [Experiments on Complex Answer Retrieval (CAR) from TREC 2017 (v1.5)](docs/experiments-car17v1.5.md)
+ [Experiments on Complex Answer Retrieval v1.5 (CAR17)](docs/experiments-car17v1.5.md)
+ [Experiments on Complex Answer Retrieval v2.0 (CAR17)](docs/experiments-car17v2.0.md)
+ [Experiments on MS MARCO](docs/experiments-msmarco.md)

Additional regressions:
@@ -1,4 +1,4 @@
# Anserini: Experiments on [Car17](http://trec-car.cs.unh.edu/) (v1.5)
# Anserini: Experiments on [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v1.5)

## Indexing

@@ -11,7 +11,7 @@ lucene-index.car17v1.5.pos+docvectors+rawdocs -storePositions -storeDocvectors \
-storeRawDocs >& log.car17v1.5.pos+docvectors+rawdocs &
```

The directory `/path/to/Car17` should be the root directory of Car17 collection, i.e., `ls /path/to/Car17` should bring up a list of `.cbor` files.
The directory `/path/to/Car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -20,7 +20,7 @@ For additional details, see explanation of [common indexing options](common-inde
Topics and qrels are stored in `src/main/resources/topics-and-qrels/`, downloaded from NIST:

+ `topics.car17v1.5.test200.txt`: [Topics for the test200 subset (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
+ `qrel: qrels.car17v1.5.test200.txt`: [adhoc qrels (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
+ `qrels.car17v1.5.test200.txt`: [adhoc qrels (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)


After indexing has completed, you should be able to perform retrieval as follows:
@@ -0,0 +1,73 @@
# Anserini: Experiments on [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v2.0)

## Indexing

Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection CarCollection \
-generator LuceneDocumentGenerator -threads 40 -input /path/to/car17v2.0 -index \
lucene-index.car17v2.0.pos+docvectors+rawdocs -storePositions -storeDocvectors \
-storeRawDocs >& log.car17v2.0.pos+docvectors+rawdocs &
```

The directory `/path/to/Car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

The "benchmarkY1-test" topics and qrels are stored in `src/main/resources/topics-and-qrels/`, downloaded from [the CAR website](http://trec-car.cs.unh.edu/datareleases/):

+ `topics.car17v2.0.test.pages.cbor-hierarchical.txt`
+ `qrels.car17v2.0.test.pages.cbor-hierarchical.txt`


After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.bm25.topics.car17v2.0.test.pages.cbor-hierarchical.txt -bm25 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.bm25+rm3.topics.car17v2.0.test.pages.cbor-hierarchical.txt -bm25 -rm3 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.bm25+ax.topics.car17v2.0.test.pages.cbor-hierarchical.txt -bm25 -axiom -rerankCutoff 20 -axiom.deterministic &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.ql.topics.car17v2.0.test.pages.cbor-hierarchical.txt -ql &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.ql+rm3.topics.car17v2.0.test.pages.cbor-hierarchical.txt -ql -rm3 &
nohup target/appassembler/bin/SearchCollection -topicreader Car -index lucene-index.car17v2.0.pos+docvectors+rawdocs -topics src/main/resources/topics-and-qrels/topics.car17v2.0.test.pages.cbor-hierarchical.txt -output run.car17v2.0.ql+ax.topics.car17v2.0.test.pages.cbor-hierarchical.txt -ql -axiom -rerankCutoff 20 -axiom.deterministic &
```

Evaluation can be performed using `trec_eval`:

```
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.bm25.topics.car17v2.0.test.pages.cbor-hierarchical.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.bm25+rm3.topics.car17v2.0.test.pages.cbor-hierarchical.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.bm25+ax.topics.car17v2.0.test.pages.cbor-hierarchical.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.ql.topics.car17v2.0.test.pages.cbor-hierarchical.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.ql+rm3.topics.car17v2.0.test.pages.cbor-hierarchical.txt
eval/trec_eval.9.0.4/trec_eval -c -m map -c -m recip_rank src/main/resources/topics-and-qrels/qrels.car17v2.0.test.pages.cbor-hierarchical.txt run.car17v2.0.ql+ax.topics.car17v2.0.test.pages.cbor-hierarchical.txt
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

MAP | BM25 | BM25+RM3 | BM25+AX | QL | QL+RM3 | QL+AX |
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
benchmarkY1test | 0.1528 | 0.1270 | 0.1342 | 0.1353 | 0.1065 | 0.1054 |


RECIP_RANK | BM25 | BM25+RM3 | BM25+AX | QL | QL+RM3 | QL+AX |
:---------------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|
benchmarkY1test | 0.2294 | 0.1903 | 0.1943 | 0.1989 | 0.1577 | 0.1554 |


@@ -117,6 +117,8 @@ def verify_index(yaml_data, build_index=True, dry_run=False):
stat = line.split(':')[0]
if stat in yaml_data['index_stats']:
value = int(line.split(':')[1])
if value != yaml_data['index_stats'][stat]:
print('{}: expected={}, actual={}'.format(stat, yaml_data['index_stats'][stat], value))
assert value == yaml_data['index_stats'][stat]
logger.info(line)
logger.info('='*10+'Verifying Index Succeed'+'='*10)
@@ -1,4 +1,4 @@
# Anserini: Experiments on [Car17](http://trec-car.cs.unh.edu/) (v1.5)
# Anserini: Experiments on [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v1.5)

## Indexing

@@ -8,7 +8,7 @@ Typical indexing command:
${index_cmds}
```

The directory `/path/to/Car17` should be the root directory of Car17 collection, i.e., `ls /path/to/Car17` should bring up a list of `.cbor` files.
The directory `/path/to/Car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

@@ -17,7 +17,7 @@ For additional details, see explanation of [common indexing options](common-inde
Topics and qrels are stored in `src/main/resources/topics-and-qrels/`, downloaded from NIST:

+ `topics.car17v1.5.test200.txt`: [Topics for the test200 subset (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
+ `qrel: qrels.car17v1.5.test200.txt`: [adhoc qrels (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)
+ `qrels.car17v1.5.test200.txt`: [adhoc qrels (TREC 2017 Complex Answer Retrieval Track)](http://trec-car.cs.unh.edu/datareleases/v1.5/test200-v1.5.tar.xz)


After indexing has completed, you should be able to perform retrieval as follows:
@@ -0,0 +1,39 @@
# Anserini: Experiments on [TREC 2017 CAR](http://trec-car.cs.unh.edu/) (v2.0)

## Indexing

Typical indexing command:

```
${index_cmds}
```

The directory `/path/to/Car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).

For additional details, see explanation of [common indexing options](common-indexing-options.md).

## Retrieval

The "benchmarkY1-test" topics and qrels are stored in `src/main/resources/topics-and-qrels/`, downloaded from [the CAR website](http://trec-car.cs.unh.edu/datareleases/):

+ `topics.car17v2.0.test.pages.cbor-hierarchical.txt`
+ `qrels.car17v2.0.test.pages.cbor-hierarchical.txt`


After indexing has completed, you should be able to perform retrieval as follows:

```
${ranking_cmds}
```

Evaluation can be performed using `trec_eval`:

```
${eval_cmds}
```

## Effectiveness

With the above commands, you should be able to replicate the following results:

${effectiveness}
@@ -0,0 +1,103 @@
---
name: car17v2.0
index_command: target/appassembler/bin/IndexCollection
index_utils_command: target/appassembler/bin/IndexUtils
search_command: target/appassembler/bin/SearchCollection
topic_root: src/main/resources/topics-and-qrels/
qrels_root: src/main/resources/topics-and-qrels/
ranking_root:
generator: LuceneDocumentGenerator
threads: 40
index_options:
- -storePositions
- -storeDocvectors
- -storeRawDocs
topic_reader: Car
input_roots:
- /tuna1/
- /scratch2/
input: collections/car/paragraphCorpus.v2.0/
index_path: indexes/lucene-index.car17v2.0.pos+docvectors+rawdocs
collection: CarCollection
index_stats:
documents: 29794689
documents (non-empty): 29791041
total terms: 1249740109
topics:
- name: "benchmarkY1test"
path: topics.car17v2.0.test.pages.cbor-hierarchical.txt
qrel: qrels.car17v2.0.test.pages.cbor-hierarchical.txt
evals:
- command: eval/trec_eval.9.0.4/trec_eval
params:
- -c -m map # -c: average over all queries in qrels - this is critical here
separator: "\t"
parse_index: 2
metric: map
metric_precision: 4
can_combine: true
- command: eval/trec_eval.9.0.4/trec_eval
params:
- -c -m recip_rank
separator: "\t"
parse_index: 2
metric: recip_rank
metric_precision: 4
can_combine: true
models:
- name: bm25
params:
- -bm25
results:
map:
- 0.1528
recip_rank:
- 0.2294
- name: bm25+rm3
params:
- -bm25
- -rm3
results:
map:
- 0.1270
recip_rank:
- 0.1903
- name: bm25+ax
params:
- -bm25
- -axiom
- -rerankCutoff 20
- -axiom.deterministic
results:
map:
- 0.1342
recip_rank:
- 0.1943
- name: ql
params:
- -ql
results:
map:
- 0.1353
recip_rank:
- 0.1989
- name: ql+rm3
params:
- -ql
- -rm3
results:
map:
- 0.1065
recip_rank:
- 0.1577
- name: ql+ax
params:
- -ql
- -axiom
- -rerankCutoff 20
- -axiom.deterministic
results:
map:
- 0.1054
recip_rank:
- 0.1554
Oops, something went wrong.

0 comments on commit 2ba2b95

Please sign in to comment.
You can’t perform that action at this time.