Skip to content

Commit

Permalink
Merge branch 'master' of github.com:castorini/pyserini into feature/m…
Browse files Browse the repository at this point in the history
…sp-index
  • Loading branch information
crystina-z committed May 18, 2022
2 parents 950fc2f + a2c4eeb commit 7b099d5
Show file tree
Hide file tree
Showing 161 changed files with 7,872 additions and 1,367 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,7 @@ We provide a number of [pre-built indexes](docs/prebuilt-indexes.md) that direct

## Release History

+ v0.16.1: May 12, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.1.md)]
+ v0.16.0: March 1, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.0.md)]
+ v0.15.0: January 21, 2022 [[Release Notes](docs/release-notes/release-notes-v0.15.0.md)]
+ v0.14.0: November 8, 2021 [[Release Notes](docs/release-notes/release-notes-v0.14.0.md)]
Expand Down
2 changes: 1 addition & 1 deletion docs/experiments-ance-prf.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Second, it takes two more parameters, one `--ance-prf-encoder` which points to t

For the lucene index, it needs to have `--storeRaw` enabled when building the index.

To reproduce `TREC DL 2019 Passage`, use the command below, change `--ance-prf-encoder` to the path that stores the checkpoint:
To reproduce `TREC DL 2019 Passage`, use the command below, change `--ance-prf-encoder` to the path that stores the checkpoint (Remember to check if `merges.txt` exists in your checkpoint directory, if it doesn't, you can download this file from [roberta-base](https://huggingface.co/roberta-base/tree/main) and add it to the checkpoint directory)
```
$ python -m pyserini.dsearch --topics dl19-passage \
--index msmarco-passage-ance-bf \
Expand Down
3 changes: 3 additions & 0 deletions docs/experiments-dkrr.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,9 @@ The expected results are as follows, shown in the "ours" column:

For reference, reported results from the paper (Table 7) are shown in the "orig" column.

## Hybrid sparse-dense retrieval with GAR-T5

Running hybrid sparse-dense retrieval with DKKR and [GAR-T5](https://github.com/castorini/pyserini/blob/master/docs/experiments-gar-t5.md) is detailed in [experiments-gar-t5.md](https://github.com/castorini/pyserini/blob/master/docs/experiments-gar-t5.md#hybrid-sparse-dense-retrieval-with-dkrr)

## Reproduction Log[*](reproducibility.md)

Expand Down
93 changes: 73 additions & 20 deletions docs/experiments-gar-t5.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,57 +10,110 @@ Download the dataset from HuggingFace and use script to process it to a .tsv fil
```bash
export ANSERINI=<path to anserini>

python scripts/gar/query_augmentation_tsv.py --dataset <nq or trivia> --data_split <validation or test> --output_path <default is augmented_topics.tsv> --sentences <optional> --titles <optional> --answers <optional>
python scripts/gar/query_augmentation_tsv.py \
--dataset <nq or trivia> \
--data_split <validation or test> \
--output_path <default is augmented_topics.tsv> \
--sentences <optional> \
--titles <optional> \
--answers <optional>
```

## Evaluation
## GAR-T5 enhanced retrieval evaluation
To evaluate the augmented queries, we need to concatenate and convert them into .tsv format for us to run BM25-search on Pyserini, which is then converted to .json format as required for evaluation.

Without specifying the output path, the default output will be an `augmented_topics.tsv` file in the working directory.

Once we have the tsv file, we can proceed to run search and evaluation

```bash
python -m pyserini.search --topics augmented_topics.tsv --index wikipedia-dpr --output runs/gar-bart-run.trec --batch-size 70 --threads 70

python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run --topics <nq-test or dpr-trivia-test> --index wikipedia-dpr --input runs/gar-bart-run.trec --output runs/gar-bart-run.json
python -m pyserini.search \
--topics augmented_topics.tsv \
--index wikipedia-dpr \
--output runs/gar-t5-run.trec \
--batch-size 70 \
--threads 70

python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics <nq-test, nq-dev, dpr-trivia-dev or dpr-trivia-test> \
--index wikipedia-dpr \
--input runs/gar-t5-run.trec \
--output runs/gar-t5-run.json
```

To run fusion RRF, you will need all three (answers, titles, sentences) trec files
```bash
python -m $ANSERINI/src/main/python/fusion.py --runs <path to answers.trec> <path to sentences.trec> <path to titles.trec> --out <output path>
python -m pyserini.fusion \
--runs <path to answers.trec> <path to sentences.trec> <path to titles.trec> \
--output <output path fusion.trec>
```

To evaluate the run:
```
python -m pyserini.eval.evaluate_dpr_retrieval --retrieval runs/gar-bart-run.json --topk 1 5 10 20 50 100 200 300 500 1000
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/gar-t5-run.json \
--topk 1 5 10 20 50 100 200 300 500 1000
```

This should give you the topk scores as you wanted

This should give you the topk scores as below

### Dev Scores from Gar-T5
### Dev Scores from GAR-T5
| Dataset | Features | Top1 | Top5 | Top10 | Top20 | Top50 | Top100 | Top200 | Top300 | Top500 | Top1000 |
|:--------:|:--------:|:-----:|:-----:|:-----:|:-----:|:-----:|:------:|:------:|:------:|:------:|:-------:|
| NQ | answer | 40.33 | 57.76 | 64.26 | 70.38 | 76.96 | 81.20 | 84.33 | 85.91 | 87.83 | 89.94 |
| NQ | sentence | 42.00 | 57.78 | 64.12 | 69.59 | 75.62 | 79.67 | 83.03 | 85.04 | 86.87 | 89.00 |
| NQ | title | 32.15 | 50.66 | 58.68 | 65.76 | 73.30 | 78.25 | 82.19 | 84.45 | 85.91 | 88.01 |
| NQ | title | 32.15 | 50.66 | 58.68 | 65.76 | 73.30 | 78.25 | 82.19 | 84.15 | 85.91 | 88.01 |
| NQ | fusion | 45.44 | 64.89 | 71.82 | 77.16 | 82.55 | 85.34 | 88.00 | 89.15 | 90.13 | 91.74 |
| TriviaQA | answer | 55.92 | 70.39 | 74.77 | 78.39 | 82.36 | 84.55 | 86.23 | 87.42 | 88.36 | 89.34 |
| TriviaQA | sentence | 49.17 | 63.30 | 68.42 | 72.57 | 77.55 | 80.67 | 83.33 | 84.93 | 86.22 | 87.78 |
| TriviaQA | title | 47.58 | 61.31 | 66.59 | 71.57 | 76.79 | 80.15 | 82.95 | 84.18 | 85.65 | 87.30 |
| TriviaQA | fusion | 59.48 | 73.43 | 77.29 | 80.43 | 83.80 | 85.60 | 87.11 | 87.81 | 88.70 | 89.68 |

### Test Scores from Gar-T5
### Test Scores from GAR-T5
| Dataset | Features | Top1 | Top5 | Top10 | Top20 | Top50 | Top100 | Top200 | Top300 | Top500 | Top1000 |
|:--------:|:--------:|:-----:|:-----:|:-----:|:-----:|:-----:|:------:|:------:|:------:|:------:|:-------:|
| NQ | answer | 40.30 | 57.51 | 64.24 | 70.11 | 77.23 | 81.75 | 85.10 | 85.79 | 88.39 | 90.80 |
| NQ | sentence | 40.30 | 57.45 | 64.27 | 69.81 | 77.34 | 81.50 | 85.26 | 85.76 | 88.12 | 90.17 |
| NQ | title | 32.11 | 51.66 | 59.47 | 66.90 | 74.85 | 79.17 | 82.96 | 84.96 | 86.70 | 88.95 |
| NQ | fusion | 45.35 | 64.63 | 71.75 | 77.17 | 83.41 | 86.90 | 89.14 | 89.67 | 91.63 | 92.91 |
| TriviaQA | answer | 55.89 | 69.57 | 73.96 | 77.95 | 82.14 | 84.76 | 86.86 | 86.97 | 88.60 | 89.56 |
| TriviaQA | sentence | 48.96 | 62.68 | 68.05 | 72.47 | 77.51 | 80.84 | 83.54 | 84.47 | 86.23 | 87.93 |
| TriviaQA | title | 47.70 | 61.28 | 66.37 | 71.24 | 76.59 | 80.04 | 82.90 | 84.51 | 85.96 | 87.64 |
| TriviaQA | fusion | 59.00 | 72.82 | 76.93 | 80.66 | 84.10 | 85.95 | 87.39 | 87.62 | 89.07 | 90.06 |
| NQ | answer | 40.30 | 57.51 | 64.24 | 70.11 | 77.23 | 81.75 | 85.10 | 86.68 | 88.39 | 90.80 |
| NQ | sentence | 40.30 | 57.45 | 64.27 | 69.81 | 77.34 | 81.50 | 85.26 | 86.73 | 88.12 | 90.17 |
| NQ | title | 32.11 | 51.66 | 59.47 | 66.90 | 74.85 | 79.17 | 82.96 | 84.65 | 86.70 | 88.95 |
| NQ | fusion | 45.35 | 64.63 | 71.75 | 77.17 | 83.41 | 86.90 | 89.14 | 90.30 | 91.63 | 92.91 |
| TriviaQA | answer | 55.89 | 69.57 | 73.96 | 77.95 | 82.14 | 84.76 | 86.86 | 87.66 | 88.60 | 89.56 |
| TriviaQA | sentence | 48.96 | 62.68 | 68.05 | 72.47 | 77.51 | 80.84 | 83.54 | 85.01 | 86.23 | 87.93 |
| TriviaQA | title | 47.70 | 61.28 | 66.37 | 71.24 | 76.59 | 80.04 | 82.90 | 84.49 | 85.96 | 87.64 |
| TriviaQA | fusion | 59.00 | 72.82 | 76.93 | 80.66 | 84.10 | 85.95 | 87.39 | 88.15 | 89.07 | 90.06 |

## Hybrid sparse-dense retrieval with DKRR

To run hybrid sparse-dense retrieval with GAR-T5 and [DKRR](https://github.com/castorini/pyserini/blob/master/docs/experiments-dkrr.md):
```
python -m pyserini.fusion \
--runs runs/gar-t5-run-fusion.trec runs/run.dpr-dkrr.trec \
--output runs/run.dkrr.gar.hybrid.trec
python -m pyserini.eval.convert_trec_run_to_dpr_retrieval_run \
--topics <nq-test, nq-dev, dpr-trivia-dev or dpr-trivia-test> \
--index wikipedia-dpr \
--input runs/run.dkrr.gar.hybrid.trec \
--output runs/run.dkrr.gar.hybrid.json
python -m pyserini.eval.evaluate_dpr_retrieval \
--retrieval runs/run.dkrr.gar.hybrid.json \
--topk 1 5 10 20 50 100 200 300 500 1000
```

The scores for this hybrid retrieval are as follows

### Dev Scores
| Dataset | Features | Top1 | Top5 | Top10 | Top20 | Top50 | Top100 | Top200 | Top300 | Top500 | Top1000 |
|:--------:|:------------------:|:-----:|:-----:|:-----:|:-----:|:-----:|:------:|:------:|:------:|:------:|:-------:|
| NQ | hybrid (with DKRR) | 53.36 | 73.66 | 79.92 | 84.46 | 88.24 | 90.22 | 91.42 | 92.10 | 92.65 | 93.26 |
| TriviaQA | hybrid (with DKRR) | 65.81 | 79.40 | 82.34 | 84.69 | 86.87 | 88.05 | 88.99 | 89.52 | 90.05 | 90.61 |

### Test Scores
| Dataset | Features | Top1 | Top5 | Top10 | Top20 | Top50 | Top100 | Top200 | Top300 | Top500 | Top1000 |
|:--------:|:------------------:|:-----:|:-----:|:-----:|:-----:|:-----:|:------:|:------:|:------:|:------:|:-------:|
| NQ | hybrid (with DKRR) | 53.07 | 74.60 | 80.25 | 84.90 | 88.89 | 90.86 | 91.99 | 92.66 | 93.35 | 94.18 |
| TriviaQA | hybrid (with DKRR) | 64.71 | 78.62 | 82.55 | 85.01 | 87.20 | 88.41 | 89.36 | 89.85 | 90.29 | 90.83 |

## Reproduction Log[*](reproducibility.md)

+ Results reproduced by [@manveertamber](https://github.com/manveertamber) on 2022-05-04 (commit [`1facc72`](https://github.com/castorini/pyserini/commit/1facc72b3c8313149c763b76502f43352efaf974))
4 changes: 3 additions & 1 deletion docs/experiments-msmarco-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,6 @@ We can see that Anserini's (tuned) BM25 baseline is already much better than the
+ Results reproduced by [@AceZhan](https://github.com/AceZhan) on 2022-01-14 (commit [`68be809`](https://github.com/castorini/pyserini/commit/68be8090b8553fc6eaf352ac690a6de9d3dc82dd))
+ Results reproduced by [@jh8liang](https://github.com/jh8liang) on 2022-02-06 (commit [`e03e068`](https://github.com/castorini/pyserini/commit/e03e06880ad4f6d67a1666c1dd45ce4250adc95d))
+ Results reproduced by [@HAKSOAT](https://github.com/HAKSOAT) on 2022-03-11 (commit [`7796685`](https://github.com/castorini/pyserini/commit/77966851755163e36489544fb08f73171e98103f))
+ Results reproduced by [@jasper-xian](https://github.com/jasper-xian) on 2022-03-27 (commit [`5668edd`](https://github.com/castorini/pyserini/commit/5668edd6f1e61e9c57d600d41d3d1f58b775d371))
+ Results reproduced by [@jasper-xian](https://github.com/jasper-xian) on 2022-03-27 (commit [`5668edd`](https://github.com/castorini/pyserini/commit/5668edd6f1e61e9c57d600d41d3d1f58b775d371))
+ Results reproduced by [@jx3yang](https://github.com/jx3yang) on 2022-04-25 (commit [`53333e0`](https://github.com/castorini/pyserini/commit/53333e0fb77371e049e24b10da3a20646c7b5af7))
+ Results reproduced by [@alvind1](https://github.com/alvind1) on 2022-05-05 (commit [`244828f`](https://github.com/castorini/pyserini/commit/244828f6d6d70a7405e0906a700a5ce8ef0def15))
Loading

0 comments on commit 7b099d5

Please sign in to comment.