Skip to content

Commit

Permalink
Rename dense index (castorini#359)
Browse files Browse the repository at this point in the history
* rename dense index

* underscores to dashes
  • Loading branch information
MXueguang committed Feb 10, 2021
1 parent b173b32 commit 9936cfc
Show file tree
Hide file tree
Showing 14 changed files with 149 additions and 149 deletions.
60 changes: 30 additions & 30 deletions docs/experiments-dpr.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ You'll need a Pyserini [development installation](https://github.com/castorini/p
Run DPR retrieval with Wikipedia brute force index

```bash
$ python -m pyserini.dsearch --topics dpr_nq_test \
$ python -m pyserini.dsearch --topics dpr-nq-test \
--index wikipedia-dpr-multi-bf \
--output runs/run.dpr.nq.multi.bf.trec \
--batch 36 --threads 12
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_nq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-nq-test \
--index wikipedia-dpr \
--input runs/run.dpr.nq.multi.bf.trec \
--output runs/run.dpr.nq.multi.bf.json
Expand All @@ -37,15 +37,15 @@ Top100 accuracy: 0.8609418282548477
### BM25 retrieval

```bash
$ python -m pyserini.search --topics dpr_nq_test \
$ python -m pyserini.search --topics dpr-nq-test \
--index wikipedia-dpr \
--output runs/run.nq-test.bm25.trec
```


To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_nq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-nq-test \
--index wikipedia-dpr \
--input runs/run.nq-test.bm25.trec \
--output runs/run.nq-test.bm25.json
Expand All @@ -69,13 +69,13 @@ $ python -m pyserini.hsearch dense --index wikipedia-dpr-multi-bf \
--batch-size 72 --threads 72 \
sparse --index wikipedia-dpr \
fusion --alpha 1.3 \
run --topics dpr_nq_test \
run --topics dpr-nq-test \
--output runs/run.nq-test.dpr.bf.bm25.trec
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_nq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-nq-test \
--index wikipedia-dpr \
--input runs/run.nq-test.dpr.bf.bm25.trec \
--output runs/run.nq-test.dpr.bf.bm25.json
Expand All @@ -93,15 +93,15 @@ Top100 accuracy: 0.8858725761772853
Run DPR retrieval with Wikipedia brute force index

```bash
$ python -m pyserini.dsearch --topics dpr_trivia_test \
$ python -m pyserini.dsearch --topics dpr-trivia-test \
--index wikipedia-dpr-multi-bf \
--output runs/run.dpr.trivia.multi.bf.trec \
--batch 72 --threads 72
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_trivia_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.dpr.trivia.multi.bf.trec \
--output runs/run.dpr.trivia.multi.bf.json
Expand All @@ -118,15 +118,15 @@ Top100 accuracy: 0.847874127110404
### BM25 retrieval

```bash
$ python -m pyserini.search --topics dpr_trivia_test \
$ python -m pyserini.search --topics dpr-trivia-test \
--index wikipedia-dpr \
--output runs/run.trivia-test.bm25.trec
```


To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_trivia_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.trivia-test.bm25.trec \
--output runs/run.trivia-test.bm25.json
Expand All @@ -150,13 +150,13 @@ $ python -m pyserini.hsearch dense --index wikipedia-dpr-multi-bf \
--batch-size 72 --threads 72 \
sparse --index wikipedia-dpr \
fusion --alpha 0.95 \
run --topics dpr_trivia_test \
run --topics dpr-trivia-test \
--output runs/run.trivia-test.dpr.bf.bm25.trec
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_trivia_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-trivia-test \
--index wikipedia-dpr \
--input runs/run.trivia-test.dpr.bf.bm25.trec \
--output runs/run.trivia-test.dpr.bf.bm25.json
Expand All @@ -174,15 +174,15 @@ Top100 accuracy: 0.8654645098559179
Run DPR retrieval with Wikipedia brute force index

```bash
$ python -m pyserini.dsearch --topics dpr_wq_test \
$ python -m pyserini.dsearch --topics dpr-wq-test \
--index wikipedia-dpr-multi-bf \
--output runs/run.dpr.wq.multi.bf.trec \
--batch 72 --threads 72
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_wq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-wq-test \
--index wikipedia-dpr \
--input runs/run.dpr.wq.multi.bf.trec \
--output runs/run.dpr.wq.multi.bf.json
Expand All @@ -199,15 +199,15 @@ Top100 accuracy: 0.8297244094488189
### BM25 retrieval

```bash
$ python -m pyserini.search --topics dpr_wq_test \
$ python -m pyserini.search --topics dpr-wq-test \
--index wikipedia-dpr \
--output runs/run.wq-test.bm25.trec
```


To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_wq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-wq-test \
--index wikipedia-dpr \
--input runs/run.wq-test.bm25.trec \
--output runs/run.wq-test.bm25.json
Expand All @@ -231,13 +231,13 @@ $ python -m pyserini.hsearch dense --index wikipedia-dpr-multi-bf \
--batch-size 72 --threads 72 \
sparse --index wikipedia-dpr \
fusion --alpha 0.95 \
run --topics dpr_wq_test \
run --topics dpr-wq-test \
--output runs/run.wq-test.dpr.bf.bm25.trec
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_wq_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-wq-test \
--index wikipedia-dpr \
--input runs/run.wq-test.dpr.bf.bm25.trec \
--output runs/run.wq-test.dpr.bf.bm25.json
Expand All @@ -255,15 +255,15 @@ Top100 accuracy: 0.843996062992126
Run DPR retrieval with Wikipedia brute force index

```bash
$ python -m pyserini.dsearch --topics dpr_curated_test \
$ python -m pyserini.dsearch --topics dpr-curated-test \
--index wikipedia-dpr-multi-bf \
--output runs/run.dpr.curated.multi.bf.trec \
--batch 72 --threads 72
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_curated_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-curated-test \
--index wikipedia-dpr \
--input runs/run.dpr.curated.multi.bf.trec \
--output runs/run.dpr.curated.multi.bf.json
Expand All @@ -280,15 +280,15 @@ Top100 accuracy: 0.9337175792507204
### BM25 retrieval

```bash
$ python -m pyserini.search --topics dpr_curated_test \
$ python -m pyserini.search --topics dpr-curated-test \
--index wikipedia-dpr \
--output runs/run.curated-test.bm25.trec
```


To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_curated_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-curated-test \
--index wikipedia-dpr \
--input runs/run.curated-test.bm25.trec \
--output runs/run.curated-test.bm25.json
Expand All @@ -312,13 +312,13 @@ $ python -m pyserini.hsearch dense --index wikipedia-dpr-multi-bf \
--batch-size 72 --threads 72 \
sparse --index wikipedia-dpr \
fusion --alpha 1.05 \
run --topics dpr_curated_test \
run --topics dpr-curated-test \
--output runs/run.curated-test.dpr.bf.bm25.trec
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_curated_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-curated-test \
--index wikipedia-dpr \
--input runs/run.curated-test.dpr.bf.bm25.trec \
--output runs/run.curated-test.dpr.bf.bm25.json
Expand All @@ -336,15 +336,15 @@ Top100 accuracy: 0.9495677233429395
Run DPR retrieval with Wikipedia brute force index

```bash
$ python -m pyserini.dsearch --topics dpr_squad_test \
$ python -m pyserini.dsearch --topics dpr-squad-test \
--index wikipedia-dpr-multi-bf \
--output runs/run.dpr.squad.multi.bf.trec \
--batch 72 --threads 72
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_squad_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-squad-test \
--index wikipedia-dpr \
--input runs/run.dpr.squad.multi.bf.trec \
--output runs/run.dpr.squad.multi.bf.json
Expand All @@ -361,15 +361,15 @@ Top100 accuracy: 0.6772942289498581
### BM25 retrieval

```bash
$ python -m pyserini.search --topics dpr_squad_test \
$ python -m pyserini.search --topics dpr-squad-test \
--index wikipedia-dpr \
--output runs/run.squad-test.bm25.trec
```


To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_squad_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-squad-test \
--index wikipedia-dpr \
--input runs/run.squad-test.bm25.trec \
--output runs/run.squad-test.bm25.json
Expand All @@ -393,13 +393,13 @@ $ python -m pyserini.hsearch dense --index wikipedia-dpr-multi-bf \
--batch-size 72 --threads 72 \
sparse --index wikipedia-dpr \
fusion --alpha 2.00 \
run --topics dpr_squad_test \
run --topics dpr-squad-test \
--output runs/run.squad-test.dpr.bf.bm25.trec
```

To evaluate convert the TREC style run file to retrieval result file in `json` format
```bash
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr_squad_test \
$ python scripts/dpr/convert_trec_run_to_retrieval_json.py --topics dpr-squad-test \
--index wikipedia-dpr \
--input runs/run.squad-test.dpr.bf.bm25.trec \
--output runs/run.squad-test.dpr.bf.bm25.json
Expand Down
2 changes: 1 addition & 1 deletion docs/experiments-msmarco-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Conveniently, Pyserini already knows how to load and iterate through these pairs
We can now perform retrieval using these queries:

```bash
python -m pyserini.search --topics msmarco_doc_dev \
python -m pyserini.search --topics msmarco-doc-dev \
--index indexes/lucene-index-msmarco-doc \
--output runs/run.msmarco-doc.bm25tuned.txt \
--bm25 --msmarco --hits 100 --k1 4.46 --b 0.82
Expand Down
2 changes: 1 addition & 1 deletion docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Conveniently, Pyserini already knows how to load and iterate through these pairs
We can now perform retrieval using these queries:
```bash
python -m pyserini.search --topics msmarco_passage_dev_subset \
python -m pyserini.search --topics msmarco-passage-dev-subset \
--index indexes/lucene-index-msmarco-passage \
--output runs/run.msmarco-passage.bm25tuned.txt \
--bm25 --msmarco --hits 1000 --k1 0.82 --b 0.68
Expand Down
12 changes: 6 additions & 6 deletions docs/experiments-tct_colbert.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ You'll need a Pyserini [development installation](https://github.com/castorini/p
MS MARCO passage ranking task, dense retrieval with TCT-ColBERT, brute force index.

```bash
$ python -m pyserini.dsearch --topics msmarco_passage_dev_subset \
$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
--index msmarco-passage-tct_colbert-bf \
--batch-size 36 \
--threads 12 \
Expand Down Expand Up @@ -49,7 +49,7 @@ NOTE: Using GPU query encoding will give slightly different result. (E.g. MRR @1

MS MARCO passage ranking task, dense retrieval with TCT-ColBERT, HNSW index.
```bash
$ python -m pyserini.dsearch --topics msmarco_passage_dev_subset \
$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
--index msmarco-passage-tct_colbert-hnsw \
--output runs/run.msmarco-passage.tct_colbert.hnsw.tsv \
--msmarco
Expand Down Expand Up @@ -95,7 +95,7 @@ python -m pyserini.hsearch dense --index msmarco-passage-tct_colbert-bf \
--batch-size 36 --threads 12 \
sparse --index msmarco-passage \
fusion --alpha 0.12 \
run --topics msmarco_passage_dev_subset \
run --topics msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.tct_colbert.bf.bm25.tsv \
--msmarco
```
Expand Down Expand Up @@ -137,7 +137,7 @@ python -m pyserini.hsearch dense --index msmarco-passage-tct_colbert-bf \
--batch-size 36 --threads 12 \
sparse --index msmarco-passage-expanded \
fusion --alpha 0.22 \
run --topics msmarco_passage_dev_subset \
run --topics msmarco-passage-dev-subset \
--output runs/run.msmarco-passage.tct_colbert.bf.doc2queryT5.tsv \
--msmarco
```
Expand Down Expand Up @@ -173,7 +173,7 @@ on [Hugging Face](https://huggingface.co/castorini/tct_colbert-msmarco/tree/main
MS MARCO document ranking task, dense retrieval with TCT-ColBERT trained on MS MARCO passages, brute force index.

```bash
$ python -m pyserini.dsearch --topics msmarco_doc_dev \
$ python -m pyserini.dsearch --topics msmarco-doc-dev \
--index msmarco-doc-tct_colbert-bf \
--encoder castorini/tct_colbert-msmarco \
--output runs/run.msmarco-doc.passage.tct_colbert.txt \
Expand Down Expand Up @@ -215,7 +215,7 @@ python -m pyserini.hsearch dense --index msmarco-doc-tct_colbert-bf \
--batch-size 36 --threads 12 \
sparse --index msmarco-doc-expanded-per-passage \
fusion --alpha 0.32 \
run --topics msmarco_doc_dev \
run --topics msmarco-doc-dev \
--output runs/run.msmarco-doc.tct_colbert.bf.doc2queryT5.tsv \
--hits 1000 --max-passage --max-passage-hits 100 \
--msmarco
Expand Down
Loading

0 comments on commit 9936cfc

Please sign in to comment.