Skip to content

Commit

Permalink
Add cosDPR-distil OTF 2CR (#1723)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool committed Nov 30, 2023
1 parent 8436a7b commit 170e271
Show file tree
Hide file tree
Showing 8 changed files with 211 additions and 33 deletions.
36 changes: 18 additions & 18 deletions docs/2cr/miracl.html
Original file line number Diff line number Diff line change
Expand Up @@ -655,13 +655,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO</td>
<td>0.499</td>
<td>0.462</td>
<td>0.443</td>
<td>0.394</td>
<td>0.478</td>
<td>0.480</td>
<td>0.472</td>
<td>0.435</td>
<td>0.390</td>
<td>0.383</td>
<td>0.272</td>
<td>0.439</td>
<td>0.419</td>
Expand All @@ -673,7 +673,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.490</td>
<td>0.444</td>
<td></td>
<td>0.422</td>
<td>0.421</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down Expand Up @@ -1136,13 +1136,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO then FT w/ all Mr. TyDi</td>
<td>0.578</td>
<td>0.592</td>
<td>0.580</td>
<td>0.281</td>
<td>0.251</td>
<td>0.384</td>
<td>0.569</td>
<td>0.301</td>
<td>0.332</td>
<td>0.329</td>
<td>0.346</td>
<td>0.500</td>
<td>0.486</td>
Expand All @@ -1154,7 +1154,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.322</td>
<td>0.598</td>
<td></td>
<td>0.463</td>
<td>0.462</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down Expand Up @@ -1617,13 +1617,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>Hybrid of `bm25` and `mdpr-tied-pft-msmarco`</td>
<td>0.673</td>
<td>0.671</td>
<td>0.654</td>
<td>0.549</td>
<td>0.641</td>
<td>0.594</td>
<td>0.672</td>
<td>0.523</td>
<td>0.615</td>
<td>0.616</td>
<td>0.443</td>
<td>0.576</td>
<td>0.609</td>
Expand All @@ -1635,7 +1635,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.564</td>
<td>0.611</td>
<td></td>
<td>0.580</td>
<td>0.579</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down Expand Up @@ -3486,13 +3486,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO</td>
<td>0.841</td>
<td>0.831</td>
<td>0.819</td>
<td>0.768</td>
<td>0.864</td>
<td>0.898</td>
<td>0.788</td>
<td>0.915</td>
<td>0.781</td>
<td>0.776</td>
<td>0.573</td>
<td>0.825</td>
<td>0.737</td>
Expand All @@ -3504,7 +3504,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.898</td>
<td>0.840</td>
<td></td>
<td>0.798</td>
<td>0.797</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down Expand Up @@ -3967,13 +3967,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO then FT w/ all Mr. TyDi</td>
<td>0.795</td>
<td>0.866</td>
<td>0.848</td>
<td>0.508</td>
<td>0.471</td>
<td>0.686</td>
<td>0.798</td>
<td>0.601</td>
<td>0.635</td>
<td>0.637</td>
<td>0.584</td>
<td>0.745</td>
<td>0.718</td>
Expand All @@ -3985,7 +3985,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.599</td>
<td>0.891</td>
<td></td>
<td>0.718</td>
<td>0.717</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down Expand Up @@ -4448,13 +4448,13 @@ <h1 class="mb-3">MIRACL</h1>
<td class="expand-button"></td>
<td>Hybrid of `bm25` and `mdpr-tied-pft-msmarco`</td>
<td>0.941</td>
<td>0.949</td>
<td>0.932</td>
<td>0.882</td>
<td>0.948</td>
<td>0.937</td>
<td>0.895</td>
<td>0.965</td>
<td>0.915</td>
<td>0.912</td>
<td>0.768</td>
<td>0.904</td>
<td>0.900</td>
Expand All @@ -4466,7 +4466,7 @@ <h1 class="mb-3">MIRACL</h1>
<td>0.948</td>
<td>0.950</td>
<td></td>
<td>0.897</td>
<td>0.895</td>
</tr>
<tr class="hide-table-padding">
<td colspan="22">
Expand Down
18 changes: 9 additions & 9 deletions docs/2cr/mrtydi.html
Original file line number Diff line number Diff line change
Expand Up @@ -746,7 +746,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ NQ</td>
<td>0.221</td>
<td>0.255</td>
<td>0.254</td>
<td>0.243</td>
<td>0.244</td>
<td>0.281</td>
Expand Down Expand Up @@ -1046,7 +1046,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO</td>
<td>0.441</td>
<td>0.417</td>
<td>0.397</td>
<td>0.327</td>
<td>0.275</td>
<td>0.352</td>
Expand All @@ -1057,7 +1057,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td>0.310</td>
<td>0.269</td>
<td></td>
<td>0.335</td>
<td>0.333</td>
</tr>
<tr class="hide-table-padding">
<td></td>
Expand Down Expand Up @@ -1346,7 +1346,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO, FT w/ all</td>
<td>0.695</td>
<td>0.643</td>
<td>0.623</td>
<td>0.492</td>
<td>0.559</td>
<td>0.578</td>
Expand All @@ -1357,7 +1357,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td>0.891</td>
<td>0.618</td>
<td></td>
<td>0.602</td>
<td>0.600</td>
</tr>
<tr class="hide-table-padding">
<td></td>
Expand Down Expand Up @@ -2251,7 +2251,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ NQ</td>
<td>0.600</td>
<td>0.716</td>
<td>0.707</td>
<td>0.689</td>
<td>0.640</td>
<td>0.691</td>
Expand All @@ -2262,7 +2262,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td>0.245</td>
<td>0.455</td>
<td></td>
<td>0.580</td>
<td>0.579</td>
</tr>
<tr class="hide-table-padding">
<td></td>
Expand Down Expand Up @@ -2551,7 +2551,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td class="expand-button"></td>
<td>mDPR (tied encoders), pre-FT w/ MS MARCO</td>
<td>0.797</td>
<td>0.820</td>
<td>0.784</td>
<td>0.754</td>
<td>0.647</td>
<td>0.736</td>
Expand All @@ -2562,7 +2562,7 @@ <h1 class="mb-3">Mr.TyDi</h1>
<td>0.782</td>
<td>0.595</td>
<td></td>
<td>0.714</td>
<td>0.711</td>
</tr>
<tr class="hide-table-padding">
<td></td>
Expand Down
129 changes: 127 additions & 2 deletions docs/2cr/msmarco-v1-passage.html
Original file line number Diff line number Diff line change
Expand Up @@ -5747,7 +5747,7 @@ <h1 class="mb-3">MS MARCO V1 Passage</h1>
<!-- Condition: OpenAI ada2: pre-encoded queries -->
<tr class="accordion-toggle collapsed" id="row50" data-toggle="collapse" data-parent="#row50" href="#collapse50">
<td class="expand-button"></td>
<td style="min-width: 85px"></td>
<td style="min-width: 85px">[11]</td>
<td style="min-width: 400px">OpenAI ada2: pre-encoded queries</td>
<td>0.4788</td>
<td>0.7035</td>
Expand Down Expand Up @@ -5856,7 +5856,7 @@ <h1 class="mb-3">MS MARCO V1 Passage</h1>
<!-- Condition: HyDE-OpenAI ada2: pre-encoded queries -->
<tr class="accordion-toggle collapsed" id="row51" data-toggle="collapse" data-parent="#row51" href="#collapse51">
<td class="expand-button"></td>
<td style="min-width: 85px"></td>
<td style="min-width: 85px">[12]</td>
<td style="min-width: 400px">HyDE-OpenAI ada2: pre-encoded queries</td>
<td>0.5125</td>
<td>0.7163</td>
Expand Down Expand Up @@ -5941,6 +5941,119 @@ <h1 class="mb-3">MS MARCO V1 Passage</h1>
</div>
<!-- Tabs content -->

</div></td>
</tr>
<tr><td style="border-bottom: 0"></td></tr>
<!-- Condition: cosDPR-distil: PyTorch -->
<tr class="accordion-toggle collapsed" id="row52" data-toggle="collapse" data-parent="#row52" href="#collapse52">
<td class="expand-button"></td>
<td style="min-width: 85px">[13]</td>
<td style="min-width: 400px">cosDPR-distil: PyTorch</td>
<td>0.4656</td>
<td>0.7250</td>
<td>0.8201</td>
<td></td>
<td>0.4876</td>
<td>0.7025</td>
<td>0.8533</td>
<td></td>
<td>0.3896</td>
<td>0.9796</td>
</tr>
<tr class="hide-table-padding">
<td></td>
<td colspan="11">
<div id="collapse52" class="collapse in p-3">

<!-- Tabs navs -->
<ul class="nav nav-tabs mb-3" id="row52-tabs" role="tablist">
<li class="nav-item" role="presentation">
<a class="nav-link active" id="row52-tab1-header" data-mdb-toggle="tab" href="#row52-tab1" role="tab" aria-controls="row52-tab1" aria-selected="true" style="text-transform:none">TREC 2019</a>
</li>
<li class="nav-item" role="presentation">
<a class="nav-link" id="row52-tab2-header" data-mdb-toggle="tab" href="#row52-tab2" role="tab" aria-controls="row52-tab2" aria-selected="false" style="text-transform:none">TREC 2020</a>
</li>
<li class="nav-item" role="presentation">
<a class="nav-link" id="row52-tab3-header" data-mdb-toggle="tab" href="#row52-tab3" role="tab" aria-controls="row52-tab3" aria-selected="false" style="text-transform:none">dev</a>
</li>
</ul>
<!-- Tabs navs -->

<!-- Tabs content -->
<div class="tab-content" id="row52-content">
<div class="tab-pane fade show active" id="row52-tab1" role="tabpanel" aria-labelledby="row52-tab1">
Command to generate run on TREC 2019 queries:

<blockquote class="mycode">
<pre><code>python -m pyserini.search.faiss \
--threads 16 --batch-size 512 \
--index msmarco-v1-passage.cosdpr-distil \
--topics dl19-passage \
--encoder castorini/cosdpr-distil \
--output run.msmarco-v1-passage.cosdpr-distil-pytorch.dl19.txt
</code></pre></blockquote>
Evaluation commands:

<blockquote class="mycode">
<pre><code>python -m pyserini.eval.trec_eval -c -l 2 -m map dl19-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl19.txt
python -m pyserini.eval.trec_eval -c -l 2 -m recall.1000 dl19-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl19.txt
</code></pre>
</blockquote>

</div>
<div class="tab-pane fade" id="row52-tab2" role="tabpanel" aria-labelledby="row52-tab2">
Command to generate run on TREC 2020 queries:

<blockquote class="mycode">
<pre><code>python -m pyserini.search.faiss \
--threads 16 --batch-size 512 \
--index msmarco-v1-passage.cosdpr-distil \
--topics dl20 \
--encoder castorini/cosdpr-distil \
--output run.msmarco-v1-passage.cosdpr-distil-pytorch.dl20.txt
</code></pre></blockquote>
Evaluation commands:

<blockquote class="mycode">
<pre><code>python -m pyserini.eval.trec_eval -c -l 2 -m map dl20-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl20-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl20.txt
python -m pyserini.eval.trec_eval -c -l 2 -m recall.1000 dl20-passage \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dl20.txt
</code></pre>
</blockquote>

</div>
<div class="tab-pane fade" id="row52-tab3" role="tabpanel" aria-labelledby="row52-tab3">
Command to generate run on dev queries:

<blockquote class="mycode">
<pre><code>python -m pyserini.search.faiss \
--threads 16 --batch-size 512 \
--index msmarco-v1-passage.cosdpr-distil \
--topics msmarco-passage-dev-subset \
--encoder castorini/cosdpr-distil \
--output run.msmarco-v1-passage.cosdpr-distil-pytorch.dev.txt
</code></pre></blockquote>
Evaluation commands:

<blockquote class="mycode">
<pre><code>python -m pyserini.eval.trec_eval -c -M 10 -m recip_rank msmarco-passage-dev-subset \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dev.txt
python -m pyserini.eval.trec_eval -c -m recall.1000 msmarco-passage-dev-subset \
run.msmarco-v1-passage.cosdpr-distil-pytorch.dev.txt
</code></pre>
</blockquote>

</div>
</div>
<!-- Tabs content -->

</div></td>
</tr>

Expand Down Expand Up @@ -5992,6 +6105,18 @@ <h1 class="mb-3">MS MARCO V1 Passage</h1>
<a href="https://aclanthology.org/D19-1410/">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.</a>
<i>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</i>, 2019.</p></li>

<li><p>[11] Jimmy Lin, Ronak Pradeep, Tommaso Teofili, and Jasper Xian.
<a href="https://arxiv.org/abs/2308.14963">Vector Search with OpenAI Embeddings: Lucene Is All You Need.</a>
<i>arXiv:2308.14963</i>, August 2023.</p></li>

<li><p>[12] Luyu Gao, Xueguang Ma, Jimmy Lin, and Jamie Callan.
<a href="https://aclanthology.org/2023.acl-long.99/">Precise Zero-Shot Dense Retrieval without Relevance Labels.</a>
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1762-1777, July 2023, Toronto, Canada.</p></li>

<li><p>[13] Xueguang Ma, Tommaso Teofili, and Jimmy Lin.
<a href="https://dl.acm.org/doi/10.1145/3583780.3615112">Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes.</a>
<i>Proceedings of the 32nd International Conference on Information and Knowledge Management (CIKM 2023)</i>, October 2023, pages 5366–5370, Birmingham, the United Kingdom.</p></li>

</ul>

<div style="padding-top: 20px"/>
Expand Down
Loading

0 comments on commit 170e271

Please sign in to comment.