Skip to content

Commit

Permalink
Refactor Aggretriever 2CR regressions on MS MARCO (#1566)
Browse files Browse the repository at this point in the history
More consistent naming of indexes and conditions.
  • Loading branch information
lintool committed Jul 8, 2023
1 parent d57bf4b commit 760c22a
Show file tree
Hide file tree
Showing 11 changed files with 426 additions and 401 deletions.
222 changes: 114 additions & 108 deletions docs/2cr/msmarco-v1-doc.html

Large diffs are not rendered by default.

332 changes: 166 additions & 166 deletions docs/2cr/msmarco-v1-passage.html

Large diffs are not rendered by default.

50 changes: 28 additions & 22 deletions docs/2cr/msmarco-v2-doc.html
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-slim \
--index msmarco-v2-doc \
--topics dl21 \
--output run.msmarco-v2-doc.bm25-doc-default.dl21.txt \
--bm25
Expand All @@ -245,7 +245,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-slim \
--index msmarco-v2-doc \
--topics msmarco-v2-doc-dev \
--output run.msmarco-v2-doc.bm25-doc-default.dev.txt \
--bm25
Expand All @@ -265,7 +265,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-slim \
--index msmarco-v2-doc \
--topics msmarco-v2-doc-dev2 \
--output run.msmarco-v2-doc.bm25-doc-default.dev2.txt \
--bm25
Expand Down Expand Up @@ -328,7 +328,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-slim \
--index msmarco-v2-doc-segmented \
--topics dl21 \
--output run.msmarco-v2-doc.bm25-doc-segmented-default.dl21.txt \
--bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Expand All @@ -351,7 +351,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-slim \
--index msmarco-v2-doc-segmented \
--topics msmarco-v2-doc-dev \
--output run.msmarco-v2-doc.bm25-doc-segmented-default.dev.txt \
--bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Expand All @@ -371,7 +371,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-slim \
--index msmarco-v2-doc-segmented \
--topics msmarco-v2-doc-dev2 \
--output run.msmarco-v2-doc.bm25-doc-segmented-default.dev2.txt \
--bm25 --hits 10000 --max-passage-hits 1000 --max-passage
Expand Down Expand Up @@ -434,7 +434,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-full \
--index msmarco-v2-doc \
--topics dl21 \
--output run.msmarco-v2-doc.bm25-rm3-doc-default.dl21.txt \
--bm25 --rm3
Expand All @@ -457,7 +457,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-full \
--index msmarco-v2-doc \
--topics msmarco-v2-doc-dev \
--output run.msmarco-v2-doc.bm25-rm3-doc-default.dev.txt \
--bm25 --rm3
Expand All @@ -477,7 +477,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-full \
--index msmarco-v2-doc \
--topics msmarco-v2-doc-dev2 \
--output run.msmarco-v2-doc.bm25-rm3-doc-default.dev2.txt \
--bm25 --rm3
Expand Down Expand Up @@ -540,7 +540,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-full \
--index msmarco-v2-doc-segmented \
--topics dl21 \
--output run.msmarco-v2-doc.bm25-rm3-doc-segmented-default.dl21.txt \
--bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Expand All @@ -563,7 +563,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-full \
--index msmarco-v2-doc-segmented \
--topics msmarco-v2-doc-dev \
--output run.msmarco-v2-doc.bm25-rm3-doc-segmented-default.dev.txt \
--bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Expand All @@ -583,7 +583,7 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-full \
--index msmarco-v2-doc-segmented \
--topics msmarco-v2-doc-dev2 \
--output run.msmarco-v2-doc.bm25-rm3-doc-segmented-default.dev2.txt \
--bm25 --rm3 --hits 10000 --max-passage-hits 1000 --max-passage
Expand Down Expand Up @@ -1241,11 +1241,11 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
</div></td>
</tr>
<tr><td style="border-bottom: 0"></td></tr>
<!-- Condition: uniCOIL (noexp): on-the-fly query inference -->
<!-- Condition: uniCOIL (noexp): query inference with PyTorch -->
<tr class="accordion-toggle collapsed" id="row11" data-toggle="collapse" data-parent="#row11" href="#collapse11">
<td class="expand-button"></td>
<td></td>
<td style="min-width: 400px">uniCOIL (noexp): on-the-fly query inference</td>
<td style="min-width: 400px">uniCOIL (noexp): query inference with PyTorch</td>
<td>0.2589</td>
<td>0.6501</td>
<td>0.9282</td>
Expand Down Expand Up @@ -1286,7 +1286,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-noexp-0shot \
--topics dl21 --encoder castorini/unicoil-noexp-msmarco-passage \
--topics dl21 \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-noexp-otf.dl21.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand All @@ -1309,7 +1310,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-noexp-0shot \
--topics msmarco-v2-doc-dev --encoder castorini/unicoil-noexp-msmarco-passage \
--topics msmarco-v2-doc-dev \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-noexp-otf.dev.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand All @@ -1329,7 +1331,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-noexp-0shot \
--topics msmarco-v2-doc-dev2 --encoder castorini/unicoil-noexp-msmarco-passage \
--topics msmarco-v2-doc-dev2 \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-noexp-otf.dev2.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand All @@ -1347,11 +1350,11 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>

</div></td>
</tr>
<!-- Condition: uniCOIL (w/ doc2query-T5): on-the-fly query inference -->
<!-- Condition: uniCOIL (w/ doc2query-T5): query inference with PyTorch -->
<tr class="accordion-toggle collapsed" id="row12" data-toggle="collapse" data-parent="#row12" href="#collapse12">
<td class="expand-button"></td>
<td></td>
<td style="min-width: 400px">uniCOIL (w/ doc2query-T5): on-the-fly query inference</td>
<td style="min-width: 400px">uniCOIL (w/ doc2query-T5): query inference with PyTorch</td>
<td>0.2720</td>
<td>0.6782</td>
<td>0.9684</td>
Expand Down Expand Up @@ -1392,7 +1395,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-0shot \
--topics dl21 --encoder castorini/unicoil-msmarco-passage \
--topics dl21 \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-otf.dl21.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand All @@ -1415,7 +1419,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-0shot \
--topics msmarco-v2-doc-dev --encoder castorini/unicoil-msmarco-passage \
--topics msmarco-v2-doc-dev \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-otf.dev.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand All @@ -1435,7 +1440,8 @@ <h1 class="mb-3">MS MARCO V2 Document</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-doc-segmented-unicoil-0shot \
--topics msmarco-v2-doc-dev2 --encoder castorini/unicoil-msmarco-passage \
--topics msmarco-v2-doc-dev2 \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-doc.unicoil-otf.dev2.txt \
--impact --hits 10000 --max-passage-hits 1000 --max-passage
</code></pre></blockquote>
Expand Down
50 changes: 28 additions & 22 deletions docs/2cr/msmarco-v2-passage.html
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-slim \
--index msmarco-v2-passage \
--topics dl21 \
--output run.msmarco-v2-passage.bm25-default.dl21.txt \
--bm25
Expand All @@ -245,7 +245,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-slim \
--index msmarco-v2-passage \
--topics msmarco-v2-passage-dev \
--output run.msmarco-v2-passage.bm25-default.dev.txt \
--bm25
Expand All @@ -265,7 +265,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-slim \
--index msmarco-v2-passage \
--topics msmarco-v2-passage-dev2 \
--output run.msmarco-v2-passage.bm25-default.dev2.txt \
--bm25
Expand Down Expand Up @@ -328,7 +328,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-slim \
--index msmarco-v2-passage-augmented \
--topics dl21 \
--output run.msmarco-v2-passage.bm25-augmented-default.dl21.txt \
--bm25
Expand All @@ -351,7 +351,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-slim \
--index msmarco-v2-passage-augmented \
--topics msmarco-v2-passage-dev \
--output run.msmarco-v2-passage.bm25-augmented-default.dev.txt \
--bm25
Expand All @@ -371,7 +371,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-slim \
--index msmarco-v2-passage-augmented \
--topics msmarco-v2-passage-dev2 \
--output run.msmarco-v2-passage.bm25-augmented-default.dev2.txt \
--bm25
Expand Down Expand Up @@ -434,7 +434,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-full \
--index msmarco-v2-passage \
--topics dl21 \
--output run.msmarco-v2-passage.bm25-rm3-default.dl21.txt \
--bm25 --rm3
Expand All @@ -457,7 +457,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-full \
--index msmarco-v2-passage \
--topics msmarco-v2-passage-dev \
--output run.msmarco-v2-passage.bm25-rm3-default.dev.txt \
--bm25 --rm3
Expand All @@ -477,7 +477,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-full \
--index msmarco-v2-passage \
--topics msmarco-v2-passage-dev2 \
--output run.msmarco-v2-passage.bm25-rm3-default.dev2.txt \
--bm25 --rm3
Expand Down Expand Up @@ -540,7 +540,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-full \
--index msmarco-v2-passage-augmented \
--topics dl21 \
--output run.msmarco-v2-passage.bm25-rm3-augmented-default.dl21.txt \
--bm25 --rm3
Expand All @@ -563,7 +563,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-full \
--index msmarco-v2-passage-augmented \
--topics msmarco-v2-passage-dev \
--output run.msmarco-v2-passage.bm25-rm3-augmented-default.dev.txt \
--bm25 --rm3
Expand All @@ -583,7 +583,7 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<blockquote class="mycode">
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-augmented-full \
--index msmarco-v2-passage-augmented \
--topics msmarco-v2-passage-dev2 \
--output run.msmarco-v2-passage.bm25-rm3-augmented-default.dev2.txt \
--bm25 --rm3
Expand Down Expand Up @@ -1241,11 +1241,11 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
</div></td>
</tr>
<tr><td style="border-bottom: 0"></td></tr>
<!-- Condition: uniCOIL (noexp): on-the-fly query inference -->
<!-- Condition: uniCOIL (noexp): query inference with PyTorch -->
<tr class="accordion-toggle collapsed" id="row11" data-toggle="collapse" data-parent="#row11" href="#collapse11">
<td class="expand-button"></td>
<td></td>
<td style="min-width: 400px">uniCOIL (noexp): on-the-fly query inference</td>
<td style="min-width: 400px">uniCOIL (noexp): query inference with PyTorch</td>
<td>0.2194</td>
<td>0.5759</td>
<td>0.6991</td>
Expand Down Expand Up @@ -1286,7 +1286,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-noexp-0shot \
--topics dl21 --encoder castorini/unicoil-noexp-msmarco-passage \
--topics dl21 \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-noexp-otf.dl21.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand All @@ -1309,7 +1310,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-noexp-0shot \
--topics msmarco-v2-passage-dev --encoder castorini/unicoil-noexp-msmarco-passage \
--topics msmarco-v2-passage-dev \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-noexp-otf.dev.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand All @@ -1329,7 +1331,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-noexp-0shot \
--topics msmarco-v2-passage-dev2 --encoder castorini/unicoil-noexp-msmarco-passage \
--topics msmarco-v2-passage-dev2 \
--encoder castorini/unicoil-noexp-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-noexp-otf.dev2.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand All @@ -1347,11 +1350,11 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>

</div></td>
</tr>
<!-- Condition: uniCOIL (w/ doc2query-T5): on-the-fly query inference -->
<!-- Condition: uniCOIL (w/ doc2query-T5): query inference with PyTorch -->
<tr class="accordion-toggle collapsed" id="row12" data-toggle="collapse" data-parent="#row12" href="#collapse12">
<td class="expand-button"></td>
<td></td>
<td style="min-width: 400px">uniCOIL (w/ doc2query-T5): on-the-fly query inference</td>
<td style="min-width: 400px">uniCOIL (w/ doc2query-T5): query inference with PyTorch</td>
<td>0.2539</td>
<td>0.6160</td>
<td>0.7311</td>
Expand Down Expand Up @@ -1392,7 +1395,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-0shot \
--topics dl21 --encoder castorini/unicoil-msmarco-passage \
--topics dl21 \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-otf.dl21.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand All @@ -1415,7 +1419,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-0shot \
--topics msmarco-v2-passage-dev --encoder castorini/unicoil-msmarco-passage \
--topics msmarco-v2-passage-dev \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-otf.dev.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand All @@ -1435,7 +1440,8 @@ <h1 class="mb-3">MS MARCO V2 Passage</h1>
<pre><code>python -m pyserini.search.lucene \
--threads 16 --batch-size 128 \
--index msmarco-v2-passage-unicoil-0shot \
--topics msmarco-v2-passage-dev2 --encoder castorini/unicoil-msmarco-passage \
--topics msmarco-v2-passage-dev2 \
--encoder castorini/unicoil-msmarco-passage \
--output run.msmarco-v2-passage.unicoil-otf.dev2.txt \
--hits 1000 --impact
</code></pre></blockquote>
Expand Down
Loading

0 comments on commit 760c22a

Please sign in to comment.