Lucene index (+docvectors) of the MS MARCO V1 document corpus, with doc2query-T5 expansions.
This index was generated on 2022/05/25 at Anserini commit 30c997
on damiano
with the following command:
target/appassembler/bin/IndexCollection -collection JsonCollection \
-generator DefaultLuceneDocumentGenerator -threads 7 \
-input /scratch2/collections/msmarco/msmarco-doc-docTTTTTquery/ \
-index indexes/lucene-index.msmarco-v1-doc-d2q-t5-docvectors.20220525.30c997/ \
-storeDocvectors -optimize
Note that this index stores term frequencies along with the docvectors: bag-of-words queries and relevance feedback are supported, but not phrase queries. The raw text is not stored.