Skip to content

Latest commit

 

History

History
16 lines (12 loc) · 825 Bytes

lucene-index.msmarco-v1-doc-d2q-t5-docvectors.20220525.30c997.README.md

File metadata and controls

16 lines (12 loc) · 825 Bytes

msmarco-v1-doc-d2q-t5-docvectors

Lucene index (+docvectors) of the MS MARCO V1 document corpus, with doc2query-T5 expansions.

This index was generated on 2022/05/25 at Anserini commit 30c997 on damiano with the following command:

target/appassembler/bin/IndexCollection -collection JsonCollection \
  -generator DefaultLuceneDocumentGenerator -threads 7 \
  -input /scratch2/collections/msmarco/msmarco-doc-docTTTTTquery/ \
  -index indexes/lucene-index.msmarco-v1-doc-d2q-t5-docvectors.20220525.30c997/ \
  -storeDocvectors -optimize

Note that this index stores term frequencies along with the docvectors: bag-of-words queries and relevance feedback are supported, but not phrase queries. The raw text is not stored.