Skip to content

Latest commit

 

History

History
1044 lines (1021 loc) · 66.5 KB

prebuilt-indexes.md

File metadata and controls

1044 lines (1021 loc) · 66.5 KB

Pyserini: Prebuilt Indexes

Pyserini provides a number of pre-built Lucene indexes. To list what's available in code:

from pyserini.search.lucene import LuceneSearcher
LuceneSearcher.list_prebuilt_indexes()

from pyserini.index.lucene import IndexReader
IndexReader.list_prebuilt_indexes()

It's easy initialize a searcher from a pre-built index:

searcher = LuceneSearcher.from_prebuilt_index('robust04')

You can use this simple Python one-liner to download the pre-built index:

python -c "from pyserini.search.lucene import LuceneSearcher; LuceneSearcher.from_prebuilt_index('robust04')"

The downloaded index will be in ~/.cache/pyserini/indexes/.

It's similarly easy initialize an index reader from a pre-built index:

index_reader = IndexReader.from_prebuilt_index('robust04')
index_reader.stats()

The output will be:

{'total_terms': 174540872, 'documents': 528030, 'non_empty_documents': 528030, 'unique_terms': 923436}

Note that unless the underlying index was built with the -optimize option (i.e., merging all index segments into a single segment), unique_terms will show -1. Nope, that's not a bug.

Below is a summary of the pre-built indexes that are currently available. Detailed configuration information for the pre-built indexes are stored in pyserini/prebuilt_index_info.py.

Standard Lucene Indexes

cacm
Lucene index of the CACM corpus
robust04 [readme]
Lucene index of TREC Disks 4 & 5 (minus Congressional Records), used in the TREC 2004 Robust Track
msmarco-passage-ltr [readme]
Lucene index of the MS MARCO passage corpus with four extra preprocessed fields for LTR
msmarco-doc-per-passage-ltr
Lucene index of the MS MARCO document per-passage corpus with four extra preprocessed fields for LTR
msmarco-document-segment-ltr
Lucene index of the MS MARCO document segmented corpus with four extra preprocessed fields for LTR
msmarco-v1-doc [readme]
Lucene index of the MS MARCO V1 document corpus.
msmarco-v1-doc-slim [readme]
Lucene index of the MS MARCO V1 document corpus ('slim' version).
msmarco-v1-doc-full [readme]
Lucene index of the MS MARCO V1 document corpus ('full' version).
msmarco-v1-doc-d2q-t5 [readme]
Lucene index of the MS MARCO V1 document corpus with doc2query-T5 expansions.
msmarco-v1-doc-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V1 document corpus with doc2query-T5 expansions.
msmarco-v1-doc-segmented [readme]
Lucene index of the MS MARCO V1 segmented document corpus.
msmarco-v1-doc-segmented-slim [readme]
Lucene index of the MS MARCO V1 segmented document corpus ('slim' version).
msmarco-v1-doc-segmented-full [readme]
Lucene index of the MS MARCO V1 segmented document corpus ('full' version).
msmarco-v1-doc-segmented-d2q-t5 [readme]
Lucene index of the MS MARCO V1 segmented document corpus with doc2query-T5 expansions.
msmarco-v1-doc-segmented-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V1 segmented document corpus with doc2query-T5 expansions.
msmarco-v1-passage [readme]
Lucene index of the MS MARCO V1 passage corpus.
msmarco-v1-passage-slim [readme]
Lucene index of the MS MARCO V1 passage corpus ('slim' version).
msmarco-v1-passage-full [readme]
Lucene index of the MS MARCO V1 passage corpus ('full' version).
msmarco-v1-passage-d2q-t5 [readme]
Lucene index of the MS MARCO V1 passage corpus with doc2query-T5 expansions.
msmarco-v1-passage-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V1 passage corpus with doc2query-T5 expansions.
msmarco-v2-doc [readme]
Lucene index of the MS MARCO V2 document corpus.
msmarco-v2-doc-slim [readme]
Lucene index of the MS MARCO V2 document corpus ('slim' version).
msmarco-v2-doc-full [readme]
Lucene index of the MS MARCO V2 document corpus ('full' version).
msmarco-v2-doc-d2q-t5 [readme]
Lucene index of the MS MARCO V2 document corpus with doc2query-T5 expansions.
msmarco-v2-doc-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V2 document corpus with doc2query-T5 expansions.
msmarco-v2-doc-segmented [readme]
Lucene index of the MS MARCO V2 segmented document corpus.
msmarco-v2-doc-segmented-slim [readme]
Lucene index of the MS MARCO V2 segmented document corpus ('slim' version).
msmarco-v2-doc-segmented-full [readme]
Lucene index of the MS MARCO V2 segmented document corpus ('full' version).
msmarco-v2-doc-segmented-d2q-t5 [readme]
Lucene index of the MS MARCO V2 segmented document corpus with doc2query-T5 expansions.
msmarco-v2-doc-segmented-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V2 segmented document corpus with doc2query-T5 expansions.
msmarco-v2-passage [readme]
Lucene index of the MS MARCO V2 passage corpus.
msmarco-v2-passage-slim [readme]
Lucene index of the MS MARCO V2 passage corpus ('slim' version).
msmarco-v2-passage-full [readme]
Lucene index of the MS MARCO V2 passage corpus ('full' version).
msmarco-v2-passage-d2q-t5 [readme]
Lucene index of the MS MARCO V2 passage corpus with doc2query-T5 expansions.
msmarco-v2-passage-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V2 passage corpus with doc2query-T5 expansions.
msmarco-v2-passage-augmented [readme]
Lucene index of the MS MARCO V2 augmented passage corpus.
msmarco-v2-passage-augmented-slim [readme]
Lucene index of the MS MARCO V2 augmented passage corpus ('slim' version).
msmarco-v2-passage-augmented-full [readme]
Lucene index of the MS MARCO V2 augmented passage corpus ('full' version).
msmarco-v2-passage-augmented-d2q-t5 [readme]
Lucene index of the MS MARCO V2 augmented passage corpus with doc2query-T5 expansions.
msmarco-v2-passage-augmented-d2q-t5-docvectors [readme]
Lucene index (+docvectors) of the MS MARCO V2 augmented passage corpus with doc2query-T5 expansions.
enwiki-paragraphs
Lucene index of English Wikipedia for BERTserini
zhwiki-paragraphs
Lucene index of Chinese Wikipedia for BERTserini
trec-covid-r5-abstract
Lucene index for TREC-COVID Round 5: abstract index
trec-covid-r5-full-text
Lucene index for TREC-COVID Round 5: full-text index
trec-covid-r5-paragraph
Lucene index for TREC-COVID Round 5: paragraph index
trec-covid-r4-abstract
Lucene index for TREC-COVID Round 4: abstract index
trec-covid-r4-full-text
Lucene index for TREC-COVID Round 4: full-text index
trec-covid-r4-paragraph
Lucene index for TREC-COVID Round 4: paragraph index
trec-covid-r3-abstract
Lucene index for TREC-COVID Round 3: abstract index
trec-covid-r3-full-text
Lucene index for TREC-COVID Round 3: full-text index
trec-covid-r3-paragraph
Lucene index for TREC-COVID Round 3: paragraph index
trec-covid-r2-abstract
Lucene index for TREC-COVID Round 2: abstract index
trec-covid-r2-full-text
Lucene index for TREC-COVID Round 2: full-text index
trec-covid-r2-paragraph
Lucene index for TREC-COVID Round 2: paragraph index
trec-covid-r1-abstract
Lucene index for TREC-COVID Round 1: abstract index
trec-covid-r1-full-text
Lucene index for TREC-COVID Round 1: full-text index
trec-covid-r1-paragraph
Lucene index for TREC-COVID Round 1: paragraph index
cast2019
Lucene index for TREC 2019 CaST
wikipedia-dpr [readme]
Lucene index of Wikipedia with DPR 100-word splits
wikipedia-dpr-slim [readme]
Lucene index of Wikipedia with DPR 100-word splits (slim version, document text not stored)
wikipedia-kilt-doc [readme]
Lucene index of Wikipedia snapshot used as KILT's knowledge source.
mrtydi-v1.1-arabic [readme]
Lucene index for Mr.TyDi v1.1 (Arabic).
mrtydi-v1.1-bengali [readme]
Lucene index for Mr.TyDi v1.1 (Bengali).
mrtydi-v1.1-english [readme]
Lucene index for Mr.TyDi v1.1 (English).
mrtydi-v1.1-finnish [readme]
Lucene index for Mr.TyDi v1.1 (Finnish).
mrtydi-v1.1-indonesian [readme]
Lucene index for Mr.TyDi v1.1 (Indonesian).
mrtydi-v1.1-japanese [readme]
Lucene index for Mr.TyDi v1.1 (Japanese).
mrtydi-v1.1-korean [readme]
Lucene index for Mr.TyDi v1.1 (Korean).
mrtydi-v1.1-russian [readme]
Lucene index for Mr.TyDi v1.1 (Russian).
mrtydi-v1.1-swahili [readme]
Lucene index for Mr.TyDi v1.1 (Swahili).
mrtydi-v1.1-telugu [readme]
Lucene index for Mr.TyDi v1.1 (Telugu).
mrtydi-v1.1-thai [readme]
Lucene index for Mr.TyDi v1.1 (Thai).
msmarco-passage [readme]
Lucene index of the MS MARCO passage corpus (deprecated; use msmarco-v1-passage instead).
msmarco-passage-slim [readme]
Lucene index of the MS MARCO passage corpus (slim version, document text not stored) (deprecated; use msmarco-v1-passage-slim instead).
msmarco-doc [readme]
Lucene index of the MS MARCO document corpus (deprecated; use msmarco-v1-doc instead).
msmarco-doc-slim [readme]
Lucene index of the MS MARCO document corpus (slim version, document text not stored) (deprecated; use msmarco-v1-doc-slim instead).
msmarco-doc-per-passage [readme]
Lucene index of the MS MARCO document corpus segmented into passages (deprecated; use msmarco-v1-doc-segmented instead).
msmarco-doc-per-passage-slim [readme]
Lucene index of the MS MARCO document corpus segmented into passages (slim version, document text not stored) (deprecated; use msmarco-v1-doc-segmented-slim instead).
msmarco-passage-expanded [readme]
Lucene index of the MS MARCO passage corpus with docTTTTTquery expansions (deprecated; use msmarco-v1-passage-d2q-t5 instead)
msmarco-doc-expanded-per-doc [readme]
Lucene index of the MS MARCO document corpus with per-doc docTTTTTquery expansions (deprecated; use msmarco-v1-doc-d2q-t5 instead)
msmarco-doc-expanded-per-passage [readme]
Lucene index of the MS MARCO document corpus with per-passage docTTTTTquery expansions (deprecated; use msmarco-v1-doc-segmented-d2q-t5 instead)
beir-v1.0.0-trec-covid-flat [readme]
Lucene flat index of BEIR (v1.0.0): TREC-COVID
beir-v1.0.0-bioasq-flat [readme]
Lucene flat index of BEIR (v1.0.0): BioASQ
beir-v1.0.0-nfcorpus-flat [readme]
Lucene flat index of BEIR (v1.0.0): NFCorpus
beir-v1.0.0-nq-flat [readme]
Lucene flat index of BEIR (v1.0.0): NQ
beir-v1.0.0-hotpotqa-flat [readme]
Lucene flat index of BEIR (v1.0.0): HotpotQA
beir-v1.0.0-fiqa-flat [readme]
Lucene flat index of BEIR (v1.0.0): FiQA-2018
beir-v1.0.0-signal1m-flat [readme]
Lucene flat index of BEIR (v1.0.0): Signal-1M
beir-v1.0.0-trec-news-flat [readme]
Lucene flat index of BEIR (v1.0.0): TREC-NEWS
beir-v1.0.0-robust04-flat [readme]
Lucene flat index of BEIR (v1.0.0): Robust04
beir-v1.0.0-arguana-flat [readme]
Lucene flat index of BEIR (v1.0.0): ArguAna
beir-v1.0.0-webis-touche2020-flat [readme]
Lucene flat index of BEIR (v1.0.0): Webis-Touche2020
beir-v1.0.0-cqadupstack-android-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-android
beir-v1.0.0-cqadupstack-english-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-english
beir-v1.0.0-cqadupstack-gaming-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-gaming
beir-v1.0.0-cqadupstack-gis-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-gis
beir-v1.0.0-cqadupstack-mathematica-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-mathematica
beir-v1.0.0-cqadupstack-physics-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-physics
beir-v1.0.0-cqadupstack-programmers-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-programmers
beir-v1.0.0-cqadupstack-stats-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-stats
beir-v1.0.0-cqadupstack-tex-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-tex
beir-v1.0.0-cqadupstack-unix-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-unix
beir-v1.0.0-cqadupstack-webmasters-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-webmasters
beir-v1.0.0-cqadupstack-wordpress-flat [readme]
Lucene flat index of BEIR (v1.0.0): CQADupStack-wordpress
beir-v1.0.0-quora-flat [readme]
Lucene flat index of BEIR (v1.0.0): Quora
beir-v1.0.0-dbpedia-entity-flat [readme]
Lucene flat index of BEIR (v1.0.0): DBPedia
beir-v1.0.0-scidocs-flat [readme]
Lucene flat index of BEIR (v1.0.0): SCIDOCS
beir-v1.0.0-fever-flat [readme]
Lucene flat index of BEIR (v1.0.0): FEVER
beir-v1.0.0-climate-fever-flat [readme]
Lucene flat index of BEIR (v1.0.0): Climate-FEVER
beir-v1.0.0-scifact-flat [readme]
Lucene flat index of BEIR (v1.0.0): SciFact
beir-v1.0.0-trec-covid-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): TREC-COVID
beir-v1.0.0-bioasq-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): BioASQ
beir-v1.0.0-nfcorpus-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): NFCorpus
beir-v1.0.0-nq-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): NQ
beir-v1.0.0-hotpotqa-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): HotpotQA
beir-v1.0.0-fiqa-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): FiQA-2018
beir-v1.0.0-signal1m-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): Signal-1M
beir-v1.0.0-trec-news-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): TREC-NEWS
beir-v1.0.0-robust04-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): Robust04
beir-v1.0.0-arguana-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): ArguAna
beir-v1.0.0-webis-touche2020-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): Webis-Touche2020
beir-v1.0.0-cqadupstack-android-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-android
beir-v1.0.0-cqadupstack-english-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-english
beir-v1.0.0-cqadupstack-gaming-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-gaming
beir-v1.0.0-cqadupstack-gis-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-gis
beir-v1.0.0-cqadupstack-mathematica-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-mathematica
beir-v1.0.0-cqadupstack-physics-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-physics
beir-v1.0.0-cqadupstack-programmers-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-programmers
beir-v1.0.0-cqadupstack-stats-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-stats
beir-v1.0.0-cqadupstack-tex-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-tex
beir-v1.0.0-cqadupstack-unix-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-unix
beir-v1.0.0-cqadupstack-webmasters-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-webmasters
beir-v1.0.0-cqadupstack-wordpress-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): CQADupStack-wordpress
beir-v1.0.0-quora-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): Quora
beir-v1.0.0-dbpedia-entity-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): DBPedia
beir-v1.0.0-scidocs-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): SCIDOCS
beir-v1.0.0-fever-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): FEVER
beir-v1.0.0-climate-fever-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): Climate-FEVER
beir-v1.0.0-scifact-multifield [readme]
Lucene multifield index of BEIR (v1.0.0): SciFact
hc4-v1.0-zh [readme]
Lucene index for HC4 v1.0 (Chinese).
hc4-v1.0-fa [readme]
Lucene index for HC4 v1.0 (Persian).
hc4-v1.0-ru [readme]
Lucene index for HC4 v1.0 (Russian).
neuclir22-zh [readme]
Lucene index for NeuClir '22 (Persian).
neuclir22-fa [readme]
Lucene index for NeuClir '22 (Persian).
neuclir22-ru [readme]
Lucene index for NeuClir '22 (Russian).

Lucene Impact Indexes

msmarco-v1-passage-unicoil [readme]
Lucene impact index of the MS MARCO V1 passage corpus for uniCOIL.
msmarco-v1-passage-unicoil-noexp [readme]
Lucene impact index of the MS MARCO V1 passage corpus for uniCOIL (noexp).
msmarco-v1-doc-segmented-unicoil [readme]
Lucene impact index of the MS MARCO V1 segmented document corpus for uniCOIL.
msmarco-v1-doc-segmented-unicoil-noexp [readme]
Lucene impact index of the MS MARCO V1 segmented document corpus for uniCOIL (noexp) with title prepended.
msmarco-v2-passage-unicoil-0shot [readme]
Lucene impact index of the MS MARCO V2 passage corpus for uniCOIL.
msmarco-v2-passage-unicoil-noexp-0shot [readme]
Lucene impact index of the MS MARCO V2 passage corpus for uniCOIL (noexp).
msmarco-v2-doc-segmented-unicoil-0shot [readme]
Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL.
msmarco-v2-doc-segmented-unicoil-0shot-v2 [readme]
Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL, with title prepended.
msmarco-v2-doc-segmented-unicoil-noexp-0shot [readme]
Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL (noexp).
msmarco-v2-doc-segmented-unicoil-noexp-0shot-v2 [readme]
Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL (noexp) with title prepended
msmarco-passage-deepimpact [readme]
Lucene impact index of the MS MARCO passage corpus encoded by DeepImpact
msmarco-passage-unicoil-tilde [readme]
Lucene impact index of the MS MARCO passage corpus encoded by uniCOIL-TILDE
msmarco-passage-distill-splade-max [readme]
Lucene impact index of the MS MARCO passage corpus encoded by distill-splade-max
msmarco-v2-passage-unicoil-tilde [readme]
Lucene impact index of the MS MARCO V2 passage corpus encoded by uniCOIL-TILDE
msmarco-passage-unicoil-d2q [readme]
Lucene impact index of the MS MARCO passage corpus encoded by uniCOIL-d2q (deprecated; use msmarco-v1-passage-unicoil instead).
msmarco-doc-per-passage-unicoil-d2q [readme]
Lucene impact index of the MS MARCO doc corpus per passage expansion encoded by uniCOIL-d2q (deprecated; use msmarco-v1-doc-segmented-unicoil instead).
msmarco-v2-passage-unicoil-noexp-0shot-deprecated [readme]
Lucene impact index of the MS MARCO V2 passage corpus encoded by uniCOIL (zero-shot, no expansions) (deprecated; use msmarco-v2-passage-unicoil-noexp-0shot instead).
msmarco-v2-doc-per-passage-unicoil-noexp-0shot [readme]
Lucene impact index of the MS MARCO V2 document corpus per passage encoded by uniCOIL (zero-shot, no expansions) (deprecated; msmarco-v2-doc-segmented-unicoil-noexp-0shot).
beir-v1.0.0-trec-covid-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): TREC-COVID encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-bioasq-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): BioASQ encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-nfcorpus-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): NFCorpus encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-nq-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): NQ encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-hotpotqa-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): HotpotQA encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-fiqa-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): FiQA-2018 encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-signal1m-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): Signal-1M encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-trec-news-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): TREC-NEWS encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-robust04-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): Robust04 encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-arguana-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): ArguAna encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-webis-touche2020-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): Webis-Touche2020 encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-android-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-android encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-english-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-english encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-gaming-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-gaming encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-gis-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-gis encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-mathematica-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-mathematica encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-physics-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-physics encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-programmers-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-programmers encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-stats-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-stats encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-tex-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-tex encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-unix-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-unix encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-webmasters-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-webmasters encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-cqadupstack-wordpress-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): CQADupStack-wordpress encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-quora-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): Quora encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-dbpedia-entity-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): DBPedia encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-scidocs-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): SCIDOCS encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-fever-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): FEVER encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-climate-fever-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): Climate-FEVER encoded by SPLADE-distill CoCodenser-medium
beir-v1.0.0-scifact-splade_distil_cocodenser_medium [readme]
Lucene impact index of BEIR (v1.0.0): SciFact encoded by SPLADE-distill CoCodenser-medium

Faiss Indexes

msmarco-passage-tct_colbert-hnsw
Faiss HNSW index of the MS MARCO passage corpus encoded by TCT-ColBERT
msmarco-passage-tct_colbert-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by TCT-ColBERT
msmarco-doc-tct_colbert-bf
Faiss FlatIP index of the MS MARCO document corpus encoded by TCT-ColBERT
msmarco-doc-tct_colbert-v2-hnp-bf
Faiss FlatIP index of the MS MARCO document corpus encoded by TCT-ColBERT-V2-HNP
wikipedia-dpr-multi-bf
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on multiple QA datasets
wikipedia-dpr-single-nq-bf
Faiss FlatIP index of Wikipedia encoded by the DPR doc encoder trained on NQ
wikipedia-bpr-single-nq-hash
Faiss binary index of Wikipedia encoded by the BPR doc encoder trained on NQ
msmarco-passage-ance-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the ANCE MS MARCO passage encoder
msmarco-doc-ance-maxp-bf
Faiss FlatIP index of the MS MARCO document corpus encoded by the ANCE MaxP encoder
wikipedia-ance-multi-bf
Faiss FlatIP index of Wikipedia encoded by the ANCE-multi encoder
msmarco-passage-sbert-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the SBERT MS MARCO passage encoder
msmarco-passage-distilbert-dot-margin_mse-T2-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the distilbert-dot-margin_mse-T2-msmarco passage encoder
msmarco-passage-distilbert-dot-tas_b-b256-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by msmarco-passage-distilbert-dot-tas_b-b256 passage encoder
msmarco-passage-tct_colbert-v2-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the tct_colbert-v2 passage encoder
msmarco-passage-tct_colbert-v2-hn-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the tct_colbert-v2-hn passage encoder
msmarco-passage-tct_colbert-v2-hnp-bf
Faiss FlatIP index of the MS MARCO passage corpus encoded by the tct_colbert-v2-hnp passage encoder
cast2019-tct_colbert-v2-hnsw [readme]
Faiss HNSW index of the CAsT2019 passage corpus encoded by the tct_colbert-v2 passage encoder
mrtydi-v1.1-arabic-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-bengali-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Bengali) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-english-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (English) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-finnish-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Finnish) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-indonesian-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Indonesian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-japanese-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Japanese) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-korean-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Korean) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-russian-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Russian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-swahili-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Swahili) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-telugu-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Telugu) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-thai-mdpr-nq [readme]
Faiss index for Mr.TyDi v1.1 (Thai) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
wikipedia-dpr-dkrr-nq
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on NQ
wikipedia-dpr-dkrr-tqa
Faiss FlatIP index of Wikipedia DPR encoded by the retriever model from 'Distilling Knowledge from Reader to Retriever for Question Answering' trained on TriviaQA
mrtydi-v1.1-arabic-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-bengali-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Bengali) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-english-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (English) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-finnish-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Finnish) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-indonesian-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Indonesian) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-japanese-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Japanese) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-korean-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Korean) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-russian-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Russian) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-swahili-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Swahili) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-telugu-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Telugu) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-thai-mdpr-tied-pft-msmarco [readme]
Faiss index for Mr.TyDi v1.1 (Thai) corpus encoded by mDPR passage encoder pre-fine-tuned on MS MARCO.
mrtydi-v1.1-arabic-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-bengali-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Bengali) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-english-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (English) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-finnish-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Finnish) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-indonesian-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Indonesian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-japanese-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Japanese) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-korean-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Korean) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-russian-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Russian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-swahili-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Swahili) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-telugu-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Telugu) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-thai-mdpr-tied-pft-nq [readme]
Faiss index for Mr.TyDi v1.1 (Thai) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-arabic-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Arabic) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-bengali-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Bengali) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-english-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (English) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-finnish-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Finnish) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-indonesian-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Indonesian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-japanese-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Japanese) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-korean-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Korean) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-russian-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Russian) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-swahili-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Swahili) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-telugu-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Telugu) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.
mrtydi-v1.1-thai-mdpr-tied-pft-msmarco-ft-all [readme]
Faiss index for Mr.TyDi v1.1 (Thai) corpus encoded by mDPR passage encoder pre-fine-tuned on NQ.