the number of queries in MSMARCO #95

mjeensung · 2022-05-01T01:19:05Z

Hi,

I have a question regarding the number of queries in MSMARCO.
According to the paper and the readme, the number of test queries in MSMARCO is 6,980.

However, when I ran the following codes, I was only able to get 43 queries.

>> corpus, queries, qrels = GenericDataLoader(data_folder='msmarco').load(split="test")
>> print(len(queries))
43

Instead, I got 6,980 queries from the dev set.
Should I use the dev queries when evaluating MSMARCO instead of the test queries?

Thanks!

The text was updated successfully, but these errors were encountered:

cadurosar · 2022-05-01T06:10:01Z

Hi,

The split you are looking for is the "dev" split (so split="dev"). BEIR considers MSMARCO test to be one of the TREC-DL competitions.

mjeensung · 2022-05-02T08:09:27Z

Thanks! It's clear now.

mjeensung closed this as completed May 2, 2022

Muennighoff mentioned this issue Aug 4, 2022

No test split for Mind Small embeddings-benchmark/mteb#28

Closed

Provide feedback