Adding the ANTIQUE dataset to BEIR #130

heliah · 2023-03-12T23:07:27Z

Hi,

I am a creator of the ANTIQUE dataset -- a passage retrieval dataset for non-factoid questions answering. Please find the paper that explains the data here. I think it could be beneficial, and would like to add it to the BEIR benchmark. Please let me know if you need me to take any action.

Thank you,
Helia Hashemi (hhashemi@cs.umass.edu)

thakur-nandan · 2023-03-14T21:42:05Z

Hi @heliah,

Thank you for sharing the ANTIQUE dataset. Interestingly the dataset contains non-factoid questions, where retrieval models require to understand the meaning of the passages to judge their relevancy for a given query.

One question on the domain for the ANTIQUE dataset, this covers Wikipedia right?

Currently we are thinking of developing the next version of the BEIR benchmark and thinking of more diversity in terms of domains and tasks. We will reach out when we reach the dataset finalization stage.

Kind Regards,
Nandan Thakur

heliah · 2023-03-16T01:19:38Z

Thanks for your prompt reply! No actually, the collection and queries both come from CQA websites. So it contains diverse non-factoid questions with open-ended answers in informal language. There is a good mixture of queries from technical, to opinion-based, or even queries related to daily tasks. The dataset is unique in a sense that many queries and passages contain sarcasm, rhetorical questions, cultural context, etc. Generally speaking, most challenges arise in understanding (and ranking) day to day human interactions (and information needs). The relevance annotations are also collected through pooling (similar to TREC) so it's among the very few (if not the only) CQA datasets with "complete" relevance annotations. The relevance annotations are provided in four levels (from 1 to 4). It has both training and test splits. There are already a few CQA datasets in BEIR, but they either focus on a very specific domain (e.g., financial) or just focus on duplicate question retrieval. So I hope ANTIQUE introduces a novel angle to the great collection of datasets in BEIR. Please let me know if you have any other questions. Helia Hashemi

…

On Tue, Mar 14, 2023 at 2:42 PM Nandan Thakur ***@***.***> wrote: Hi @heliah <https://github.com/heliah>, Thank you for sharing the ANTIQUE dataset. Interestingly the dataset contains non-factoid questions, where retrieval models require to understand the meaning of the passages to judge their relevancy for a given query. One question on the domain for the ANTIQUE dataset, this covers Wikipedia right? Currently we are thinking of developing the next version of the BEIR benchmark and thinking of more diversity in terms of domains and tasks. We will reach out when we reach the dataset finalization stage. — Reply to this email directly, view it on GitHub <#130 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRLGVZI5QEC7MXWLYOYVJLW4DQ3TANCNFSM6AAAAAAVYMOJKU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

nreimers · 2023-03-16T08:22:39Z

Sounds interesting, would be nice to have it added to BEIR.

@heliah Do you have the files in the BEIR format?

heliah · 2023-05-07T03:36:30Z

Hi Nils, Sorry for the delayed response. Apparently, I missed your message. I converted the Antique dataset to the BEIR format. You can download beir-version.zip from https://ciir.cs.umass.edu/downloads/Antique/ Please let me know if you have any questions. Best, Helia

…

On Thu, Mar 16, 2023 at 4:22 AM Nils Reimers ***@***.***> wrote: Sounds interesting, would be nice to have it added to BEIR. @heliah <https://github.com/heliah> Do you have the files in the BEIR format? — Reply to this email directly, view it on GitHub <#130 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRLGV7KUQN7BLQ3NWKL5SDW4LEVTANCNFSM6AAAAAAVYMOJKU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the ANTIQUE dataset to BEIR #130

Adding the ANTIQUE dataset to BEIR #130

heliah commented Mar 12, 2023

thakur-nandan commented Mar 14, 2023 •

edited

Loading

heliah commented Mar 16, 2023 via email

nreimers commented Mar 16, 2023

heliah commented May 7, 2023 via email

Adding the ANTIQUE dataset to BEIR #130

Adding the ANTIQUE dataset to BEIR #130

Comments

heliah commented Mar 12, 2023

thakur-nandan commented Mar 14, 2023 • edited Loading

heliah commented Mar 16, 2023 via email

nreimers commented Mar 16, 2023

heliah commented May 7, 2023 via email

thakur-nandan commented Mar 14, 2023 •

edited

Loading