-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the ANTIQUE dataset to BEIR #130
Comments
Hi @heliah, Thank you for sharing the ANTIQUE dataset. Interestingly the dataset contains non-factoid questions, where retrieval models require to understand the meaning of the passages to judge their relevancy for a given query. One question on the domain for the ANTIQUE dataset, this covers Wikipedia right? Currently we are thinking of developing the next version of the BEIR benchmark and thinking of more diversity in terms of domains and tasks. We will reach out when we reach the dataset finalization stage. Kind Regards, |
Thanks for your prompt reply!
No actually, the collection and queries both come from CQA websites. So it
contains diverse non-factoid questions with open-ended answers in informal
language. There is a good mixture of queries from technical, to
opinion-based, or even queries related to daily tasks. The dataset is
unique in a sense that many queries and passages contain sarcasm,
rhetorical questions, cultural context, etc. Generally speaking,
most challenges arise in understanding (and ranking) day to day human
interactions (and information needs).
The relevance annotations are also collected through pooling (similar to
TREC) so it's among the very few (if not the only) CQA datasets with
"complete" relevance annotations. The relevance annotations are provided in
four levels (from 1 to 4). It has both training and test splits.
There are already a few CQA datasets in BEIR, but they either focus on a
very specific domain (e.g., financial) or just focus on duplicate question
retrieval. So I hope ANTIQUE introduces a novel angle to the great
collection of datasets in BEIR.
Please let me know if you have any other questions.
Helia Hashemi
…On Tue, Mar 14, 2023 at 2:42 PM Nandan Thakur ***@***.***> wrote:
Hi @heliah <https://github.com/heliah>,
Thank you for sharing the ANTIQUE dataset. Interestingly the dataset
contains non-factoid questions, where retrieval models require to
understand the meaning of the passages to judge their relevancy for a given
query.
One question on the domain for the ANTIQUE dataset, this covers Wikipedia
right?
Currently we are thinking of developing the next version of the BEIR
benchmark and thinking of more diversity in terms of domains and tasks. We
will reach out when we reach the dataset finalization stage.
—
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRLGVZI5QEC7MXWLYOYVJLW4DQ3TANCNFSM6AAAAAAVYMOJKU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sounds interesting, would be nice to have it added to BEIR. @heliah Do you have the files in the BEIR format? |
Hi Nils,
Sorry for the delayed response. Apparently, I missed your message. I
converted the Antique dataset to the BEIR format. You can download
beir-version.zip from https://ciir.cs.umass.edu/downloads/Antique/
Please let me know if you have any questions.
Best,
Helia
…On Thu, Mar 16, 2023 at 4:22 AM Nils Reimers ***@***.***> wrote:
Sounds interesting, would be nice to have it added to BEIR.
@heliah <https://github.com/heliah> Do you have the files in the BEIR
format?
—
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADRLGV7KUQN7BLQ3NWKL5SDW4LEVTANCNFSM6AAAAAAVYMOJKU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi,
I am a creator of the ANTIQUE dataset -- a passage retrieval dataset for non-factoid questions answering. Please find the paper that explains the data here. I think it could be beneficial, and would like to add it to the BEIR benchmark. Please let me know if you need me to take any action.
Thank you,
Helia Hashemi (hhashemi@cs.umass.edu)
The text was updated successfully, but these errors were encountered: