Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add danFEVER #97

Merged
merged 9 commits into from
Jan 26, 2024
Merged

Add danFEVER #97

merged 9 commits into from
Jan 26, 2024

Conversation

KennethEnevoldsen
Copy link
Owner

@KennethEnevoldsen KennethEnevoldsen commented Jan 25, 2024

Added danfever as a retrieval dataset

fixes #93

for claim, evidence, label_id in zip(claims, evidences, labels):
claim_is_supported = class_labels[label_id] == "Supported"

sim = 1 if claim_is_supported else 0 # negative for refutes claims - is that what we want?
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Muennighoff - The DanFEVER dataset is similar to the FEVER dataset. Just wanted to make sure that this dataset is constructed fairly similar to FEVER.

I use the claim as the query to all the evidence segments as the corpus. The relevance score is then determined by whether the claim is supported.

However, I am unsure if assigning 0 to "not supported" and "not enough evidence" is meaningful.

What are your thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am unsure if assigning 0 to "not supported" and "not enough evidence" is meaningful.

If that's the same way it is done for FEVER, then I think it's okay!

Copy link
Owner Author

@KennethEnevoldsen KennethEnevoldsen Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure how it is done for FEVER (can I find the processing script somewhere?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what it says in BEIR, so it does seem like they everything that's not the evidence is a 0
FEVER [60] The Fact Extraction and VERification dataset is collected to facilitate the automatic fact checking. We utilize the original paper splits as queries Q and retrieve evidences from the pre-processed Wikipedia Abstracts (June 2017 dump) as our corpus T

Copy link
Collaborator

@x-tabdeveloping x-tabdeveloping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, feel free to merge

@KennethEnevoldsen KennethEnevoldsen merged commit 801753f into main Jan 26, 2024
4 of 6 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the add-danfever branch January 26, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DanFever, retrieval
3 participants