Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding F1 evaluation metric #5

Open
atreyasha opened this issue Mar 18, 2022 · 0 comments
Open

Question regarding F1 evaluation metric #5

atreyasha opened this issue Mar 18, 2022 · 0 comments

Comments

@atreyasha
Copy link

atreyasha commented Mar 18, 2022

Hi @AbhilashaRavichander,

I would like to ask a question regarding the F1 evaluation metric used in your paper (similar to #3). The paper mentions that the "average of the maximum F1 from each n−1 subset" is used to calculate the F1 metric. I am slightly unsure as to how this works, but think it could mean the following:

  1. For each classification output, compare the predicted label against the labels from the annotators. Compute the maximum F1 per sample (which should be the same as accuracy), as shown in the example below:

    Sample Predicted Label Ann1 Ann2 Ann3 Maximum F1
    1 Relevant Irrelevant None Irrelevant 0
    2 Relevant Relevant Relevant Relevant 1
    3 Irrelevant None Irrelevant Relevant 1
  2. Take the average of all maximum F1 scores: (0 + 1 + 1)/3 = 2/3 =~ 0.67

Is my understanding of the evaluation metric correct?

Thank you for your time.

@atreyasha atreyasha changed the title Questions regarding F1 evaluation metric Question regarding F1 evaluation metric Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant