feat: add `AnswerF1Evaluator` for answer evaluation #7606

leomaurodesenv · 2024-04-27T01:56:02Z

Is your feature request related to a problem? Please describe.
Based in discussion #7395, we can see that it is missing the F1-score evaluator for Extractive QA algorithms in Haystack 2.0.

Describe the solution you'd like
As stated by @julian-risch

Calculating the F1 score is a bit more complicated than the exact match because the F1 score is token based. So the evaluator first needs to tokenize the predicted answer and the ground truth answers. Then it needs to calculate precision and recall based on those tokens and then calculate the harmonic mean of those to get the final F1 score.

Thus, this issue is for requesting the AnswerF1Evaluator class, similar to #7050 . My proposal is following the tradicional "formula" used in SQuaD dataset article, paper link. Here you can see a sample of computing script

https://github.com/huggingface/evaluate/blob/main/metrics/squad_v2/squad_v2.py

The text was updated successfully, but these errors were encountered:

masci added the topic:eval label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `AnswerF1Evaluator` for answer evaluation #7606

feat: add `AnswerF1Evaluator` for answer evaluation #7606

leomaurodesenv commented Apr 27, 2024

feat: add AnswerF1Evaluator for answer evaluation #7606

feat: add AnswerF1Evaluator for answer evaluation #7606

Comments

leomaurodesenv commented Apr 27, 2024

feat: add `AnswerF1Evaluator` for answer evaluation #7606

feat: add `AnswerF1Evaluator` for answer evaluation #7606