You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Based in discussion #7395, we can see that it is missing the F1-score evaluator for Extractive QA algorithms in Haystack 2.0.
Describe the solution you'd like
As stated by @julian-risch
Calculating the F1 score is a bit more complicated than the exact match because the F1 score is token based. So the evaluator first needs to tokenize the predicted answer and the ground truth answers. Then it needs to calculate precision and recall based on those tokens and then calculate the harmonic mean of those to get the final F1 score.
Thus, this issue is for requesting the AnswerF1Evaluator class, similar to #7050 . My proposal is following the tradicional "formula" used in SQuaD dataset article, paper link. Here you can see a sample of computing script
Is your feature request related to a problem? Please describe.
Based in discussion #7395, we can see that it is missing the F1-score evaluator for Extractive QA algorithms in Haystack 2.0.
Describe the solution you'd like
As stated by @julian-risch
Thus, this issue is for requesting the
AnswerF1Evaluator
class, similar to #7050 . My proposal is following the tradicional "formula" used in SQuaD dataset article, paper link. Here you can see a sample of computing scriptThe text was updated successfully, but these errors were encountered: