Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between no_answer score definition in QuestionAnsweringHead and FARMReader #2410

Closed
tstadel opened this issue Apr 12, 2022 · 0 comments · Fixed by #2414
Closed
Assignees

Comments

@tstadel
Copy link
Member

tstadel commented Apr 12, 2022

Describe the bug
For calculating meaningful no_answer scores all answer candidates from all retrieved documents have to be considered.

QuestionAnsweringHead calculates no_answer score and confidence in

score=best_overall_positive_score - no_ans_gap,
and sorts the results according to this value. By doing this, QuestionAnsweringHead assumes that all the retrieved documents have been passed along as one dataset in order to calculate a global no_answer score.

FARMReader however passes each retrieved document separately to the QuestionAnsweringHead using the QAInferencer. That's why FARMReader has its own no_answer score calculation logic in

score=float(expit(np.asarray(no_ans_score) / 8))

This logic differs from the one in QuestionAnsweringHead resulting in different rankings (e.g. no_answer jumps to the first place in FARMReader while it would be on the second in QuestionAnsweringHead or vice versa).

Whether the one or the other is better is hard to say. Some tests on SQuAD datasets however suggest that FARMReader's definition might have advantages over QuestionAnsweringHead's. Anyways this introduces an discrepancy between reader.eval() and pipeline.eval() as the former uses QuestionAnsweringHead's no_answer scores and the latter FARMReader's.

Additional context
FARMReader's no_answer score logic is actually implemented in BaseReader which means it also affects TransformerReader.

Expected behavior
There's one consistent definition of no_answer scores.

To Reproduce
E.g use script from #2216 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant