You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and sorts the results according to this value. By doing this, QuestionAnsweringHead assumes that all the retrieved documents have been passed along as one dataset in order to calculate a global no_answer score.
FARMReader however passes each retrieved document separately to the QuestionAnsweringHead using the QAInferencer. That's why FARMReader has its own no_answer score calculation logic in
This logic differs from the one in QuestionAnsweringHead resulting in different rankings (e.g. no_answer jumps to the first place in FARMReader while it would be on the second in QuestionAnsweringHead or vice versa).
Whether the one or the other is better is hard to say. Some tests on SQuAD datasets however suggest that FARMReader's definition might have advantages over QuestionAnsweringHead's. Anyways this introduces an discrepancy between reader.eval() and pipeline.eval() as the former uses QuestionAnsweringHead's no_answer scores and the latter FARMReader's.
Additional context FARMReader's no_answer score logic is actually implemented in BaseReader which means it also affects TransformerReader.
Expected behavior
There's one consistent definition of no_answer scores.
Describe the bug
For calculating meaningful no_answer scores all answer candidates from all retrieved documents have to be considered.
QuestionAnsweringHead
calculates no_answer score and confidence inhaystack/haystack/modeling/model/prediction_head.py
Line 792 in 96a538b
QuestionAnsweringHead
assumes that all the retrieved documents have been passed along as one dataset in order to calculate a global no_answer score.FARMReader
however passes each retrieved document separately to theQuestionAnsweringHead
using theQAInferencer
. That's whyFARMReader
has its own no_answer score calculation logic inhaystack/haystack/nodes/reader/base.py
Line 53 in 96a538b
This logic differs from the one in
QuestionAnsweringHead
resulting in different rankings (e.g. no_answer jumps to the first place inFARMReader
while it would be on the second inQuestionAnsweringHead
or vice versa).Whether the one or the other is better is hard to say. Some tests on SQuAD datasets however suggest that
FARMReader
's definition might have advantages overQuestionAnsweringHead
's. Anyways this introduces an discrepancy betweenreader.eval()
andpipeline.eval()
as the former usesQuestionAnsweringHead
's no_answer scores and the latterFARMReader
's.Additional context
FARMReader
's no_answer score logic is actually implemented inBaseReader
which means it also affectsTransformerReader
.Expected behavior
There's one consistent definition of no_answer scores.
To Reproduce
E.g use script from #2216 (comment)
The text was updated successfully, but these errors were encountered: