Discrepancy between no_answer score definition in `QuestionAnsweringHead` and `FARMReader` #2410

tstadel · 2022-04-12T16:03:57Z

Describe the bug
For calculating meaningful no_answer scores all answer candidates from all retrieved documents have to be considered.

QuestionAnsweringHead calculates no_answer score and confidence in

haystack/haystack/modeling/model/prediction_head.py

Line 792 in 96a538b

score=best_overall_positive_score - no_ans_gap,

and sorts the results according to this value. By doing this, QuestionAnsweringHead assumes that all the retrieved documents have been passed along as one dataset in order to calculate a global no_answer score.

FARMReader however passes each retrieved document separately to the QuestionAnsweringHead using the QAInferencer. That's why FARMReader has its own no_answer score calculation logic in

haystack/haystack/nodes/reader/base.py

Line 53 in 96a538b

score=float(expit(np.asarray(no_ans_score) / 8))

This logic differs from the one in QuestionAnsweringHead resulting in different rankings (e.g. no_answer jumps to the first place in FARMReader while it would be on the second in QuestionAnsweringHead or vice versa).

Whether the one or the other is better is hard to say. Some tests on SQuAD datasets however suggest that FARMReader's definition might have advantages over QuestionAnsweringHead's. Anyways this introduces an discrepancy between reader.eval() and pipeline.eval() as the former uses QuestionAnsweringHead's no_answer scores and the latter FARMReader's.

Additional context
FARMReader's no_answer score logic is actually implemented in BaseReader which means it also affects TransformerReader.

Expected behavior
There's one consistent definition of no_answer scores.

To Reproduce
E.g use script from #2216 (comment)

The text was updated successfully, but these errors were encountered:

tstadel added type:bug Something isn't working topic:eval topic:pipeline topic:reader journey:intermediate labels Apr 12, 2022

tstadel mentioned this issue Apr 13, 2022

Match answer sorting in QuestionAnsweringHead with FARMReader #2414

Merged

4 tasks

tstadel self-assigned this Apr 13, 2022

tstadel closed this as completed in #2414 Apr 21, 2022

sjrl mentioned this issue Jul 20, 2022

Reproducing deepset/roberta-base-squad2 metrics on HuggingFace #2853

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between no_answer score definition in `QuestionAnsweringHead` and `FARMReader` #2410

Discrepancy between no_answer score definition in `QuestionAnsweringHead` and `FARMReader` #2410

tstadel commented Apr 12, 2022 •

edited

Discrepancy between no_answer score definition in QuestionAnsweringHead and FARMReader #2410

Discrepancy between no_answer score definition in QuestionAnsweringHead and FARMReader #2410

Comments

tstadel commented Apr 12, 2022 • edited

Discrepancy between no_answer score definition in `QuestionAnsweringHead` and `FARMReader` #2410

Discrepancy between no_answer score definition in `QuestionAnsweringHead` and `FARMReader` #2410

tstadel commented Apr 12, 2022 •

edited