Add More top_k handling to EvalDocuments #1133

brandenchan · 2021-06-02T14:41:58Z

Rename top_k_eval_documents and allow it to be set in EvalDocuments.run(). Perform checks to ensure that top_k_eval_documents doesn't change over the course of evaluation and that we never get a situation where EvalDocuments is passed less documents than its top_k

julian-risch

Looks good to me. As discussed, I agree that the warning about fewer documents than top_k_eval_documents is nice too have although that case does not invalidate metrics. One question though about the handling of changing top_k_eval_documents . See below.

julian-risch · 2021-06-03T14:03:35Z

haystack/eval.py

+                logger.warning(f"EvalDocuments was already run once with top_k_eval_documents={self.top_k_used} but is "
+                               f"being run again with top_k_eval_documents={self.top_k_eval_documents}. This will lead "
+                               f"to unreliable evaluation metrics")
+                self.inconsistent_top_k_warning = True


I was wondering whether we should call init_counts() so that the calculated metrics make sense at least from here on?

Yes that's a great idea. I can imagine that a user might be performing multiple evaluation loops with a different top_k each time. I've implemented the count reset and also removed the self.inconsistent_top_k_warning so that it can be triggered more than once.

Yes, running the evaluation multiple times with different top_k_eval_documents makes sense now. However, there might be a different problem now depending on how we expect users to use the evaluation. The problem is that top_k_used is only updated in the very first run and never later on.
If you run the evaluation with top_k_eval_documents=5 twice, the counts are aggregated, which is good. top_k_used is set to 5.
If you then run with different top_k_eval_documents, the counts are reset, which is good too.
If you then switch back to top_k_eval_documents, the counts are not reset because top_k_eval_documents is again the same as top_k_used, which is bad.
Further, from now on the counts are not aggregated anymore even for multiple runs with the same top_k_eval_documents and that's only because of trying out different values of top_k_eval_documents earlier. That would be confusing. Do we expect users to try out different values of top_k_eval_documents? Maybe in jupyter notebooks? If not, I am okay with the current version of the code. Otherwise, I am happy to discuss (also in a call). I would suggest to simply add this line self.top_k_used = top_k_eval_documents at the very end of the run method.

Yup I agree with your points here. Added the line self.top_k_used = top_k_eval_documents to the end of the run method.

julian-risch · 2021-06-04T06:59:45Z

haystack/eval.py

+        if not self.top_k_used:
+            self.top_k_used = top_k_eval_documents
+        elif self.top_k_used != top_k_eval_documents:
+                logger.warning(f"EvalDocuments was already run once with top_k_eval_documents={self.top_k_used} but is "


Minor thing: Indentation level looks odd here 🤔

Improve top_k support

88e55d3

brandenchan requested review from tholor and julian-risch June 2, 2021 14:41

brandenchan self-assigned this Jun 2, 2021

brandenchan added 2 commits June 2, 2021 17:05

Adjust warning

7304228

Satisfy mypy

7fd80b1

julian-risch approved these changes Jun 3, 2021

View reviewed changes

julian-risch reviewed Jun 3, 2021

View reviewed changes

Reinit eval counts if top_k has changed

6d1f6f5

brandenchan requested a review from julian-risch June 3, 2021 17:09

julian-risch reviewed Jun 4, 2021

View reviewed changes

Incorporate reviewer feedback

307358a

brandenchan merged commit 59e3c55 into master Jun 7, 2021

brandenchan deleted the eval_args branch June 7, 2021 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add More top_k handling to EvalDocuments #1133

Add More top_k handling to EvalDocuments #1133

brandenchan commented Jun 2, 2021 •

edited

Loading

julian-risch left a comment •

edited

Loading

julian-risch Jun 3, 2021

brandenchan Jun 3, 2021

julian-risch Jun 4, 2021 •

edited

Loading

brandenchan Jun 7, 2021

julian-risch Jun 4, 2021 •

edited

Loading

Add More top_k handling to EvalDocuments #1133

Add More top_k handling to EvalDocuments #1133

Conversation

brandenchan commented Jun 2, 2021 • edited Loading

julian-risch left a comment • edited Loading

Choose a reason for hiding this comment

julian-risch Jun 3, 2021

Choose a reason for hiding this comment

brandenchan Jun 3, 2021

Choose a reason for hiding this comment

julian-risch Jun 4, 2021 • edited Loading

Choose a reason for hiding this comment

brandenchan Jun 7, 2021

Choose a reason for hiding this comment

julian-risch Jun 4, 2021 • edited Loading

Choose a reason for hiding this comment

brandenchan commented Jun 2, 2021 •

edited

Loading

julian-risch left a comment •

edited

Loading

julian-risch Jun 4, 2021 •

edited

Loading

julian-risch Jun 4, 2021 •

edited

Loading