Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add More top_k handling to EvalDocuments #1133

Merged
merged 5 commits into from
Jun 7, 2021
Merged

Add More top_k handling to EvalDocuments #1133

merged 5 commits into from
Jun 7, 2021

Conversation

brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Jun 2, 2021

Rename top_k_eval_documents and allow it to be set in EvalDocuments.run(). Perform checks to ensure that top_k_eval_documents doesn't change over the course of evaluation and that we never get a situation where EvalDocuments is passed less documents than its top_k

@brandenchan brandenchan self-assigned this Jun 2, 2021
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. As discussed, I agree that the warning about fewer documents than top_k_eval_documents is nice too have although that case does not invalidate metrics. One question though about the handling of changing top_k_eval_documents . See below.

haystack/eval.py Outdated
logger.warning(f"EvalDocuments was already run once with top_k_eval_documents={self.top_k_used} but is "
f"being run again with top_k_eval_documents={self.top_k_eval_documents}. This will lead "
f"to unreliable evaluation metrics")
self.inconsistent_top_k_warning = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering whether we should call init_counts() so that the calculated metrics make sense at least from here on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a great idea. I can imagine that a user might be performing multiple evaluation loops with a different top_k each time. I've implemented the count reset and also removed the self.inconsistent_top_k_warning so that it can be triggered more than once.

Copy link
Member

@julian-risch julian-risch Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, running the evaluation multiple times with different top_k_eval_documents makes sense now. However, there might be a different problem now depending on how we expect users to use the evaluation. The problem is that top_k_used is only updated in the very first run and never later on.
If you run the evaluation with top_k_eval_documents=5 twice, the counts are aggregated, which is good. top_k_used is set to 5.
If you then run with different top_k_eval_documents, the counts are reset, which is good too.
If you then switch back to top_k_eval_documents, the counts are not reset because top_k_eval_documents is again the same as top_k_used, which is bad.
Further, from now on the counts are not aggregated anymore even for multiple runs with the same top_k_eval_documents and that's only because of trying out different values of top_k_eval_documents earlier. That would be confusing. Do we expect users to try out different values of top_k_eval_documents? Maybe in jupyter notebooks? If not, I am okay with the current version of the code. Otherwise, I am happy to discuss (also in a call). I would suggest to simply add this line self.top_k_used = top_k_eval_documents at the very end of the run method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I agree with your points here. Added the line self.top_k_used = top_k_eval_documents to the end of the run method.

haystack/eval.py Outdated
if not self.top_k_used:
self.top_k_used = top_k_eval_documents
elif self.top_k_used != top_k_eval_documents:
logger.warning(f"EvalDocuments was already run once with top_k_eval_documents={self.top_k_used} but is "
Copy link
Member

@julian-risch julian-risch Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing: Indentation level looks odd here 🤔

@brandenchan brandenchan merged commit 59e3c55 into master Jun 7, 2021
@brandenchan brandenchan deleted the eval_args branch June 7, 2021 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants