New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nDCG to pipeline.eval()
's document metrics
#2008
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Let's keep it in mind though that while adding more metrics here, we also have many metrics in metrics.py
: https://github.com/deepset-ai/haystack/blob/13510aa753d4e390f244398cc50654185dddbcde/haystack/modeling/evaluation/metrics.py I understand why that's currently the case and I'm okay with that. However, why metrics are defined/calculated in schema.py
might not be so obvious to users. Maybe we would like users to be able to define and use their own metrics sooner or later. For example passing a function to calculate_metrics()?
@@ -947,13 +947,17 @@ def _build_document_metrics_df( | |||
recall_single_hit = min(num_retrieved_relevants, 1) | |||
precision = num_retrieved_relevants / retrieved if retrieved > 0 else 0.0 | |||
rr = 1.0 / rank_retrieved_relevants.min() if len(rank_retrieved_relevants) > 0 else 0.0 | |||
dcg = np.sum([1.0 / np.log2(rank+1) for rank in rank_retrieved_relevants]) if len(rank_retrieved_relevants) > 0 else 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure we have the same understanding: the 1.0
in [1.0 / np.log2 ...here means that we assume binary relevance scores and we could allow for graded relevance scores if we put the relevance of the document retrieved at this rank here instead of
1.0`. However, we don't have graded relevance scores in Haystack (yet) so it's fine as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. We would need to adjust the other metrics as well if we want to support graded relevance scores.
I agree, let's approach this in another PR. I've opened an issue for that: #2009 |
nDCG is a popular Information Retrieval metric that we should support for document returning nodes.
It was officially requested in #925.