Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nDCG to pipeline.eval()'s document metrics #2008

Merged
merged 4 commits into from Jan 14, 2022
Merged

Add nDCG to pipeline.eval()'s document metrics #2008

merged 4 commits into from Jan 14, 2022

Conversation

tstadel
Copy link
Member

@tstadel tstadel commented Jan 14, 2022

nDCG is a popular Information Retrieval metric that we should support for document returning nodes.
It was officially requested in #925.

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Let's keep it in mind though that while adding more metrics here, we also have many metrics in metrics.py: https://github.com/deepset-ai/haystack/blob/13510aa753d4e390f244398cc50654185dddbcde/haystack/modeling/evaluation/metrics.py I understand why that's currently the case and I'm okay with that. However, why metrics are defined/calculated in schema.py might not be so obvious to users. Maybe we would like users to be able to define and use their own metrics sooner or later. For example passing a function to calculate_metrics()?

@@ -947,13 +947,17 @@ def _build_document_metrics_df(
recall_single_hit = min(num_retrieved_relevants, 1)
precision = num_retrieved_relevants / retrieved if retrieved > 0 else 0.0
rr = 1.0 / rank_retrieved_relevants.min() if len(rank_retrieved_relevants) > 0 else 0.0
dcg = np.sum([1.0 / np.log2(rank+1) for rank in rank_retrieved_relevants]) if len(rank_retrieved_relevants) > 0 else 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we have the same understanding: the 1.0 in [1.0 / np.log2 ...here means that we assume binary relevance scores and we could allow for graded relevance scores if we put the relevance of the document retrieved at this rank here instead of1.0`. However, we don't have graded relevance scores in Haystack (yet) so it's fine as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. We would need to adjust the other metrics as well if we want to support graded relevance scores.

@tstadel
Copy link
Member Author

tstadel commented Jan 14, 2022

LGTM +1 Let's keep it in mind though that while adding more metrics here, we also have many metrics in metrics.py: https://github.com/deepset-ai/haystack/blob/13510aa753d4e390f244398cc50654185dddbcde/haystack/modeling/evaluation/metrics.py I understand why that's currently the case and I'm okay with that. However, why metrics are defined/calculated in schema.py might not be so obvious to users. Maybe we would like users to be able to define and use their own metrics sooner or later. For example passing a function to calculate_metrics()?

I agree, let's approach this in another PR. I've opened an issue for that: #2009

@tstadel tstadel merged commit f42d2e8 into master Jan 14, 2022
@tstadel tstadel deleted the ndcg branch January 14, 2022 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants