Add nDCG to `pipeline.eval()`'s document metrics #2008

tstadel · 2022-01-14T15:29:37Z

nDCG is a popular Information Retrieval metric that we should support for document returning nodes.
It was officially requested in #925.

julian-risch

LGTM 👍 Let's keep it in mind though that while adding more metrics here, we also have many metrics in metrics.py: https://github.com/deepset-ai/haystack/blob/13510aa753d4e390f244398cc50654185dddbcde/haystack/modeling/evaluation/metrics.py I understand why that's currently the case and I'm okay with that. However, why metrics are defined/calculated in schema.py might not be so obvious to users. Maybe we would like users to be able to define and use their own metrics sooner or later. For example passing a function to calculate_metrics()?

julian-risch · 2022-01-14T15:48:32Z

haystack/schema.py

@@ -947,13 +947,17 @@ def _build_document_metrics_df(
            recall_single_hit = min(num_retrieved_relevants, 1)
            precision = num_retrieved_relevants / retrieved if retrieved > 0 else 0.0
            rr = 1.0 / rank_retrieved_relevants.min() if len(rank_retrieved_relevants) > 0 else 0.0
+            dcg = np.sum([1.0 / np.log2(rank+1) for rank in rank_retrieved_relevants]) if len(rank_retrieved_relevants) > 0 else 0.0


Just to make sure we have the same understanding: the 1.0 in [1.0 / np.log2 ...here means that we assume binary relevance scores and we could allow for graded relevance scores if we put the relevance of the document retrieved at this rank here instead of1.0`. However, we don't have graded relevance scores in Haystack (yet) so it's fine as it is.

Yes, indeed. We would need to adjust the other metrics as well if we want to support graded relevance scores.

tstadel · 2022-01-14T16:20:48Z

LGTM +1 Let's keep it in mind though that while adding more metrics here, we also have many metrics in metrics.py: https://github.com/deepset-ai/haystack/blob/13510aa753d4e390f244398cc50654185dddbcde/haystack/modeling/evaluation/metrics.py I understand why that's currently the case and I'm okay with that. However, why metrics are defined/calculated in schema.py might not be so obvious to users. Maybe we would like users to be able to define and use their own metrics sooner or later. For example passing a function to calculate_metrics()?

I agree, let's approach this in another PR. I've opened an issue for that: #2009

add ndcg metric

b60aa8e

tstadel requested a review from julian-risch January 14, 2022 15:29

julian-risch approved these changes Jan 14, 2022

View reviewed changes

julian-risch added the topic:eval label Jan 14, 2022

tstadel and others added 3 commits January 14, 2022 17:22

Merge branch 'master' into ndcg

5a85311

fix merge

48d5209

Add latest docstring and tutorial changes

396abb8

tstadel merged commit f42d2e8 into master Jan 14, 2022

tstadel deleted the ndcg branch January 14, 2022 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nDCG to `pipeline.eval()`'s document metrics #2008

Add nDCG to `pipeline.eval()`'s document metrics #2008

tstadel commented Jan 14, 2022 •

edited

julian-risch left a comment

julian-risch Jan 14, 2022

tstadel Jan 14, 2022

tstadel commented Jan 14, 2022

Add nDCG to pipeline.eval()'s document metrics #2008

Add nDCG to pipeline.eval()'s document metrics #2008

Conversation

tstadel commented Jan 14, 2022 • edited

julian-risch left a comment

Choose a reason for hiding this comment

julian-risch Jan 14, 2022

Choose a reason for hiding this comment

tstadel Jan 14, 2022

Choose a reason for hiding this comment

tstadel commented Jan 14, 2022

Add nDCG to `pipeline.eval()`'s document metrics #2008

Add nDCG to `pipeline.eval()`'s document metrics #2008

tstadel commented Jan 14, 2022 •

edited