Fix `DenseRetrievalExactSearch` evaluation #154

NouamaneTazi · 2023-08-12T17:21:45Z

I noticed there was a problem in the way we handled queries that exist in the retrieval corpus. By default we have ignore_identical_ids=True which pops these duplicated queries from the results. Which means some queries would have top_k retrieved documents, while others have top_k-1 retrieved documents.

Fixing this behaviour gives a noticeable change in scores. Here's the difference in scores noticed for "intfloat/e5-large" on ArguAna evaluated using MTEB:

    model = SentenceTransformer("intfloat/e5-large", device="cuda")
    eval = MTEB(tasks=["ArguAna"])
    eval.run(model, batch_size=512*2, corpus_chunk_size=10000, overwrite_results=True)

Scores before fix:

INFO:mteb.evaluation.MTEB:Scores: {'ndcg_at_1': 0.27596, 'ndcg_at_3': 0.42701, 'ndcg_at_5': 0.48151, 'ndcg_at_10': 0.53452, 'ndcg_at_100': 0.57081, 'ndcg_at_1000': 0.57226, 'map_at_1': 0.27596, 'map_at_3': 0.38976, 'map_at_5': 0.41967, 'map_at_10': 0.44187, 'map_at_100': 0.4507, 'map_at_1000': 0.45077, 'recall_at_1': 0.27596, 'recall_at_3': 0.53485, 'recall_at_5': 0.66856, 'recall_at_10': 0.83073, 'recall_at_100': 0.98578, 'recall_at_1000': 0.99644, 'precision_at_1': 0.27596, 'precision_at_3': 0.17828, 'precision_at_5': 0.13371, 'precision_at_10': 0.08307, 'precision_at_100': 0.00986, 'precision_at_1000': 0.001, 'mrr_at_1': 0.28378, 'mrr_at_3': 0.39284, 'mrr_at_5': 0.42261, 'mrr_at_10': 0.44498, 'mrr_at_100': 0.45374, 'mrr_at_1000': 0.45381, 'evaluation_time': 127.59}

Scores after fix:

INFO:mteb.evaluation.MTEB:Scores: {'ndcg_at_1': 0.41963, 'ndcg_at_3': 0.57859, 'ndcg_at_5': 0.62677, 'ndcg_at_10': 0.65648, 'ndcg_at_100': 0.67739, 'ndcg_at_1000': 0.67846, 'map_at_1': 0.41963, 'map_at_3': 0.53983, 'map_at_5': 0.56664, 'map_at_10': 0.57907, 'map_at_100': 0.58407, 'map_at_1000': 0.58413, 'recall_at_1': 0.41963, 'recall_at_3': 0.69061, 'recall_at_5': 0.80725, 'recall_at_10': 0.89829, 'recall_at_100': 0.98862, 'recall_at_1000': 0.99644, 'precision_at_1': 0.41963, 'precision_at_3': 0.2302, 'precision_at_5': 0.16145, 'precision_at_10': 0.08983, 'precision_at_100': 0.00989, 'precision_at_1000': 0.001, 'mrr_at_1': 0.41963, 'mrr_at_3': 0.53983, 'mrr_at_5': 0.56664, 'mrr_at_10': 0.57907, 'mrr_at_100': 0.58407, 'mrr_at_1000': 0.58413, 'evaluation_time': 112.69}

cc @thakur-nandan

Muennighoff

I'm not fully understanding yet, maybe you can help me out 😅🧐

Muennighoff · 2023-08-15T21:52:42Z

beir/retrieval/search/dense/exact_search.py

@@ -45,6 +46,9 @@ def search(self,
        logger.info("Sorting Corpus by document length (Longest first)...")

        corpus_ids = sorted(corpus, key=lambda k: len(corpus[k].get("title", "") + corpus[k].get("text", "")), reverse=True)
+        if ignore_identical_ids:
+            # We remove the query from results if it exists in corpus
+            corpus_ids = [cid for cid in corpus_ids if cid not in query_ids] 


Doesn't this make the task "easier" by removing all other queries as options for each query?

I.e. previously, given query1 the model could wrongly retrieve query2 (if it was also in the corpus).
Now the model cannot retrieve any of the other queries which makes it easier assuming the answer is never another query.

I think thus option was for Quora: You want to find paraphrases of queries, but not the original start query. But this original query will always be ranked first at it is also part of the corpus

Which is why we have the ignore_identical_ids option I think. This PR only tries to fix ignore_identical_ids=True case

Muennighoff · 2023-08-15T21:56:49Z

beir/retrieval/search/dense/exact_search.py

-            cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k+1, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)
+            cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)


You write that Which means some queries would have top_k retrieved documents, while others have top_k-1 retrieved documents., but didn't this +1 ensure that that does not happen cuz we retrieve top_k+1 but then only allow top_k lateron?

IIUC, the problem comes from this line

beir/beir/retrieval/search/dense/exact_search.py

Line 86 in 505d80d

if len(result_heaps[query_id]) < top_k:

So we only keep the top_k (which sometimes include the query inside the retrieved docs)

I see, I thought the if corpus_id != query_id: would ensure that the query would never be added to result_heaps[query_id] 🧐

Hmm, then why do we get different results? 🧐

It's easy to check, we just have to assert that number of results of each query is top_k. Can you check that please @Muennighoff ?

add ignore_identical_ids to DRES

fc17f4a

NouamaneTazi marked this pull request as ready for review August 12, 2023 17:29

.

505d80d

Muennighoff reviewed Aug 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `DenseRetrievalExactSearch` evaluation #154

Fix `DenseRetrievalExactSearch` evaluation #154

NouamaneTazi commented Aug 12, 2023 •

edited

Loading

Muennighoff left a comment

Muennighoff Aug 15, 2023

nreimers Aug 16, 2023

NouamaneTazi Aug 16, 2023

Muennighoff Aug 15, 2023

NouamaneTazi Aug 16, 2023

Muennighoff Aug 16, 2023

NouamaneTazi Aug 17, 2023

NouamaneTazi Aug 17, 2023

		cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k+1, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)
		cos_scores_top_k_values, cos_scores_top_k_idx = torch.topk(cos_scores, min(top_k, len(cos_scores[1])), dim=1, largest=True, sorted=return_sorted)

Fix DenseRetrievalExactSearch evaluation #154

Are you sure you want to change the base?

Fix DenseRetrievalExactSearch evaluation #154

Conversation

NouamaneTazi commented Aug 12, 2023 • edited Loading

Muennighoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix `DenseRetrievalExactSearch` evaluation #154

Fix `DenseRetrievalExactSearch` evaluation #154

NouamaneTazi commented Aug 12, 2023 •

edited

Loading