feat: retrival detail add matched_count by e06084 · Pull Request #422 · MigoXLab/dingo

e06084 · 2026-06-08T10:22:21Z

No description provided.

gemini-code-assist

Code Review

This pull request enhances MTEB search adapter traces by attaching relevant documents (qrels) to distinguish between mapped and relevant matches, updating the Semantic Scholar backend documentation, and adding corresponding unit tests. A critical review comment points out that the implementation of _attach_relevant_docs will raise AttributeError on standard MTEB tasks when attempting to iterate over task.dataset and call .get() on Hugging Face Dataset objects, suggesting to retrieve qrels directly from task.relevant_docs instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-08T10:24:16Z

+            for hf_subset, splits in getattr(task, "dataset", {}).items():
+                for hf_split, data_split in splits.items():
+                    relevant_docs = data_split.get("relevant_docs", {})
+                    model.set_relevant_docs(
+                        task_name,
+                        hf_split,
+                        hf_subset,
+                        relevant_docs,
+                    )


The current implementation of _attach_relevant_docs has two critical bugs that will cause it to crash on almost all MTEB retrieval tasks:

AttributeError on standard tasks: For standard retrieval tasks (without subsets), task.dataset is a Hugging Face DatasetDict (mapping split names to Dataset objects). Calling getattr(task, "dataset", {}).items() will yield (hf_split, data_split). The inner loop then attempts to call splits.items() (which is data_split.items()). Since data_split is a Hugging Face Dataset object, this will raise AttributeError: 'Dataset' object has no attribute 'items'.

AttributeError on Dataset.get: Even if the task has subsets and splits is a DatasetDict, data_split will be a Hugging Face Dataset object. Calling data_split.get("relevant_docs", {}) will raise AttributeError: 'Dataset' object has no attribute 'get' because Dataset does not have a .get() method.

Instead, you should retrieve the loaded qrels directly from task.relevant_docs (which is a standard dictionary of split -> qrels populated after task.load_data()), and get the subset name from task.hf_subset (defaulting to 'default').

Suggested change

for hf_subset, splits in getattr(task, "dataset", {}).items():

for hf_split, data_split in splits.items():

relevant_docs = data_split.get("relevant_docs", {})

model.set_relevant_docs(

task_name,

hf_split,

hf_subset,

relevant_docs,

)

hf_subset = getattr(task, "hf_subset", "default")

relevant_docs_dict = getattr(task, "relevant_docs", {})

if relevant_docs_dict:

for hf_split, relevant_docs in relevant_docs_dict.items():

model.set_relevant_docs(

task_name,

hf_split,

hf_subset,

relevant_docs,

)

e06084 added 2 commits June 8, 2026 18:20

fix: update semantic_scholar cli example

23d9019

feat: retrival detail add matched_count

af90351

gemini-code-assist Bot reviewed Jun 8, 2026

View reviewed changes

fix comment

b65b76c

e06084 merged commit df7e1eb into MigoXLab:dev Jun 8, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: retrival detail add matched_count#422

feat: retrival detail add matched_count#422
e06084 merged 3 commits into
MigoXLab:devfrom
e06084:dev

e06084 commented Jun 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

e06084 commented Jun 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant