[Benchmark] Support MME-Reasoning #1057

JiakangYuan · 2025-06-10T11:12:27Z

MME-Reasoning is a comprehensive benchmark designed to evaluate the reasoning ability of MLLMs, which covers all three types of reasoning (i.e., inductive, deductive, and abductive) in its questions.

arXiv link: https://arxiv.org/pdf/2505.21327
project page: https://alpha-innovator.github.io/mmereasoning.github.io/

kennymckormick · 2025-06-16T07:04:22Z

Sample Evaluation Results:

gpt-4.1-2025-04-14:

Overall 39.6465
planning and exploring 35.3591
calculation 37.9939
spatial-temporal 42.5532
casual chaining analysis 47.2222
pattern analysis 39.3939
inductive 38.6707
deductive 40.5229
abductive 39.4472

Gemini-2.0-Flash:

Overall 33.5859
planning and exploring 26.7956
calculation 37.9939
spatial-temporal 38.6525
casual chaining analysis 56.9444
pattern analysis 21.8182
inductive 26.8882
deductive 42.7015
abductive 28.6432

* support mme-reasoning * Fix Lint --------- Co-authored-by: Haodong Duan <dhd@pku.edu.cn> Co-authored-by: kennymckormick <dhd.efz@gmail.com>

JiakangYuan and others added 5 commits June 10, 2025 19:03

support mme-reasoning

b589bb5

Merge branch 'main' into mme-reasoning

c881d58

Merge branch 'main' into mme-reasoning

87f009a

Merge branch 'main' into mme-reasoning

fc9bded

Fix Lint

a1f308d

kennymckormick merged commit 586800e into open-compass:main Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Support MME-Reasoning #1057

[Benchmark] Support MME-Reasoning #1057

Uh oh!

JiakangYuan commented Jun 10, 2025

Uh oh!

kennymckormick commented Jun 16, 2025

Uh oh!

Uh oh!

[Benchmark] Support MME-Reasoning #1057

[Benchmark] Support MME-Reasoning #1057

Uh oh!

Conversation

JiakangYuan commented Jun 10, 2025

Uh oh!

kennymckormick commented Jun 16, 2025

Uh oh!

Uh oh!