Skip to content

[Benchmark] Support MME-Reasoning #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 16, 2025

Conversation

JiakangYuan
Copy link
Contributor

MME-Reasoning is a comprehensive benchmark designed to evaluate the reasoning ability of MLLMs, which covers all three types of reasoning (i.e., inductive, deductive, and abductive) in its questions.

arXiv link: https://arxiv.org/pdf/2505.21327
project page: https://alpha-innovator.github.io/mmereasoning.github.io/

@kennymckormick
Copy link
Member

Sample Evaluation Results:

gpt-4.1-2025-04-14:


Overall 39.6465
planning and exploring 35.3591
calculation 37.9939
spatial-temporal 42.5532
casual chaining analysis 47.2222
pattern analysis 39.3939
inductive 38.6707
deductive 40.5229
abductive 39.4472


Gemini-2.0-Flash:


Overall 33.5859
planning and exploring 26.7956
calculation 37.9939
spatial-temporal 38.6525
casual chaining analysis 56.9444
pattern analysis 21.8182
inductive 26.8882
deductive 42.7015
abductive 28.6432


@kennymckormick kennymckormick merged commit 586800e into open-compass:main Jun 16, 2025
kennymckormick added a commit to hexmSeeU/VLMEvalKit that referenced this pull request Jun 16, 2025
* support mme-reasoning

* Fix Lint

---------

Co-authored-by: Haodong Duan <dhd@pku.edu.cn>
Co-authored-by: kennymckormick <dhd.efz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants