MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 50+ advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.

Our MLLM works

🔥🔥🔥 A Survey on Multimodal Large Language Models
Project Page | Paper

🍎 [Read our new version] (update on April 2, 2024)

Chinese version will be updated soon!

The first survey for Multimodal Large Language Models (MLLMs). ✨

Welcome to add WeChat ID (wmd_ustc) to join our MLLM communication group! 🌟

🔥🔥🔥 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Project Page [This Page] | Paper

The first comprehensive evaluation benchmark for MLLMs. Now the leaderboards include 50+ advanced models, such as Qwen-VL-Max, Gemini Pro, and GPT-4V. ✨

If you want to add your model in our leaderboards, please feel free to email bradyfu24@gmail.com. We will update the leaderboards in time. ✨

Download MME 🌟🌟

The benchmark dataset is collected by Xiamen University for academic research only. You can email yongdongluo@stu.xmu.edu.cn to obtain the dataset, according to the following requirement.

Requirement: A real-name system is encouraged for better academic communication. Your email suffix needs to match your affiliation, such as xx@stu.xmu.edu.cn and Xiamen University. Otherwise, you need to explain why. Please include the information bellow when sending your application email.

Name: (tell us who you are.)
Affiliation: (the name/url of your university or company)
Job Title: (e.g., professor, PhD, and researcher)
Email: (your email address)
How to use: (only for non-commercial use)

🔥🔥🔥 Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper | Source Code

The first work to correct hallucinations in MLLMs. ✨

🔥🔥🔥 A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Paper

The first technical report for Gemini vs GPT-4V. A total of 128 pages. Completed within one week of the Gemini API opening. 🌟

📑 If you find our projects helpful to your research, please consider citing:

@article{fu2023mme,
  title={MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models},
  author={Fu, Chaoyou and Chen, Peixian and Shen, Yunhang and Qin, Yulei and Zhang, Mengdan and Lin, Xu and Yang, Jinrui and Zheng, Xiawu and Li, Ke and Sun, Xing and others},
  journal={arXiv preprint arXiv:2306.13394},
  year={2023}
}

@article{fu2024video,
  title={Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis},
  author={Fu, Chaoyou and Dai, Yuhan and Luo, Yondong and Li, Lei and Ren, Shuhuai and Zhang, Renrui and Wang, Zihan and Zhou, Chenyu and Shen, Yunhang and Zhang, Mengdan and others},
  journal={arXiv preprint arXiv:2405.21075},
  year={2024}
}

@article{yin2023survey,
  title={A survey on multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Li, Ke and Sun, Xing and Xu, Tong and Chen, Enhong},
  journal={arXiv preprint arXiv:2306.13549},
  year={2023}
}

@article{yin2023woodpecker,
  title={Woodpecker: Hallucination correction for multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Xu, Tong and Wang, Hao and Sui, Dianbo and Shen, Yunhang and Li, Ke and Sun, Xing and Chen, Enhong},
  journal={arXiv preprint arXiv:2310.16045},
  year={2023}
}

@article{fu2023challenger,
  title={A challenger to gpt-4v? early explorations of gemini in visual expertise},
  author={Fu, Chaoyou and Zhang, Renrui and Lin, Haojia and Wang, Zihan and Gao, Timin and Luo, Yongdong and Huang, Yubo and Zhang, Zhengye and Qiu, Longtian and Ye, Gaoxiang and others},
  journal={arXiv preprint arXiv:2312.12436},
  year={2023}
}

News 🚀

[06-06] Thanks to CMRI, JT-VL-Chat-V1.0 is added in MME. 🔥🔥
[05-27] Thanks to Junbo Cui, MiniCPM-Llama3-V 2.5 joins MME.
[05-18] Thanks to Chunyu Xie, 360VL is incorporated into MME.
[04-27] Thanks to Zhe Chen, we welcome a new member InternVL-Chat-V1.5.
[04-15] Thanks to Junbo Cui, MiniCPM-V-2 is added in MME.
[04-10] Thanks to Wenqiao Zhang, HyperLLaVA joins our leaderboards.
[03-14] Thanks to Muyang He, Bunny-3B takes part in MME.
[02-23] Thanks to Jingyu Liu, ChatTruth-7B is added to MME.
[02-07] Thanks to TsinghuaNLP, MiniCPM and OmniLMM are incorporated into our leaderboards.
[02-05] Thanks to Haotian Liu, LLaVA-1.6 is added to MME.
[02-05] Thanks to Bin Lin, MoE-LLaVA joins MME.
[02-05] Thanks to Weihan Wang and Wenyi Hong, CogVLM and CogAgent take part in MME.
[01-25] Thanks to Shijie Wang, we welcome a new member Qwen-VL-Max.
[01-22] Thanks to Xiaoyi Dong, InternLM-XComposer2-VL joins our leaderboards.

2023

[2023-12]

[12-31] Thanks to Dian Li, PureMM takes part in our leaderboards (update in 2024-01-14 and 2024-01-21).
[12-31] Thanks to Yilin Ma and Min Xu, RBDash is added in MME.
[12-18] Thanks to Zihan Wang, our leaderboards usher in Gemini Pro.
[12-18] Thanks to Jinze Bai, a new model Qwen-VL-Plus is added in MME.
[12-18] Thanks to Junbum Cha, Honeybee joins our leaderboards.
[12-12] Thanks to Yuliang Liu, Monkey-Chat takes part in MME.
[12-12] Thanks to Junkun Yuan, we welcome a new member AGILMM.
[12-01] Thanks to Cheng Wen, BELLE-VL is added to our leaderboards.
[12-01] Thanks to PCI Research, TransCore-M joins MME.

[2023-11]

[11-24] Thanks to Xiaoyi Dong, we add ShareGPT4V to our leaderboards.
[11-24] Thanks to Muyang He, DataOptim joins MME.
[11-24] Thanks to Zifei Shan, Kanva is added.
[11-21] Thanks to Junke Wang, LVIS-INSTRUCT4V is added to our MME.
[11-18] Thanks to Zhenbo Luo, our leaderboards welcome a new member CVLM.
[11-10] Thanks to Qinghao Ye, we get a new model mPLUG-Owl2 in our leaderboards.
[11-10] Thanks to Zhibin Wang, InfMLLM joins our leaderboards (update in 2023-12-12).

[2023-10]

[10-29] Thanks to Jiaming Han, SPHINX is added to our leaderboards.
[10-23] Thanks to Zihan Wang, he manually evaluate the performance of GPT-4V on our benchmark. Note that GPT-4V refuses to answer questions that involve individuals, resulting in a zero score in the Celebrity subtask.
[10-13] Thanks to Yizhou Zhou, WeMM joins our leaderboards (The results are renewed on 2023-11-10 by updating the model).
[10-13] Thanks to Cui Junbo, we add Muffin to our leaderboards.
[10-13] Thanks to Jiaming Han, the results of LLaMA-Adapter V2 have been updated.
[10-04] Thanks to Haotian Liu, the results of LLaVA have been updated.

[2023-09]

[09-28] Thanks to Huasong Zhong, Lion is added.
[09-27] Thanks to Xiaoyi Dong, InternLM-XComposer-VL joins our leaderboards.
[09-05] Thanks to Jinze Bai, our leaderboards usher in Qwen-VL-Chat.
[09-01] Thanks to Skywork Multi-Modal Group, Skywork-MM takes part in our leaderboards.

[2023-08]

[08-28] Thanks to UCSD MLPC, we welcome BLIVA to join our leaderboards.
[08-28] Thanks to Jianfeng Wang, GIT2 is added to our leaderboards.
[08-28] Thanks to Yike Yuan and Songyang Zhang, the results of MiniGPT4 have been revised.
[08-21] Thanks to Haozhe Zhao, MMICL joins our leaderboards (The results are renewed on 2023-09-17 by upgrading the checkpoint.).
[08-13] Thanks to Zhejiang University DCD Lab, our leaderboards incorporate a new member Cheetor.
[08-08] Thanks to Fuxiao Liu, we add LRV-Instruction to our leaderboards.

[2023-07]

[07-28] Thanks to Yingzi Ma, his work Octopus has been updated to our leaderboards.
[07-15] Thanks to Jiani Zheng, our leaderboards welcome a new member Lynx.
[07-12] Thanks to Ao Zhang, his work VPGTrans has been added in our leaderboards.
[07-09] Thanks to Bo Li, we have updated the evaluation of his work Otter. It uses the latest model OTTER-Image-MPT7B that incoporates OpenFlamingv2 and enhances instruction following ability.

[2023-06]

[06-30] Thanks to Renrui Zhang, we have updated the evaluation of his two works, i.e., LLaMA-Adapter V2 and ImageBind_LLM. The former is re-evaluated after changing the model weights, and the latter is a newly added MLLM.
[06-30] Thanks to Gen Luo, we have added the evaluation of his work LaVIN.
[06-30] The results of other models have also been updated, retrieving the answer from the beginning of the generated responses instead of the whole responses. An automated evaluation script for the calculation of scores has been released!

Results of Available Models [Unavailable Version]

Leaderboards of Available Models [Unavailable Version]

Perception
- Existence | Count | Position | Color | Poster | Celebrity | Scene | Landmark | Artwork | OCR
Cognition
- Commonsense Reasoning | Numerical Calculation | Text Translation | Code Reasoning

Perception

Sum of the scores of all perception subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, and OCR. The full score of each subtask is 200, and that of all perception is 2000.

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	1790.04
🥈	ChatTruth-7B	Qwen-7B	1735.88
🥉	InternLM-XComposer2-VL	InternLM2-7B	1712.00
4	PureMM	Vicuna-13B	1686.52
5	Qwen-VL-Plus	-	1681.25
6	InfMLLM	Vicuna-13B	1673.75
7	InternVL-Chat-V1.1	LLaMA2-13B	1672.35
8	Honeybee	Vicuna-13B	1661.13
9	Bunny-8B	LLaMA3-8B	1644.14
10	JT-VL-Chat	-	1642.51
11	360VL	LLaMA3-70B	1640.86
12	InternVL-Chat-V1.5	InternLM2-20B	1637.84
13	OmniLMM	Zephyr-7B-beta	1636.90
14	LLaVA-1.6	Vicuna-34B	1631.47
15	WeMM	InternLM-7B	1621.66
16	MiniCPM-Llama3-V 2.5	LLaMA3-8B	1619.29
17	ShareGPT4V	Vicuna-13B	1618.70
18	RBDash	Vicuna-13B	1610.15
19	BELLE-VL	Qwen-14B	1595.34
20	TransCore-M	PCITransGPT-13B	1588.16
21	HyperLLaVA	Vicuna-13B	1575.61
22	LVIS-INSTRUCT4V	Vicuna-13B	1574.89
23	MindSource-VL-Chat	MindSource-7B	1567.99
24	DataOptim-LLaVA	Vicuna-13B	1563.56
25	SPHINX	LLaMA2-13B	1560.15
26	LLaVA	Vicuna-13B	1531.31
27	InternLM-XComposer-VL	InternLM-7B	1528.44
28	Monkey-Chat	Qwen-7B	1522.39
29	CogAgent	Vicuna-7B	1497.79
30	Gemini Pro	-	1496.57
31	Bunny-3B	Phi-2	1488.80
32	Qwen-VL-Chat	Qwen-7B	1487.57
33	MiniCPM	MiniCPM-2B	1452.01
34	mPLUG-Owl2	LLaMA2-7B	1450.19
35	MiniCPM-V-2	MiniCPM-2B	1443.19
36	CogVLM	Vicuna-7B	1439.07
37	MoE-LLaVA	Phi-2.7B×4	1431.34
38	GPT-4V	-	1409.43
39	MMICL	FlanT5xxl	1381.74
40	Lynx	Vicuna-7B	1373.24
41	BLIVA	FlanT5xxl	1337.73
42	GIT2	VQAv2-finetuned	1332.05
43	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	1328.39
44	Cheetor	Vicuna-7B	1299.97
45	LRV-Instruction	LRV-7B	1299.79
46	BLIP-2	Flant5xxl	1293.84
47	Otter	OTTER-Image-MPT7B	1292.26
48	Muffin	Vicuna-13B	1281.02
49	InstructBLIP	FlanT5xxl	1212.82
50	mPLUG-Owl	LLaMA-7B	967.34
51	LaVIN	LAVIN-13B	963.6
52	VPGTrans	Vicuna-7B	790.45
53	ImageBind_LLM	LLaMA-7B	775.77
54	VisualGLM-6B	VisualGLM-6B	705.31
55	Multimodal-GPT	Multimodal-GPT-9B	654.72
56	PandaGPT	Vicuna-7B	642.59
57	MiniGPT-4	Vicuna-13B	581.66

Existence

Rank	Model	Version	Score
🏅️	MiniCPM-Llama3-V 2.5	LLaMA3-8B	200.00
🥈	Otter	OTTER-Image-MPT7B	195.00
🥈	Lynx	Vicuna-7B	195.00
🥈	WeMM	InternLM-7B	195.00
🥈	Muffin	Vicuna-13B	195.00
🥈	SPHINX	LLaMA2-13B	195.00
🥈	InfMLLM	Vicuna-13B	195.00
🥈	LVIS-INSTRUCT4V	Vicuna-13B	195.00
🥈	RBDash	Vicuna-13B	195.00
🥈	InternLM-XComposer2-VL	InternLM2-7B	195.00
🥈	CogVLM	Vicuna-7B	195.00
🥈	ChatTruth-7B	Qwen-7B	195.00
🥈	MindSource-VL-Chat	MindSource-7B	195.00
🥈	MiniCPM-V-2	MiniCPM-2B	195.00
🥈	Bunny-8B	LLaMA3-8B	195.00
🥉	GIT2	VQAv2-finetuned	190.00
🥉	InternLM-XComposer-VL	InternLM-7B	190.00
🥉	GPT-4V	-	190.00
🥉	ShareGPT4V	Vicuna-13B	190.00
🥉	DataOptim-LLaVA	Vicuna-13B	190.00
🥉	BELLE-VL	Qwen-14B	190.00
🥉	TransCore-M	PCITransGPT-13B	190.00
🥉	LLaVA-1.6	Vicuna-34B	190.00
🥉	MiniCPM	MiniCPM-2B	190.00
🥉	OmniLMM	Zephyr-7B-beta	190.00
🥉	HyperLLaVA	Vicuna-13B	190.00
🥉	InternVL-Chat-V1.5	InternLM2-20B	190.00
🥉	360VL	LLaMA3-70B	190.00
4	PureMM	Vicuna-13B	188.33
5	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	185.00
5	InstructBLIP	FlanT5xxl	185.00
5	LLaVA	Vicuna-13B	185.00
5	LaVIN	LAVIN-13B	185.00
5	mPLUG-Owl2	LLaMA2-7B	185.00
5	Monkey-Chat	Qwen-7B	185.00
5	Honeybee	Vicuna-13B	185.00
5	InternVL-Chat-V1.1	LLaMA2-13B	185.00
5	CogAgent	Vicuna-7B	185.00
5	JT-VL-Chat	-	185.00
5	Qwen-VL-Max	-	183.33
6	Cheetor	Vicuna-7B	180.00
6	BLIVA	FlanT5xxl	180.00
6	MoE-LLaVA	Phi-2.7B×4	180.00
6	Bunny-3B	Phi-2	180.00
7	Qwen-VL-Plus	-	175.00
7	Gemini Pro	-	175.00
8	MMICL	FlanT5xxl	170.00
9	LRV-Instruction	LRV-7B	165.00
10	BLIP-2	Flant5xxl	160.00
11	Qwen-VL-Chat	Qwen-7B	158.33
12	ImageBind_LLM	LLaMA-7B	128.33
13	mPLUG-Owl	LLaMA-7B	120.00
14	VisualGLM-6B	VisualGLM-6B	85.00
15	PandaGPT	Vicuna-7B	70.00
15	VPGTrans	Vicuna-7B	70.00
16	MiniGPT-4	Vicuna-13B	68.33
17	Multimodal-GPT	Multimodal-GPT-9B	61.67

Count

Rank	Model	Version	Score
🏅️	CogAgent	Vicuna-7B	180.00
🥈	InternVL-Chat-V1.5	InternLM2-20B	175.00
🥉	RBDash	Vicuna-13B	173.33
🥉	InternVL-Chat-V1.1	LLaMA2-13B	173.33
🥉	JT-VL-Chat	-	173.33
4	Honeybee	Vicuna-13B	170.00
4	LLaVA-1.6	Vicuna-34B	170.00
4	MindSource-VL-Chat	MindSource-7B	170.00
5	MiniCPM-Llama3-V 2.5	LLaMA3-8B	168.33
6	Qwen-VL-Max	-	166.67
7	ShareGPT4V	Vicuna-13B	165.00
7	DataOptim-LLaVA	Vicuna-13B	165.00
7	TransCore-M	PCITransGPT-13B	165.00
7	CogVLM	Vicuna-7B	165.00
7	OmniLMM	Zephyr-7B-beta	165.00
7	Bunny-8B	LLaMA3-8B	165.00
8	Muffin	Vicuna-13B	163.33
9	360VL	LLaMA3-70B	160.80
10	MMICL	FlanT5xxl	160.00
10	GPT-4V	-	160.00
10	SPHINX	LLaMA2-13B	160.00
10	LVIS-INSTRUCT4V	Vicuna-13B	160.00
10	InternLM-XComposer2-VL	InternLM2-7B	160.00
10	ChatTruth-7B	Qwen-7B	160.00
10	HyperLLaVA	Vicuna-13B	160.00
11	InternLM-XComposer-VL	InternLM-7B	158.33
11	Bunny-3B	Phi-2	158.33
12	LLaVA	Vicuna-13B	155.00
12	mPLUG-Owl2	LLaMA2-7B	155.00
12	MoE-LLaVA	Phi-2.7B×4	155.00
13	Qwen-VL-Plus	-	153.33
14	Lynx	Vicuna-7B	151.67
15	Qwen-VL-Chat	Qwen-7B	150.00
15	BELLE-VL	Qwen-14B	150.00
15	Monkey-Chat	Qwen-7B	150.00
15	PureMM	Vicuna-13B	150.00
16	InfMLLM	Vicuna-13B	145.00
17	InstructBLIP	FlanT5xxl	143.33
18	WeMM	InternLM-7B	140.00
19	BLIVA	FlanT5xxl	138.33
20	BLIP-2	Flant5xxl	135.00
21	MiniCPM-V-2	MiniCPM-2B	133.33
21	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	133.33
22	Gemini Pro	-	131.67
23	MiniCPM	MiniCPM-2B	130.00
24	GIT2	VQAv2-finetuned	118.33
25	LRV-Instruction	LRV-7B	111.67
26	Cheetor	Vicuna-7B	96.67
27	Otter	OTTER-Image-MPT7B	88.33
27	LaVIN	LAVIN-13B	88.33
28	VPGTrans	Vicuna-7B	85.00
29	ImageBind_LLM	LLaMA-7B	60.00
30	MiniGPT-4	Vicuna-13B	55.00
30	Multimodal-GPT	Multimodal-GPT-9B	55.00
31	mPLUG-Owl	LLaMA-7B	50.00
31	VisualGLM-6B	VisualGLM-6B	50.00
31	PandaGPT	Vicuna-7B	50.00

Position

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	176.67
🥈	InfMLLM	Vicuna-13B	170.00
🥉	InternVL-Chat-V1.5	InternLM2-20B	166.67
4	InternLM-XComposer2-VL	InternLM2-7B	163.33
5	InternVL-Chat-V1.1	LLaMA2-13B	163.33
6	Qwen-VL-Plus	-	161.67
7	ChatTruth-7B	Qwen-7B	158.33
8	Honeybee	Vicuna-13B	155.00
8	360VL	LLaMA3-70B	155.00
9	ShareGPT4V	Vicuna-13B	153.33
9	SPHINX	LLaMA2-13B	153.33
10	MindSource-VL-Chat	MindSource-7B	146.67
11	JT-VL-Chat	-	145.00
12	RBDash	Vicuna-13B	138.33
13	LLaVA-1.6	Vicuna-34B	138.33
14	TransCore-M	PCITransGPT-13B	136.67
14	MiniCPM-Llama3-V 2.5	LLaMA3-8B	136.67
15	CogAgent	Vicuna-7B	135.00
15	Bunny-8B	LLaMA3-8B	135.00
16	LLaVA	Vicuna-13B	133.33
17	OmniLMM	Zephyr-7B-beta	131.67
18	BELLE-VL	Qwen-14B	130.00
19	LVIS-INSTRUCT4V	Vicuna-13B	128.33
19	HyperLLaVA	Vicuna-13B	128.33
19	Qwen-VL-Chat	Qwen-7B	128.33
19	Bunny-3B	Phi-2	128.33
20	InternLM-XComposer-VL	InternLM-7B	126.67
20	WeMM	InternLM-7B	126.67
21	PureMM	Vicuna-13B	123.33
22	DataOptim-LLaVA	Vicuna-13B	121.67
23	Monkey-Chat	Qwen-7B	118.33
23	MoE-LLaVA	Phi-2.7B×4	118.33
24	CogVLM	Vicuna-7B	103.33
25	GIT2	VQAv2-finetuned	96.67
26	GPT-4V	-	95.00
27	MiniCPM	MiniCPM-2B	93.33
28	Lynx	Vicuna-7B	90.00
28	Gemini Pro	-	90.00
29	mPLUG-Owl2	LLaMA2-7B	88.33
30	Otter	OTTER-Image-MPT7B	86.67
30	LRV-Instruction	LRV-7B	86.67
30	MiniCPM-V-2	MiniCPM-2B	86.67
31	MMICL	FlanT5xxl	81.67
31	BLIVA	FlanT5xxl	81.67
32	Cheetor	Vicuna-7B	80.00
33	BLIP-2	Flant5xxl	73.33
34	InstructBLIP	FlanT5xxl	66.67
34	Muffin	Vicuna-13B	66.67
35	LaVIN	LAVIN-13B	63.33
35	VPGTrans	Vicuna-7B	63.33
36	Multimodal-GPT	Multimodal-GPT-9B	58.33
37	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	56.67
38	mPLUG-Owl	LLaMA-7B	50.00
38	PandaGPT	Vicuna-7B	50.00
39	VisualGLM-6B	VisualGLM-6B	48.33
40	ImageBind_LLM	LLaMA-7B	46.67
41	MiniGPT-4	Vicuna-13B	43.33

Color

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	176.67
🥈	InfMLLM	Vicuna-13B	170.00
🥉	InternVL-Chat-V1.5	InternLM2-20B	166.67
4	InternLM-XComposer2-VL	InternLM2-7B	163.33
4	InternVL-Chat-V1.1	LLaMA2-13B	163.33
5	Qwen-VL-Plus	-	161.67
6	ChatTruth-7B	Qwen-7B	158.33
7	Honeybee	Vicuna-13B	155.00
7	360VL	LLaMA3-70B	155.00
8	ShareGPT4V	Vicuna-13B	153.33
8	SPHINX	LLaMA2-13B	153.33
9	MindSource-VL-Chat	MindSource-7B	146.67
10	JT-VL-Chat	-	145.00
11	RBDash	Vicuna-13B	138.33
11	LLaVA-1.6	Vicuna-34B	138.33
12	TransCore-M	PCITransGPT-13B	136.67
12	MiniCPM-Llama3-V 2.5	LLaMA3-8B	136.67
13	CogAgent	Vicuna-7B	135.00
13	Bunny-8B	LLaMA3-8B	135.00
14	LLaVA	Vicuna-13B	133.33
15	OmniLMM	Zephyr-7B-beta	131.67
16	BELLE-VL	Qwen-14B	130.00
17	LVIS-INSTRUCT4V	Vicuna-13B	128.33
17	HyperLLaVA	Vicuna-13B	128.33
17	Qwen-VL-Chat	Qwen-7B	128.33
17	Bunny-3B	Phi-2	128.33
18	InternLM-XComposer-VL	InternLM-7B	126.67
18	WeMM	InternLM-7B	126.67
19	PureMM	Vicuna-13B	123.33
20	DataOptim-LLaVA	Vicuna-13B	121.67
21	Monkey-Chat	Qwen-7B	118.33
21	MoE-LLaVA	Phi-2.7B×4	118.33
22	CogVLM	Vicuna-7B	103.33
23	GIT2	VQAv2-finetuned	96.67
24	GPT-4V	-	95.00
25	MiniCPM	MiniCPM-2B	93.33
26	Lynx	Vicuna-7B	90.00
26	Gemini Pro	-	90.00
27	mPLUG-Owl2	LLaMA2-7B	88.33
28	Otter	OTTER-Image-MPT7B	86.67
29	LRV-Instruction	LRV-7B	86.67
30	MiniCPM-V-2	MiniCPM-2B	86.67
31	MMICL	FlanT5xxl	81.67
31	BLIVA	FlanT5xxl	81.67
32	Cheetor	Vicuna-7B	80.00
33	BLIP-2	Flant5xxl	73.33
34	InstructBLIP	FlanT5xxl	66.67
34	Muffin	Vicuna-13B	66.67
35	LaVIN	LAVIN-13B	63.33
35	VPGTrans	Vicuna-7B	63.33
36	Multimodal-GPT	Multimodal-GPT-9B	58.33
37	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	56.67
38	mPLUG-Owl	LLaMA-7B	50.00
38	PandaGPT	Vicuna-7B	50.00
39	VisualGLM-6B	VisualGLM-6B	48.33
40	ImageBind_LLM	LLaMA-7B	46.67
41	MiniGPT-4	Vicuna-13B	43.33

Poster

Rank	Model	Version	Score
🏅️	GPT-4V	-	192.18
🥈	PureMM	Vicuna-13B	191.50
🥉	Qwen-VL-Max	-	187.76
4	InfMLLM	Vicuna-13B	183.33
5	Qwen-VL-Plus	-	181.63
6	Monkey-Chat	Qwen-7B	178.91
7	Qwen-VL-Chat	Qwen-7B	178.57
8	360VL	LLaMA3-70B	176.87
9	MiniCPM-Llama3-V 2.5	LLaMA3-8B	175.85
10	ChatTruth-7B	Qwen-7B	174.15
11	InternVL-Chat-V1.5	InternLM2-20B	173.81
12	OmniLMM	Zephyr-7B-beta	171.43
13	InternLM-XComposer2-VL	InternLM2-7B	171.09
14	Honeybee	Vicuna-13B	170.07
15	DataOptim-LLaVA	Vicuna-13B	169.73
16	LLaVA-1.6	Vicuna-34B	169.39
17	ShareGPT4V	Vicuna-13B	169.05
18	JT-VL-Chat	-	168.71
19	CogAgent	Vicuna-7B	167.35
20	Bunny-8B	LLaMA3-8B	167.35
21	BELLE-VL	Qwen-14B	166.33
22	RBDash	Vicuna-13B	165.99
23	MiniCPM-V-2	MiniCPM-2B	165.31
24	Gemini Pro	-	164.97
25	HyperLLaVA	Vicuna-13B	164.97
26	SPHINX	LLaMA2-13B	164.29
27	LVIS-INSTRUCT4V	Vicuna-13B	162.59
28	InternLM-XComposer-VL	InternLM-7B	161.90
29	InternVL-Chat-V1.1	LLaMA2-13B	161.22
30	LLaVA	Vicuna-13B	160.54
31	WeMM	InternLM-7B	160.54
32	TransCore-M	PCITransGPT-13B	160.20
33	mPLUG-Owl2	LLaMA2-7B	160.20
34	MiniCPM	MiniCPM-2B	158.50
35	MindSource-VL-Chat	MindSource-7B	155.10
36	BLIVA	FlanT5xxl	155.10
37	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	147.96
38	Cheetor	Vicuna-7B	147.28
39	CogVLM	Vicuna-7B	146.94
40	MMICL	FlanT5xxl	146.26
41	BLIP-2	Flant5xxl	141.84
42	LRV-Instruction	LRV-7B	139.04
43	Otter	OTTER-Image-MPT7B	138.78
44	Muffin	Vicuna-13B	137.76
45	mPLUG-Owl	LLaMA-7B	136.05
46	Lynx	Vicuna-7B	124.83
47	InstructBLIP	FlanT5xxl	123.81
48	GIT2	VQAv2-finetuned	112.59
49	Bunny-3B	Phi-2	108.50
50	MoE-LLaVA	Phi-2.7B×4	99.32
51	VPGTrans	Vicuna-7B	84.01
52	LaVIN	LAVIN-13B	79.59
53	PandaGPT	Vicuna-7B	76.53
54	VisualGLM-6B	VisualGLM-6B	65.99
55	ImageBind_LLM	LLaMA-7B	64.97
56	Multimodal-GPT	Multimodal-GPT-9B	57.82
57	MiniGPT-4	Vicuna-13B	41.84

Celebrity

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	184.12
🏅️	Qwen-VL-Plus	-	184.12
🥈	PureMM	Vicuna-13B	182.35
🥉	WeMM	InternLM-7B	179.12
4	SPHINX	LLaMA2-13B	177.94
5	ChatTruth-7B	Qwen-7B	177.65
6	Honeybee	Vicuna-13B	177.06
7	Bunny-8B	LLaMA3-8B	175.29
8	Otter	OTTER-Image-MPT7B	172.65
9	OmniLMM	Zephyr-7B-beta	172.06
10	RBDash	Vicuna-13B	170.00
11	360VL	LLaMA3-70B	168.24
12	InfMLLM	Vicuna-13B	164.41
12	mPLUG-Owl2	LLaMA2-7B	164.41
13	Cheetor	Vicuna-7B	164.12
14	HyperLLaVA	Vicuna-13B	162.06
15	LVIS-INSTRUCT4V	Vicuna-13B	161.47
15	JT-VL-Chat	-	161.47
16	LLaVA-1.6	Vicuna-34B	160.00
17	DataOptim-LLaVA	Vicuna-13B	159.41
18	MiniCPM-Llama3-V 2.5	LLaMA3-8B	157.94
19	MiniCPM	MiniCPM-2B	155.59
20	InternVL-Chat-V1.1	LLaMA2-13B	154.71
21	ShareGPT4V	Vicuna-13B	153.82
21	InternLM-XComposer2-VL	InternLM2-7B	153.82
22	LLaVA	Vicuna-13B	152.94
23	InternLM-XComposer-VL	InternLM-7B	150.29
24	CogAgent	Vicuna-7B	147.94
24	MoE-LLaVA	Phi-2.7B×4	147.94
25	Gemini Pro	-	147.35
26	GIT2	VQAv2-finetuned	145.88
27	TransCore-M	PCITransGPT-13B	145.29
28	Monkey-Chat	Qwen-7B	142.65
29	MMICL	FlanT5xxl	141.76
30	MiniCPM-V-2	MiniCPM-2B	140.88
30	BLIVA	FlanT5xxl	140.88
31	InternVL-Chat-V1.5	InternLM2-20B	138.53
32	BELLE-VL	Qwen-14B	136.76
32	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	136.76
33	Bunny-3B	Phi-2	130.88
34	MindSource-VL-Chat	MindSource-7B	126.47
35	Qwen-VL-Chat	Qwen-7B	120.59
36	Lynx	Vicuna-7B	118.24
37	CogVLM	Vicuna-7B	115.29
38	LRV-Instruction	LRV-7B	112.65
39	BLIP-2	Flant5xxl	105.59
40	InstructBLIP	FlanT5xxl	101.18
41	mPLUG-Owl	LLaMA-7B	100.29
42	Muffin	Vicuna-13B	81.76
43	ImageBind_LLM	LLaMA-7B	76.47
44	Multimodal-GPT	Multimodal-GPT-9B	73.82
45	PandaGPT	Vicuna-7B	57.06
46	MiniGPT-4	Vicuna-13B	54.41
47	VPGTrans	Vicuna-7B	53.53
48	VisualGLM-6B	VisualGLM-6B	53.24
49	LaVIN	LAVIN-13B	47.35
50	GPT-4V	-	0.00

Scene

Rank	Model	Version	Score
🏅️	InfMLLM	Vicuna-13B	176.75
🥈	WeMM	InternLM-7B	176.25
🥉	Qwen-VL-Max	-	173.00
4	ShareGPT4V	Vicuna-13B	168.00
5	ChatTruth-7B	Qwen-7B	167.75
6	DataOptim-LLaVA	Vicuna-13B	166.50
6	HyperLLaVA	Vicuna-13B	166.50
7	InternLM-XComposer2-VL	InternLM2-7B	164.75
7	360VL	LLaMA3-70B	164.75
8	Lynx	Vicuna-7B	164.50
8	LLaVA-1.6	Vicuna-34B	164.50
9	LVIS-INSTRUCT4V	Vicuna-13B	163.25
10	PureMM	Vicuna-13B	162.75
11	Honeybee	Vicuna-13B	162.00
12	Monkey-Chat	Qwen-7B	161.75
12	MindSource-VL-Chat	MindSource-7B	161.75
13	LLaVA	Vicuna-13B	161.25
14	TransCore-M	PCITransGPT-13B	161.00
15	RBDash	Vicuna-13B	160.25
16	SPHINX	LLaMA2-13B	160.00
17	InternLM-XComposer-VL	InternLM-7B	159.75
18	CogVLM	Vicuna-7B	159.25
19	Otter	OTTER-Image-MPT7B	158.75
20	GIT2	VQAv2-finetuned	158.50
20	Bunny-3B	Phi-2	158.50
21	MiniCPM	MiniCPM-2B	157.75
21	JT-VL-Chat	-	157.75
22	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	156.25
22	BELLE-VL	Qwen-14B	156.25
23	Cheetor	Vicuna-7B	156.00
24	InternVL-Chat-V1.1	LLaMA2-13B	155.75
25	InternVL-Chat-V1.5	InternLM2-20B	154.75
26	MoE-LLaVA	Phi-2.7B×4	154.50
27	CogAgent	Vicuna-7B	154.25
27	MiniCPM-V-2	MiniCPM-2B	154.25
28	MMICL	FlanT5xxl	153.75
28	MiniCPM-Llama3-V 2.5	LLaMA3-8B	153.75
29	Bunny-8B	LLaMA3-8B	153.75
30	mPLUG-Owl2	LLaMA2-7B	153.25
31	InstructBLIP	FlanT5xxl	153.00
31	Qwen-VL-Chat	Qwen-7B	152.25
33	BLIVA	FlanT5xxl	151.50
34	Muffin	Vicuna-13B	151.25
35	GPT-4V	-	151.00
36	Qwen-VL-Plus	-	151.00
37	LRV-Instruction	LRV-7B	147.98
38	VisualGLM-6B	VisualGLM-6B	146.25
38	OmniLMM	Zephyr-7B-beta	146.25
39	BLIP-2	Flant5xxl	145.25
40	Gemini Pro	-	144.75
41	VPGTrans	Vicuna-7B	141.75
42	LaVIN	LAVIN-13B	136.75
43	mPLUG-Owl	LLaMA-7B	135.50
44	PandaGPT	Vicuna-7B	118.00
45	ImageBind_LLM	LLaMA-7B	113.25
46	MiniGPT-4	Vicuna-13B	71.75
47	Multimodal-GPT	Multimodal-GPT-9B	68.00

Landmark

Rank	Model	Version	Score
🏅️	Qwen-VL-Plus	-	191.00
🥈	Qwen-VL-Max	-	187.50
🥉	ChatTruth-7B	Qwen-7B	185.75
4	InternVL-Chat-V1.5	InternLM2-20B	177.75
5	360VL	LLaMA3-70B	177.25
5	MiniCPM-Llama3-V 2.5	LLaMA3-8B	177.25
6	Monkey-Chat	Qwen-7B	176.50
7	InternLM-XComposer2-VL	InternLM2-7B	176.00
8	OmniLMM	Zephyr-7B-beta	175.25
9	MiniCPM-V-2	MiniCPM-2B	175.00
10	RBDash	Vicuna-13B	174.25
11	ShareGPT4V	Vicuna-13B	174.00
11	BELLE-VL	Qwen-14B	174.00
12	JT-VL-Chat	-	173.50
13	WeMM	InternLM-7B	172.25
13	Honeybee	Vicuna-13B	172.25
13	PureMM	Vicuna-13B	172.25
14	CogAgent	Vicuna-7B	172.00
14	HyperLLaVA	Vicuna-13B	172.00
15	LLaVA	Vicuna-13B	170.50
16	Bunny-8B	LLaMA3-8B	170.25
17	SPHINX	LLaMA2-13B	168.09
18	InternVL-Chat-V1.1	LLaMA2-13B	168.00
19	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	167.84
20	MiniCPM	MiniCPM-2B	167.75
21	InfMLLM	Vicuna-13B	166.75
22	InternLM-XComposer-VL	InternLM-7B	165.25
22	LLaVA-1.6	Vicuna-34B	165.25
23	Qwen-VL-Chat	Qwen-7B	164.00
24	Lynx	Vicuna-7B	162.00
25	LVIS-INSTRUCT4V	Vicuna-13B	161.50
26	LRV-Instruction	LRV-7B	160.53
27	DataOptim-LLaVA	Vicuna-13B	160.00
28	mPLUG-Owl	LLaMA-7B	159.25
28	TransCore-M	PCITransGPT-13B	159.25
29	Gemini Pro	-	158.75
30	CogVLM	Vicuna-7B	158.00
31	mPLUG-Owl2	LLaMA2-7B	157.25
32	Bunny-3B	Phi-2	155.00
33	MindSource-VL-Chat	MindSource-7B	150.75
34	MoE-LLaVA	Phi-2.7B×4	148.25
35	Muffin	Vicuna-13B	146.25
36	Cheetor	Vicuna-7B	145.73
37	GIT2	VQAv2-finetuned	140.50
38	GPT-4V	-	138.25
39	BLIP-2	Flant5xxl	138.00
40	Otter	OTTER-Image-MPT7B	137.25
41	MMICL	FlanT5xxl	136.13
42	LaVIN	LAVIN-13B	93.50
43	BLIVA	FlanT5xxl	89.50
44	VisualGLM-6B	VisualGLM-6B	83.75
45	InstructBLIP	FlanT5xxl	79.75
46	Multimodal-GPT	Multimodal-GPT-9B	69.75
46	PandaGPT	Vicuna-7B	69.75
47	VPGTrans	Vicuna-7B	64.75
48	ImageBind_LLM	LLaMA-7B	62.00
49	MiniGPT-4	Vicuna-13B	54.00

Artwork

Rank	Model	Version	Score
🏅️	InternLM-XComposer2-VL	InternLM2-7B	185.50
🥈	PureMM	Vicuna-13B	183.50
🥉	InfMLLM	Vicuna-13B	167.50
4	Qwen-VL-Max	-	166.00
5	ChatTruth-7B	Qwen-7B	159.75
6	WeMM	InternLM-7B	156.00
6	Qwen-VL-Plus	-	156.00
7	OmniLMM	Zephyr-7B-beta	150.25
8	GPT-4V	-	148.00
9	GIT2	VQAv2-finetuned	146.25
10	MiniCPM-V-2	MiniCPM-2B	145.25
11	MiniCPM-Llama3-V 2.5	LLaMA3-8B	144.50
12	Monkey-Chat	Qwen-7B	144.25
13	InternVL-Chat-V1.1	LLaMA2-13B	143.50
14	InternVL-Chat-V1.5	InternLM2-20B	143.00
15	RBDash	Vicuna-13B	140.50
16	BELLE-VL	Qwen-14B	139.50
17	LLaVA-1.6	Vicuna-34B	139.00
18	BLIP-2	Flant5xxl	136.50
19	Gemini Pro	-	135.75
20	MMICL	FlanT5xxl	135.50
21	Honeybee	Vicuna-13B	134.75
23	InstructBLIP	FlanT5xxl	134.25
23	mPLUG-Owl2	LLaMA2-7B	134.25
24	SPHINX	LLaMA2-13B	134.00
25	BLIVA	FlanT5xxl	133.25
26	JT-VL-Chat	-	132.75
27	Bunny-8B	LLaMA3-8B	132.50
28	360VL	LLaMA3-70B	131.25
29	TransCore-M	PCITransGPT-13B	130.75
29	MiniCPM	MiniCPM-2B	130.75
30	LVIS-INSTRUCT4V	Vicuna-13B	130.25
31	Otter	OTTER-Image-MPT7B	129.00
32	ShareGPT4V	Vicuna-13B	128.00
33	InternLM-XComposer-VL	InternLM-7B	126.25
34	Qwen-VL-Chat	Qwen-7B	125.50
35	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	123.75
36	MindSource-VL-Chat	MindSource-7B	119.75
37	Lynx	Vicuna-7B	119.50
38	Bunny-3B	Phi-2	119.25
38	HyperLLaVA	Vicuna-13B	119.25
39	LLaVA	Vicuna-13B	117.75
40	Muffin	Vicuna-13B	116.50
41	CogAgent	Vicuna-7B	116.25
42	DataOptim-LLaVA	Vicuna-13B	113.75
43	Cheetor	Vicuna-7B	113.50
44	MoE-LLaVA	Phi-2.7B×4	105.50
45	LRV-Instruction	LRV-7B	101.25
46	mPLUG-Owl	LLaMA-7B	96.25
47	CogVLM	Vicuna-7B	88.75
48	LaVIN	LAVIN-13B	87.25
49	VPGTrans	Vicuna-7B	77.25
50	VisualGLM-6B	VisualGLM-6B	75.25
51	ImageBind_LLM	LLaMA-7B	70.75
52	MiniGPT-4	Vicuna-13B	60.50
53	Multimodal-GPT	Multimodal-GPT-9B	59.50
54	PandaGPT	Vicuna-7B	51.25

OCR

Rank	Model	Version	Score
🏅️	GPT-4V	-	185.00
🏅️	Gemini Pro	-	185.00
🏅️	Qwen-VL-Max	-	185.00
🥈	BELLE-VL	Qwen-14B	177.50
🥈	InternVL-Chat-V1.1	LLaMA2-13B	177.50
🥉	Bunny-3B	Phi-2	170.00
🥉	JT-VL-Chat	-	170.00
4	DataOptim-LLaVA	Vicuna-13B	162.50
4	PureMM	Vicuna-13B	162.50
4	ChatTruth-7B	Qwen-7B	162.50
4	MindSource-VL-Chat	MindSource-7B	162.50
5	TransCore-M	PCITransGPT-13B	155.00
5	Honeybee	Vicuna-13B	155.00
5	OmniLMM	Zephyr-7B-beta	155.00
5	Bunny-8B	LLaMA3-8B	155.00
6	WeMM	InternLM-7B	147.50
6	Qwen-VL-Plus	-	147.50
6	InternLM-XComposer2-VL	InternLM2-7B	147.50
6	CogVLM	Vicuna-7B	147.50
7	Qwen-VL-Chat	Qwen-7B	140.00
7	LLaVA-1.6	Vicuna-34B	140.00
7	InternVL-Chat-V1.5	InternLM2-20B	140.00
7	MiniCPM-Llama3-V 2.5	LLaMA3-8B	140.00
8	LVIS-INSTRUCT4V	Vicuna-13B	132.50
8	ShareGPT4V	Vicuna-13B	132.50
8	MoE-LLaVA	Phi-2.7B×4	132.50
8	HyperLLaVA	Vicuna-13B	132.50
8	360VL	LLaMA3-70B	132.50
9	LLaVA	Vicuna-13B	125.00
9	InternLM-XComposer-VL	InternLM-7B	125.00
10	BLIP-2	Flant5xxl	110.00
10	LRV-Instruction	LRV-7B	110.00
10	InfMLLM	Vicuna-13B	110.00
10	MiniCPM	MiniCPM-2B	110.00
11	LaVIN	LAVIN-13B	107.50
12	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	102.50
12	mPLUG-Owl2	LLaMA2-7B	102.50
12	RBDash	Vicuna-13B	102.50
12	MiniCPM-V-2	MiniCPM-2B	102.50
13	Cheetor	Vicuna-7B	100.00
13	MMICL	FlanT5xxl	100.00
14	BLIVA	FlanT5xxl	87.50
14	SPHINX	LLaMA2-13B	87.50
15	Multimodal-GPT	Multimodal-GPT-9B	82.50
15	ImageBind_LLM	LLaMA-7B	80.00
15	Monkey-Chat	Qwen-7B	80.00
15	CogAgent	Vicuna-7B	80.00
16	VPGTrans	Vicuna-7B	77.50
16	Lynx	Vicuna-7B	77.50
17	InstructBLIP	FlanT5xxl	72.50
17	Otter	OTTER-Image-MPT7B	72.50
18	mPLUG-Owl	LLaMA-7B	65.00
18	GIT2	VQAv2-finetuned	65.00
19	MiniGPT-4	Vicuna-13B	57.50
19	Muffin	Vicuna-13B	57.50
20	PandaGPT	Vicuna-7B	50.00
21	VisualGLM-6B	VisualGLM-6B	42.50

Cognition

Sum of the scores of all cognition subtasks, including commonsense reasoning, numerical calculation, text translation, and code reasoning. The full score of each subtask is 200, and that of all cognition is 800.

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	643.57
🥈	InternVL-Chat-V1.5	InternLM2-20B	550.00
🥉	InternLM-XComposer2-VL	InternLM2-7B	530.71
4	GPT-4V	-	517.14
5	Qwen-VL-Plus	-	502.14
6	WeMM	InternLM-7B	445.00
7	Gemini Pro	-	436.79
8	MMICL	FlanT5xxl	428.93
9	MiniCPM-V-2	MiniCPM-2B	406.07
10	Monkey-Chat	Qwen-7B	401.43
11	MiniCPM-Llama3-V 2.5	LLaMA3-8B	400.71
12	LLaVA-1.6	Vicuna-34B	397.14
13	InternLM-XComposer-VL	InternLM-7B	391.07
14	ChatTruth-7B	Qwen-7B	387.86
15	360VL	LLaMA3-70B	371.43
16	InfMLLM	Vicuna-13B	368.93
17	Bunny-8B	LLaMA3-8B	367.50
18	DataOptim-LLaVA	Vicuna-13B	361.07
19	Qwen-VL-Chat	Qwen-7B	360.71
20	PureMM	Vicuna-13B	360.36
21	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	356.43
22	InternVL-Chat-V1.1	LLaMA2-13B	341.07
23	JT-VL-Chat	-	339.64
24	BELLE-VL	Qwen-14B	332.14
25	BLIVA	FlanT5xxl	331.43
26	RBDash	Vicuna-13B	330.00
27	LRV-Instruction	LRV-7B	328.21
28	OmniLMM	Zephyr-7B-beta	322.14
29	Cheetor	Vicuna-7B	321.07
30	Honeybee	Vicuna-13B	315.36
31	TransCore-M	PCITransGPT-13B	314.64
32	MiniCPM	MiniCPM-2B	314.29
33	CogVLM	Vicuna-7B	313.21
34	mPLUG-Owl2	LLaMA2-7B	313.21
35	SPHINX	LLaMA2-13B	310.00
36	Otter	OTTER-Image-MPT7B	306.43
37	HyperLLaVA	Vicuna-13B	304.29
38	ShareGPT4V	Vicuna-13B	303.21
39	MindSource-VL-Chat	MindSource-7B	301.43
40	LLaVA	Vicuna-13B	295.36
41	InstructBLIP	FlanT5xxl	291.79
42	BLIP-2	Flant5xxl	290.00
43	Muffin	Vicuna-13B	290.00
44	Bunny-3B	Phi-2	289.29
45	LVIS-INSTRUCT4V	Vicuna-13B	286.79
46	mPLUG-Owl	LLaMA-7B	276.07
47	CogAgent	Vicuna-7B	274.64
48	MoE-LLaVA	Phi-2.7B×4	262.14
49	GIT2	VQAv2-finetuned	261.79
50	LaVIN	LAVIN-13B	249.64
51	VPGTrans	Vicuna-7B	249.29
52	PandaGPT	Vicuna-7B	228.57
53	Multimodal-GPT	Multimodal-GPT-9B	226.79
54	Lynx	Vicuna-7B	215.71
55	ImageBind_LLM	LLaMA-7B	213.57
56	VisualGLM-6B	VisualGLM-6B	181.79
57	MiniGPT-4	Vicuna-13B	144.29

Commonsense Reasoning

Rank	Model	Version	Score
🏅️	InfMLLM	Vicuna-13B	156.43
🥈	LLaVA-1.6	Vicuna-34B	152.14
🥉	360VL	LLaMA3-70B	151.43
4	MiniCPM-Llama3-V 2.5	LLaMA3-8B	150.71
5	Qwen-VL-Max	-	148.57
6	InternLM-XComposer2-VL	InternLM2-7B	145.71
7	Qwen-VL-Plus	-	142.14
7	GPT-4V	-	142.14
8	WeMM	InternLM-7B	140.00
8	RBDash	Vicuna-13B	140.00
8	Bunny-8B	LLaMA3-8B	140.00
9	InternLM-XComposer-VL	InternLM-7B	138.57
10	PureMM	Vicuna-13B	137.86
11	MMICL	FlanT5xxl	136.43
11	BLIVA	FlanT5xxl	136.43
11	MindSource-VL-Chat	MindSource-7B	136.43
12	InternVL-Chat-V1.5	InternLM2-20B	135.00
13	LVIS-INSTRUCT4V	Vicuna-13B	134.29
13	HyperLLaVA	Vicuna-13B	134.29
14	ChatTruth-7B	Qwen-7B	132.86
15	TransCore-M	PCITransGPT-13B	132.14
16	Monkey-Chat	Qwen-7B	131.43
17	Qwen-VL-Chat	Qwen-7B	130.71
18	SPHINX	LLaMA2-13B	130.00
19	InstructBLIP	FlanT5xxl	129.29
19	Gemini Pro	-	129.29
20	MiniCPM-V-2	MiniCPM-2B	128.57
21	LLaVA	Vicuna-13B	127.86
22	BELLE-VL	Qwen-14B	127.14
22	OmniLMM	Zephyr-7B-beta	127.14
23	ShareGPT4V	Vicuna-13B	125.71
23	CogVLM	Vicuna-7B	125.71
24	DataOptim-LLaVA	Vicuna-13B	123.57
24	InternVL-Chat-V1.1	LLaMA2-13B	123.57
25	Honeybee	Vicuna-13B	122.86
26	JT-VL-Chat	-	122.14
27	MiniCPM	MiniCPM-2B	119.29
28	MoE-LLaVA	Phi-2.7B×4	117.14
29	CogAgent	Vicuna-7B	117.14
30	mPLUG-Owl2	LLaMA2-7B	115.71
31	Bunny-3B	Phi-2	114.29
32	Lynx	Vicuna-7B	110.71
33	BLIP-2	Flant5xxl	110.00
33	Muffin	Vicuna-13B	110.00
34	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	106.43
34	Otter	OTTER-Image-MPT7B	106.43
35	LRV-Instruction	LRV-7B	100.71
36	GIT2	VQAv2-finetuned	99.29
37	Cheetor	Vicuna-7B	98.57
38	LaVIN	LAVIN-13B	87.14
39	mPLUG-Owl	LLaMA-7B	78.57
40	PandaGPT	Vicuna-7B	73.57
41	VPGTrans	Vicuna-7B	64.29
42	MiniGPT-4	Vicuna-13B	59.29
43	Multimodal-GPT	Multimodal-GPT-9B	49.29
44	ImageBind_LLM	LLaMA-7B	48.57
45	VisualGLM-6B	VisualGLM-6B	39.29

Numerical Calculation

Rank	Model	Version	Score
🏅️	Qwen-VL-Max	-	155.00
🥈	InternLM-XComposer2-VL	InternLM2-7B	137.50
🥉	GPT-4V	-	130.00
4	InternVL-Chat-V1.5	InternLM2-20B	125.00
5	Qwen-VL-Plus	-	85.00
6	MMICL	FlanT5xxl	82.50
7	Cheetor	Vicuna-7B	77.50
7	Gemini Pro	-	77.50
8	Bunny-8B	LLaMA3-8B	75.00
9	Otter	OTTER-Image-MPT7B	72.50
9	LLaVA-1.6	Vicuna-34B	72.50
10	LRV-Instruction	LRV-7B	70.00
10	InternVL-Chat-V1.1	LLaMA2-13B	70.00
10	360VL	LLaMA3-70B	70.00
11	RBDash	Vicuna-13B	67.50
11	JT-VL-Chat	-	67.50
12	LaVIN	LAVIN-13B	65.00
13	Multimodal-GPT	Multimodal-GPT-9B	62.50
14	mPLUG-Owl	LLaMA-7B	60.00
14	InfMLLM	Vicuna-13B	60.00
14	CogVLM	Vicuna-7B	60.00
15	BLIVA	FlanT5xxl	57.50
15	WeMM	InternLM-7B	57.50
15	Honeybee	Vicuna-13B	57.50
16	ImageBind_LLM	LLaMA-7B	55.00
16	InternLM-XComposer-VL	InternLM-7B	55.00
16	SPHINX	LLaMA2-13B	55.00
16	TransCore-M	PCITransGPT-13B	55.00
17	OmniLMM	Zephyr-7B-beta	55.00
17	PandaGPT	Vicuna-7B	50.00
17	VPGTrans	Vicuna-7B	50.00
17	GIT2	VQAv2-finetuned	50.00
17	MoE-LLaVA	Phi-2.7B×4	50.00
17	MiniCPM-Llama3-V 2.5	LLaMA3-8B	50.00
18	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	47.50
18	DataOptim-LLaVA	Vicuna-13B	47.50
18	BELLE-VL	Qwen-14B	47.50
18	MiniCPM	MiniCPM-2B	47.50
18	Bunny-3B	Phi-2	47.50
19	MiniGPT-4	Vicuna-13B	45.00
19	VisualGLM-6B	VisualGLM-6B	45.00
19	Muffin	Vicuna-13B	45.00
19	ShareGPT4V	Vicuna-13B	45.00
19	PureMM	Vicuna-13B	45.00
19	CogAgent	Vicuna-7B	45.00
19	MindSource-VL-Chat	MindSource-7B	45.00
20	LLaVA	Vicuna-13B	42.50
21	Monkey-Chat	Qwen-7B	42.50
21	BLIP-2	Flant5xxl	40.00
21	InstructBLIP	FlanT5xxl	40.00
21	Qwen-VL-Chat	Qwen-7B	40.00
21	LVIS-INSTRUCT4V	Vicuna-13B	40.00
21	ChatTruth-7B	Qwen-7B	40.00
22	HyperLLaVA	Vicuna-13B	37.50
23	mPLUG-Owl2	LLaMA2-7B	35.00
24	MiniCPM-V-2	MiniCPM-2B	32.50
25	Lynx	Vicuna-7B	17.50

Text Translation

Rank	Model	Version	Score
🏅️	Qwen-VL-Plus	-	185.00
🥈	InternVL-Chat-V1.5	InternLM2-20B	185.00
🥉	Qwen-VL-Max	-	170.00
🥉	MiniCPM-V-2	MiniCPM-2B	170.00
4	ChatTruth-7B	Qwen-7B	162.50
5	Qwen-VL-Chat	Qwen-7B	147.50
5	InternLM-XComposer2-VL	InternLM2-7B	147.50
6	Gemini Pro	-	145.00
7	MiniCPM-Llama3-V 2.5	LLaMA3-8B	140.00
8	Monkey-Chat	Qwen-7B	137.50
9	MMICL	FlanT5xxl	132.50
10	WeMM	InternLM-7B	130.00
11	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	112.50
11	InternLM-XComposer-VL	InternLM-7B	112.50
12	DataOptim-LLaVA	Vicuna-13B	110.00
13	JT-VL-Chat	-	105.00
13	mPLUG-Owl2	LLaMA2-7B	102.50
13	BELLE-VL	Qwen-14B	102.50
13	InternVL-Chat-V1.1	LLaMA2-13B	102.50
14	Bunny-8B	LLaMA3-8B	95.00
15	PureMM	Vicuna-13B	92.50
15	Bunny-3B	Phi-2	92.50
16	Honeybee	Vicuna-13B	87.50
16	HyperLLaVA	Vicuna-13B	87.50
17	LRV-Instruction	LRV-7B	85.00
18	MiniCPM	MiniCPM-2B	82.50
19	mPLUG-Owl	LLaMA-7B	80.00
19	ShareGPT4V	Vicuna-13B	80.00
19	MindSource-VL-Chat	MindSource-7B	80.00
20	LLaVA	Vicuna-13B	77.50
20	VPGTrans	Vicuna-7B	77.50
20	BLIVA	FlanT5xxl	77.50
20	InfMLLM	Vicuna-13B	77.50
20	OmniLMM	Zephyr-7B-beta	77.50
21	GPT-4V	-	75.00
21	SPHINX	LLaMA2-13B	75.00
21	CogVLM	Vicuna-7B	75.00
22	Muffin	Vicuna-13B	72.50
22	RBDash	Vicuna-13B	72.50
22	LLaVA-1.6	Vicuna-34B	72.50
22	360VL	LLaMA3-70B	72.50
23	LVIS-INSTRUCT4V	Vicuna-13B	70.00
24	GIT2	VQAv2-finetuned	67.50
25	BLIP-2	Flant5xxl	65.00
25	InstructBLIP	FlanT5xxl	65.00
26	CogAgent	Vicuna-7B	62.50
27	Multimodal-GPT	Multimodal-GPT-9B	60.00
28	Otter	OTTER-Image-MPT7B	57.50
28	PandaGPT	Vicuna-7B	57.50
28	Cheetor	Vicuna-7B	57.50
28	MoE-LLaVA	Phi-2.7B×4	57.50
29	TransCore-M	PCITransGPT-13B	55.00
30	ImageBind_LLM	LLaMA-7B	50.00
30	VisualGLM-6B	VisualGLM-6B	50.00
31	LaVIN	LAVIN-13B	47.50
32	Lynx	Vicuna-7B	42.50
33	MiniGPT-4	Vicuna-13B	0.00

Code Reasoning

Rank	Model	Version	Score
🏅️	GPT-4V	-	170.00
🏅️	Qwen-VL-Max	-	170.00
🥈	WeMM	InternLM-7B	117.50
🥉	internVL-Chat-V1.5	InternLM2-20B	105.00
4	InternLM-XComposer2-VL	InternLM2-7B	100.00
4	LLaVA-1.6	Vicuna-34B	100.00
5	LLaMA-Adapter V2	LLaMA-Adapter-v2.1-7B	90.00
5	Monkey-Chat	Qwen-7B	90.00
5	Qwen-VL-Plus	-	90.00
6	Cheetor	Vicuna-7B	87.50
7	InternLM-XComposer-VL	InternLM-7B	85.00
7	Gemini Pro	-	85.00
7	PureMM	Vicuna-13B	85.00
8	DataOptim-LLaVA	Vicuna-13B	80.00
9	MMICL	FlanT5xxl	77.50
9	360VL	LLaMA3-70B	77.50
10	BLIP-2	Flant5xxl	75.00
10	InfMLLM	Vicuna-13B	75.00
10	MiniCPM-V-2	MiniCPM-2B	75.00
11	LRV-Instruction	LRV-7B	72.50
11	TransCore-M	PCITransGPT-13B	72.50
12	Otter	OTTER-Image-MPT7B	70.00
13	MiniCPM	MiniCPM-2B	65.00
14	Muffin	Vicuna-13B	62.50
14	OmniLMM	Zephyr-7B-beta	62.50
15	ImageBind_LLM	LLaMA-7B	60.00
15	BLIVA	FlanT5xxl	60.00
15	mPLUG-Owl2	LLaMA2-7B	60.00
15	MiniCPM-Llama3-V 2.5	LLaMA3-8B	60.00
16	mPLUG-Owl	LLaMA-7B	57.50
16	InstructBLIP	FlanT5xxl	57.50
16	VPGTrans	Vicuna-7B	57.50
16	Bunny-8B	LLaMA3-8B	57.50
17	Multimodal-GPT	Multimodal-GPT-9B	55.00
17	BELLE-VL	Qwen-14B	55.00
18	ShareGPT4V	Vicuna-13B	52.50
18	CogVLM	Vicuna-7B	52.50
18	ChatTruth-7B	Qwen-7B	52.50
19	LaVIN	LAVIN-13B	50.00
19	SPHINX	LLaMA2-13B	50.00
19	RBDash	Vicuna-13B	50.00
19	CogAgent	Vicuna-7B	50.00
20	VisualGLM-6B	VisualGLM-6B	47.50
20	PandaGPT	Vicuna-7B	47.50
20	LLaVA	Vicuna-13B	47.50
20	Honeybee	Vicuna-13B	47.50
21	Lynx	Vicuna-7B	45.00
21	GIT2	VQAv2-finetuned	45.00
21	InternVL-Chat-V1.1	LLaMA2-13B	45.00
21	HyperLLaVA	Vicuna-13B	45.00
21	JT-VL-Chat	-	45.00
22	Qwen-VL-Chat	Qwen-7B	42.50
22	LVIS-INSTRUCT4V	Vicuna-13B	42.50
23	MiniGPT-4	Vicuna-13B	40.00
23	MindSource-VL-Chat	MindSource-7B	40.00
24	MoE-LLaVA	Phi-2.7B×4	37.50
25	Bunny-3B	Phi-2	35.00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Our MLLM works

News 🚀

Results of Available Models [Unavailable Version]

Leaderboards of Available Models [Unavailable Version]

Perception

Existence

Count

Position

Color

Poster

Celebrity

Scene

Landmark

Artwork

OCR

Cognition

Commonsense Reasoning

Numerical Calculation

Text Translation

Code Reasoning

Files

README.md

Latest commit

History

README.md

File metadata and controls

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Our MLLM works

News 🚀

Results of Available Models [Unavailable Version]

Leaderboards of Available Models [Unavailable Version]

Perception

Existence

Count

Position

Color

Poster

Celebrity

Scene

Landmark

Artwork

OCR

Cognition

Commonsense Reasoning

Numerical Calculation

Text Translation

Code Reasoning