Skip to content

Latest commit

 

History

History
1204 lines (1109 loc) · 161 KB

File metadata and controls

1204 lines (1109 loc) · 161 KB

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image. However, it is difficult for these case studies to fully reflect the performance of MLLM, lacking a comprehensive evaluation. In this paper, we fill in this blank, presenting the first MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks. In order to avoid data leakage that may arise from direct use of public datasets for evaluation, the annotations of instruction-answer pairs are all manually designed. The concise instruction design allows us to fairly compare MLLMs, instead of struggling in prompt engineering. Besides, with such an instruction, we can also easily carry out quantitative statistics. A total of 50+ advanced MLLMs are comprehensively evaluated on our MME, which not only suggests that existing MLLMs still have a large room for improvement, but also reveals the potential directions for the subsequent model optimization.

Our MLLM works

🔥🔥🔥 A Survey on Multimodal Large Language Models
Project Page | Paper

🍎 [Read our new version] (update on April 2, 2024)

Chinese version will be updated soon!
The first survey for Multimodal Large Language Models (MLLMs). ✨
Welcome to add WeChat ID (wmd_ustc) to join our MLLM communication group! 🌟

🔥🔥🔥 MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Project Page [This Page] | Paper

The first comprehensive evaluation benchmark for MLLMs. Now the leaderboards include 50+ advanced models, such as Qwen-VL-Max, Gemini Pro, and GPT-4V. ✨

If you want to add your model in our leaderboards, please feel free to email bradyfu24@gmail.com. We will update the leaderboards in time. ✨

Download MME 🌟🌟

The benchmark dataset is collected by Xiamen University for academic research only. You can email yongdongluo@stu.xmu.edu.cn to obtain the dataset, according to the following requirement.

Requirement: A real-name system is encouraged for better academic communication. Your email suffix needs to match your affiliation, such as xx@stu.xmu.edu.cn and Xiamen University. Otherwise, you need to explain why. Please include the information bellow when sending your application email.

Name: (tell us who you are.)
Affiliation: (the name/url of your university or company)
Job Title: (e.g., professor, PhD, and researcher)
Email: (your email address)
How to use: (only for non-commercial use)

🔥🔥🔥 Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper | Source CodeStar

The first work to correct hallucinations in MLLMs. ✨


🔥🔥🔥 A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
Paper

The first technical report for Gemini vs GPT-4V. A total of 128 pages. Completed within one week of the Gemini API opening. 🌟



📑 If you find our projects helpful to your research, please consider citing:

@article{fu2023mme,
  title={MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models},
  author={Fu, Chaoyou and Chen, Peixian and Shen, Yunhang and Qin, Yulei and Zhang, Mengdan and Lin, Xu and Yang, Jinrui and Zheng, Xiawu and Li, Ke and Sun, Xing and others},
  journal={arXiv preprint arXiv:2306.13394},
  year={2023}
}

@article{fu2024video,
  title={Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis},
  author={Fu, Chaoyou and Dai, Yuhan and Luo, Yondong and Li, Lei and Ren, Shuhuai and Zhang, Renrui and Wang, Zihan and Zhou, Chenyu and Shen, Yunhang and Zhang, Mengdan and others},
  journal={arXiv preprint arXiv:2405.21075},
  year={2024}
}

@article{yin2023survey,
  title={A survey on multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Li, Ke and Sun, Xing and Xu, Tong and Chen, Enhong},
  journal={arXiv preprint arXiv:2306.13549},
  year={2023}
}

@article{yin2023woodpecker,
  title={Woodpecker: Hallucination correction for multimodal large language models},
  author={Yin, Shukang and Fu, Chaoyou and Zhao, Sirui and Xu, Tong and Wang, Hao and Sui, Dianbo and Shen, Yunhang and Li, Ke and Sun, Xing and Chen, Enhong},
  journal={arXiv preprint arXiv:2310.16045},
  year={2023}
}

@article{fu2023challenger,
  title={A challenger to gpt-4v? early explorations of gemini in visual expertise},
  author={Fu, Chaoyou and Zhang, Renrui and Lin, Haojia and Wang, Zihan and Gao, Timin and Luo, Yongdong and Huang, Yubo and Zhang, Zhengye and Qiu, Longtian and Ye, Gaoxiang and others},
  journal={arXiv preprint arXiv:2312.12436},
  year={2023}
}

News 🚀

  1. [06-06] Thanks to CMRI, JT-VL-Chat-V1.0 is added in MME. 🔥🔥
  2. [05-27] Thanks to Junbo Cui, MiniCPM-Llama3-V 2.5 joins MME.
  3. [05-18] Thanks to Chunyu Xie, 360VL is incorporated into MME.
  4. [04-27] Thanks to Zhe Chen, we welcome a new member InternVL-Chat-V1.5.
  5. [04-15] Thanks to Junbo Cui, MiniCPM-V-2 is added in MME.
  6. [04-10] Thanks to Wenqiao Zhang, HyperLLaVA joins our leaderboards.
  7. [03-14] Thanks to Muyang He, Bunny-3B takes part in MME.
  8. [02-23] Thanks to Jingyu Liu, ChatTruth-7B is added to MME.
  9. [02-07] Thanks to TsinghuaNLP, MiniCPM and OmniLMM are incorporated into our leaderboards.
  10. [02-05] Thanks to Haotian Liu, LLaVA-1.6 is added to MME.
  11. [02-05] Thanks to Bin Lin, MoE-LLaVA joins MME.
  12. [02-05] Thanks to Weihan Wang and Wenyi Hong, CogVLM and CogAgent take part in MME.
  13. [01-25] Thanks to Shijie Wang, we welcome a new member Qwen-VL-Max.
  14. [01-22] Thanks to Xiaoyi Dong, InternLM-XComposer2-VL joins our leaderboards.
2023

[2023-12]

  1. [12-31] Thanks to Dian Li, PureMM takes part in our leaderboards (update in 2024-01-14 and 2024-01-21).
  2. [12-31] Thanks to Yilin Ma and Min Xu, RBDash is added in MME.
  3. [12-18] Thanks to Zihan Wang, our leaderboards usher in Gemini Pro.
  4. [12-18] Thanks to Jinze Bai, a new model Qwen-VL-Plus is added in MME.
  5. [12-18] Thanks to Junbum Cha, Honeybee joins our leaderboards.
  6. [12-12] Thanks to Yuliang Liu, Monkey-Chat takes part in MME.
  7. [12-12] Thanks to Junkun Yuan, we welcome a new member AGILMM.
  8. [12-01] Thanks to Cheng Wen, BELLE-VL is added to our leaderboards.
  9. [12-01] Thanks to PCI Research, TransCore-M joins MME.

[2023-11]

  1. [11-24] Thanks to Xiaoyi Dong, we add ShareGPT4V to our leaderboards.
  2. [11-24] Thanks to Muyang He, DataOptim joins MME.
  3. [11-24] Thanks to Zifei Shan, Kanva is added.
  4. [11-21] Thanks to Junke Wang, LVIS-INSTRUCT4V is added to our MME.
  5. [11-18] Thanks to Zhenbo Luo, our leaderboards welcome a new member CVLM.
  6. [11-10] Thanks to Qinghao Ye, we get a new model mPLUG-Owl2 in our leaderboards.
  7. [11-10] Thanks to Zhibin Wang, InfMLLM joins our leaderboards (update in 2023-12-12).

[2023-10]

  1. [10-29] Thanks to Jiaming Han, SPHINX is added to our leaderboards.
  2. [10-23] Thanks to Zihan Wang, he manually evaluate the performance of GPT-4V on our benchmark. Note that GPT-4V refuses to answer questions that involve individuals, resulting in a zero score in the Celebrity subtask.
  3. [10-13] Thanks to Yizhou Zhou, WeMM joins our leaderboards (The results are renewed on 2023-11-10 by updating the model).
  4. [10-13] Thanks to Cui Junbo, we add Muffin to our leaderboards.
  5. [10-13] Thanks to Jiaming Han, the results of LLaMA-Adapter V2 have been updated.
  6. [10-04] Thanks to Haotian Liu, the results of LLaVA have been updated.

[2023-09]

  1. [09-28] Thanks to Huasong Zhong, Lion is added.
  2. [09-27] Thanks to Xiaoyi Dong, InternLM-XComposer-VL joins our leaderboards.
  3. [09-05] Thanks to Jinze Bai, our leaderboards usher in Qwen-VL-Chat.
  4. [09-01] Thanks to Skywork Multi-Modal Group, Skywork-MM takes part in our leaderboards.

[2023-08]

  1. [08-28] Thanks to UCSD MLPC, we welcome BLIVA to join our leaderboards.
  2. [08-28] Thanks to Jianfeng Wang, GIT2 is added to our leaderboards.
  3. [08-28] Thanks to Yike Yuan and Songyang Zhang, the results of MiniGPT4 have been revised.
  4. [08-21] Thanks to Haozhe Zhao, MMICL joins our leaderboards (The results are renewed on 2023-09-17 by upgrading the checkpoint.).
  5. [08-13] Thanks to Zhejiang University DCD Lab, our leaderboards incorporate a new member Cheetor.
  6. [08-08] Thanks to Fuxiao Liu, we add LRV-Instruction to our leaderboards.

[2023-07]

  1. [07-28] Thanks to Yingzi Ma, his work Octopus has been updated to our leaderboards.
  2. [07-15] Thanks to Jiani Zheng, our leaderboards welcome a new member Lynx.
  3. [07-12] Thanks to Ao Zhang, his work VPGTrans has been added in our leaderboards.
  4. [07-09] Thanks to Bo Li, we have updated the evaluation of his work Otter. It uses the latest model OTTER-Image-MPT7B that incoporates OpenFlamingv2 and enhances instruction following ability.

[2023-06]

  1. [06-30] Thanks to Renrui Zhang, we have updated the evaluation of his two works, i.e., LLaMA-Adapter V2 and ImageBind_LLM. The former is re-evaluated after changing the model weights, and the latter is a newly added MLLM.
  2. [06-30] Thanks to Gen Luo, we have added the evaluation of his work LaVIN.
  3. [06-30] The results of other models have also been updated, retrieving the answer from the beginning of the generated responses instead of the whole responses. An automated evaluation script for the calculation of scores has been released!

Results of Available Models [Unavailable Version]

Leaderboards of Available Models [Unavailable Version]



Perception

Sum of the scores of all perception subtasks, including existence, count, position, color, poster, celebrity, scene, landmark, artwork, and OCR. The full score of each subtask is 200, and that of all perception is 2000.

Rank Model Version Score
🏅️ Qwen-VL-Max - 1790.04
🥈 ChatTruth-7B Qwen-7B 1735.88
🥉 InternLM-XComposer2-VL InternLM2-7B 1712.00
4 PureMM Vicuna-13B 1686.52
5 Qwen-VL-Plus - 1681.25
6 InfMLLM Vicuna-13B 1673.75
7 InternVL-Chat-V1.1 LLaMA2-13B 1672.35
8 Honeybee Vicuna-13B 1661.13
9 Bunny-8B LLaMA3-8B 1644.14
10 JT-VL-Chat - 1642.51
11 360VL LLaMA3-70B 1640.86
12 InternVL-Chat-V1.5 InternLM2-20B 1637.84
13 OmniLMM Zephyr-7B-beta 1636.90
14 LLaVA-1.6 Vicuna-34B 1631.47
15 WeMM InternLM-7B 1621.66
16 MiniCPM-Llama3-V 2.5 LLaMA3-8B 1619.29
17 ShareGPT4V Vicuna-13B 1618.70
18 RBDash Vicuna-13B 1610.15
19 BELLE-VL Qwen-14B 1595.34
20 TransCore-M PCITransGPT-13B 1588.16
21 HyperLLaVA Vicuna-13B 1575.61
22 LVIS-INSTRUCT4V Vicuna-13B 1574.89
23 MindSource-VL-Chat MindSource-7B 1567.99
24 DataOptim-LLaVA Vicuna-13B 1563.56
25 SPHINX LLaMA2-13B 1560.15
26 LLaVA Vicuna-13B 1531.31
27 InternLM-XComposer-VL InternLM-7B 1528.44
28 Monkey-Chat Qwen-7B 1522.39
29 CogAgent Vicuna-7B 1497.79
30 Gemini Pro - 1496.57
31 Bunny-3B Phi-2 1488.80
32 Qwen-VL-Chat Qwen-7B 1487.57
33 MiniCPM MiniCPM-2B 1452.01
34 mPLUG-Owl2 LLaMA2-7B 1450.19
35 MiniCPM-V-2 MiniCPM-2B 1443.19
36 CogVLM Vicuna-7B 1439.07
37 MoE-LLaVA Phi-2.7B×4 1431.34
38 GPT-4V - 1409.43
39 MMICL FlanT5xxl 1381.74
40 Lynx Vicuna-7B 1373.24
41 BLIVA FlanT5xxl 1337.73
42 GIT2 VQAv2-finetuned 1332.05
43 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 1328.39
44 Cheetor Vicuna-7B 1299.97
45 LRV-Instruction LRV-7B 1299.79
46 BLIP-2 Flant5xxl 1293.84
47 Otter OTTER-Image-MPT7B 1292.26
48 Muffin Vicuna-13B 1281.02
49 InstructBLIP FlanT5xxl 1212.82
50 mPLUG-Owl LLaMA-7B 967.34
51 LaVIN LAVIN-13B 963.6
52 VPGTrans Vicuna-7B 790.45
53 ImageBind_LLM LLaMA-7B 775.77
54 VisualGLM-6B VisualGLM-6B 705.31
55 Multimodal-GPT Multimodal-GPT-9B 654.72
56 PandaGPT Vicuna-7B 642.59
57 MiniGPT-4 Vicuna-13B 581.66

Existence

Rank Model Version Score
🏅️ MiniCPM-Llama3-V 2.5 LLaMA3-8B 200.00
🥈 Otter OTTER-Image-MPT7B 195.00
🥈 Lynx Vicuna-7B 195.00
🥈 WeMM InternLM-7B 195.00
🥈 Muffin Vicuna-13B 195.00
🥈 SPHINX LLaMA2-13B 195.00
🥈 InfMLLM Vicuna-13B 195.00
🥈 LVIS-INSTRUCT4V Vicuna-13B 195.00
🥈 RBDash Vicuna-13B 195.00
🥈 InternLM-XComposer2-VL InternLM2-7B 195.00
🥈 CogVLM Vicuna-7B 195.00
🥈 ChatTruth-7B Qwen-7B 195.00
🥈 MindSource-VL-Chat MindSource-7B 195.00
🥈 MiniCPM-V-2 MiniCPM-2B 195.00
🥈 Bunny-8B LLaMA3-8B 195.00
🥉 GIT2 VQAv2-finetuned 190.00
🥉 InternLM-XComposer-VL InternLM-7B 190.00
🥉 GPT-4V - 190.00
🥉 ShareGPT4V Vicuna-13B 190.00
🥉 DataOptim-LLaVA Vicuna-13B 190.00
🥉 BELLE-VL Qwen-14B 190.00
🥉 TransCore-M PCITransGPT-13B 190.00
🥉 LLaVA-1.6 Vicuna-34B 190.00
🥉 MiniCPM MiniCPM-2B 190.00
🥉 OmniLMM Zephyr-7B-beta 190.00
🥉 HyperLLaVA Vicuna-13B 190.00
🥉 InternVL-Chat-V1.5 InternLM2-20B 190.00
🥉 360VL LLaMA3-70B 190.00
4 PureMM Vicuna-13B 188.33
5 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 185.00
5 InstructBLIP FlanT5xxl 185.00
5 LLaVA Vicuna-13B 185.00
5 LaVIN LAVIN-13B 185.00
5 mPLUG-Owl2 LLaMA2-7B 185.00
5 Monkey-Chat Qwen-7B 185.00
5 Honeybee Vicuna-13B 185.00
5 InternVL-Chat-V1.1 LLaMA2-13B 185.00
5 CogAgent Vicuna-7B 185.00
5 JT-VL-Chat - 185.00
5 Qwen-VL-Max - 183.33
6 Cheetor Vicuna-7B 180.00
6 BLIVA FlanT5xxl 180.00
6 MoE-LLaVA Phi-2.7B×4 180.00
6 Bunny-3B Phi-2 180.00
7 Qwen-VL-Plus - 175.00
7 Gemini Pro - 175.00
8 MMICL FlanT5xxl 170.00
9 LRV-Instruction LRV-7B 165.00
10 BLIP-2 Flant5xxl 160.00
11 Qwen-VL-Chat Qwen-7B 158.33
12 ImageBind_LLM LLaMA-7B 128.33
13 mPLUG-Owl LLaMA-7B 120.00
14 VisualGLM-6B VisualGLM-6B 85.00
15 PandaGPT Vicuna-7B 70.00
15 VPGTrans Vicuna-7B 70.00
16 MiniGPT-4 Vicuna-13B 68.33
17 Multimodal-GPT Multimodal-GPT-9B 61.67

Count

Rank Model Version Score
🏅️ CogAgent Vicuna-7B 180.00
🥈 InternVL-Chat-V1.5 InternLM2-20B 175.00
🥉 RBDash Vicuna-13B 173.33
🥉 InternVL-Chat-V1.1 LLaMA2-13B 173.33
🥉 JT-VL-Chat - 173.33
4 Honeybee Vicuna-13B 170.00
4 LLaVA-1.6 Vicuna-34B 170.00
4 MindSource-VL-Chat MindSource-7B 170.00
5 MiniCPM-Llama3-V 2.5 LLaMA3-8B 168.33
6 Qwen-VL-Max - 166.67
7 ShareGPT4V Vicuna-13B 165.00
7 DataOptim-LLaVA Vicuna-13B 165.00
7 TransCore-M PCITransGPT-13B 165.00
7 CogVLM Vicuna-7B 165.00
7 OmniLMM Zephyr-7B-beta 165.00
7 Bunny-8B LLaMA3-8B 165.00
8 Muffin Vicuna-13B 163.33
9 360VL LLaMA3-70B 160.80
10 MMICL FlanT5xxl 160.00
10 GPT-4V - 160.00
10 SPHINX LLaMA2-13B 160.00
10 LVIS-INSTRUCT4V Vicuna-13B 160.00
10 InternLM-XComposer2-VL InternLM2-7B 160.00
10 ChatTruth-7B Qwen-7B 160.00
10 HyperLLaVA Vicuna-13B 160.00
11 InternLM-XComposer-VL InternLM-7B 158.33
11 Bunny-3B Phi-2 158.33
12 LLaVA Vicuna-13B 155.00
12 mPLUG-Owl2 LLaMA2-7B 155.00
12 MoE-LLaVA Phi-2.7B×4 155.00
13 Qwen-VL-Plus - 153.33
14 Lynx Vicuna-7B 151.67
15 Qwen-VL-Chat Qwen-7B 150.00
15 BELLE-VL Qwen-14B 150.00
15 Monkey-Chat Qwen-7B 150.00
15 PureMM Vicuna-13B 150.00
16 InfMLLM Vicuna-13B 145.00
17 InstructBLIP FlanT5xxl 143.33
18 WeMM InternLM-7B 140.00
19 BLIVA FlanT5xxl 138.33
20 BLIP-2 Flant5xxl 135.00
21 MiniCPM-V-2 MiniCPM-2B 133.33
21 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 133.33
22 Gemini Pro - 131.67
23 MiniCPM MiniCPM-2B 130.00
24 GIT2 VQAv2-finetuned 118.33
25 LRV-Instruction LRV-7B 111.67
26 Cheetor Vicuna-7B 96.67
27 Otter OTTER-Image-MPT7B 88.33
27 LaVIN LAVIN-13B 88.33
28 VPGTrans Vicuna-7B 85.00
29 ImageBind_LLM LLaMA-7B 60.00
30 MiniGPT-4 Vicuna-13B 55.00
30 Multimodal-GPT Multimodal-GPT-9B 55.00
31 mPLUG-Owl LLaMA-7B 50.00
31 VisualGLM-6B VisualGLM-6B 50.00
31 PandaGPT Vicuna-7B 50.00

Position

Rank Model Version Score
🏅️ Qwen-VL-Max - 176.67
🥈 InfMLLM Vicuna-13B 170.00
🥉 InternVL-Chat-V1.5 InternLM2-20B 166.67
4 InternLM-XComposer2-VL InternLM2-7B 163.33
5 InternVL-Chat-V1.1 LLaMA2-13B 163.33
6 Qwen-VL-Plus - 161.67
7 ChatTruth-7B Qwen-7B 158.33
8 Honeybee Vicuna-13B 155.00
8 360VL LLaMA3-70B 155.00
9 ShareGPT4V Vicuna-13B 153.33
9 SPHINX LLaMA2-13B 153.33
10 MindSource-VL-Chat MindSource-7B 146.67
11 JT-VL-Chat - 145.00
12 RBDash Vicuna-13B 138.33
13 LLaVA-1.6 Vicuna-34B 138.33
14 TransCore-M PCITransGPT-13B 136.67
14 MiniCPM-Llama3-V 2.5 LLaMA3-8B 136.67
15 CogAgent Vicuna-7B 135.00
15 Bunny-8B LLaMA3-8B 135.00
16 LLaVA Vicuna-13B 133.33
17 OmniLMM Zephyr-7B-beta 131.67
18 BELLE-VL Qwen-14B 130.00
19 LVIS-INSTRUCT4V Vicuna-13B 128.33
19 HyperLLaVA Vicuna-13B 128.33
19 Qwen-VL-Chat Qwen-7B 128.33
19 Bunny-3B Phi-2 128.33
20 InternLM-XComposer-VL InternLM-7B 126.67
20 WeMM InternLM-7B 126.67
21 PureMM Vicuna-13B 123.33
22 DataOptim-LLaVA Vicuna-13B 121.67
23 Monkey-Chat Qwen-7B 118.33
23 MoE-LLaVA Phi-2.7B×4 118.33
24 CogVLM Vicuna-7B 103.33
25 GIT2 VQAv2-finetuned 96.67
26 GPT-4V - 95.00
27 MiniCPM MiniCPM-2B 93.33
28 Lynx Vicuna-7B 90.00
28 Gemini Pro - 90.00
29 mPLUG-Owl2 LLaMA2-7B 88.33
30 Otter OTTER-Image-MPT7B 86.67
30 LRV-Instruction LRV-7B 86.67
30 MiniCPM-V-2 MiniCPM-2B 86.67
31 MMICL FlanT5xxl 81.67
31 BLIVA FlanT5xxl 81.67
32 Cheetor Vicuna-7B 80.00
33 BLIP-2 Flant5xxl 73.33
34 InstructBLIP FlanT5xxl 66.67
34 Muffin Vicuna-13B 66.67
35 LaVIN LAVIN-13B 63.33
35 VPGTrans Vicuna-7B 63.33
36 Multimodal-GPT Multimodal-GPT-9B 58.33
37 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 56.67
38 mPLUG-Owl LLaMA-7B 50.00
38 PandaGPT Vicuna-7B 50.00
39 VisualGLM-6B VisualGLM-6B 48.33
40 ImageBind_LLM LLaMA-7B 46.67
41 MiniGPT-4 Vicuna-13B 43.33

Color

Rank Model Version Score
🏅️ Qwen-VL-Max - 176.67
🥈 InfMLLM Vicuna-13B 170.00
🥉 InternVL-Chat-V1.5 InternLM2-20B 166.67
4 InternLM-XComposer2-VL InternLM2-7B 163.33
4 InternVL-Chat-V1.1 LLaMA2-13B 163.33
5 Qwen-VL-Plus - 161.67
6 ChatTruth-7B Qwen-7B 158.33
7 Honeybee Vicuna-13B 155.00
7 360VL LLaMA3-70B 155.00
8 ShareGPT4V Vicuna-13B 153.33
8 SPHINX LLaMA2-13B 153.33
9 MindSource-VL-Chat MindSource-7B 146.67
10 JT-VL-Chat - 145.00
11 RBDash Vicuna-13B 138.33
11 LLaVA-1.6 Vicuna-34B 138.33
12 TransCore-M PCITransGPT-13B 136.67
12 MiniCPM-Llama3-V 2.5 LLaMA3-8B 136.67
13 CogAgent Vicuna-7B 135.00
13 Bunny-8B LLaMA3-8B 135.00
14 LLaVA Vicuna-13B 133.33
15 OmniLMM Zephyr-7B-beta 131.67
16 BELLE-VL Qwen-14B 130.00
17 LVIS-INSTRUCT4V Vicuna-13B 128.33
17 HyperLLaVA Vicuna-13B 128.33
17 Qwen-VL-Chat Qwen-7B 128.33
17 Bunny-3B Phi-2 128.33
18 InternLM-XComposer-VL InternLM-7B 126.67
18 WeMM InternLM-7B 126.67
19 PureMM Vicuna-13B 123.33
20 DataOptim-LLaVA Vicuna-13B 121.67
21 Monkey-Chat Qwen-7B 118.33
21 MoE-LLaVA Phi-2.7B×4 118.33
22 CogVLM Vicuna-7B 103.33
23 GIT2 VQAv2-finetuned 96.67
24 GPT-4V - 95.00
25 MiniCPM MiniCPM-2B 93.33
26 Lynx Vicuna-7B 90.00
26 Gemini Pro - 90.00
27 mPLUG-Owl2 LLaMA2-7B 88.33
28 Otter OTTER-Image-MPT7B 86.67
29 LRV-Instruction LRV-7B 86.67
30 MiniCPM-V-2 MiniCPM-2B 86.67
31 MMICL FlanT5xxl 81.67
31 BLIVA FlanT5xxl 81.67
32 Cheetor Vicuna-7B 80.00
33 BLIP-2 Flant5xxl 73.33
34 InstructBLIP FlanT5xxl 66.67
34 Muffin Vicuna-13B 66.67
35 LaVIN LAVIN-13B 63.33
35 VPGTrans Vicuna-7B 63.33
36 Multimodal-GPT Multimodal-GPT-9B 58.33
37 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 56.67
38 mPLUG-Owl LLaMA-7B 50.00
38 PandaGPT Vicuna-7B 50.00
39 VisualGLM-6B VisualGLM-6B 48.33
40 ImageBind_LLM LLaMA-7B 46.67
41 MiniGPT-4 Vicuna-13B 43.33

Poster

Rank Model Version Score
🏅️ GPT-4V - 192.18
🥈 PureMM Vicuna-13B 191.50
🥉 Qwen-VL-Max - 187.76
4 InfMLLM Vicuna-13B 183.33
5 Qwen-VL-Plus - 181.63
6 Monkey-Chat Qwen-7B 178.91
7 Qwen-VL-Chat Qwen-7B 178.57
8 360VL LLaMA3-70B 176.87
9 MiniCPM-Llama3-V 2.5 LLaMA3-8B 175.85
10 ChatTruth-7B Qwen-7B 174.15
11 InternVL-Chat-V1.5 InternLM2-20B 173.81
12 OmniLMM Zephyr-7B-beta 171.43
13 InternLM-XComposer2-VL InternLM2-7B 171.09
14 Honeybee Vicuna-13B 170.07
15 DataOptim-LLaVA Vicuna-13B 169.73
16 LLaVA-1.6 Vicuna-34B 169.39
17 ShareGPT4V Vicuna-13B 169.05
18 JT-VL-Chat - 168.71
19 CogAgent Vicuna-7B 167.35
20 Bunny-8B LLaMA3-8B 167.35
21 BELLE-VL Qwen-14B 166.33
22 RBDash Vicuna-13B 165.99
23 MiniCPM-V-2 MiniCPM-2B 165.31
24 Gemini Pro - 164.97
25 HyperLLaVA Vicuna-13B 164.97
26 SPHINX LLaMA2-13B 164.29
27 LVIS-INSTRUCT4V Vicuna-13B 162.59
28 InternLM-XComposer-VL InternLM-7B 161.90
29 InternVL-Chat-V1.1 LLaMA2-13B 161.22
30 LLaVA Vicuna-13B 160.54
31 WeMM InternLM-7B 160.54
32 TransCore-M PCITransGPT-13B 160.20
33 mPLUG-Owl2 LLaMA2-7B 160.20
34 MiniCPM MiniCPM-2B 158.50
35 MindSource-VL-Chat MindSource-7B 155.10
36 BLIVA FlanT5xxl 155.10
37 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 147.96
38 Cheetor Vicuna-7B 147.28
39 CogVLM Vicuna-7B 146.94
40 MMICL FlanT5xxl 146.26
41 BLIP-2 Flant5xxl 141.84
42 LRV-Instruction LRV-7B 139.04
43 Otter OTTER-Image-MPT7B 138.78
44 Muffin Vicuna-13B 137.76
45 mPLUG-Owl LLaMA-7B 136.05
46 Lynx Vicuna-7B 124.83
47 InstructBLIP FlanT5xxl 123.81
48 GIT2 VQAv2-finetuned 112.59
49 Bunny-3B Phi-2 108.50
50 MoE-LLaVA Phi-2.7B×4 99.32
51 VPGTrans Vicuna-7B 84.01
52 LaVIN LAVIN-13B 79.59
53 PandaGPT Vicuna-7B 76.53
54 VisualGLM-6B VisualGLM-6B 65.99
55 ImageBind_LLM LLaMA-7B 64.97
56 Multimodal-GPT Multimodal-GPT-9B 57.82
57 MiniGPT-4 Vicuna-13B 41.84

Celebrity

Rank Model Version Score
🏅️ Qwen-VL-Max - 184.12
🏅️ Qwen-VL-Plus - 184.12
🥈 PureMM Vicuna-13B 182.35
🥉 WeMM InternLM-7B 179.12
4 SPHINX LLaMA2-13B 177.94
5 ChatTruth-7B Qwen-7B 177.65
6 Honeybee Vicuna-13B 177.06
7 Bunny-8B LLaMA3-8B 175.29
8 Otter OTTER-Image-MPT7B 172.65
9 OmniLMM Zephyr-7B-beta 172.06
10 RBDash Vicuna-13B 170.00
11 360VL LLaMA3-70B 168.24
12 InfMLLM Vicuna-13B 164.41
12 mPLUG-Owl2 LLaMA2-7B 164.41
13 Cheetor Vicuna-7B 164.12
14 HyperLLaVA Vicuna-13B 162.06
15 LVIS-INSTRUCT4V Vicuna-13B 161.47
15 JT-VL-Chat - 161.47
16 LLaVA-1.6 Vicuna-34B 160.00
17 DataOptim-LLaVA Vicuna-13B 159.41
18 MiniCPM-Llama3-V 2.5 LLaMA3-8B 157.94
19 MiniCPM MiniCPM-2B 155.59
20 InternVL-Chat-V1.1 LLaMA2-13B 154.71
21 ShareGPT4V Vicuna-13B 153.82
21 InternLM-XComposer2-VL InternLM2-7B 153.82
22 LLaVA Vicuna-13B 152.94
23 InternLM-XComposer-VL InternLM-7B 150.29
24 CogAgent Vicuna-7B 147.94
24 MoE-LLaVA Phi-2.7B×4 147.94
25 Gemini Pro - 147.35
26 GIT2 VQAv2-finetuned 145.88
27 TransCore-M PCITransGPT-13B 145.29
28 Monkey-Chat Qwen-7B 142.65
29 MMICL FlanT5xxl 141.76
30 MiniCPM-V-2 MiniCPM-2B 140.88
30 BLIVA FlanT5xxl 140.88
31 InternVL-Chat-V1.5 InternLM2-20B 138.53
32 BELLE-VL Qwen-14B 136.76
32 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 136.76
33 Bunny-3B Phi-2 130.88
34 MindSource-VL-Chat MindSource-7B 126.47
35 Qwen-VL-Chat Qwen-7B 120.59
36 Lynx Vicuna-7B 118.24
37 CogVLM Vicuna-7B 115.29
38 LRV-Instruction LRV-7B 112.65
39 BLIP-2 Flant5xxl 105.59
40 InstructBLIP FlanT5xxl 101.18
41 mPLUG-Owl LLaMA-7B 100.29
42 Muffin Vicuna-13B 81.76
43 ImageBind_LLM LLaMA-7B 76.47
44 Multimodal-GPT Multimodal-GPT-9B 73.82
45 PandaGPT Vicuna-7B 57.06
46 MiniGPT-4 Vicuna-13B 54.41
47 VPGTrans Vicuna-7B 53.53
48 VisualGLM-6B VisualGLM-6B 53.24
49 LaVIN LAVIN-13B 47.35
50 GPT-4V - 0.00

Scene

Rank Model Version Score
🏅️ InfMLLM Vicuna-13B 176.75
🥈 WeMM InternLM-7B 176.25
🥉 Qwen-VL-Max - 173.00
4 ShareGPT4V Vicuna-13B 168.00
5 ChatTruth-7B Qwen-7B 167.75
6 DataOptim-LLaVA Vicuna-13B 166.50
6 HyperLLaVA Vicuna-13B 166.50
7 InternLM-XComposer2-VL InternLM2-7B 164.75
7 360VL LLaMA3-70B 164.75
8 Lynx Vicuna-7B 164.50
8 LLaVA-1.6 Vicuna-34B 164.50
9 LVIS-INSTRUCT4V Vicuna-13B 163.25
10 PureMM Vicuna-13B 162.75
11 Honeybee Vicuna-13B 162.00
12 Monkey-Chat Qwen-7B 161.75
12 MindSource-VL-Chat MindSource-7B 161.75
13 LLaVA Vicuna-13B 161.25
14 TransCore-M PCITransGPT-13B 161.00
15 RBDash Vicuna-13B 160.25
16 SPHINX LLaMA2-13B 160.00
17 InternLM-XComposer-VL InternLM-7B 159.75
18 CogVLM Vicuna-7B 159.25
19 Otter OTTER-Image-MPT7B 158.75
20 GIT2 VQAv2-finetuned 158.50
20 Bunny-3B Phi-2 158.50
21 MiniCPM MiniCPM-2B 157.75
21 JT-VL-Chat - 157.75
22 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 156.25
22 BELLE-VL Qwen-14B 156.25
23 Cheetor Vicuna-7B 156.00
24 InternVL-Chat-V1.1 LLaMA2-13B 155.75
25 InternVL-Chat-V1.5 InternLM2-20B 154.75
26 MoE-LLaVA Phi-2.7B×4 154.50
27 CogAgent Vicuna-7B 154.25
27 MiniCPM-V-2 MiniCPM-2B 154.25
28 MMICL FlanT5xxl 153.75
28 MiniCPM-Llama3-V 2.5 LLaMA3-8B 153.75
29 Bunny-8B LLaMA3-8B 153.75
30 mPLUG-Owl2 LLaMA2-7B 153.25
31 InstructBLIP FlanT5xxl 153.00
31 Qwen-VL-Chat Qwen-7B 152.25
33 BLIVA FlanT5xxl 151.50
34 Muffin Vicuna-13B 151.25
35 GPT-4V - 151.00
36 Qwen-VL-Plus - 151.00
37 LRV-Instruction LRV-7B 147.98
38 VisualGLM-6B VisualGLM-6B 146.25
38 OmniLMM Zephyr-7B-beta 146.25
39 BLIP-2 Flant5xxl 145.25
40 Gemini Pro - 144.75
41 VPGTrans Vicuna-7B 141.75
42 LaVIN LAVIN-13B 136.75
43 mPLUG-Owl LLaMA-7B 135.50
44 PandaGPT Vicuna-7B 118.00
45 ImageBind_LLM LLaMA-7B 113.25
46 MiniGPT-4 Vicuna-13B 71.75
47 Multimodal-GPT Multimodal-GPT-9B 68.00

Landmark

Rank Model Version Score
🏅️ Qwen-VL-Plus - 191.00
🥈 Qwen-VL-Max - 187.50
🥉 ChatTruth-7B Qwen-7B 185.75
4 InternVL-Chat-V1.5 InternLM2-20B 177.75
5 360VL LLaMA3-70B 177.25
5 MiniCPM-Llama3-V 2.5 LLaMA3-8B 177.25
6 Monkey-Chat Qwen-7B 176.50
7 InternLM-XComposer2-VL InternLM2-7B 176.00
8 OmniLMM Zephyr-7B-beta 175.25
9 MiniCPM-V-2 MiniCPM-2B 175.00
10 RBDash Vicuna-13B 174.25
11 ShareGPT4V Vicuna-13B 174.00
11 BELLE-VL Qwen-14B 174.00
12 JT-VL-Chat - 173.50
13 WeMM InternLM-7B 172.25
13 Honeybee Vicuna-13B 172.25
13 PureMM Vicuna-13B 172.25
14 CogAgent Vicuna-7B 172.00
14 HyperLLaVA Vicuna-13B 172.00
15 LLaVA Vicuna-13B 170.50
16 Bunny-8B LLaMA3-8B 170.25
17 SPHINX LLaMA2-13B 168.09
18 InternVL-Chat-V1.1 LLaMA2-13B 168.00
19 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 167.84
20 MiniCPM MiniCPM-2B 167.75
21 InfMLLM Vicuna-13B 166.75
22 InternLM-XComposer-VL InternLM-7B 165.25
22 LLaVA-1.6 Vicuna-34B 165.25
23 Qwen-VL-Chat Qwen-7B 164.00
24 Lynx Vicuna-7B 162.00
25 LVIS-INSTRUCT4V Vicuna-13B 161.50
26 LRV-Instruction LRV-7B 160.53
27 DataOptim-LLaVA Vicuna-13B 160.00
28 mPLUG-Owl LLaMA-7B 159.25
28 TransCore-M PCITransGPT-13B 159.25
29 Gemini Pro - 158.75
30 CogVLM Vicuna-7B 158.00
31 mPLUG-Owl2 LLaMA2-7B 157.25
32 Bunny-3B Phi-2 155.00
33 MindSource-VL-Chat MindSource-7B 150.75
34 MoE-LLaVA Phi-2.7B×4 148.25
35 Muffin Vicuna-13B 146.25
36 Cheetor Vicuna-7B 145.73
37 GIT2 VQAv2-finetuned 140.50
38 GPT-4V - 138.25
39 BLIP-2 Flant5xxl 138.00
40 Otter OTTER-Image-MPT7B 137.25
41 MMICL FlanT5xxl 136.13
42 LaVIN LAVIN-13B 93.50
43 BLIVA FlanT5xxl 89.50
44 VisualGLM-6B VisualGLM-6B 83.75
45 InstructBLIP FlanT5xxl 79.75
46 Multimodal-GPT Multimodal-GPT-9B 69.75
46 PandaGPT Vicuna-7B 69.75
47 VPGTrans Vicuna-7B 64.75
48 ImageBind_LLM LLaMA-7B 62.00
49 MiniGPT-4 Vicuna-13B 54.00

Artwork

Rank Model Version Score
🏅️ InternLM-XComposer2-VL InternLM2-7B 185.50
🥈 PureMM Vicuna-13B 183.50
🥉 InfMLLM Vicuna-13B 167.50
4 Qwen-VL-Max - 166.00
5 ChatTruth-7B Qwen-7B 159.75
6 WeMM InternLM-7B 156.00
6 Qwen-VL-Plus - 156.00
7 OmniLMM Zephyr-7B-beta 150.25
8 GPT-4V - 148.00
9 GIT2 VQAv2-finetuned 146.25
10 MiniCPM-V-2 MiniCPM-2B 145.25
11 MiniCPM-Llama3-V 2.5 LLaMA3-8B 144.50
12 Monkey-Chat Qwen-7B 144.25
13 InternVL-Chat-V1.1 LLaMA2-13B 143.50
14 InternVL-Chat-V1.5 InternLM2-20B 143.00
15 RBDash Vicuna-13B 140.50
16 BELLE-VL Qwen-14B 139.50
17 LLaVA-1.6 Vicuna-34B 139.00
18 BLIP-2 Flant5xxl 136.50
19 Gemini Pro - 135.75
20 MMICL FlanT5xxl 135.50
21 Honeybee Vicuna-13B 134.75
23 InstructBLIP FlanT5xxl 134.25
23 mPLUG-Owl2 LLaMA2-7B 134.25
24 SPHINX LLaMA2-13B 134.00
25 BLIVA FlanT5xxl 133.25
26 JT-VL-Chat - 132.75
27 Bunny-8B LLaMA3-8B 132.50
28 360VL LLaMA3-70B 131.25
29 TransCore-M PCITransGPT-13B 130.75
29 MiniCPM MiniCPM-2B 130.75
30 LVIS-INSTRUCT4V Vicuna-13B 130.25
31 Otter OTTER-Image-MPT7B 129.00
32 ShareGPT4V Vicuna-13B 128.00
33 InternLM-XComposer-VL InternLM-7B 126.25
34 Qwen-VL-Chat Qwen-7B 125.50
35 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 123.75
36 MindSource-VL-Chat MindSource-7B 119.75
37 Lynx Vicuna-7B 119.50
38 Bunny-3B Phi-2 119.25
38 HyperLLaVA Vicuna-13B 119.25
39 LLaVA Vicuna-13B 117.75
40 Muffin Vicuna-13B 116.50
41 CogAgent Vicuna-7B 116.25
42 DataOptim-LLaVA Vicuna-13B 113.75
43 Cheetor Vicuna-7B 113.50
44 MoE-LLaVA Phi-2.7B×4 105.50
45 LRV-Instruction LRV-7B 101.25
46 mPLUG-Owl LLaMA-7B 96.25
47 CogVLM Vicuna-7B 88.75
48 LaVIN LAVIN-13B 87.25
49 VPGTrans Vicuna-7B 77.25
50 VisualGLM-6B VisualGLM-6B 75.25
51 ImageBind_LLM LLaMA-7B 70.75
52 MiniGPT-4 Vicuna-13B 60.50
53 Multimodal-GPT Multimodal-GPT-9B 59.50
54 PandaGPT Vicuna-7B 51.25

OCR

Rank Model Version Score
🏅️ GPT-4V - 185.00
🏅️ Gemini Pro - 185.00
🏅️ Qwen-VL-Max - 185.00
🥈 BELLE-VL Qwen-14B 177.50
🥈 InternVL-Chat-V1.1 LLaMA2-13B 177.50
🥉 Bunny-3B Phi-2 170.00
🥉 JT-VL-Chat - 170.00
4 DataOptim-LLaVA Vicuna-13B 162.50
4 PureMM Vicuna-13B 162.50
4 ChatTruth-7B Qwen-7B 162.50
4 MindSource-VL-Chat MindSource-7B 162.50
5 TransCore-M PCITransGPT-13B 155.00
5 Honeybee Vicuna-13B 155.00
5 OmniLMM Zephyr-7B-beta 155.00
5 Bunny-8B LLaMA3-8B 155.00
6 WeMM InternLM-7B 147.50
6 Qwen-VL-Plus - 147.50
6 InternLM-XComposer2-VL InternLM2-7B 147.50
6 CogVLM Vicuna-7B 147.50
7 Qwen-VL-Chat Qwen-7B 140.00
7 LLaVA-1.6 Vicuna-34B 140.00
7 InternVL-Chat-V1.5 InternLM2-20B 140.00
7 MiniCPM-Llama3-V 2.5 LLaMA3-8B 140.00
8 LVIS-INSTRUCT4V Vicuna-13B 132.50
8 ShareGPT4V Vicuna-13B 132.50
8 MoE-LLaVA Phi-2.7B×4 132.50
8 HyperLLaVA Vicuna-13B 132.50
8 360VL LLaMA3-70B 132.50
9 LLaVA Vicuna-13B 125.00
9 InternLM-XComposer-VL InternLM-7B 125.00
10 BLIP-2 Flant5xxl 110.00
10 LRV-Instruction LRV-7B 110.00
10 InfMLLM Vicuna-13B 110.00
10 MiniCPM MiniCPM-2B 110.00
11 LaVIN LAVIN-13B 107.50
12 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 102.50
12 mPLUG-Owl2 LLaMA2-7B 102.50
12 RBDash Vicuna-13B 102.50
12 MiniCPM-V-2 MiniCPM-2B 102.50
13 Cheetor Vicuna-7B 100.00
13 MMICL FlanT5xxl 100.00
14 BLIVA FlanT5xxl 87.50
14 SPHINX LLaMA2-13B 87.50
15 Multimodal-GPT Multimodal-GPT-9B 82.50
15 ImageBind_LLM LLaMA-7B 80.00
15 Monkey-Chat Qwen-7B 80.00
15 CogAgent Vicuna-7B 80.00
16 VPGTrans Vicuna-7B 77.50
16 Lynx Vicuna-7B 77.50
17 InstructBLIP FlanT5xxl 72.50
17 Otter OTTER-Image-MPT7B 72.50
18 mPLUG-Owl LLaMA-7B 65.00
18 GIT2 VQAv2-finetuned 65.00
19 MiniGPT-4 Vicuna-13B 57.50
19 Muffin Vicuna-13B 57.50
20 PandaGPT Vicuna-7B 50.00
21 VisualGLM-6B VisualGLM-6B 42.50

Cognition

Sum of the scores of all cognition subtasks, including commonsense reasoning, numerical calculation, text translation, and code reasoning. The full score of each subtask is 200, and that of all cognition is 800.

Rank Model Version Score
🏅️ Qwen-VL-Max - 643.57
🥈 InternVL-Chat-V1.5 InternLM2-20B 550.00
🥉 InternLM-XComposer2-VL InternLM2-7B 530.71
4 GPT-4V - 517.14
5 Qwen-VL-Plus - 502.14
6 WeMM InternLM-7B 445.00
7 Gemini Pro - 436.79
8 MMICL FlanT5xxl 428.93
9 MiniCPM-V-2 MiniCPM-2B 406.07
10 Monkey-Chat Qwen-7B 401.43
11 MiniCPM-Llama3-V 2.5 LLaMA3-8B 400.71
12 LLaVA-1.6 Vicuna-34B 397.14
13 InternLM-XComposer-VL InternLM-7B 391.07
14 ChatTruth-7B Qwen-7B 387.86
15 360VL LLaMA3-70B 371.43
16 InfMLLM Vicuna-13B 368.93
17 Bunny-8B LLaMA3-8B 367.50
18 DataOptim-LLaVA Vicuna-13B 361.07
19 Qwen-VL-Chat Qwen-7B 360.71
20 PureMM Vicuna-13B 360.36
21 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 356.43
22 InternVL-Chat-V1.1 LLaMA2-13B 341.07
23 JT-VL-Chat - 339.64
24 BELLE-VL Qwen-14B 332.14
25 BLIVA FlanT5xxl 331.43
26 RBDash Vicuna-13B 330.00
27 LRV-Instruction LRV-7B 328.21
28 OmniLMM Zephyr-7B-beta 322.14
29 Cheetor Vicuna-7B 321.07
30 Honeybee Vicuna-13B 315.36
31 TransCore-M PCITransGPT-13B 314.64
32 MiniCPM MiniCPM-2B 314.29
33 CogVLM Vicuna-7B 313.21
34 mPLUG-Owl2 LLaMA2-7B 313.21
35 SPHINX LLaMA2-13B 310.00
36 Otter OTTER-Image-MPT7B 306.43
37 HyperLLaVA Vicuna-13B 304.29
38 ShareGPT4V Vicuna-13B 303.21
39 MindSource-VL-Chat MindSource-7B 301.43
40 LLaVA Vicuna-13B 295.36
41 InstructBLIP FlanT5xxl 291.79
42 BLIP-2 Flant5xxl 290.00
43 Muffin Vicuna-13B 290.00
44 Bunny-3B Phi-2 289.29
45 LVIS-INSTRUCT4V Vicuna-13B 286.79
46 mPLUG-Owl LLaMA-7B 276.07
47 CogAgent Vicuna-7B 274.64
48 MoE-LLaVA Phi-2.7B×4 262.14
49 GIT2 VQAv2-finetuned 261.79
50 LaVIN LAVIN-13B 249.64
51 VPGTrans Vicuna-7B 249.29
52 PandaGPT Vicuna-7B 228.57
53 Multimodal-GPT Multimodal-GPT-9B 226.79
54 Lynx Vicuna-7B 215.71
55 ImageBind_LLM LLaMA-7B 213.57
56 VisualGLM-6B VisualGLM-6B 181.79
57 MiniGPT-4 Vicuna-13B 144.29

Commonsense Reasoning

Rank Model Version Score
🏅️ InfMLLM Vicuna-13B 156.43
🥈 LLaVA-1.6 Vicuna-34B 152.14
🥉 360VL LLaMA3-70B 151.43
4 MiniCPM-Llama3-V 2.5 LLaMA3-8B 150.71
5 Qwen-VL-Max - 148.57
6 InternLM-XComposer2-VL InternLM2-7B 145.71
7 Qwen-VL-Plus - 142.14
7 GPT-4V - 142.14
8 WeMM InternLM-7B 140.00
8 RBDash Vicuna-13B 140.00
8 Bunny-8B LLaMA3-8B 140.00
9 InternLM-XComposer-VL InternLM-7B 138.57
10 PureMM Vicuna-13B 137.86
11 MMICL FlanT5xxl 136.43
11 BLIVA FlanT5xxl 136.43
11 MindSource-VL-Chat MindSource-7B 136.43
12 InternVL-Chat-V1.5 InternLM2-20B 135.00
13 LVIS-INSTRUCT4V Vicuna-13B 134.29
13 HyperLLaVA Vicuna-13B 134.29
14 ChatTruth-7B Qwen-7B 132.86
15 TransCore-M PCITransGPT-13B 132.14
16 Monkey-Chat Qwen-7B 131.43
17 Qwen-VL-Chat Qwen-7B 130.71
18 SPHINX LLaMA2-13B 130.00
19 InstructBLIP FlanT5xxl 129.29
19 Gemini Pro - 129.29
20 MiniCPM-V-2 MiniCPM-2B 128.57
21 LLaVA Vicuna-13B 127.86
22 BELLE-VL Qwen-14B 127.14
22 OmniLMM Zephyr-7B-beta 127.14
23 ShareGPT4V Vicuna-13B 125.71
23 CogVLM Vicuna-7B 125.71
24 DataOptim-LLaVA Vicuna-13B 123.57
24 InternVL-Chat-V1.1 LLaMA2-13B 123.57
25 Honeybee Vicuna-13B 122.86
26 JT-VL-Chat - 122.14
27 MiniCPM MiniCPM-2B 119.29
28 MoE-LLaVA Phi-2.7B×4 117.14
29 CogAgent Vicuna-7B 117.14
30 mPLUG-Owl2 LLaMA2-7B 115.71
31 Bunny-3B Phi-2 114.29
32 Lynx Vicuna-7B 110.71
33 BLIP-2 Flant5xxl 110.00
33 Muffin Vicuna-13B 110.00
34 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 106.43
34 Otter OTTER-Image-MPT7B 106.43
35 LRV-Instruction LRV-7B 100.71
36 GIT2 VQAv2-finetuned 99.29
37 Cheetor Vicuna-7B 98.57
38 LaVIN LAVIN-13B 87.14
39 mPLUG-Owl LLaMA-7B 78.57
40 PandaGPT Vicuna-7B 73.57
41 VPGTrans Vicuna-7B 64.29
42 MiniGPT-4 Vicuna-13B 59.29
43 Multimodal-GPT Multimodal-GPT-9B 49.29
44 ImageBind_LLM LLaMA-7B 48.57
45 VisualGLM-6B VisualGLM-6B 39.29

Numerical Calculation

Rank Model Version Score
🏅️ Qwen-VL-Max - 155.00
🥈 InternLM-XComposer2-VL InternLM2-7B 137.50
🥉 GPT-4V - 130.00
4 InternVL-Chat-V1.5 InternLM2-20B 125.00
5 Qwen-VL-Plus - 85.00
6 MMICL FlanT5xxl 82.50
7 Cheetor Vicuna-7B 77.50
7 Gemini Pro - 77.50
8 Bunny-8B LLaMA3-8B 75.00
9 Otter OTTER-Image-MPT7B 72.50
9 LLaVA-1.6 Vicuna-34B 72.50
10 LRV-Instruction LRV-7B 70.00
10 InternVL-Chat-V1.1 LLaMA2-13B 70.00
10 360VL LLaMA3-70B 70.00
11 RBDash Vicuna-13B 67.50
11 JT-VL-Chat - 67.50
12 LaVIN LAVIN-13B 65.00
13 Multimodal-GPT Multimodal-GPT-9B 62.50
14 mPLUG-Owl LLaMA-7B 60.00
14 InfMLLM Vicuna-13B 60.00
14 CogVLM Vicuna-7B 60.00
15 BLIVA FlanT5xxl 57.50
15 WeMM InternLM-7B 57.50
15 Honeybee Vicuna-13B 57.50
16 ImageBind_LLM LLaMA-7B 55.00
16 InternLM-XComposer-VL InternLM-7B 55.00
16 SPHINX LLaMA2-13B 55.00
16 TransCore-M PCITransGPT-13B 55.00
17 OmniLMM Zephyr-7B-beta 55.00
17 PandaGPT Vicuna-7B 50.00
17 VPGTrans Vicuna-7B 50.00
17 GIT2 VQAv2-finetuned 50.00
17 MoE-LLaVA Phi-2.7B×4 50.00
17 MiniCPM-Llama3-V 2.5 LLaMA3-8B 50.00
18 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 47.50
18 DataOptim-LLaVA Vicuna-13B 47.50
18 BELLE-VL Qwen-14B 47.50
18 MiniCPM MiniCPM-2B 47.50
18 Bunny-3B Phi-2 47.50
19 MiniGPT-4 Vicuna-13B 45.00
19 VisualGLM-6B VisualGLM-6B 45.00
19 Muffin Vicuna-13B 45.00
19 ShareGPT4V Vicuna-13B 45.00
19 PureMM Vicuna-13B 45.00
19 CogAgent Vicuna-7B 45.00
19 MindSource-VL-Chat MindSource-7B 45.00
20 LLaVA Vicuna-13B 42.50
21 Monkey-Chat Qwen-7B 42.50
21 BLIP-2 Flant5xxl 40.00
21 InstructBLIP FlanT5xxl 40.00
21 Qwen-VL-Chat Qwen-7B 40.00
21 LVIS-INSTRUCT4V Vicuna-13B 40.00
21 ChatTruth-7B Qwen-7B 40.00
22 HyperLLaVA Vicuna-13B 37.50
23 mPLUG-Owl2 LLaMA2-7B 35.00
24 MiniCPM-V-2 MiniCPM-2B 32.50
25 Lynx Vicuna-7B 17.50

Text Translation

Rank Model Version Score
🏅️ Qwen-VL-Plus - 185.00
🥈 InternVL-Chat-V1.5 InternLM2-20B 185.00
🥉 Qwen-VL-Max - 170.00
🥉 MiniCPM-V-2 MiniCPM-2B 170.00
4 ChatTruth-7B Qwen-7B 162.50
5 Qwen-VL-Chat Qwen-7B 147.50
5 InternLM-XComposer2-VL InternLM2-7B 147.50
6 Gemini Pro - 145.00
7 MiniCPM-Llama3-V 2.5 LLaMA3-8B 140.00
8 Monkey-Chat Qwen-7B 137.50
9 MMICL FlanT5xxl 132.50
10 WeMM InternLM-7B 130.00
11 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 112.50
11 InternLM-XComposer-VL InternLM-7B 112.50
12 DataOptim-LLaVA Vicuna-13B 110.00
13 JT-VL-Chat - 105.00
13 mPLUG-Owl2 LLaMA2-7B 102.50
13 BELLE-VL Qwen-14B 102.50
13 InternVL-Chat-V1.1 LLaMA2-13B 102.50
14 Bunny-8B LLaMA3-8B 95.00
15 PureMM Vicuna-13B 92.50
15 Bunny-3B Phi-2 92.50
16 Honeybee Vicuna-13B 87.50
16 HyperLLaVA Vicuna-13B 87.50
17 LRV-Instruction LRV-7B 85.00
18 MiniCPM MiniCPM-2B 82.50
19 mPLUG-Owl LLaMA-7B 80.00
19 ShareGPT4V Vicuna-13B 80.00
19 MindSource-VL-Chat MindSource-7B 80.00
20 LLaVA Vicuna-13B 77.50
20 VPGTrans Vicuna-7B 77.50
20 BLIVA FlanT5xxl 77.50
20 InfMLLM Vicuna-13B 77.50
20 OmniLMM Zephyr-7B-beta 77.50
21 GPT-4V - 75.00
21 SPHINX LLaMA2-13B 75.00
21 CogVLM Vicuna-7B 75.00
22 Muffin Vicuna-13B 72.50
22 RBDash Vicuna-13B 72.50
22 LLaVA-1.6 Vicuna-34B 72.50
22 360VL LLaMA3-70B 72.50
23 LVIS-INSTRUCT4V Vicuna-13B 70.00
24 GIT2 VQAv2-finetuned 67.50
25 BLIP-2 Flant5xxl 65.00
25 InstructBLIP FlanT5xxl 65.00
26 CogAgent Vicuna-7B 62.50
27 Multimodal-GPT Multimodal-GPT-9B 60.00
28 Otter OTTER-Image-MPT7B 57.50
28 PandaGPT Vicuna-7B 57.50
28 Cheetor Vicuna-7B 57.50
28 MoE-LLaVA Phi-2.7B×4 57.50
29 TransCore-M PCITransGPT-13B 55.00
30 ImageBind_LLM LLaMA-7B 50.00
30 VisualGLM-6B VisualGLM-6B 50.00
31 LaVIN LAVIN-13B 47.50
32 Lynx Vicuna-7B 42.50
33 MiniGPT-4 Vicuna-13B 0.00

Code Reasoning

Rank Model Version Score
🏅️ GPT-4V - 170.00
🏅️ Qwen-VL-Max - 170.00
🥈 WeMM InternLM-7B 117.50
🥉 internVL-Chat-V1.5 InternLM2-20B 105.00
4 InternLM-XComposer2-VL InternLM2-7B 100.00
4 LLaVA-1.6 Vicuna-34B 100.00
5 LLaMA-Adapter V2 LLaMA-Adapter-v2.1-7B 90.00
5 Monkey-Chat Qwen-7B 90.00
5 Qwen-VL-Plus - 90.00
6 Cheetor Vicuna-7B 87.50
7 InternLM-XComposer-VL InternLM-7B 85.00
7 Gemini Pro - 85.00
7 PureMM Vicuna-13B 85.00
8 DataOptim-LLaVA Vicuna-13B 80.00
9 MMICL FlanT5xxl 77.50
9 360VL LLaMA3-70B 77.50
10 BLIP-2 Flant5xxl 75.00
10 InfMLLM Vicuna-13B 75.00
10 MiniCPM-V-2 MiniCPM-2B 75.00
11 LRV-Instruction LRV-7B 72.50
11 TransCore-M PCITransGPT-13B 72.50
12 Otter OTTER-Image-MPT7B 70.00
13 MiniCPM MiniCPM-2B 65.00
14 Muffin Vicuna-13B 62.50
14 OmniLMM Zephyr-7B-beta 62.50
15 ImageBind_LLM LLaMA-7B 60.00
15 BLIVA FlanT5xxl 60.00
15 mPLUG-Owl2 LLaMA2-7B 60.00
15 MiniCPM-Llama3-V 2.5 LLaMA3-8B 60.00
16 mPLUG-Owl LLaMA-7B 57.50
16 InstructBLIP FlanT5xxl 57.50
16 VPGTrans Vicuna-7B 57.50
16 Bunny-8B LLaMA3-8B 57.50
17 Multimodal-GPT Multimodal-GPT-9B 55.00
17 BELLE-VL Qwen-14B 55.00
18 ShareGPT4V Vicuna-13B 52.50
18 CogVLM Vicuna-7B 52.50
18 ChatTruth-7B Qwen-7B 52.50
19 LaVIN LAVIN-13B 50.00
19 SPHINX LLaMA2-13B 50.00
19 RBDash Vicuna-13B 50.00
19 CogAgent Vicuna-7B 50.00
20 VisualGLM-6B VisualGLM-6B 47.50
20 PandaGPT Vicuna-7B 47.50
20 LLaVA Vicuna-13B 47.50
20 Honeybee Vicuna-13B 47.50
21 Lynx Vicuna-7B 45.00
21 GIT2 VQAv2-finetuned 45.00
21 InternVL-Chat-V1.1 LLaMA2-13B 45.00
21 HyperLLaVA Vicuna-13B 45.00
21 JT-VL-Chat - 45.00
22 Qwen-VL-Chat Qwen-7B 42.50
22 LVIS-INSTRUCT4V Vicuna-13B 42.50
23 MiniGPT-4 Vicuna-13B 40.00
23 MindSource-VL-Chat MindSource-7B 40.00
24 MoE-LLaVA Phi-2.7B×4 37.50
25 Bunny-3B Phi-2 35.00