Skip to content

PRAISELab-PicusLab/ExtendedMMMED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฅ Extended Multilingual Multimodal Medical Exam Dataset for Visual Question Answering in Healthcare

GitHub Repository MMMED License

The Extended Multilingual Multimodal Medical Exam Dataset (Extended MMMED) is the new, larger release of MMMED for evaluating Vision-Language Models (VLMs) on medical multiple-choice question answering (MCQA) tasks.

Compared to the original benchmark, this extension substantially increases the number of questions and updates the benchmark with 28 tested VLMs (general-purpose, medical-specialized, and closed-source) across Spanish, English, and Italian.

The dataset includes challenging, real-world medical content from Mรฉdico Interno Residente (MIR - Spain) and Scuole di Specializzazione in Medicina (SSM - Italy) exam settings, with heterogeneous diagnostic images and clinically grounded questions.

๐Ÿ”“ How to Access the Dataset

You can access the dataset via Hugging Face. Follow these steps to download it:

โš ๏ธ Disclaimer: This dataset contains medical images that may be sensitive for some users. Viewer discretion is advised, especially if the content may evoke strong emotional reactions or be distressing.

Login using e.g. huggingface-cli login to access this dataset

from datasets import load_dataset

# Login using e.g. `huggingface-cli login` to access this dataset
ds = load_dataset("praiselab-picuslab/MMMED", split="extended")

๐ŸŒŸ Key Features:

  • Languages: ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฌ๐Ÿ‡ง English, ๐Ÿ‡ฎ๐Ÿ‡น Italian
  • Scale: 955 questions per language (2,865 total samples)
  • Medical Content: Questions based on Spanish residency exam material
  • Image Types: Diagnostic medical images (e.g., CT scans, X-rays)
  • Categories: 26 medical categories per language
  • Multimodal: Each question comes with a medical image ๐Ÿ“ธ
  • Benchmarking: 28 VLMs evaluated in multilingual settings

๐Ÿ”„ Dataset Workflow

Here is the general workflow for building the MMMED dataset for Vision-Language Model (VLM) evaluation:

image

๐Ÿ“Š Dataset Overview

The Extended MMMED benchmark contains 955 questions for each language and is organized into 26 medical categories per language. The table below reports updated corpus statistics used in the new study.

Statistic ๐Ÿ‡ช๐Ÿ‡ธ Spanish ๐Ÿ‡ฌ๐Ÿ‡ง English ๐Ÿ‡ฎ๐Ÿ‡น Italian
# Questions 955 955 955
# Categories 26 26 26
Last Update 2026 2026 2026
Avg. Option Length 4.20 3.87 4.03
Max. Option Length 73 76 74
Total Question Tokens* 42,262 41,327 39,716
Avg. Question Length 43.45 40.81 40.78
Max. Question Length 264 258 254

* Token counts are computed with the preprocessing pipeline used in this repository (SpaCy-based analysis notebooks).

image

๐Ÿ–ผ๏ธ Image Types

Categorization of Image Types in the Extended MMMED Dataset. This figure presents the four main categories of images included in the dataset and their respective distributions.

image

โœจ Example MMCQA

Each multimodal multiple-choice question-answer (MMCQA) pair integrates three essential components with the following structure:

  • Category: $C$
  • Question: $Q$
  • Image URL: $I$
  • Answer Options: $O$
  • Correct Answer: ๐Ÿ’ก

Hereโ€™s an illustrative example of multimodal QA in three languages:

image

๐Ÿ” VLMs Evaluated in the Extended Benchmark (28 Models)

The following table reports architecture details for all tested models.

Model Type Param (B) Language Model Vision Model
medvlm-r1 Medical 2 Qwen2-2B QwenViT
maira-2 Medical 7 Vicuna-7B-v1.5 RAD-DINO-MAIRA-2
medgemma-4b-it Medical 4 Gemma-3-4B MedSigLIP-448
llava-med-v1.5-7b Medical 7 Mistral-7B CLIP ViT-L/14
chexagent-8b Medical 8 Phi-2-2B SigLIP-Large
medgemma-27b-it Medical 27 Gemma-3-27B MedSigLIP-448
minicpm-v-2.6 General 2.6 Qwen2-7B SigLip-400M
paligemma-3b-mix-448 General 3 Gemma-2B SigLIP-So400m/14
paligemma2-3b-mix-448 General 3 Gemma-2-2B SigLIP-So400m/14
deepseek-vl2-tiny General 3 DeepSeekMoE-3B SigLIP-400M
qwen2.5-vl-3b General 3 Qwen2.5-3B QwenViT
phi-3.5-vision General 4 Phi-3.5 CLIP ViT-L/14
gemma-3-4b-it General 4 Gemma-3-4B SigLIP
llava-v1.5-7b General 7 Vicuna-7B-v1.5 CLIP ViT-L/14
deepseek-vl-7b General 7 DeepSeek-LLM-7B SigLIP + SAM
qwen2.5-vl-7b General 7 Qwen2.5-7B QwenViT
qwen2-vl-7b General 8 Qwen2-7B QwenViT
qwen3-vl-8b General 8 Qwen3-8B QwenViT
internvl2.5-8b General 8 InternLM2.5-7B InternViT-300M
paligemma2-10b-mix-448 General 10 Gemma-2-9B SigLIP-So400m/14
pixtral-12b General 12 Mistral-Nemo-12B Pixtral ViT
gemma-3-27b-it General 27 Gemma-3-27B SigLIP
qwen3-vl-30b General 30 Qwen3-30B QwenViT
qwen2.5-vl-32b General 32 Qwen2.5-32B QwenViT
qwen2.5-vl-72b General 72 Qwen2.5-72B QwenViT
claude-4-sonnet Closed Unknown Closed-Source Closed-Source
gpt-5-mini Closed Unknown Closed-Source Closed-Source
gemini-2.5-flash Closed Unknown Closed-Source Closed-Source

๐Ÿ“ˆ VLM Performance on MMMED

The following figure presents the overall multilingual performance trend.

image

For complete analysis outputs (tables and publication-quality figures), see:

  • Analysis/analysis_output/tables/accuracy_table.csv
  • Analysis/analysis_output/tables/summary_table.csv
  • Analysis/analysis_output/figures/

๐Ÿ–‹๏ธ Original MMMED Citation

Please cite also the original work as follows:

@inproceedings{riccio2025multilingual,
  title={A Multilingual Multimodal Medical Examination Dataset for Visual Question Answering in Healthcare},
  author={Riccio, Giuseppe and Romano, Antonio and Barone, Mariano and Orlando, Gian Marco and Russo, Diego and Postiglione, Marco and La Gatta, Valerio and Moscato, Vincenzo},
  booktitle={2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)},
  pages={435--440},
  year={2025},
  organization={IEEE Computer Society}
}

๐ŸŒ Notes

Dataset Usage: The dataset is intended for academic and research purposes only. It is not recommended for clinical decision-making or commercial use.

๐Ÿ‘จโ€๐Ÿ’ป This project was developed by Mariano Barone, Francesco Di Serio, Giuseppe Riccio, Antonio Romano, Vincenzo Moscato, and Marco Postiglione University of Naples, Federico II

๐Ÿ“œ License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

About

๐Ÿฉบ Extended MMMED is a benchmark dataset for evaluating Vision-Language Models (VLMs) on medical multiple-choice question answering (MCQA) tasks. ๐ŸฅIt features 955 real-world medical questions from Spanish MIR and Italian SSM exams, available in ๐Ÿ‡ช๐Ÿ‡ธ Spanish, ๐Ÿ‡ฌ๐Ÿ‡ง English, and ๐Ÿ‡ฎ๐Ÿ‡น Italian.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors