While recent multimodal large language models (MLLMs) have advanced automated ECG interpretation, they still face two key limitations: (1) insufficient multimodal synergy between ECG time series signals and ECG images, and (2) limited explainability in linking diagnoses to granular waveform evidence. We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations: a dual-encoder framework extracting complementary time series and image features, cross-modal alignment for effective multimodal understanding, and knowledge-guided instruction data generation for generating high-granularity grounding data (ECG-Grounding) linking diagnoses to measurable parameters (e.g., QRS/PR Intervals). Additionally, we propose the Grounded ECG Understanding task, a clinically motivated benchmark designed to comprehensively assess the MLLM's capability in grounded ECG understanding. Experimental results on both existing and our proposed benchmarks show GEM significantly improves predictive performance (CSN +7.4%↑), explainability (+22.7%↑), and grounding (+25.3%↑), making it a promising approach for real-world clinical applications.
- [Sep 2025] GEM has been accepted to NeurIPS 2025! More updates coming soon.
- [Jul 2025] The full version of MIMIC-IV-ECG with beat-level features and GPT-4o interpretations has been released — check it out here!
- [Mar 2025] GEM-7B and ECG-Grounding-30k are now available.
We will continue to release more ECG-Grounding data and associated beat-level features progressively.
Stay tuned for updates!
Project Page: 📖 Page
Paper: 📄 Arxiv
Model: 🤗 GEM
Data: 🤗 ECG-Grounding
git clone https://github.com/lanxiang1017/GEM.git
bash GEM/setup.shPlease download required data:
ECG:
Images:
- ECG-Grounding-Images (mimic_gen)
- ECG-Instruct
- ECG-Bench
After downloading all of them, organize the data as follows in ./data,
├── ecg_timeseries
└── champan-shaoxing
└── code15
└── cpsc2018
└── ptbxl
└── georgia
└── mimic-iv
├── ecg_images
└── cod15_v4
└── csn_aug_all_layout_papersize
└── csn_ori_layout_papersize
└── csn_part_noise_layout_papersize
└── gen_images
└── mimic_gen
└── mimic
└── mimic_v4
└── ptb-xl
├── ecg_bench
└── images
└── ecg-grounding-test-mimiciv.json
└── ecg-grounding-test-ptbxl.json
├── ecg_jsons
└── ECG_Grounding_30k.json
Pretrained ECG Encoder:
- ECG-CoCa : place it in
GEM/ecg_coca/open_clip/checkpoint
Pretrained MLLMs:
For training from scratch:
- step 1. specify paths in
GEM/scripts/train_gem.sh - step 2. run
bash GEM/scripts/train_gem.sh
For ECG-Grounding:
- step 1. generate interpretations:
GEM/evaluation/gem_bench/bench_ecggrounding.sh - step 2. process interpretations:
GEM/gem_evaluation/process_gem_outputs.ipynb - step 3. generate GPT evaluation reports:
GEM/gem_evaluation/generate_gpt_eval.py - step 4. process evaluation reports and get scores:
GEM/gem_evaluation/process_grounding_scores.ipynb
For ECG-Bench:
- step 1. generate results:
GEM/evaluation/gem_bench/bench_ecgbench.sh - step 2. evaluate results:
GEM/evaluation/evaluate_ecgbench.py - step 3. evaluate reports:
GEM/evaluation/eval_report.py
Note
-
- You need to specify the result paths in all evaluation scripts (For ECG-Bench, you also need to specify the path to question files in evaluate_ecgbench.py).
-
- If you download our trained GEM-7B model from HuggingFace, you must set the path to ECG-CoCa in the config.json file (under "mm_ecg_tower") before using it.
-
- bench_ecggrounding.sh is designed to use multiple GPUs to generate interpretations simultaneously, reducing generation time. To use it, you must split the test file (ecg-grounding-test-mimiciv.json) into multiple chunks. If you prefer a simpler setup, you can use bench_ecgbench.sh instead. The core generation functions are the same. Example usage:
bash bench_ecgbench.sh -m PATH_TO_GEM -d ecg-grounding-test-mimiciv.
- bench_ecggrounding.sh is designed to use multiple GPUs to generate interpretations simultaneously, reducing generation time. To use it, you must split the test file (ecg-grounding-test-mimiciv.json) into multiple chunks. If you prefer a simpler setup, you can use bench_ecgbench.sh instead. The core generation functions are the same. Example usage:
If you find GEM helpful for your research and applications, please cite our paper:
@misc{lan2025gemempoweringmllmgrounded,
title={GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images},
author={Xiang Lan and Feng Wu and Kai He and Qinghao Zhao and Shenda Hong and Mengling Feng},
year={2025},
eprint={2503.06073},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.06073},
}We thank the authors of PULSE and ECG-Chat for their publicly released models, datasets, and training codes.
