Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training 【ICML 2024】
This is the offical code of Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training.[ICML 2024]
Clone this repository and install Python dependencies:
git clone https://github.com/SVT-Yang/MedST.git
pip install -r requirements.txt
Datasets we used are as follows:
-
MIMIC-CXR: MIMIC-CXR-JPG is the medical mutimodal dataset we used for pretraining.
-
MS-CXR-T benchmark:We used MS-CXR-T benchmark for temporal downstream tasks.
-
RSNA: We used the stage 2 of RSNA dataset in Kaggle.
-
COVIDx: We used the version 6 of COVIDx dataset in Kaggle which has 3 classes, i.e., no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia.
After downloading datasets, please check if the path in constants.py
is correct.
- run
mimic_cxr.py
to get multi-view image-text pairs and temporal information. - run
rsna.py
andcovidx.py
to get train/val/test set.
First, download pretrained weights we used:
- Text encoder (BioClinicalBERT) : download
pytorch_model.bin
to/medst/emilyalsentzer/Bio_ClinicalBERT
folder from Bio_ClinicalBERT. - MGCA pre-trained weights from MGCA.
Before pretraining, please make sure all the path
is correct.
Then, we use this command to pretrain:
cd medst/models/medst
CUDA_VISIBLE_DEVICES=0,1 python medst_module.py --gpus 2 --strategy ddp --batch_size 10 --num_workers 8
Our pre-trained MedST can be found here.
First, we need set the path
(or ckpt_path
) argument to the path of our pre-trained MedST model.
- make sure the path of two csv files (temporal image classification and temporal sentence similarity classification) are correct.
- run
temporal_test.py
to get the results.
- run
zeroshot_RSNA.py
to get the results.
We use --data_pct
to specify the portion of training data for finetuning. To run all experiments for COVIDx classification task, we use this command:
./run_cls_covidx.sh
This work is built upon the MGCA and TCC.
@article{yang2024unlocking,
title={Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training},
author={Jinxia Yang and Bing Su and Wayne Xin Zhao and Ji-Rong Wen},
journal={arXiv preprint arXiv:2405.19654},
year={2024}
}