This repository contains the official implementation of VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance.
Paper: https://arxiv.org/abs/2510.06809
Echocardiography is a critical tool for detecting heart diseases, yet its steep operational difficulty causes a shortage of skilled personnel. Probe guidance systems, which assist in acquiring high-quality images, offer a promising solution to lower this operational barrier. However, robust probe guidance remains challenging due to significant individual variability. This variability manifests as differences in low-level features within two-dimensional (2D) images, which complicates image feature understanding, and differences in individual three-dimensional (3D) structures, which poses challenges for precise navigation. To address these challenges, we first propose leveraging the robust image representations learned by ultrasound foundation models from vast datasets. Yet, applying these models to probe navigation is non-trivial due to their lack of understanding of individual 3D structures. To this end, we meticulously design a Vision-Action Adapter (VA-Adapter) to online inject the capability of understanding individual 3D structures. Specifically, by embedding the VA-Adapter into the foundation model's image encoder, the model can infer cardiac anatomy from historical vision-action sequences, mimicking the cognitive process of a sonographer. Extensive experiments on a dataset with over 1.31M samples demonstrate that the VA-Adapter outperforms strong probe guidance models while requiring approximately 33 times fewer trained parameters.
VA-Adapter injects historical vision-action information into ultrasound foundation model encoders. The adapter enables the model to adapt robust 2D ultrasound representations to individualized 3D cardiac navigation, while keeping most foundation model parameters frozen.
This project supports VA-Adapter training on three ultrasound foundation models:
- EchoCLIP
- BiomedCLIP
- USFM
The model wrappers are organized under models/:
models/echoclip_adapter.pymodels/biomedclip_adapter.pymodels/usfm_adapter.pymodels/seq_model.py
The code is tested with the following core dependencies:
- Python >= 3.8
- PyTorch >= 2.1
- timm == 1.0.15
- open_clip_torch == 2.32.0
Install dependencies with:
pip install torch torchvision
pip install timm==1.0.15 open_clip_torch==2.32.0
pip install einops scipy matplotlib tqdmCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_echoclip.py \
--arch echoclip \
--epochs 5 \
--batch-size 256 \
--lr 1e-4 --lr_f 1e-6 \
--num-workers 8 --print-freq 50 \
--timestep 4 \
--data_root data \
--logs logs/echoclip \
--dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--use_adapterCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_biomedclip.py \
--arch biomedclip \
--epochs 5 \
--batch-size 256 \
--lr 1e-4 --lr_f 1e-6 \
--num-workers 8 --print-freq 50 \
--timestep 4 \
--data_root data \
--logs logs/biomedclip \
--dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--encoderpath pretrained_weights/biomed_clip.bin \
--use_adapterHere, --encoderpath pretrained_weights/biomed_clip.bin should point to the official BiomedCLIP pretrained visual encoder weights.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_usfm.py \
--arch usfm \
--epochs 5 \
--batch-size 256 \
--lr 1e-4 --lr_f 1e-6 \
--num-workers 8 --print-freq 50 \
--timestep 4 \
--data_root data \
--logs logs/usfm \
--dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
--encoderpath pretrained_weights/USFM_latest.pth \
--use_adapterHere, --encoderpath pretrained_weights/USFM_latest.pth should point to the official USFM pretrained weights.
Training logs and checkpoints are saved to the directory specified by --logs. The best checkpoint is saved according to validation MAE.
If you find our project useful in your research, please consider citing:
@misc{wang2026vaadapter,
title={VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance},
author={Teng Wang and Haojun Jiang and Yuxuan Wang and Zhenguo Sun and Yujiao Deng and Shiji Song and Gao Huang},
year={2026},
eprint={2510.06809},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.06809},
}