Skip to content

LeapLabTHU/VA-Adapter

Repository files navigation

VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

MICCAI 2026 Early Accept paper (Top 9%)

This repository contains the official implementation of VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance.

Paper: https://arxiv.org/abs/2510.06809

Abstract

Echocardiography is a critical tool for detecting heart diseases, yet its steep operational difficulty causes a shortage of skilled personnel. Probe guidance systems, which assist in acquiring high-quality images, offer a promising solution to lower this operational barrier. However, robust probe guidance remains challenging due to significant individual variability. This variability manifests as differences in low-level features within two-dimensional (2D) images, which complicates image feature understanding, and differences in individual three-dimensional (3D) structures, which poses challenges for precise navigation. To address these challenges, we first propose leveraging the robust image representations learned by ultrasound foundation models from vast datasets. Yet, applying these models to probe navigation is non-trivial due to their lack of understanding of individual 3D structures. To this end, we meticulously design a Vision-Action Adapter (VA-Adapter) to online inject the capability of understanding individual 3D structures. Specifically, by embedding the VA-Adapter into the foundation model's image encoder, the model can infer cardiac anatomy from historical vision-action sequences, mimicking the cognitive process of a sonographer. Extensive experiments on a dataset with over 1.31M samples demonstrate that the VA-Adapter outperforms strong probe guidance models while requiring approximately 33 times fewer trained parameters.

Method

VA-Adapter injects historical vision-action information into ultrasound foundation model encoders. The adapter enables the model to adapt robust 2D ultrasound representations to individualized 3D cardiac navigation, while keeping most foundation model parameters frozen.

Supported Foundation Models

This project supports VA-Adapter training on three ultrasound foundation models:

  • EchoCLIP
  • BiomedCLIP
  • USFM

The model wrappers are organized under models/:

  • models/echoclip_adapter.py
  • models/biomedclip_adapter.py
  • models/usfm_adapter.py
  • models/seq_model.py

Environment

The code is tested with the following core dependencies:

  • Python >= 3.8
  • PyTorch >= 2.1
  • timm == 1.0.15
  • open_clip_torch == 2.32.0

Install dependencies with:

pip install torch torchvision
pip install timm==1.0.15 open_clip_torch==2.32.0
pip install einops scipy matplotlib tqdm

Training

EchoCLIP + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_echoclip.py \
  --arch echoclip \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/echoclip \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --use_adapter

BiomedCLIP + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_biomedclip.py \
  --arch biomedclip \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/biomedclip \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --encoderpath pretrained_weights/biomed_clip.bin \
  --use_adapter

Here, --encoderpath pretrained_weights/biomed_clip.bin should point to the official BiomedCLIP pretrained visual encoder weights.

USFM + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_usfm.py \
  --arch usfm \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/usfm \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --encoderpath pretrained_weights/USFM_latest.pth \
  --use_adapter

Here, --encoderpath pretrained_weights/USFM_latest.pth should point to the official USFM pretrained weights.

Outputs

Training logs and checkpoints are saved to the directory specified by --logs. The best checkpoint is saved according to validation MAE.

Reference

If you find our project useful in your research, please consider citing:

@misc{wang2026vaadapter,
      title={VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance}, 
      author={Teng Wang and Haojun Jiang and Yuxuan Wang and Zhenguo Sun and Yujiao Deng and Shiji Song and Gao Huang},
      year={2026},
      eprint={2510.06809},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.06809}, 
}

About

[MICCAI 2026] VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages