VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

MICCAI 2026 Early Accept paper (Top 9%)

This repository contains the official implementation of VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance.

Paper: https://arxiv.org/abs/2510.06809

Abstract

Echocardiography is a critical tool for detecting heart diseases, yet its steep operational difficulty causes a shortage of skilled personnel. Probe guidance systems, which assist in acquiring high-quality images, offer a promising solution to lower this operational barrier. However, robust probe guidance remains challenging due to significant individual variability. This variability manifests as differences in low-level features within two-dimensional (2D) images, which complicates image feature understanding, and differences in individual three-dimensional (3D) structures, which poses challenges for precise navigation. To address these challenges, we first propose leveraging the robust image representations learned by ultrasound foundation models from vast datasets. Yet, applying these models to probe navigation is non-trivial due to their lack of understanding of individual 3D structures. To this end, we meticulously design a Vision-Action Adapter (VA-Adapter) to online inject the capability of understanding individual 3D structures. Specifically, by embedding the VA-Adapter into the foundation model's image encoder, the model can infer cardiac anatomy from historical vision-action sequences, mimicking the cognitive process of a sonographer. Extensive experiments on a dataset with over 1.31M samples demonstrate that the VA-Adapter outperforms strong probe guidance models while requiring approximately 33 times fewer trained parameters.

Method

VA-Adapter injects historical vision-action information into ultrasound foundation model encoders. The adapter enables the model to adapt robust 2D ultrasound representations to individualized 3D cardiac navigation, while keeping most foundation model parameters frozen.

Supported Foundation Models

This project supports VA-Adapter training on three ultrasound foundation models:

EchoCLIP
BiomedCLIP
USFM

The model wrappers are organized under models/:

models/echoclip_adapter.py
models/biomedclip_adapter.py
models/usfm_adapter.py
models/seq_model.py

Environment

The code is tested with the following core dependencies:

Python >= 3.8
PyTorch >= 2.1
timm == 1.0.15
open_clip_torch == 2.32.0

Install dependencies with:

pip install torch torchvision
pip install timm==1.0.15 open_clip_torch==2.32.0
pip install einops scipy matplotlib tqdm

Training

EchoCLIP + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_echoclip.py \
  --arch echoclip \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/echoclip \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --use_adapter

BiomedCLIP + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_biomedclip.py \
  --arch biomedclip \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/biomedclip \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --encoderpath pretrained_weights/biomed_clip.bin \
  --use_adapter

Here, --encoderpath pretrained_weights/biomed_clip.bin should point to the official BiomedCLIP pretrained visual encoder weights.

USFM + VA-Adapter

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python train_usfm.py \
  --arch usfm \
  --epochs 5 \
  --batch-size 256 \
  --lr 1e-4 --lr_f 1e-6 \
  --num-workers 8 --print-freq 50 \
  --timestep 4 \
  --data_root data \
  --logs logs/usfm \
  --dist-url 'tcp://127.0.0.1:23451' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 \
  --encoderpath pretrained_weights/USFM_latest.pth \
  --use_adapter

Here, --encoderpath pretrained_weights/USFM_latest.pth should point to the official USFM pretrained weights.

Outputs

Training logs and checkpoints are saved to the directory specified by --logs. The best checkpoint is saved according to validation MAE.

Reference

If you find our project useful in your research, please consider citing:

@misc{wang2026vaadapter,
      title={VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance}, 
      author={Teng Wang and Haojun Jiang and Yuxuan Wang and Zhenguo Sun and Yujiao Deng and Shiji Song and Gao Huang},
      year={2026},
      eprint={2510.06809},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.06809}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fig		fig
models		models
datasets.py		datasets.py
loss.py		loss.py
readme.md		readme.md
rotation.py		rotation.py
train_biomedclip.py		train_biomedclip.py
train_echoclip.py		train_echoclip.py
train_usfm.py		train_usfm.py
transformation.py		transformation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

MICCAI 2026 Early Accept paper (Top 9%)

Abstract

Method

Supported Foundation Models

Environment

Training

EchoCLIP + VA-Adapter

BiomedCLIP + VA-Adapter

USFM + VA-Adapter

Outputs

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VA-Adapter: Adapting Ultrasound Foundation Model to Echocardiography Probe Guidance

MICCAI 2026 Early Accept paper (Top 9%)

Abstract

Method

Supported Foundation Models

Environment

Training

EchoCLIP + VA-Adapter

BiomedCLIP + VA-Adapter

USFM + VA-Adapter

Outputs

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages