Skip to content

YH0517/AFLoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AFLoc

This repository provides the official implementation of Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis

Key Features

  • A generalizable vision-language pre-training model for annotation-free pathology localization.
  • We use a multi-level semantic structure-based contrastive learning to aligns multi-granularity medical concepts across reports and images.
  • Demonstrates strong generalizability to mulitiple modalities including chest X-rays and retinal fundus images.

Details

Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc lies in its extensive multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts from reports with abundant image features, to adapt to the diverse expressions of pathologies and unseen pathologies without the reliance on image annotations from experts. We demonstrate the proof of concept on Chest X-ray images, with extensive experimental validation across 6 distinct external datasets, encompassing 13 types of chest pathologies. The results demonstrate that AFLoc surpasses state-of-the-art methods in pathology localization and classification, and even outperforms the human benchmark in locating 5 different pathologies. Additionally, we further verify its generalization ability by applying it to retinal fundus images. Our approach showcases AFLoc's versatilities and underscores its suitability for clinical diagnosis in complex clinical environments.

Quantatitive pathological lesions localization results

Visualizations of pathological lesions localization

Links to download datasets

Get started

Main requirements

Our experiments are based on a server with two NVIDIA A100 80GB GPU, 512G memory, and Intel(R) Xeon(R) Gold 6326 CPU. Here are some core python packages:

torch==1.8.0

pytorch-lightning==1.1.4

transformers==4.2.1

torchvision==0.9.0

omegaconf==2.0.5

pydicom

numpy

pycocotools

Installation

# create a new conda environment
conda create -n AFLoc python=3.9
conda activate AFLoc

# install torch and pytorch-lightning
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install pytorch-lightning==1.1.4

# install other packages
pip install -r requirements.txt

Download pretraind model & preprocessing files

You can download our pretrained model and preprocessing files from this link.

Preprocessing

python preprocess/resize.py
python preprocess/preprocess.py

Pretraining

Update the directory to your own within afloc/constants.py. Then training AFLoc with following command:

python train.py -c ./afloc/config.yaml --train

Inference

Update the directory to your own within eval/constants.py. Then you can inference AFLoc on MS-CXR with following command:

python inference.py -ds MS_CXR --gpu 0 

Feedback and Contact

For further questions with the codes, please feel free to contact Hao Yang

License

This project is under the Apache License 2.0 license. See LICENSE for details.

Acknowledgement

Some codes are reference from GLoRIA, BioViL, and cheXlocalize. We thank the authors for making their valuable code & data publicly available.

Citation

If you find this repository useful, please consider citing this paper:

@article{afloc,
    title={Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis},
    author={Hao Yang, Hong-Yu Zhou, Zhihuan Li, Yuanxu Gao, Cheng Li, Weijian Huang, Jiarun Liu, Hairong Zheng, Kang Zhang, and Shanshan Wang},
    journal={arXiv preprint arXiv:2401.02044},
    year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages