Skip to content

Badgewho/HMDMIL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Histomorphology-driven multi-instance learning for breast cancer WSI classification

NEWS

Abstract

Histomorphology is crucial in breast cancer diagnosis. However, existing whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information, limiting their ability to capture key and fine-grained pathological features. To address this limitation, we propose a novel framework that explicitly incorporates histomorphology (tumor cellularity, cellular morphology, and tissue architecture) into WSI classification. Specifically, our approach consists of three key components: (1) estimating the importance of tumor-related histomorphology information at the patch level based on medical prior knowledge; (2) generating representative cluster-level features through histomorphology-driven cluster pooling; and (3) enabling WSI-level classification through histomorphology-driven multi-instance aggregation. With the incorporation of histomorphological information, our framework strengthens the model’s ability to capture key and fine-grained pathological patterns, thereby enhancing WSI classification performance. Experimental results demonstrate its effectiveness, achieving high diagnostic accuracy for molecular subtyping and cancer subtyping.

NOTES

2025-02-27: We released the full version of HMDMIL, including models and train scripts.

Installation

  • Environment: CUDA 11.8 / Python 3.10
  • Create a virtual environment
> conda create -n hmdmil python=3.10 -y
> conda activate hmdmil
  • Install Pytorch 2.0.1
> pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
> pip install packaging
  • Install causal-conv1d
> pip install causal-conv1d==1.1.1
  • Install HMD
> git clone https://github.com/Badgewho/HMDMIL.git
> cd HMDMIL
> pip install requirements.txt


## Repository Details

# <!-- * `csv`:  Complete Cbioportal files, including the features path and data splits with 5-fold cross-validation. 
# * `datasets`: The code for Dataset, you can just replace the path in Line-25. -->
# * `mamba`: including the original Mamba, Bi-Mamba from Vim and our proposed SRMamba.
# * `models`: Support the following model:
#   - [Mean pooling]
#   - [Max pooling]
#   - [ABMIL] 
#   - [TransMIL]
#   - [S4MIL]
#   - [MambaMIL]
# <!-- * `results`: the results on 12 datasets, including BLCA BRCA CESC CRC GBMLGG KIRC LIHC LUAD LUSC PAAD SARC UCEC. -->
# * `splits`: Splits for reproducation.
# * `train_scripts`: We provide train scripts for cancer subtyping.

## How to Train
### Prepare your data
1. Download diagnostic WSIs from [TCGA](https://portal.gdc.cancer.gov/) and [BRACS](https://www.bracs.icar.cnr.it/) and [BCNB](https://bupt-ai-cz.github.io/BCNB/)
2. Use the WSI processing tool provided by [Prov-gigapath](https://github.com/prov-gigapath/prov-gigapath) to extract pretrained feature for each 256 $\times$ 256 patch (20x), which we then save as `.h5` files for each WSI. So, we get one `h5_files` folder storing `.h5` files for all WSIs of one study.

The final structure of datasets should be as following:
```bash
DATA_ROOT_DIR/
    └──h5_files/
        └──dataset1/
            ├── slide_1.h5
            ├── slide_2.h5
            └── ...
        └──dataset2/
            ├── slide_1.pt
            ├── slide_2.pt
            └── ...

run the following code for training TCGA-BRCA cancer subtyping

sh ./train_scripts/TCGA-BRCA.sh

run the following code for training BReAst Carcinoma Subtyping

sh ./train_scripts/BRACS.sh

run the following code for training Molecular Subtyping

sh ./train_scripts/BCNB.sh

Different distributions of histomorphology-driven importance scores

Image 1
Image 2
Image 3
Image 4

Acknowledgements

Huge thanks to the authors of following open-source projects:

License & Citation

About

Histomorphology-driven multi-instance learning for breast cancer WSI classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors