Histomorphology-driven multi-instance learning for breast cancer WSI classification

NEWS

Abstract

Histomorphology is crucial in breast cancer diagnosis. However, existing whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information, limiting their ability to capture key and fine-grained pathological features. To address this limitation, we propose a novel framework that explicitly incorporates histomorphology (tumor cellularity, cellular morphology, and tissue architecture) into WSI classification. Specifically, our approach consists of three key components: (1) estimating the importance of tumor-related histomorphology information at the patch level based on medical prior knowledge; (2) generating representative cluster-level features through histomorphology-driven cluster pooling; and (3) enabling WSI-level classification through histomorphology-driven multi-instance aggregation. With the incorporation of histomorphological information, our framework strengthens the model’s ability to capture key and fine-grained pathological patterns, thereby enhancing WSI classification performance. Experimental results demonstrate its effectiveness, achieving high diagnostic accuracy for molecular subtyping and cancer subtyping.

NOTES

2025-02-27: We released the full version of HMDMIL, including models and train scripts.

Installation

Environment: CUDA 11.8 / Python 3.10
Create a virtual environment

> conda create -n hmdmil python=3.10 -y
> conda activate hmdmil

Install Pytorch 2.0.1

> pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
> pip install packaging

Install causal-conv1d

> pip install causal-conv1d==1.1.1

Install HMD

> git clone https://github.com/Badgewho/HMDMIL.git
> cd HMDMIL
> pip install requirements.txt


## Repository Details

# <!-- * `csv`:  Complete Cbioportal files, including the features path and data splits with 5-fold cross-validation. 
# * `datasets`: The code for Dataset, you can just replace the path in Line-25. -->
# * `mamba`: including the original Mamba, Bi-Mamba from Vim and our proposed SRMamba.
# * `models`: Support the following model:
#   - [Mean pooling]
#   - [Max pooling]
#   - [ABMIL] 
#   - [TransMIL]
#   - [S4MIL]
#   - [MambaMIL]
# <!-- * `results`: the results on 12 datasets, including BLCA BRCA CESC CRC GBMLGG KIRC LIHC LUAD LUSC PAAD SARC UCEC. -->
# * `splits`: Splits for reproducation.
# * `train_scripts`: We provide train scripts for cancer subtyping.

## How to Train
### Prepare your data
1. Download diagnostic WSIs from [TCGA](https://portal.gdc.cancer.gov/) and [BRACS](https://www.bracs.icar.cnr.it/) and [BCNB](https://bupt-ai-cz.github.io/BCNB/)
2. Use the WSI processing tool provided by [Prov-gigapath](https://github.com/prov-gigapath/prov-gigapath) to extract pretrained feature for each 256 $\times$ 256 patch (20x), which we then save as `.h5` files for each WSI. So, we get one `h5_files` folder storing `.h5` files for all WSIs of one study.

The final structure of datasets should be as following:
```bash
DATA_ROOT_DIR/
    └──h5_files/
        └──dataset1/
            ├── slide_1.h5
            ├── slide_2.h5
            └── ...
        └──dataset2/
            ├── slide_1.pt
            ├── slide_2.pt
            └── ...

run the following code for training TCGA-BRCA cancer subtyping

sh ./train_scripts/TCGA-BRCA.sh

run the following code for training BReAst Carcinoma Subtyping

sh ./train_scripts/BRACS.sh

run the following code for training Molecular Subtyping

sh ./train_scripts/BCNB.sh

Different distributions of histomorphology-driven importance scores

Acknowledgements

Huge thanks to the authors of following open-source projects:

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
dataset		dataset
figs		figs
labelcsv		labelcsv
mamba		mamba
models		models
split		split
train_scripts		train_scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
creat_dataset.py		creat_dataset.py
ds		ds
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Histomorphology-driven multi-instance learning for breast cancer WSI classification

NEWS

Abstract

NOTES

Installation

Different distributions of histomorphology-driven importance scores

Acknowledgements

License & Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Histomorphology-driven multi-instance learning for breast cancer WSI classification

NEWS

Abstract

NOTES

Installation

Different distributions of histomorphology-driven importance scores

Acknowledgements

License & Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages