Histomorphology is crucial in breast cancer diagnosis. However, existing whole slide image (WSI) classification methods struggle to effectively incorporate histomorphology information, limiting their ability to capture key and fine-grained pathological features. To address this limitation, we propose a novel framework that explicitly incorporates histomorphology (tumor cellularity, cellular morphology, and tissue architecture) into WSI classification. Specifically, our approach consists of three key components: (1) estimating the importance of tumor-related histomorphology information at the patch level based on medical prior knowledge; (2) generating representative cluster-level features through histomorphology-driven cluster pooling; and (3) enabling WSI-level classification through histomorphology-driven multi-instance aggregation. With the incorporation of histomorphological information, our framework strengthens the model’s ability to capture key and fine-grained pathological patterns, thereby enhancing WSI classification performance. Experimental results demonstrate its effectiveness, achieving high diagnostic accuracy for molecular subtyping and cancer subtyping.
2025-02-27: We released the full version of HMDMIL, including models and train scripts.
- Environment: CUDA 11.8 / Python 3.10
- Create a virtual environment
> conda create -n hmdmil python=3.10 -y
> conda activate hmdmil- Install Pytorch 2.0.1
> pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
> pip install packaging- Install causal-conv1d
> pip install causal-conv1d==1.1.1- Install HMD
> git clone https://github.com/Badgewho/HMDMIL.git
> cd HMDMIL
> pip install requirements.txt
## Repository Details
# <!-- * `csv`: Complete Cbioportal files, including the features path and data splits with 5-fold cross-validation.
# * `datasets`: The code for Dataset, you can just replace the path in Line-25. -->
# * `mamba`: including the original Mamba, Bi-Mamba from Vim and our proposed SRMamba.
# * `models`: Support the following model:
# - [Mean pooling]
# - [Max pooling]
# - [ABMIL]
# - [TransMIL]
# - [S4MIL]
# - [MambaMIL]
# <!-- * `results`: the results on 12 datasets, including BLCA BRCA CESC CRC GBMLGG KIRC LIHC LUAD LUSC PAAD SARC UCEC. -->
# * `splits`: Splits for reproducation.
# * `train_scripts`: We provide train scripts for cancer subtyping.
## How to Train
### Prepare your data
1. Download diagnostic WSIs from [TCGA](https://portal.gdc.cancer.gov/) and [BRACS](https://www.bracs.icar.cnr.it/) and [BCNB](https://bupt-ai-cz.github.io/BCNB/)
2. Use the WSI processing tool provided by [Prov-gigapath](https://github.com/prov-gigapath/prov-gigapath) to extract pretrained feature for each 256 $\times$ 256 patch (20x), which we then save as `.h5` files for each WSI. So, we get one `h5_files` folder storing `.h5` files for all WSIs of one study.
The final structure of datasets should be as following:
```bash
DATA_ROOT_DIR/
└──h5_files/
└──dataset1/
├── slide_1.h5
├── slide_2.h5
└── ...
└──dataset2/
├── slide_1.pt
├── slide_2.pt
└── ...run the following code for training TCGA-BRCA cancer subtyping
sh ./train_scripts/TCGA-BRCA.shrun the following code for training BReAst Carcinoma Subtyping
sh ./train_scripts/BRACS.shrun the following code for training Molecular Subtyping
sh ./train_scripts/BCNB.shHuge thanks to the authors of following open-source projects:




