Tuo Yin, Frédéric Lifrange, Zoë Denis, Alex de Caluwé, Laurence Buisseret, Xavier Catteau, Clara Legros, Nick Reynaert, Jennifer Dhont
Overview of the proposed H-scoring framework, consisting of three modules: (1) a tumor–stroma segmentation module (TSM), (2) a nuclei segmentation module (NSM), and (3) an H-score estimation module (HEM).
On two NVIDIA RTX A6000 GPUs, with the CUDA toolkit enabled and the dependencies listed in requirements.txt installed, this approach enables H-score estimation for a gigapixel immunohistochemistry (IHC) whole slide image (WSI) in approximately 2 minutes.
Organize your training data in the dataset folder as follows:
dataset/ # All images are in patches of 512 × 512 pixels
│── TSM # Tumor-stroma segmentation dataset
│ │── train # Training set (for training and validation)
│ │ │── patches_H # H-stains of IHC patches
│ │ │── patches_ann_gray # Annotations (pixel values 0–3 correspond to four classes)
│
│── NSM # Nuclei segmentation dataset
│ │── train
│ │ │── patches # IHC patches
│ │ │── patches_H # H-stains of IHC patches
│ │ │── nuclei_mask # Annotations (pixel values 255 or 0 = nuclei or others)
│
│── HEM # H-score estimation dataset
│ │── train
│ │ │── patches_HEM # Tumor cell mask & stroma nuclei mask (255 or 0 = target or others)
│ │ │── patch_labels.xlsx # Columns: patch, patch_NH, patch_H, patch_N
python main_train_models.py --root_dir "(Root directory where the project)" --TSM_model "MoCo-SM" --NSM_model "Triple UNet" --HEM_model "VGG16Regression" --batch_size 8 --TSM_lr 0.0001 --NSM_lr 0.0001 --HEM_lr 0.001 --epochs 200
Note:
Choose TSM model from MoCo-SM (best performance), MobileNetV3, UNet, UNet++, or DeepLabV3+.
Choose HEM model from VGG16Regression (best performance), StainIntensityNet (computationally efficient), or RAM-CNN.
The framework accepts whole-slide images (WSIs) in .ndpi (40× magnification; 0.25 µm per pixel) or .png (20× magnification; 0.5 µm per pixel) format.
Place WSIs in .ndpi format in the data/WSIs_ndpi folder.
Place WSIs in .png format in the data/WSIs_png folder, and comment out the following line in main_inference.py (Line 26):
convert_images_to_png_infolder(root_dir)
Either use:
(1) Your trained models (model parameters have been saved in the checkpoint folder after training), or
(2) The models trained on our internal dataset ([Download model weights]).
python main_inference.py --root_dir "/home/yin/pycharm/github_code" --TSM_model "MoCo-SM" --NSM_model "Triple UNet" --HEM_model "VGG16Regression" --Aggregation "Endtoend"
Note:
Choose TSM and HEM models in the same way as during training.
The framework achieved a Spearman’s rank correlation (Spearman's ρ) in internal validation of 0.84 (95% confidence interval [CI]: 0.77–0.89) across 100 expert-annotated WSIs, outperforming state-of-the-art (ρ=0.78, 95% CI: 0.68–0.85) and matching the inter-observer variability between two expert pathologists (ρ=0.84, 95% CI: 0.63–0.94).
Comparison of H-scores (a) and H-score rankings (b) of WSIs from the internal test set (MHC-I) as assessed by two pathologists and estimated by the proposed framework (green) and a state-of-the-art (SOTA) method (blue).
All users are responsible for reviewing the output of the developed framework to determine whether the framework meets the user’s needs and for validating and evaluating the framework before any clinical use.
If you find the framework useful for your your research and applications, please cite using this BibTeX:
@article{yin2026fully,
title={Fully Automated Stain Quantification Framework for IHC Whole Slide Images in Breast Cancer},
author={Yin, Tuo and Lifrange, Fr{\'e}d{\'e}ric and Denis, Zo{\"e} and de Caluw{\'e}, Alex and Buisseret, Laurence and Catteau, Xavier and Legros, Clara and Reynaert, Nick and Dhont, Jennifer},
journal={Technology in Cancer Research \& Treatment},
volume={25},
pages={15330338251407734},
year={2026},
publisher={SAGE Publications Sage CA: Los Angeles, CA}
}
