Fully Automated Stain Quantification Framework for IHC Whole Slide Images in Breast Cancer

Tuo Yin, Frédéric Lifrange, Zoë Denis, Alex de Caluwé, Laurence Buisseret, Xavier Catteau, Clara Legros, Nick Reynaert, Jennifer Dhont

Framework Overview

Overview of the proposed H-scoring framework, consisting of three modules: (1) a tumor–stroma segmentation module (TSM), (2) a nuclei segmentation module (NSM), and (3) an H-score estimation module (HEM).

Environment and Runtime

On two NVIDIA RTX A6000 GPUs, with the CUDA toolkit enabled and the dependencies listed in requirements.txt installed, this approach enables H-score estimation for a gigapixel immunohistochemistry (IHC) whole slide image (WSI) in approximately 2 minutes.

Framework Training

1. Dataset preparation

Organize your training data in the dataset folder as follows:

dataset/                       # All images are in patches of 512 × 512 pixels  
│── TSM                        # Tumor-stroma segmentation dataset
│   │── train                  # Training set (for training and validation)
│   │   │── patches_H          # H-stains of IHC patches
│   │   │── patches_ann_gray   # Annotations (pixel values 0–3 correspond to four classes)
│
│── NSM                        # Nuclei segmentation dataset
│   │── train
│   │   │── patches            # IHC patches
│   │   │── patches_H          # H-stains of IHC patches
│   │   │── nuclei_mask        # Annotations (pixel values 255 or 0 = nuclei or others)
│
│── HEM                        # H-score estimation dataset
│   │── train
│   │   │── patches_HEM        # Tumor cell mask & stroma nuclei mask (255 or 0 = target or others)
│   │   │── patch_labels.xlsx  # Columns: patch, patch_NH, patch_H, patch_N

2. Training

python main_train_models.py --root_dir "(Root directory where the project)" --TSM_model "MoCo-SM" --NSM_model "Triple UNet" --HEM_model "VGG16Regression" --batch_size 8 --TSM_lr 0.0001 --NSM_lr 0.0001 --HEM_lr 0.001 --epochs 200

Note:
Choose TSM model from MoCo-SM (best performance), MobileNetV3, UNet, UNet++, or DeepLabV3+.
Choose HEM model from VGG16Regression (best performance), StainIntensityNet (computationally efficient), or RAM-CNN.

Framework Inference

1. Data preparation

The framework accepts whole-slide images (WSIs) in .ndpi (40× magnification; 0.25 µm per pixel) or .png (20× magnification; 0.5 µm per pixel) format.

Place WSIs in .ndpi format in the data/WSIs_ndpi folder.
Place WSIs in .png format in the data/WSIs_png folder, and comment out the following line in main_inference.py (Line 26):

convert_images_to_png_infolder(root_dir)

2. Inference

Either use:
(1) Your trained models (model parameters have been saved in the checkpoint folder after training), or
(2) The models trained on our internal dataset ([Download model weights]).

python main_inference.py --root_dir "/home/yin/pycharm/github_code" --TSM_model "MoCo-SM" --NSM_model "Triple UNet" --HEM_model "VGG16Regression" --Aggregation "Endtoend"

Note:
Choose TSM and HEM models in the same way as during training.

Key Results

The framework achieved a Spearman’s rank correlation (Spearman's ρ) in internal validation of 0.84 (95% confidence interval [CI]: 0.77–0.89) across 100 expert-annotated WSIs, outperforming state-of-the-art (ρ=0.78, 95% CI: 0.68–0.85) and matching the inter-observer variability between two expert pathologists (ρ=0.84, 95% CI: 0.63–0.94).

Comparison of H-scores (a) and H-score rankings (b) of WSIs from the internal test set (MHC-I) as assessed by two pathologists and estimated by the proposed framework (green) and a state-of-the-art (SOTA) method (blue).

Usage and License Notices

All users are responsible for reviewing the output of the developed framework to determine whether the framework meets the user’s needs and for validating and evaluating the framework before any clinical use.

Citation

If you find the framework useful for your your research and applications, please cite using this BibTeX:

@article{yin2026fully,
  title={Fully Automated Stain Quantification Framework for IHC Whole Slide Images in Breast Cancer},
  author={Yin, Tuo and Lifrange, Fr{\'e}d{\'e}ric and Denis, Zo{\"e} and de Caluw{\'e}, Alex and Buisseret, Laurence and Catteau, Xavier and Legros, Clara and Reynaert, Nick and Dhont, Jennifer},
  journal={Technology in Cancer Research \& Treatment},
  volume={25},
  pages={15330338251407734},
  year={2026},
  publisher={SAGE Publications Sage CA: Los Angeles, CA}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fully Automated Stain Quantification Framework for IHC Whole Slide Images in Breast Cancer

Framework Overview

Environment and Runtime

Framework Training

1. Dataset preparation

2. Training

Framework Inference

1. Data preparation

2. Inference

Key Results

Usage and License Notices

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
dataset		dataset
images		images
models		models
utils		utils
README.md		README.md
main_inference.py		main_inference.py
main_train_models.py		main_train_models.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Fully Automated Stain Quantification Framework for IHC Whole Slide Images in Breast Cancer

Framework Overview

Environment and Runtime

Framework Training

1. Dataset preparation

2. Training

Framework Inference

1. Data preparation

2. Inference

Key Results

Usage and License Notices

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages