Paper | Supplement | Website | Demo | Dataset
This repository contains the main training code and the pretrained weights of the CVPR 2024 paper: Rotation-Agnostic Image Representation Learning for Digital Pathology
- Overview
- HistoRotate: Rotation-Agnostic Training
- PathDino: Histopathology Vision Transformer
- Dataset Preparation
- PathDino Training
- PathDino Inference on Histopathology Image
- Results
- Citation
The proposed Whole Slide Image (WSI) analysis pipeline incorporates a fast patch selection method, (FPS), which efficiently selects representative patches while preserving spatial distribution. The second component, HistoRotate, introduces a (360°) rotation augmentation for training histopathology models. Unlike natural images, histopathology patch rotation enhances learning without altering contextual information. The third module, PathDino, is a compact histopathology Transformer with only five small vision transformer blocks and ≈9 million parameters, markedly fewer than alternatives. Customized for histology images, PathDino demonstrates superior performance and mitigates overfitting, a common challenge in histology image analysis.
HistoRotate is a
PathDino is a lightweight histopathology transformer consisting of just five small vision transformer blocks. PathDino is a customized ViT architecture, finely tuned to the nuances of histology images. It not only exhibits superior performance but also effectively reduces susceptibility to overfitting, a common challenge in histology image analysis.
MV@5 vs # Params vs FLOPs | PathDino vs HIPT vs DinoSSLPath |
---|---|
The proposed PathDino Pretraining Dataset. We extracted a total of
Overall, the patches tiled from TCGA stored in a data directory structure as follows:
TCGA
│ images_1024
│ └─000001.jpg
│ └─000002.jpg
│ └─000003.jpg
| images_512
│ └─000001.jpg
│ └─000002.jpg
│ └─000003.jpg
Note: It is not necessary to get all the patches in two different dimensions
Train PathDino on single GPU run the following command:
python PathDino_main_512.py \
--arch pathdino \
--lr 0.0005 \
--epochs 27 \
--batch_size_per_gpu 64 \
--data_path /path/to/data/root/dir/ \
--output_dir /path/for/the/output/ \
--num_workers 24 \
To train the same model in a distributed multi-GPU mode, e.g., 8 GPUs
:
python -m torch.distributed.launch --nproc_per_node=8 PathDino_main_512.py \
--lr 0.0005 \
--epochs 27 \
--batch_size_per_gpu 64 \
--data_path /path/to/data/root/dir/ \
--output_dir /path/for/the/output/ \
--num_workers 24 \
--host '28500' \
To extract embeddings from histopathology images using the pretrained PathDino model:
First, download the pretrained model PathDino512.pth
from the HuggingFace repo. Then, locate it in the ./inference
directory.
from PathDino import get_pathDino_model
from PIL import Image
import torch
histoImg = Image.open('./inference/img.png')
model, transformInput = get_pathDino_model(weights_path='./inference/PathDino512.pth')
img = transformInput(histoImg)
embedding = model(img.unsqueeze(0))
print(embedding.shape)
To visualize the activation maps as an animated GIFs:
python example_visualizeAttention_gif.py inference/img.png --output_dir output
The output activation maps will be saved in the output directory as PNG images and animated GIFs for each attention head.
The results presented in Table
@article{alfasly2023rotationagnostic,
title={Rotation-Agnostic Image Representation Learning for Digital Pathology},
author={Saghir Alfasly and Abubakr Shafique and Peyman Nejat and Jibran Khan and Areej Alsaafin and Ghazal Alabtah and H.R. Tizhoosh},
year={2023},
eprint={2311.08359},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgements
The code is built upon Dino