PathLDM: Text conditioned Latent Diffusion Model for Histopathology

Official code for our WACV 2024 publication PathLDM: Text conditioned Latent Diffusion Model for Histopathology. This codebase builds heavily on CompVis/latent-diffusion

Requirements

To install python dependencies,

conda env create -f environment.yaml
conda activate ldm

Downloading + Organizing Data

tl;dr : TCGA-BRCA Image patches, captions and Tumor/TIL probabilities used in our training can be downloaded from this link. See this file for the Dataset class we use during training.

We obtained machine readable text reports for TCGA from this repo, and used GPT-3.5 to summarize them. Summaries of all BRCA reports can be found at this link.

Obtaining Tumor and TIL probabilities

We used wsinfer to obtain tumor and TIL probabilities. Wsinfer works directly with the WSI files, and outputs a csv with the probabilities for each patch, but the size and magnification might be different from the patches extracted by DSMIL. For each 10x patch, we use the average probabilities of the overlapping patches from wsinfer.

Download the WSIs

We used the DSMIL repository to extract 256 x 256 patches @ 10x magnification, resulting in 3.2 million patches for TCGA-BRCA. The following steps are borrowed from the DSMIL repository.

From GDC data portal. You can use GDC data portal with a manifest file and configuration file. The raw WSIs take about 1TB of disc space and may take several days to download. Please check details regarding the use of TCGA data portal. Otherwise, individual WSIs can be download manually in GDC data portal repository

Prepare the patches

Once you clone the DSMIL repository, you can use the following command to extract patches from the WSIs.

$ python deepzoom_tiler.py -m 0 -b 10

Pretrained models

We provide the following trained models

Conditioning network	Conditioning type	Modality	FID	Link
Class embedder	Tumor + TIL	Class label (4 classes)	29.45	link
OpenAI CLIP	Report + tumor + TIL	Text caption (154 tokens)	10.64	link
PLIP	Report + tumor + TIL	Text caption (154 tokens)	7.64	link

Training

To train a diffusion model, create a config file similar to this and create / update the corresponding dataloader (ex this). To download frozen VAEs, follow instructions in the original LDM repo.

Example training command :

python main.py -t --gpus 0,1 --base configs/latent-diffusion/text_cond/plip_imagenet_finetune.yaml

Sampling

This notebook shows how to sample from the text conditioned diffusion model.

BibTeX

@InProceedings{Yellapragada_2024_WACV,
    author    = {Yellapragada, Srikar and Graikos, Alexandros and Prasanna, Prateek and Kurc, Tahsin and Saltz, Joel and Samaras, Dimitris},
    title     = {PathLDM: Text Conditioned Latent Diffusion Model for Histopathology},
    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    month     = {January},
    year      = {2024},
    pages     = {5182-5191}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
ldm		ldm
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
example_sampling.ipynb		example_sampling.ipynb
main.py		main.py
pip_list.txt		pip_list.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

ldm

ldm

.gitignore

.gitignore

README.md

README.md

environment.yaml

environment.yaml

example_sampling.ipynb

example_sampling.ipynb

main.py

main.py

pip_list.txt

pip_list.txt

setup.py

setup.py

Repository files navigation

PathLDM: Text conditioned Latent Diffusion Model for Histopathology

Requirements

Downloading + Organizing Data

Obtaining Tumor and TIL probabilities

Download the WSIs

Prepare the patches

Pretrained models

Training

Sampling

BibTeX

About

Releases

Packages

Languages

cvlab-stonybrook/PathLDM

Folders and files

Latest commit

History

Repository files navigation

PathLDM: Text conditioned Latent Diffusion Model for Histopathology

Requirements

Downloading + Organizing Data

Obtaining Tumor and TIL probabilities

Download the WSIs

Prepare the patches

Pretrained models

Training

Sampling

BibTeX

About

Resources

Stars

Watchers

Forks

Languages