Skip to content

Self-Supervised Vision Transformers for multiplexed imaging datasets

License

Notifications You must be signed in to change notification settings

JacobHanimann/scDINO

Repository files navigation

Self-Supervised Vision Transformers for multi-channel single-cells images

Application of DINO for automated microscopy-derived fluorescent imaging datasets of single cells and instructions on how to run subsequent downstream analyses with trained Vision Transformers (ViTs) of non-RGB multi-channel images. See Emerging Properties in Self-Supervised Vision Transformers for the original DINO implementation and Self-supervised vision transformers accurately decode cellular state heterogeneity for the adaption described here. [DINO arXiv] [scDINO bioRxiv]

DINO illustration

Check out our recent publication, Cellular Architecture Shapes the Naïve T Cell Response, in Science Magazine. We used scDINO to identify distinct T cell phenotypes by examining over 30,000 single-cell crops of CD4 and CD8 T cells from healthy donors. We trained ViT-S/16 models exclusively on CD3 single-channel images, and downstream analysis to investigate the phenotypic heterogeneity was performed by clustering the CLS-Token latent space and visualizing it with the TopOMetry framework [Science].

Science Sfig1A
Science Sfig1A

Further demonstration of the usefulness of the DINO framework for image-based biological discovery is presented in the preprint, Unbiased single-cell morphology with self-supervised vision transformers. This work demonstrates that self-supervised vision transformers can encode cellular morphology at various scales, from subcellular to multicellular [bioRxiv].

This codebase provides:

  • Workflow to run analyses of multi-channel image datasets (non-RGB) with publicly available self-supervised Vision Transformers (DINO-ss-ViTs) from [DINO arXiv] and with scDINO (scDINO-ss-ViTs) introduced in our paper [scDINO bioRxiv]
  • Workflow to train ViTs on multi-channel single-cell images generated by automated microscopy using scDINO and subsequently run downstream analyses

Pretrained models

Public available ss-ViTs pretrained on Imagenet with DINO

This table is adapted from the official DINO repository. You can choose to download the weights of the pretrained backbone used for downstream tasks, or the full checkpoint containing backbone and projection head weights for both student and teacher networks. Detailed arguments and training/evaluation logs are provided. Note that DeiT-S and ViT-S names refer exactly to the same architecture.

arch download
DINO-ss-ViT-S/16 backbone only full ckpt args logs
DINO-ss-ViT-S/8 backbone only full ckpt args logs
DINO-ss-ViT-B/16 backbone only full ckpt args logs
DINO-ss-ViT-B/8 backbone only full ckpt args logs

scDINO ss-ViTs pretrained on high-content imaging data of single immune cells

Here you can donwload the pretrained single-cell DINO (scDINO) ss-ViTs used in our article [scDINO bioRxiv]. The ViTs are pretrained on the Deep phenotyping PBMC Image Set of Y.Severin, a high-content imaging dataset containing labeled single-cell images of 8 different immune cell classes from multiple healthy donors. Here we provide the scDINO-ss-ViT-S/16 full checkpoint trained for 100 epochs.

arch download
scDINO-ss-ViT-S/16 full ckpt args logs

Requirements

This codebase has been developed on a linux machine with python version 3.8, snakemake 7.20.0, torch 1.8.1, torchvision 0.9.1 and a HPC cluster running with the slurm workload manager. All required python packages and corresponding version for this setup can be found in the requirements.txt file.

Analyse non-RGB multi-channel images with pretrained ViTs

In Figure 1 of our manuscript we show [scDINO bioRxiv] how DINO-ss-ViTs can be applied to decipher stem cell heterogeneity using single-cell images derived from high-content imaging. These single-cell images images are not RGB-based, but composed of several separate microscopy-derived greyscale images that were combined in one multi-channel TIFF image. To be able to use these multi-channel input images in combination with ViTs, we load the values of a TIFF input file as a multidimensional pytorch tensor in the Multichannel_dataset(datasets.ImageFolder) Class in the compute_CLS_features.py which is used to construct the pytorch dataset object.

analysis plots

Run all 3 analyses at once

To send a job to the slurm cluster to compute the CLS Token representations, visualise their embeddings using UMAP and generate example attention images all at once, use the only_downstream_snakefile snakemake file and the only_downstream_snakefile.yml configuration file.

Example submission:

snakemake -s only_downstream_snakefile all \
--configfile="configs/only_downstream_analyses.yaml" \
--keep-incomplete \
--drop-metadata \
--keep-going \
--cores 8 \
--jobs 40 \
-k \
--cluster "sbatch --time=01:00:00 \
--gpus=1 \
-n 8 \
--mem-per-cpu=9000 \
--output=slurm_output_evaluate.txt \
--error=slurm_error_evaluate.txt" \
--latency-wait 45 \

All configurations and parameters (metadata and hyperparameters) of the job can be set in the only_downstream_snakefile.yml file. The results will be saved in the output_dir folder. Instead of running all 3 analyses at once, you can also run them separately, by specifying the target rule in the snakemake command.

Compute [CLS] Token representations

The representation of an image is given by the output of the [CLS] Token in form of a numeric vector with dimensionality d = 384 for ViT-S and d = 768 for ViT-B. To compute a [CLS] feature space for a given dataset, prepare the configuration variables in the downstream_analyses: subsection in only_downstream_snakefile.yaml.

To learn more about the args in the configuration file for the computation of the features, run:

pyscripts/compute_CLS_features.py --help

Visualise CLS token space using UMAP

To get a glimpse of the feature space, we can use the UMAP algorithm to project multidimensional vectors into a 2D embedding. The UMAP parameters can be adjusted in the downstream_analyses: umap_eval: subsection of the config file.

Run k-NN evaluation

To quantitatively evaluate label-specific clustering, we can run a k-NN evaluation to get a global clustering score across classes. The kNN parameters can be adjusted in the configuration file in the downstream_analyses: kNN: subsection.

Visualisation of the CLS Token-based Self-Attention Mechanism of ss-ViTs

DINO illustration

To visualise the CLS token-based self-attention of the ss-ViTs, attention maps can be generated for each image class. Our default settings randomly pick 1 image per image class in the given dataset. The attention maps are saved in the attention_maps subfolder of the output_dir in the results folder. Each attention head is saved as a separate image. Additionally, for each original multi-channel input image, all channels are separately saved as a single image.

scDINO training and evaluation on greyscale multi-channel images

immunocell_plot

To train your own Vision Transformers on a given dataset from scratch and subsequently evaluate them on downstream tasks (with automatic train and test split), use the full_pipeline_snakefile and the scDINO_full_pipeline_snakefile.yml configuration file.

Example submission:

snakemake -s full_pipeline_snakefile all \
--configfile="configs/scDINO_full_pipeline.yaml" \
--keep-incomplete \
--drop-metadata \
--cores 8 \
--jobs 40 \
-k \
--cluster "sbatch --time=04:00:00 \
--gpus=2 \
-n 8 \
--mem-per-cpu=9000 \
--output=slurm_output.txt \
--error=slurm_error.txt" \
--latency-wait 45 \

To reproduce the scDINO-ss-ViT-S/16 used in our manuscript, download the Deep phenotyping PBMC Image Set of Y.Severin and define the path to the dataset in the config file under dataset_dir.

License

This repository adheres to the Apache 2.0 license. You can find more information on this in the LICENSE file.

Citation

If you find this adaption useful for your research, please consider citing us:

@article {Pfaendler2023.01.16.524226,
	author = {Pfaendler, Ramon and Hanimann, Jacob and Lee, Sohyon and Snijder, Berend},
	title = {Self-supervised vision transformers accurately decode cellular state heterogeneity},
	year = {2023},
	doi = {10.1101/2023.01.16.524226},
	URL = {https://www.biorxiv.org/content/early/2023/01/18/2023.01.16.524226},
	eprint = {https://www.biorxiv.org/content/early/2023/01/18/2023.01.16.524226.full.pdf},
	journal = {bioRxiv}
}