Skip to content

ChristophReich1996/Yeast-in-Microstructures-Dataset

Repository files navigation

An Instance Segmentation Dataset of Yeast Cells in Microstructures

arXiv License: MIT Framework

Christoph Reich , Tim Prangemeier , André O. Françani & Heinz Koeppl

1

This repository includes the official and maintained PyTorch validation (+ data loading & visualization) code of the Yeast in Microstructures dataset proposed in An Instance Segmentation Dataset of Yeast Cells in Microstructures.

wget https://tudatalib.ulb.tu-darmstadt.de/bitstream/handle/tudatalib/3799/yeast_cell_in_microstructures_dataset.zip

Update: We have released a high-resolution dataset of our microscopy images with panoptic annotations at ICCVW 2023. Check out our TYC dataset project page!

Abstract

Extracting single-cell information from microscopy data requires accurate instance-wise segmentations. Obtaining pixel-wise segmentations from microscopy imagery remains a challenging task, especially with the added complexity of microstructured environments. This paper presents a novel dataset for segmenting yeast cells in microstructures. We offer pixel-wise instance segmentation labels for both cells and trap microstructures. In total, we release 493 densely annotated microscopy images. To facilitate a unified comparison between novel segmentation algorithms, we propose a standardized evaluation strategy for our dataset. The aim of the dataset and evaluation strategy is to facilitate the development of new cell segmentation approaches.

If you use our dataset or find this research useful in your work, please cite our paper:

@inproceedings{Reich2023,
        title={{An Instance Segmentation Dataset of Yeast Cells in Microstructures}},
        author={Reich, Christoph and Prangemeier, Tim and Fran{\c{c}}ani, Andr{\'e} O and Koeppl, Heinz},
        booktitle={{International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)}},
        year={2023}
}

Table of Contents

  1. Installation
  2. Dataformat
  3. Dataset Class
  4. Evaluation
  5. Visualization
  6. Additional Unlabeled Data
  7. Acknowledgements

Installation

The validation, data loading, and visualization code can be installed as a Python package by running:

pip install git+https://github.com/ChristophReich1996/Yeast-in-Microstructures-Dataset.git

All dependencies are listed in requirements.txt.

Dataformat

The dataset is split into a training, validation, and test set. Please refer to the paper for more information on this.

├── test
│     ├── bounding_boxes
│     ├── classes
│     ├── inputs
│     └── instances
├── train
│     ├── bounding_boxes
│     ├── classes
│     ├── inputs
│     └── instances
└── val
      ├── bounding_boxes
      ├── classes
      ├── inputs
      └── instances

Every subset (train, val, and test) includes four different folders (inputs, instances, classes, bounding_boxes) . The inputs folder includes the input images each with the shape [128, 128]. The instances folder holds the instance maps of a shape of [N, 128, 128] (N is the number of instances). The classes holds the semantic class information of each instance as a tensor of shape [N]. The bounding_boxes folder offers axis-aligned bounding boxes for each instance of shape [N, 4 (x0y0x1y1)]. Every sample of the dataset has a .pt file in each of the four folders. The .pt file can directly be loaded as a PyTorch Tensor with torch.load(...). For details on the data loading please have a look at the dataset class implementation.

Dataset Class

This repo includes a PyTorch dataset class implementation (in the yim_dataset.data module) of the Yeast in Microstructures dataset, located in the module yim_dataset.data. The dataset class implementation loads the dataset and returns the images, instance maps, bounding boxes, and semantic classes.

import yim_dataset
from torch import Tensor
from torch.utils.data import Dataset

# Init dataset
dataset: Dataset = yim_dataset.data.YIMDataset(path="/some_path_to_data/train", return_absolute_bounding_box=False)
# Get first sample of the dataset
image, instances, bounding_boxes, class_labels = dataset[0]  # type: Tensor, Tensor, Tensor, Tensor
# Show shapes
print(image.shape)  # [1, 256, 256]
print(instances.shape)  # [N, 256, 256]
print(bounding_boxes.shape)  # [N, 4 (xcycwh, relative format)]
print(class_labels)  # [N, C=2 (trap=0 and cell=1)]

The dataset class implementation also offers support for custom Kornia data augmentations. You can pass an AugmentationSequential object to the dataset class. The following example utilizes random horizontal and vertical flipping as well as random Gaussian blur augmentations.

import kornia.augmentation
import yim_dataset
from torch.utils.data import Dataset

# Init augmentations
augmentations = kornia.augmentation.AugmentationSequential(
    kornia.augmentation.RandomHorizontalFlip(p=0.5),
    kornia.augmentation.RandomVerticalFlip(p=0.5),
    kornia.augmentation.RandomGaussianBlur(kernel_size=(31, 31), sigma=(9, 9), p=0.5),
    data_keys=["input", "bbox_xyxy", "mask"],
    same_on_batch=False,
)
# Init dataset
dataset: Dataset = yim_dataset.data.YIMDataset(path="/some_path_to_data/train", augmentations=augmentations)

Note that it is necessary to pass ["input", "bbox_xyxy", "mask"] as data keys! If a different data key configuration is given a runtime error is raised.

For wrapping the dataset with the PyTorch DataLoader please use the custom collide function.

from typing import List

import yim_dataset
from torch import Tensor
from torch.utils.data import Dataset, DataLoader

# Init dataset
dataset: Dataset = yim_dataset.data.YIMDataset(path="/some_path_to_data/train", return_absolute_bounding_box=False)
data_loader = DataLoader(
    dataset=dataset,
    num_workers=2,
    batch_size=2,
    drop_last=True,
    collate_fn=yim_dataset.data.collate_function_yim_dataset,
)
# Get a sample from dataloader
images, instances, bounding_boxes, class_labels = next(
    iter(data_loader))  # type: Tensor, List[Tensor], List[Tensor], List[Tensor]
# Show shapes
print(images.shape)  # [B, 1, 256, 256]
print(instances.shape)  # list([N, 256, 256])
print(bounding_boxes.shape)  # list([N, 4 (xcycwh, relative format)])
print(class_labels)  # list([N, C=2 (trap=0 and cell=1)])
All Dataset Class Parameters

YIMDataset parameters:

Parameter Default value Info
path: str - Path to dataset as a string.
augmentations: Optional[AugmentationSequential] None Augmentations to be used. If None no augmentation is employed.
normalize: bool True If true images are normalized by the given normalization function.
normalization_function: Callable[[Tensor], Tensor] normalize (0 mean, unit std.) Normalization function.
return_absolute_bounding_box: bool False If true BBs returned absolut format (else relative)

We provide a full dataset and data loader example in example_eval.py.

If this dataset class implementation is not sufficient for your application please customize the existing code or open a pull request with extending the existing implementation.

Evaluation

We propose to validate segmentation predictions on our dataset by using the Panoptic Quality and the cell class IoU. We implement both metrics as a TorchMetrics metric in the yim_dataset.eval module. Both metrics (PanopticQuality and CellIoU) can be used like all TorchMetrics metrics. The input to both metrics is the prediction, composed of the instance maps (list of tensors) and semantic class prediction (list of tensors), and the label is also composed of instance maps and semantic classes. Note that the instance maps are not allowed to overlap. Additionally, both metrics assume thresholded instance maps and hard semantic classes (no logits).

import yim_dataset
from torchmetrics import Metric

pq: Metric = yim_dataset.eval.PanopticQuality()
cell_iou: Metric = yim_dataset.eval.CellIoU()

for index, (images, instances, bounding_boxes, class_labels) in enumerate(data_loader):
    # Make prediction
    instances_pred, bounding_boxes_pred, class_labels_pred = model(
        images)  # type: List[Tensor], List[Tensor], List[Tensor]
    # Get semantic classes form one-hot vector
    class_labels = [c.argmax(dim=-1) for c in class_labels]
    class_labels_pred = [c.argmax(dim=-1) for c in class_labels_pred]
    # Compute metrics
    pq.update(
        instances_pred=instances_pred,
        classes_pred=class_labels_pred,
        instances_target=instances,
        classes_target=class_labels,
    )
    cell_iou.update(
        instances_pred=instances_pred,
        classes_pred=class_labels_pred,
        instances_target=instances,
        classes_target=class_labels,
    )
# Compute final metric
print(f"Panoptic Quality: {pq.compute().item()}")
print(f"Cell class IoU: {cell_iou.compute().item()}")

A full working example is provided in example_eval.py.

Visualization

This implementation (yim_dataset.vis module) also includes various functions for reproducing the plots from the paper. The instance segmentation overlay (image + instance maps + BB + classes), as shown at the top, can be achieved by:

import yim_dataset
from torch import Tensor
from torch.utils.data import Dataset

# Init dataset
dataset: Dataset = yim_dataset.data.YIMDataset(path="/some_path_to_data/train", return_absolute_bounding_box=False)
# Get first sample of the dataset
image, instances, bounding_boxes, class_labels = dataset[0]  # type: Tensor, Tensor, Tensor, Tensor
# Plot 
yim_dataset.vis.plot_image_instances_bb_classes(
    image=image,
    instances=instances,
    bounding_boxes=yim_dataset.data.bounding_box_xcycwh_to_x0y0x1y1(bounding_boxes),
    class_labels=class_labels.argmax(dim=1),
    save=False,
    show=True,
    show_class_label=True,
)

All plot functions entail the parameter show: bool and save: bool. If show=True the plot is directly visualized by calling plt.show(). If you want to save the plot to a file set save=True and provide the path and file name (file_path: str).

An example use of all visualization functions is provided in example_vis.py.

Additional Unlabeled Data

Note that there are also additional unlabeled data available from the same domain. In the paper Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell we proposed an unlabeled dataset of ~9k images (sequences) of yeast cells in microstructures. The dataset is available at TUdatalib. Please cite the following paper if you are using the unlabeled images in your research:

@inproceedings{Reich2021,
        title={{Multi-StyleGAN: Towards Image-Based Simulation of Time-Lapse Live-Cell Microscopy}},
        author={Reich, Christoph and Prangemeier, Tim and Wildner, Christian and Koeppl, Heinz},
        booktitle={{International Conference on Medical image computing and computer-assisted intervention (MICCAI)}},
        year={2021},
        organization={Springer}
}

Acknowledgements

We thank Christoph Hoog Antink for insightful discussions, Klaus-Dieter Voss for aid with the microfluidics fabrication, Jan Basrawi for contributing to data labeling, and Robert Sauerborn for aid with setting up the project page.

Credit to TorchMetrics (Lightning AI) , Kornia, and PyTorch for providing the basis of this implementation.

This work was supported by the Landesoffensive für wissenschaftliche Exzellenz as part of the LOEWE Schwerpunkt CompuGene. H.K. acknowledges the support from the European Research Council (ERC) with the consolidator grant CONSYN ( nr. 773196). C.R. acknowledges the support of NEC Laboratories America, Inc.