## Nucleus Segmentation with a pre-trained network

As the first step we will segment the nuclei in the input images. The nucleus segmentation will be used to segment the full cells by using a seeded watershed starting from the nuclei (will be discussed in more detail in one of the next lessons). Here, we will use a model from [bioimage.io](https://bioimage.io/#/) for nucleus segmentation. This model was trained on data from the [DSB Nucleus Segmentation](https://www.kaggle.com/c/data-science-bowl-2018), which contains images quite similar to the nucleus channel here. Hence, it works quite well without changing the model at all.

The goal of this session is to learn how to apply a pre-trained model from [bioimage.io](https://bioimage.io/#/) using the [bioimageio.core](https://github.com/bioimage-io/core-bioimage-io-python) python library.

Note: there are several other deep learning based tools for nucleus segmentation. In particular, [stardist](https://github.com/stardist/stardist) is a versatile and robust choice for this task. The notebook `stardist_pretrained-nucleus-segmentation` demonstrates how to use it for this task instead of the model here (work in progress!). 

In [None]:
# import general purpose libraries
import os

import h5py
import napari
import numpy as np

In [None]:
# define the paths to folders with data and output.
# if you store the data somewhere else just change the 'data_folder' variable.
data_folder = "../data"
output_folder = os.path.join(data_folder, "predictions")
os.makedirs(output_folder, exist_ok=True)

### 1. Load the nucleus segmentation model

We use this model from bioimage.io: https://bioimage.io/#/?tags=nuclei&id=10.5281%2Fzenodo.5764892.
It is a U-Net that was trained to predict foreground and boundaries in microscopy images with nucleus staining.

In [None]:
# import libraries for bioimageio
import bioimageio.core
from xarray import DataArray

In [None]:
# affable-shark is the nickname of the model we want to use.
# load_resource_description downloads this model and loads it into memory
# in the representation of the bioimageio.core library
model_name = "affable-shark"
model = bioimageio.core.load_resource_description(model_name)

### 2. Check the model

Next we run prediction for one image with this model and visualize the prediction with napari.

In [None]:
# we load the channel with nucleus staining for one of the test images (from hdf5)
image_path = os.path.join(data_folder, "test/gt_image_040.h5")
with h5py.File(image_path, "r") as f:
    image = f["raw/nuclei/s0"][:]

In [None]:
# run prediction with the bioimageio.core library using the prediction pipeline class,
# which applies the pre-and-postprocessing defined in the bioimageio model specification
# as well as the deep learning model (here: U-Net) itself
with bioimageio.core.create_prediction_pipeline(model) as pp:
    input_ = DataArray(image[None, None], dims=tuple("bcyx"))
    prediction = pp(input_)[0].squeeze().values

In [None]:
# check the prediction in napari
viewer = napari.Viewer()
viewer.add_image(image)
viewer.add_image(prediction)

### 3. Implement post-processing to get the nucleus instance segmentation

As you have seen the model predicts foreground and nucleus boundaries. However, we want to segment individual nuclei, i.e. an "image" where each nucleus is marked by an unique id. So we post-process the network predictions to get the instance segmentation:
- compute the distance map to the boundary predictions
- set it to zero outside of the predicted foreground
- find the distance maxima
- run seeded watershed from these maxima using the boundary predictions as height map

We choose this approach in order to separate touching nuclei with weak boundary evidence.

In [None]:
# the functions from scipy and skimage we need to implement the instance segmentation procedure
from scipy.ndimage import distance_transform_edt
from skimage.feature import peak_local_max
from skimage.filters import gaussian
from skimage.segmentation import watershed

In [None]:
# compute the distances to nearest boundaries inside the predicted foreground
foreground, boundaries = prediction
foreground = foreground > 0.5
boundary_distances = distance_transform_edt(boundaries < 0.1)
boundary_distances[~foreground] = 0
boundary_distances = gaussian(boundary_distances)

In [None]:
# find the seeds (= maxima of the distance map) and run seeded watershed
seed_points = peak_local_max(boundary_distances, min_distance=5, exclude_border=False)
seeds = np.zeros(foreground.shape, dtype="uint32")
seeds[seed_points[:, 0], seed_points[:, 1]] = np.arange(1, len(seed_points) + 1)
nucleus_segmentation = watershed(boundaries, markers=seeds, mask=foreground)

In [None]:
# check the segmentation result and visualize the intermediates
viewer = napari.Viewer()
viewer.add_image(image)
viewer.add_image(boundaries)
viewer.add_image(boundary_distances)
viewer.add_points(seed_points)
viewer.add_labels(nucleus_segmentation)

### 4. Apply to all test data

After checking the predictions and segmentation procedure for one image we apply it to all test images.

In [None]:
from glob import glob
from tqdm import tqdm

In [None]:
# get all the test files
input_files = glob(os.path.join(data_folder, "test", "*.h5"))

In [None]:
# check what shapes we have in the images
shapes = [np.array(h5py.File(path, "r")["raw/nuclei/s0"].shape) for path in input_files]
print("Image shapes:", np.unique(shapes, axis=0))

In [None]:
# check the shape that is expected by the model
print(model.inputs[0].shape)

When you run above code you will see that we have two different image shapes: `(930, 1024)`, `(1024, 1024)`. And you will see that the model has the following input shape description: `ParametrizedInputShape(min=[1, 1, 64, 64], step=[0, 0, 16, 16])`. This description means that it expects inputs of a minimal shape of `(64, 64)` and that the input shape needs to be divisible by `16`. Hence, the image height `930` is not a valid input to the model. That's why we use the function `predict_with_padding` below, which automatically pads the input image to the expected input shape, runs prediction and then crops the prediction results back to the input shape. 

In [None]:
# a function that wraps the prediction and segmentation functionality we have tested above
def run_segmentation(pp, image, min_distance=5, sigma=1.0):
    input_ = DataArray(image[None, None], dims=tuple("bcyx"))
    # prediction with padding to deal with images with incompatible input shape
    prediction = bioimageio.core.predict_with_padding(pp, input_, padding={"x": 16, "y": 16})[0].squeeze().values
    foreground, boundaries = prediction
    foreground = foreground > 0.5
    boundary_distances = distance_transform_edt(boundaries < 0.1)
    boundary_distances[~foreground] = 0
    boundary_distances = gaussian(boundary_distances, sigma)
    seed_points = peak_local_max(boundary_distances, min_distance=min_distance, exclude_border=False)
    seeds = np.zeros(foreground.shape, dtype="uint32")
    seeds[seed_points[:, 0], seed_points[:, 1]] = np.arange(1, len(seed_points) + 1)
    nucleus_segmentation = watershed(boundaries, markers=seeds, mask=foreground)
    return nucleus_segmentation

In [None]:
# run segmentation for all test images
with bioimageio.core.create_prediction_pipeline(model) as pp:
    for path in tqdm(input_files, desc="Run nucleus segmentation"):
        with h5py.File(path, "r") as f:
            image = f["raw/nuclei/s0"][:]
        segmentation = run_segmentation(pp, image)
        output_path = os.path.join(output_folder, os.path.basename(path))
        with h5py.File(output_path, "a") as f:
            f.create_dataset("segmentations/nuclei/watershed_based", data=segmentation, compression="gzip")

### Exercises

- We have segmented the nuclei, but haven't evaluated the quality of the instance segmentation yet. For example, we can evaluate its quality by computing the average precision at 50% overlap (AP50) when comparing the ground-truth nucleus segmentation to our segmentation results. Implement the evaluation procedure:
    - First, compute the ground-truth nucleus instance segmentation by applying connected components ([skimage.measure.label](https://scikit-image.org/docs/stable/api/skimage.measure.html#skimage.measure.label)) to the semantic nucleus segmentation ground-truth, which is stored in the key `labels/infected/nuclei/s0`.
    - Then compute the AP50 score using the function [elf.evaluation.matching](https://github.com/constantinpape/elf/blob/master/elf/evaluation/matching.py#L129).
    - Hint: you can check out a similar evaluation procedure in the notebook `2_cell_segmentation/cell_segmentation.ipynb`.
- [stardist](https://github.com/stardist/stardist) is another pre-trained method that can be used to segment nuclei, and is generally more robust than the approach used here. Segment the nuclei from the test set with it, run evaluation and compare the results with our segmentations from here.
    - We are also working on an example notebook for showing how to apply segmentation with stardist to this data in `stardist_pretrained-nucleus-segmentation.ipynb` but it is not finished yet.

### What's next

Next we turn to segmenting the full cells (i.e. cytosol and nucleus). For this, we will first train a network for predicting cell foreground and boundaries in `2_cell_segmentation/torchem-train-cell-membrane-segmentation.ipynb`.