## Cell Segmentation

Now we bring the nucleus segmentation, and cell foreground and boundary predictions together in order to obtain the complete cell instance segmentation. Here, we use a seeded watershed, where we use the nucleus instances as seeds, use the cell boundary predictions as height map for the watershed and the cell foreground prediction as mask. We use the watershed functionality from [skimage](https://scikit-image.org/) for this.

The goal of this lesson is to further explore post-processing for instance segmentation and to also learn how to quantitatively evaluate segmentation results.

In [None]:
# General imports and functionality for network prediction and watershed.
import os

import bioimageio.core
import h5py
import napari

from skimage.segmentation import watershed
from xarray import DataArray

In [None]:
# Define the paths to folders with the data and predictions.
# If you store the data somewhere else just change the 'data_folder' variable.

data_folder = "../data"
output_folder = os.path.join(data_folder, "predictions")

### 1. Implement Cell Segmentation

First, we implement the watershed based cell segmentation and visually check it for a test image.

In [None]:
# We load the model we have trained in the previous notebook.
model_path = os.path.join(data_folder, "trained_models/boundary-segmentation/boundary_segmentation_model.zip")
model = bioimageio.core.load_resource_description(model_path)

In [None]:
# And load the serum channel as well as the nucleus segmentation for one of the test images.
image_path = os.path.join(data_folder, "test/gt_image_048.h5")
prediction_path = os.path.join(data_folder, "predictions/gt_image_048.h5")

with h5py.File(image_path, "r") as f:
    image = f["raw/serum_IgG/s0"][:]
    
with h5py.File(prediction_path, "r") as f:
    nuclei = f["/segmentations/nuclei/watershed_based"][:]

In [None]:
# Next, we run prediction with the cell segmentation network.
# For details on the bioimageio functionality see the previous notebook on nucleus segmentation.
with bioimageio.core.create_prediction_pipeline(model) as pp:
    input_ = DataArray(image[None, None], dims=tuple("bcyx"))
    pred = bioimageio.core.predict_with_padding(pp, input_, padding={"x": 16, "y": 16})[0].values.squeeze()

In [None]:
# Check the predictions visually.
viewer = napari.Viewer()
viewer.add_image(image)
viewer.add_image(pred)
viewer.add_labels(nuclei)

In [None]:
# Run watershed to get the cell instance segmentation.
foreground, boundaries = pred
foreground = foreground > 0.5
cells = watershed(boundaries, markers=nuclei, mask=foreground)

In [None]:
# And check the result.
viewer = napari.Viewer()
viewer.add_image(image)
viewer.add_labels(cells)

### 2. Apply to Test Images

Now we apply this segmentation approach to all test images.

In [None]:
from glob import glob
from tqdm import tqdm

test_images = glob(os.path.join(data_folder, "test/*.h5"))
test_images.sort()

In [None]:
# Combine the prediction and watershed in a function.
def segment_cells(pp, image, nuclei):
    input_ = DataArray(image[None, None], dims=tuple("bcyx"))
    pred = bioimageio.core.predict_with_padding(pp, input_, padding={"x": 16, "y": 16})[0].values.squeeze()
    foreground, boundaries = pred
    foreground = foreground > 0.5
    cells = watershed(boundaries, markers=nuclei, mask=foreground)
    return cells

In [None]:
# And run this function for all test images, saving the results to hdf5.
with bioimageio.core.create_prediction_pipeline(model) as pp:
    for path in tqdm(test_images):
        out_path = os.path.join(output_folder, os.path.basename(path))
        with h5py.File(path, "r") as f:
            image = f["raw/serum_IgG/s0"][:]
        with h5py.File(out_path, "r") as f:
            nuclei = f["segmentations/nuclei/watershed_based"][:]
        cells = segment_cells(pp, image, nuclei)
        with h5py.File(out_path, "a") as f:
            f.create_dataset("segmentations/cells/watershed_based", data=cells, compression="gzip")

### 3.  Evaluate Cell Segmentation

We can now also quantitatively evaluate the cell segementation. We use the AP50 evaluation metric for it. It measures the [precision](https://en.wikipedia.org/wiki/Precision_and_recall) of the matches between the predicted segmentation and ground-truth segmentation. This is a standard evaluation metric for instance segmentations, and we use the implementation from [elf](https://github.com/constantinpape/elf).

In [None]:
import numpy as np
from elf.evaluation import matching

In [None]:
predictions = glob(os.path.join(output_folder, "*.h5"))
predictions.sort()
assert len(predictions) == len(test_images)

In [None]:
evaluation_scores = []
for image_path, pred_path in zip(test_images, predictions):
    with h5py.File(image_path, "r") as f:
        ground_truth = f["labels/cells/s0"][:]
    with h5py.File(pred_path, "r") as f:
        segmentation = f["segmentations/cells/watershed_based"][:]
    evaluation_scores.append(matching(segmentation, ground_truth)["precision"])
evaluation_score = np.mean(evaluation_scores)
print("The AP50 score for the cell segmentation is", evaluation_score)

### Exercises

- If you have trained different segmentation models in the previous notebook `torchem-train-cell-membrane-segmentation`, then compare the evaluation results between them.
- [Cellpose](https://github.com/MouseLand/cellpose) is a generalist method for cell segmentation that can directly be applied to our data. Run segmentation for the test images with it and compare the evaluation scores.
    - We are also working on adding a notebook that shows how to apply Cellpose to this data `cellpose_pretrained-cell-segmentation`, but this is work in progress.

### What's next

Now that we have obtained a cell classification we turn to classifying the cells into infected vs. non-infected in `3_cell_classification/pytorch_train-infection-classifier.ipynb`.