# STAPL-3D segmentation demo

This notebook demonstrates the core components of the STAPL-3D segmentation pipeline: **blockwise segmentation** and **zipping**.

If you did not follow the STAPL-3D README: please find STAPL-3D and the installation instructions [here](https://github.com/RiosGroup/STAPL3D) before doing this demo.

Because STAPL-3D is all about big datafiles, we provide small cutouts and precomputed summary data that will be downloaded while progressing through the notebook.

First, define where you have want the data to be downloaded by changing *projectdir*; default is the current demo directory.

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import os
import yaml
import urllib.request

projectdir = '.'
dataset = 'HFK16w'

datadir = os.path.join(projectdir, dataset)
os.makedirs(datadir, exist_ok=True)
filestem = os.path.join(datadir, dataset)


In [None]:
from stapl3d.segmentation import segment
import yaml

parameter_file = '{}.yml'.format(filestem)
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)
cfg['segmentation']

cfg['segmentation']['compactness'] = 0.002
cfg['segmentation']['steps'] = [6, 7, 8]

filepath = '/Users/michielkleinnijenhuis/projects/STAPL3D/demos/HFK16w/blocks/HFK16w_shading_stitching_biasfield_00000-00240_00000-00240_00000-00106.h5'
segment.cell_segmentation(filepath, **cfg['segmentation'])


We define STAPL3D parameters preferably using a [yaml](https://yaml.org) parameter file. It has a simple structure and can be parsed in Python and `bash`. We will download the example, read it into a dictionary structure, list all entries and show the entry that contains information on the default directory structure for STAPL3D. 

In [None]:
parameter_file = '{}.yml'.format(filestem)

# Download the yml-file.
if not os.path.exists(parameter_file):
    url = 'https://surfdrive.surf.nl/files/index.php/s/Ubx9wVon5CIuIzo/download'
    urllib.request.urlretrieve(url, parameter_file)

# Load parameter file.
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

# Show contents.
cfg.keys()
cfg['dirtree']


## Parallelization

We provided a preprocessed data cutout in the Imaris v5.5 file format. which is an hdf5 file with 5 dimensions (free [Imaris Viewer](https://imaris.oxinst.com/imaris-viewer) is nowadays available; and the file format can be inspected with [HDFview](https://www.hdfgroup.org/downloads/hdfview/) or with `h5ls` or `h5py`.

We download the file and name it according to the default STAPL-3D pipeline conventions.

In [None]:
# Download the ims-file.

biasfield_stem = '{}{}{}{}'.format(
    dataset,
    cfg['shading']['postfix'],
    cfg['stitching']['postfix'],
    cfg['biasfield']['postfix'],
)
ims_filepath = os.path.join(datadir, '{}.ims'.format(biasfield_stem))

if not os.path.exists(ims_filepath):
    url = 'https://surfdrive.surf.nl/files/index.php/s/NxWhUWuLQBHPMGV/download'
    urllib.request.urlretrieve(url, ims_filepath)


We use the STAPL-3D Image class to load this file and inspect it's properties. We'll also save the dimensions, the Z-dimension and the number of channels in convenience variables `dims`, `Z` and `C`.


In [None]:
from stapl3d import Image

image_in = ims_filepath
im = Image(image_in)
im.load(load_data=False)
props = im.get_props()
im.close()

dims = im.dims
Z = im.dims[im.axlab.index('z')]
C = im.dims[im.axlab.index('c')]

props


In processing the full dataset, this cutout of **106 x 1408 x 1408 x 8** would equate to a single datablock, but for this demo we will further subdivide this block to demonstrate the pipeline.

For segmentation, we use a weighted sum of the membrane channels (ch3, ch5, ch6, ch7). The weights [0.5, 0.5, 1.0, 1.0] work well for this data.
We have specified this in the parameter file HFK16w.yml:

In [None]:
import yaml

parameter_file = '{}.yml'.format(filestem)
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

cfg['blocks']
cfg['dataset']


The above indicates that, in addition to the membrane sum, we generate a nuclear channel mean as well as a mean over all channels (used for generating masks). Importantly, we specify that we want to output channel 0 (DAPI), because we will use it to create a nuclear mask.

Next, we specify the shape of the processing blocks. Usually we would opt for a blocksize of ~100-200 million voxels; now we chose a blocksize in *xy* of 176 for 64 blocks of ~6M voxels. We keep the margin similar to what we set for big datasets as reducing it may hinder adequate analysis.

In [None]:
bs = cfg['dataset']['bs']  # blocksize
bm = cfg['dataset']['bm']  # blockmargin

# NOTE: imaris layout: zyxct
blocksize = [Z, bs, bs, C, 1]
blockmargin = [0, bm, bm, 0, 0]

blockdir = os.path.join(datadir, cfg['dirtree']['datadir']['blocks'])
block_prefix = os.path.join(blockdir, biasfield_stem)
os.makedirs(blockdir, exist_ok=True)

'Processing data in blocks of {} voxels with a margin of {} voxels'.format(blocksize, blockmargin)


Now we are ready to call the function that computes the membrane mean, and splits the data into blocks at the same time. Datablocks are written to the *HFK16w/blocks/* directory and are postfixed by the voxel coordinates of the original datafile HFK16w/blocks/HFK16w_**x-X_y_Y_z-Z**.h5. 

In [None]:
from stapl3d import blocks

blocks.split(image_in, parameter_file)


These are some of the files that were generated:

In [None]:
from glob import glob

filelist = glob(os.path.join(blockdir, '{}_*.h5'.format(biasfield_stem)))
filelist.sort()
len(filelist), filelist[:5]


The resulting hdf5 files have the following internal file structure:
    - .h5/mean
    - .h5/chan/ch00
    - .h5/memb/mean
    - .h5/nucl/mean

## Membrane enhancement

Before segmentation, we perform membrane enhancement.

For the demo we do not want to be dependent on the third-party [ACME](https://wiki.med.harvard.edu/SysBio/Megason/ACME) software and provide the output that otherwise results from the ACME procedure. We split it into blocks, and write it as separate datasets in the same files as the channel data.

If you have ACME installed, set an `ACME` path environment variable or point `ACMEdir` to the directory with the binaries.

In [None]:
ACMEdir = os.environ.get('ACME')

if ACMEdir:
    # Perform membrane enhancement.
    from stapl3d.segmentation import enhance
    enhance.estimate(image_in, parameter_file)
else:
    # Download precomputed membrane enhancement.
    acme_filepath = os.path.join(datadir, '{}_ACME.h5'.format(biasfield_stem))
    if not os.path.exists(acme_filepath):
        url = 'https://surfdrive.surf.nl/files/index.php/s/oQcxIocFBkaXwJe/download'
        urllib.request.urlretrieve(url, acme_filepath)

    # Split into blocks
    from stapl3d import blocks
    for ids in ['memb/preprocess', 'memb/planarity']:
        acme_image = '{}/{}'.format(acme_filepath, ids)
        output_template = '{}_{}.h5/{}'.format(block_prefix, '{}', ids)
        blocks.splitblocks(acme_image, blocksize[:3], blockmargin[:3], output_template)



## Segmentation

The segmentation is parallelized over the blocks we just created. Each of the 64 files is processed seperately.
The segmentation routine is associated with a fair amount of parameters. This list all the parameters specified in the yml-file.

In [None]:
len(filelist), filelist[:5]
cfg['segmentation']


A few parameter of particular note:
- input volumes:
        'ids_memb_mask': 'memb/planarity'
        'ids_memb_chan': 'memb/mean'
        'ids_nucl_chan': 'chan/ch00'
        'ids_dset_mean': 'mean'

The following parameters can be changed to optimize segmentation or use parameters from automated fine tuning:
- membrane mask:
    - 'planarity_thr': 0.0005
- nuclei mask:
    - 'sauvola_window_size': [19, 75, 75]
    - 'dapi_thr': 5000
    - 'dapi_absmin': 1000
- peak detection:
    - 'peaks_size': [11, 19, 19]
    - 'compactness': 0.8
- watershed:
    - 'memb_sigma': 3.0
  

Next we calculate the segments for each block. Segmentation time of single block is in the minutes-range. The 106 x 240 x 240 blocksize (including the margin) will take ~1GB of memory per process. Please set the number of processes so that you will stay within RAM. `n_proc = 8` would be a fairly safe bet for modern systems; `n_proc = 0` results in using all available processors.

In [None]:
n_proc = 0

from stapl3d.segmentation import segment
segment.estimate(image_in, parameter_file, n_workers=n_proc)


Report pages (pdf) have been written to the *HFK16w/blocks/* directory. Let's look at one of them:

In [None]:
from stapl3d.segmentation.segment import generate_report

block_idx = 20
generate_report('{}/memb/mean'.format(filelist[block_idx]), ioff=False)


From left to right, images are show for:
 - the DAPI channel and the membrane mean
 - the nuclear mask and the membrane mask
 - the combined mask with detected peaks and overlaid on the distance transform image
 - the first and the final watershed results

# zipping

Having parallelized the segmentation process for increased analysis speed and reduced memory footprint, the need arises to reassemble the blocks into a final combined segmentation volume without seams at the block boundaries. These seams are a consequence of trivial parallelization in processing the individual blocks (i.e. without communication between the processes). They manifest through partial cells lying on the block boundaries that have been assigned different labels in different blocks. Importantly, these doubly segmented cells may not perfectly match up over the boundary. These block-boundary-segments need to be resegmented in order to complete the accurate segmentation of the full dataset. We refer to this correct reassembly of the datablocks as ‘zipping’. In short, it consists of identifying the segments lying on the boundaries, removing them, and resegmenting that space. We aimed to design the procedure such that it requires minimal computational resources and expertise (fast, with a low memory footprint, and without the need for communication between processes).

## relabel and copy blocks
We first perform a sequential relabeling of all the blocks to make each label unique.
We copy the relabeled blocks to new datasets in the same file for writing the zip-results in-place.

In [None]:
from stapl3d.segmentation import zipping

# Relabel the blocks sequentially
zipping.relabel(image_in, parameter_file)

# Copy the relabeled segmentation
zipping.copyblocks(image_in, parameter_file)

# Write a file with the maximum of all the labels in the block.
seg_pf = cfg['segmentation']['segments_ods']
zip_pf = '{}_{}'.format(cfg['relabel']['postfix'], cfg['copyblocks']['postfix'])
zip_path_int = 'segm/{}_{}'.format(seg_pf, zip_pf)

from stapl3d import get_paths
filestem = os.path.splitext(get_paths(image_in)['fname'])[0]
filename = '{}_maxlabels_{}_{}.txt'.format(filestem, seg_pf, zip_pf)
maxlabelfile = os.path.join(blockdir, filename)
maxlabels = zipping.get_maxlabels_from_attribute(filelist, zip_path_int, maxlabelfile)
'maxlabs after copy {}'.format(maxlabels)


## zip
There is a one-liner for computing all the steps in the zip:
```
zipping.estimate(image_in, parameter_file)
```
For this demo, we will be much more verbose to illustrate the zipping process. We use Python's multiprocessing for distributed processing if calling the functionality from within a Python interpreter. Next, we define the zipping parameters and functions. First, we set the number of processors, the block-layout and the zipping parameters.


In [None]:
import multiprocessing

# Set the maximum number of processors to use
n_proc_max = 8

# Determine the number of seams in the data.
n_seams_yx, seamgrid = zipping.get_zip_layout(image_in, blocksize)
'Zipjob: {} seamlines over X; {} seamlines over Y'.format(n_seams_yx[1], n_seams_yx[0])

# Argument list to `zipping.resegment_block_boundaries`
images_in = ['{}/{}'.format(datafile, zip_path_int) for datafile in filelist]
blocksize = blocksize[:3]
blockmargin = blockmargin[:3]
axis = 0
seamnumbers = [-1, -1, -1]
mask_dataset = ''
relabel = False
maxlabel = maxlabelfile
in_place = True
outputstem = os.path.join(blockdir, dataset)
save_steps = False
args = [images_in, blocksize, blockmargin, axis, seamnumbers,
        mask_dataset, relabel, maxlabel, in_place, outputstem, save_steps]


We define a convenience function that merges datablocks into a single volume and returns a single z-plane for display.

In [None]:
from stapl3d import blocks

import matplotlib as mpl
import matplotlib.pyplot as plt

def merge_and_slice_dset(filelist, ids, dims, bs, bm, slc=20):

    # Merge the datablocks.
    images_in=['{}/{}'.format(datafile, ids)
               for datafile in filelist]
    filename = '{}.h5/{}'.format(dataset, ids)
    outputpath=os.path.join(datadir, filename)

    blocks.mergeblocks(
        images_in=images_in,
        blocksize=[dims[0], bs, bs],
        blockmargin=[0, bm, bm],
        fullsize=dims[:3],
        outputpath=outputpath,
    )

    # Get a slice of the merged data.
    im = Image(outputpath)
    im.load()
    im.slices[0] = slice(slc, slc + 1, 1)
    data = im.slice_dataset()
    im.close()

    return data


Let's check with the membrane mean blocks. This should output an image of 1408 x 1408.

In [None]:
ids = 'memb/mean'
img = merge_and_slice_dset(filelist, ids, dims, bs, bm)

plt.imshow(img, cmap='gray', vmax=5000)
plt.show()


In the zipping procedure, we employ an order such that no blocks are handled concurrently. First, blocks with overlap in the Y-dimension are processed (odd and even zip-lines separately); then X-ziplines; then the corners where four datablocks overlap are resegmented. For demo purpose, we keep track of the output for each step and store it in `imgs`.

In [None]:
# Resegment zip-lines in 4 groups: horizontal/even, horizontal/odd, vertical/even, vertical/odd zip-lines.

imgs = []

for axis, n_seams in zip([1, 2], n_seams_yx):

    n_proc = min(n_proc_max, int(np.ceil(n_seams / 2)))

    for offset in [0, 1]:

        # do the zip-step
        zipping.compute_zip_step(
            args, axis, seamgrid,
            starts=[offset, 0], stops=[n_seams, 1], steps=[2, 2],
            n_proc=n_proc,
        )

        # update maxlabels
        maxlabels = zipping.get_maxlabels_from_attribute(filelist, zip_path_int, maxlabelfile)

        # keep image for display
        imgs.append(merge_and_slice_dset(filelist, zip_path_int, dims, bs, bm))

f, axs = plt.subplots(1, 4, figsize=(24, 24))
for img, ax in zip(imgs, axs):
    ax.imshow(img)
plt.show()


Newly processed zip-lines are assigned high labels indicated in yellow of the viridis colormap, nicely demonstrating the zipping process.

The zip-lines still have seams in the places where they intersect. Next we process zip-quads, in which the segments on these intersections are resegmented to finish the zip. 

In [None]:
# Resegment zip-quads in 4 groups: even/even, even/odd, odd/even, odd/odd zip-line intersections

imgs = []

for start_y in [0, 1]:

    for start_x in [0, 1]:

        # do the zip-step
        zipping.compute_zip_step(
            args, axis=0, seamgrid=seamgrid,
            starts=[start_y, start_x], stops=n_seams_yx, steps=[2, 2],
            n_proc=n_proc,
        )

        # update maxlabels
        maxlabels = zipping.get_maxlabels_from_attribute(filelist, zip_path_int, maxlabelfile)

        # keep image for display
        imgs.append(merge_and_slice_dset(filelist, zip_path_int, dims, bs, bm))

f, axs = plt.subplots(1, 4, figsize=(24, 24))
for img, ax in zip(imgs, axs):
    ax.imshow(img)
plt.show()


To visualize the segments in the more common random colors, we relabel, shuffle and plot.

In [None]:
from skimage.segmentation import relabel_sequential
from skimage.color import label2rgb
from random import shuffle

img = merge_and_slice_dset(filelist, zip_path_int, dims, bs, bm)

img = relabel_sequential(img)[0]

ulabels = np.unique(img[:])[1:]
relabeled = [l for l in range(0, len(ulabels))]
shuffle(relabeled)

img = np.array([0] + relabeled)[img]

f = plt.figure(figsize=(12, 12))
plt.imshow(label2rgb(img))
plt.show()


In STAPL-3D, we use rich multidimensional data to obtain a robust segmentation. We can also use the information we have to perform subcellular segmentation. Here, we split segments in nucleus and membrane subsegments such that we can specifically extract intensities from the appropriate voxels for the type of staining (nuclear or membranal). In addition, the subsegmentation opens up possibilities for defining compound features that inform on internal cell structure.


In [None]:
seg_path = os.path.join(datadir, '{}.h5/{}'.format(dataset, zip_path_int))

from stapl3d.segmentation import segment
segment.subsegment(image_in, parameter_file)


In [None]:
for pf in ['full', 'memb', 'nucl']:
    ids = 'segm/labels_memb_del_relabeled_fix_{}'.format(pf)
    merge_and_slice_dset(filelist, zip_path_int, dims, bs, bm)


Let's have a look at a corner of the section to visualize the subcellular compartments.

In [None]:
from stapl3d import LabelImage
from skimage.color import label2rgb

slc = 20

# get background images
ids_n = 'nucl/dapi_preprocess'
dapi = merge_and_slice_dset(filelist, ids_n, dims, bs, bm)
ids_m = 'memb/mean_smooth'
memb = merge_and_slice_dset(filelist, ids_m, dims, bs, bm)

f, axs = plt.subplots(1, 3, figsize=(24, 24))
segs = [
    'segm/labels_memb_del_relabeled_fix_full', 
    'segm/labels_memb_del_relabeled_fix_memb',
    'segm/labels_memb_del_relabeled_fix_nucl',
]
bgs = [memb, dapi, memb]

for ax, seg, bg in zip(axs, segs, bgs):
    seg_path = os.path.join(datadir, '{}.h5/{}'.format(dataset, seg))
    im = LabelImage(seg_path)
    im.load()
    im.slices[0] = slice(slc, slc + 1, 1)
    img = im.slice_dataset()
    im.close()

    img = img[:500,:500]
    bg = bg[:500,:500] * 5
    clabels = label2rgb(img, image=bg, alpha=1.0, bg_label=0)
    ax.imshow(clabels)

plt.show()
