# STAPL-3D preprocessing demo

This notebook demonstrates the core components of the STAPL3D preprocessing pipeline: **z-stack shading** correction and **3D inhomogeneity** correction. 

If you did not follow the STAPL-3D README: please find STAPL-3D and the installation instructions [here](https://github.com/RiosGroup/STAPL3D) before doing this demo.

Because STAPL-3D is all about big datafiles, we provide small cutouts and precomputed summary data that will be downloaded while progressing through the notebook.

First, define where you have want the data to be downloaded by changing *projectdir*; default is the current demo directory.

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import os
import yaml
import urllib.request

projectdir = '.'
dataset = 'HFK16w'

datadir = os.path.join(projectdir, dataset)
os.makedirs(datadir, exist_ok=True)
filestem = os.path.join(datadir, dataset)


We define STAPL3D parameters preferably using a [yaml](https://yaml.org) parameter file. It has a simple structure and can be parsed in Python and `bash`. We will download the example, read it into a dictionary structure, list all entries and show the entry that contains information on the default directory structure for STAPL3D. 

In [2]:
parameter_file = '{}.yml'.format(filestem)

# Download the yml-file.
if not os.path.exists(parameter_file):
    url = 'https://surfdrive.surf.nl/files/index.php/s/Ubx9wVon5CIuIzo/download'
    urllib.request.urlretrieve(url, parameter_file)

# Load parameter file.
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

# Show contents.
cfg.keys()
cfg['dirtree']


dict_keys(['dirtree', 'dataset', 'shading', 'stitching', 'splitchannels', 'mask', 'biasfield', 'bias_stack', 'biasfield_apply', 'ims_aggregate', 'splitblocks', 'membrane_enhancement', 'segmentation', 'segmentation_postproc', 'segmentation_gather', 'relabel', 'relabel_gather', 'copyblocks', 'copyblocks_gather', 'zipping', 'zipping_gather', 'ziplines', 'zipquads', 'zipping_postproc', 'copydataset', 'subsegment', 'mergeblocks', 'features', 'features_postproc'])

{'datadir': {'blocks': 'blocks',
  'channels': 'channels',
  'shading': 'shading',
  'mask': None,
  'biasfield': 'biasfield',
  'profiling': 'profiling',
  'jobfiles': 'jobfiles'}}

## Shading correction

Shading correction (or flatfield correction) attempts to remove the intensity gradients that may be present in the xy-plane of the z-stacks that make up the dataset. These originate from imperfections in the microscope's optics and manifest as a grid over the assembled 3D volume. Because the shading is channel-specific, STAPL-3D estimates a 2D profile for each channel separately from the data.

We provide a 2-ch z-stack of data (106 x 1024 x 1024 x 2) in the data archive for demonstration purposes. These are two channels extracted from an 8-channel dataset of 262 stacks, i.e. ~0.1% of the data. The stack includes a nuclear channel (DAPI) and a membrane channel (NCAM1).


In [3]:
# Download the czi-file.

czi_filepath = '{}.czi'.format(filestem)
if not os.path.exists(czi_filepath):
    url = 'https://surfdrive.surf.nl/files/index.php/s/Ly85srzZmdWJCyJ/download'
    urllib.request.urlretrieve(url, czi_filepath)


We define the parameters to the shading correction module in the yaml parameter file.

In [4]:
# Load parameter file.
cfg['shading']


{'file_format': 'czi',
 'params': {'postfix': '_shading',
  'metric': 'median',
  'noise_threshold': 1000,
  'z_range': None,
  'quantile_threshold': 0.9,
  'polynomial_order': 3},
 'submit': {'n_workers': None,
  'channels': None,
  'array': 'channel',
  'nodes': 1,
  'tasks': 1,
  'mem': '20G',
  'wtime': '02:00:00'},
 'conda': {'env': 'stapl3d'}}

This means that, in this example, we calculate the *median* value for z-stacks concatenated over X and Y, while masking any value < *1000*. We use the *90%* of planes that have the highest median intensities to calculate the 1D shading profile that is fit using a *3rd order* polynomial. The resulting files of this processing step are postfixed with *_shading* 

The estimation of the shading profile is done in parallel for channels. The number of concurrent processes can be set by specifying 'n_workers' in the yml-file, or as an argument. The default is to use the same number of processors as there are channels in the dataset--if available. 

Note that for cluster-deployment (SGE or SLURM), more specific configurations can be set in the yaml (cfg['shading_estimation']).


Now run the shading estimation.

In [5]:
# non-parallelized
from stapl3d.preprocessing import shading
czi_filepath = '{}.czi'.format(filestem)
channel = 0
shading.estimate_channel(czi_filepath, channel, **cfg['shading']['params'])


AttributeError: module 'stapl3d.preprocessing.shading' has no attribute 'estimate_channel'

In [6]:
# parallelized
from stapl3d.preprocessing import shading
czi_filepath = '{}.czi'.format(filestem)
shading.estimate(czi_filepath, parameter_file)


IndexError: child index out of range

For each channel, this will write the estimated shading profile as an image (.tif) and a processing report (.pdf), as well as the calculated medians (.pickle) and a logfile (.log) to the *HFK16w/shading/* directory.

In [None]:
os.listdir(os.path.join(datadir, 'shading'))


For the single stack these do not look great, because the algorithm needs multiple z-stacks to reliably estimate the shading profile. Therefore, we provide pre-calculated medians for the full 262-stack dataset (in *HFK16w/shading_full*) to demonstrate the expected output. First, let's plot the results for a single channel: 

In [None]:
# Download and extract the shading_full directory.

shading_dir = os.path.join(datadir, 'shading_full')
zipfilepath = '{}.zip'.format(shading_dir)

if not os.path.exists(shading_dir):
    url = 'https://surfdrive.surf.nl/files/index.php/s/izqKOK7kQzr1nG1/download'
    urllib.request.urlretrieve(url, zipfilepath)

    import zipfile
    with zipfile.ZipFile(zipfilepath, 'r') as zf:
        zf.extractall(datadir)

# Generate and show the report of channel 0
shading.generate_report(shading_dir, 'HFK16w_full', channel=0)


The first row shows the result for concatenating the data over *X*, i.e. yielding a median value for each *yz*-coordinate. The left plot shows the medians of the planes (selected using a *quantile_threshold* parameter of 0.8) in rainbow colours. The right plot shows the normalized profile with confidence intervals as well as the normalized fit. The bottom left shows the median profile over *z*, with the selected planes indicated by tick marks. The bottom right image shows the 2D shading profile to use for correcting each plane in each z-stack of the channel. The red dashed traces indicates an -arbitrary- threshold to help with flagging potential issues with the data; the viridis colormap is also clipped to red  at this threshold.

Now let's run it for all channnels, sending the output to pdf's in *HFK16w/shading_full*.

In [None]:
for ch in range(8):
    shading.generate_report(shading_dir, 'HFK16w_full', channel=ch, ioff=True)


# Stitching

If you plan to use your own imaging data, we stitch in the proprietary Zeiss Zen or Arivis software packages. Please stitch and then convert the result to an Imaris or STAPL-3D file-format. For the bias field estimation demo below, we provide a downsampled image of the stitching result in hdf5 format. For the segmentation demo, we provide a cutout of the fully preprocessed file in Imaris format: *HFK16w_shading_stitching_biasfield.ims*. A free viewer for these data can be downloaded [here](https://imaris.oxinst.com/imaris-viewer). 

In [None]:
# TODO: split channels?

# Inhomogeneity correction.

We next correct the stitched file for inhomogeneities such as depth attenuation and uneven penetration of clearing agents and antibodies. This is done using the *N4* algorithm ([Tustison et al., 2010](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3071855) as implemented in simpleitk) on a downsampled image. For this demo, we provide the downsampled data in an hdf5 file.

We download the data, use the STAPL-3D Image class to get some info about this image, and display the parameters.

In [None]:
stitch_stem = '{}{}{}'.format(
    dataset,
    cfg['shading']['postfix'],
    cfg['stitching']['postfix'],
)
bfc_filepath = os.path.join(datadir, '{}.h5'.format(stitch_stem))

# Download the hdf5-file.
if not os.path.exists(bfc_filepath):
    url = 'https://surfdrive.surf.nl/files/index.php/s/WkMMCW5e4wgNUgb/download'
    urllib.request.urlretrieve(url, bfc_filepath)

# Print image info.
from stapl3d import Image
image_in = '{}/data'.format(bfc_filepath)
im = Image(image_in)
im.load(load_data=False)
im.get_props()
im.close()


The image dimensions are *zyxc* = *106 x 263 x 249 x 8* with voxels of *1.2 x 21.3 x 21.3* $\mu$m. This is a good size for estimating the slow variations over the volume. Default parameters are:

In [None]:
cfg['biasfield']


If an imaris pyramid image is provided, data will be taken at `resolution_level` and further downsampled with `downsample_factors`. The `n_iterations`, `n_fitlevels` and `n_bspline_cps` are passed to the [ITK-filter](https://simpleitk.org/doxygen/latest/html/classitk_1_1simple_1_1N4BiasFieldCorrectionImageFilter.html). 

To use a mask in the estimation, the `mask_in` input can either be 
 - set to `True`, in which case the path defaults to `{dataset}{cfg['mask']['postfix']}.h5/mask`
 - contain the path to a mask image (in which background should be `0`).
 The mask image is expected to be the same size as the input image, i.e. it will also be downsampled with `downsample_factors`.

Now run the N4 algorithm (on channel 4). SimpleITK will keep **all your processors busy for a while** in processing a single channel. (Note that for HPC cluster deployment, there is more control: channels are distributed over separate jobs, and the number of threads used for each channel can be set separately.)

Because the hdf5 file already contains downsampled data, we set `downsample_factors` to unitary.

In [None]:
from stapl3d.preprocessing import biasfield

channel = 4
biasfield_dir = os.path.join(datadir, cfg['dirtree']['datadir']['biasfield'])

biasfield.estimate_channel(
    image_in,
    channel=channel,
    downsample_factors=[1, 1, 1, 1, 1],
    n_iterations=2,
    n_fitlevels=4,
    n_bspline_cps=[5, 5, 5],
    outputdir=biasfield_dir,
    )


Next, we (re)generate the bias field correction report to inspect the result.

In [None]:
from stapl3d.preprocessing import biasfield
biasfield.generate_report(biasfield_dir, stitch_stem, channel=channel)


The left column shows orthogonal sections of the downsampled dataset for the uncorrected (top) and corrected data (middle) as well as the estimated bias field (bottom). Plotted on the left and top of the images are profiles of the median values over the three axes. The right column offers a closer comparison of the profiles (*mean + SD*) for the corrected (green) vs uncorrected (red) data. The bias field correction yields a much flatter profile for *z*, as well as *xy*. Low-frequency inhomogeneities are removed, while the detail of the specific staining is retained in the corrected data.

Note that above, we called the function for estimation for a single channel directly, foregoing the yaml file and specifying the parameters directly as arguments. To estimate the inhomogeneity for all channels a single call to the `biasfield.estimate` function suffices, after which `biasfield.apply` can be called to correct the high-resolution volumes:

```
biasfield.estimate(ims_file, par_file)
biasfield.apply(ims_file, par_file)
```
