# STAPL-3D preprocessing demo

This notebook demonstrates the core components of the STAPL3D preprocessing pipeline: **z-stack shading** correction and **3D inhomogeneity** correction. 

If you came here via an unorthodox route, please find STAPL-3D and the installation instructions [here](https://github.com/RiosGroup/STAPL3D).

Because STAPL-3D is all about big datafiles, we provide small cutouts and precomputed summary data. Please, download [HFK16w.zip](https://surfdrive.surf.nl/files/index.php/s/Q9wRT5cyKGERxI5) (~6GB). Note that the Preprocessing and Segmentation demo's use the same zip-file.

First, define where you have put the data. Please change *datadir* to point to the *HFK16w* directory that you have unzipped.

In [None]:
import os

datadir = './HFK16w'
dataset = 'HFK16w'
filestem = os.path.join(datadir, dataset)


## Shading correction

Shading correction (or flatfield correction) attempts to remove the intensity gradients that may be present in the xy-plane of the z-stacks that make up the dataset. These originate from imperfections in the microscope's optics and manifest as a grid over the assembled 3D volume. Because the shading is channel-specific, STAPL-3D estimates a 2D profile for each channel separately from the data.

We define the parameters to the shading correction module with a yaml parameter file included with the demo data archive.

In [None]:
import yaml

parameter_file = '{}.yml'.format(filestem)
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

cfg['shading_correction']


This means that, in this example, we calculate the *median* value for z-stacks concatenated over X and Y, while masking any value < *1000*. We use the *90%* of planes that have the highest median intensities to calculate the 1D shading profile that is fit using a *3rd order* polynomial.

The estimation of the shading profile is done in parallel for channels. The configuration for parallelization can be done from the yaml-file; the default is to use the same number of processors as there are channels in the dataset, if available. For cluster-environments (SGE or SLURM), more specific configurations can be set in the yaml:

In [None]:
cfg['shading_estimation']


We provide a 2-ch z-stack of data (106 x 1024 x 1024 x 2) in the data archive for demonstration purposes. These are two channels extracted from an 8-channel dataset of 262 stacks, i.e. ~0.1% of the data. The stack includes a nuclear channel (DAPI) and a membrane channel (NCAM1).

In [None]:
from stapl3d.preprocessing.shading import shading_correction

filepath = '{}_2ch-stack.czi'.format(filestem)
shading_correction(filepath, parameter_file)


For each channel, this will write the estimated shading profile as a tif-file and a processing report (.pdf), as well as the calculated medians (.pickle) and a logfile to the *HFK16w/shading/* directory.

In [None]:
os.listdir(os.path.join(datadir, 'shading'))


For the single stack these do not look great, because the algorithm needs multiple z-stacks to reliably estimate the shading profile. Therefore, we provide pre-calculated medians for the full 262-stack dataset (in *HFK16w/shading_full*) to demonstrate the expected output. First, let's plot the results for a single channel: 

In [None]:
from stapl3d.preprocessing.shading import generate_report

channel = 0

shading_dir = os.path.join(datadir, 'shading_full')
generate_report(shading_dir, 'HFK16w_full', channel)


The first row shows the result for concatenating the data over *X*, i.e. yielding a median value for each *yz*-coordinate. The left plot shows the medians of the planes (selected using a *quantile_threshold* parameter of 0.8) in rainbow colours. The right plot shows the normalized profile with confidence intervals as well as the normalized fit. The bottom left shows the median profile over *z*, with the selected planes indicated by tick marks. The bottom right image shows the 2D shading profile to use for correcting each plane in each z-stack of the channel. The red dashed traces indicates an -arbitrary- threshold to help with flagging potential issues with the data; the viridis colormap is also clipped to red  at this threshold.

Now let's run it for all channnels, sending the output to pdf's in *HFK16w/shading_full*.

In [None]:
for ch in range(8):
    generate_report(shading_dir, 'HFK16w_full', channel=ch, ioff=True)


# Stitching

If you plan to use your own imaging data, we stitch in the proprietary Zeiss Zen or Arivis software packages. Please stitch and then convert the result to an Imaris or STAPL-3D file-format. We have provided a cutout of the stitched file for the next processing steps in Imaris format: *HFK16w/HFK16w_bfc_block.ims*. A free viewer for these data can be downloaded [here](https://imaris.oxinst.com/imaris-viewer).

# Inhomogeneity correction.

We next correct the stitched file for inhomogeneities such as depth attenuation and uneven penetration of clearing agents and antibodies. This is done using the *N4* algorithm ([Tustison et al., 2010] as implemented in simpleitk) on a downsampled image. For this demo, we provide the downsampled data in an hdf5 file.

Let's use the STAPL-3D Image class to get some info about this image.


In [None]:
from stapl3d import Image

biasfield_dir = os.path.join(datadir, 'biasfield')
filename = '{}_biasfield.h5/data'.format(dataset)
image_in = os.path.join(biasfield_dir, filename)

im = Image(image_in)
im.load(load_data=False)

props = im.get_props()

im.close()

props

The image dimensions are *zyxc* = *106 x 263 x 249 x 8* with voxels of *1.2 x 21.3 x 21.3* $\mu$m. This is a good size for estimating the slow variations over the volume.

Now run the N4 algorithm. SimpleITK will keep **all your processors busy for a while** in processing a single channel. (For HPC cluster deployment, there more control: channels are distributed over separate jobs, and the number of threads used for each channel can be set separately.)

In [None]:
from stapl3d.preprocessing.biasfield import estimate_biasfield

channel = 4
filename = '{}_ch{:02d}_biasfield'.format(dataset, channel)
chstem = os.path.join(biasfield_dir, filename)

estimate_biasfield(
    image_in,
    channel=channel,
    downsample_factors=[1, 1, 1, 1, 1],
    n_iterations=50,
    n_fitlevels=4,
    n_bspline_cps=[5, 5, 5],
    outputstem=chstem,
    )


Next, we (re)generate the bias field correction report to inspect the result.

In [None]:
from stapl3d.preprocessing.biasfield import generate_report

generate_report(biasfield_dir, 'HFK16w', channel=channel)


The left column shows orthogonal sections of the downsampled dataset for the uncorrected (top) and corrected data (middle) as well as the estimated bias field (bottom). Plotted on the left and top of the images are profiles of the median values over the three axes. The right column offers a closer comparison of the profiles (*mean + SD*) for the corrected (green) vs uncorrected (red) data. The bias field correction yields a much flatter profile for *z*, as well as *xy*. Low-frequency inhomogeneities are removed, while the detail of the specific staining is retained in the corrected data.