# STAPL-3D equalization assay demo

This notebook demonstrates the core components of the STAPL3D equalization assay analysis pipeline. 

If you did not follow the STAPL-3D README: please find STAPL-3D and the installation instructions [here](https://github.com/RiosGroup/STAPL3D) before doing this demo.


Let's start with some general settings and imports.

In [None]:
# Show all output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Imports.
import os
import yaml
import zipfile
import urllib.request

from stapl3d import equalization


First, define where you want the data to be downloaded by changing *projectdir*; default is the current demo directory.


In [None]:
projectdir = os.path.abspath('.')

dataset = 'EqualizationAssay'
datadir = os.path.join(projectdir, dataset)


Download and extract the data.

In [None]:
zipfilepath = os.path.join(projectdir, 'equalization.zip')

if not os.path.exists(zipfilepath):
    url = 'https://surfdrive.surf.nl/files/index.php/s/zgGc56IGc3atXMd/download'
    urllib.request.urlretrieve(url, zipfilepath)

if not os.path.exists(datadir):
    with zipfile.ZipFile(zipfilepath, 'r') as zf:
        zf.extractall()


 The name of the extracted dataset is *EqualizationAssay*. Jump to it.

In [None]:
os.chdir(datadir)
f'working in directory: {os.path.abspath(".")}'


## Set up the analysis

In the recommended setup, the files are organized in a directory tree with 
- *species* as the first level
- *antibody* as the second level
- *repetitions* are saved as individual czi files.

*\<datadir\>/\<species\>/\<antibody\>/\<repetition\>.\<ext\>*

Primaries are included as a separate *species*.

*\<datadir\>/primaries/\<antibody\>/\<repetition\>.\<ext\>*

In the analysis, the directory tree is used to group files and the directory names will be used in outputs and plots.


Let's list an example for each level for the downloaded data.


In [None]:
ldir = datadir
for l in ['First', 'Second', 'Reps']:
    l1 = os.listdir(ldir)
    f'{l} level {ldir}:'
    l1
    ldir = os.path.join(ldir, l1[0])


We will prepare the equalization analysis by initializing an `equaliz3r` object by pointing it to the directory we wish to analyse.

In [None]:
equaliz3r = equalization.Equaliz3r(datadir)


#### File selection

By default, the STAPL-3D equalization assay analysis pipeline will expect a particular directory structure within the data directory specified: ```<datadir>/<species>/<antibody>/<repitition>.<ext>```

It will search for files in `datadir` according to this structure. It can be adapted by setting:
- `equaliz3r.filepat` 
- `equaliz3r.use_dirtree` 

followed by a call to 
`equaliz3r.set_filepaths()`


In [None]:
# select all files (default)
equaliz3r.filepat = '*.*'
equaliz3r.set_filepaths()
equaliz3r.filepaths


In [None]:
# search for czi-files in `datadir` rather than a directory tree
equaliz3r.use_dirtree = False
equaliz3r.filepat = '*.czi'
equaliz3r.set_filepaths()
equaliz3r.filepaths


NOTE: if all files all provided in a single directory the layout of primaries and secondaries can be provided in the yml parameterfile or has to be handled in postprocessing

```
equalization:
    primaries:
        <antibody1>: <filestem1>
        <antibody2>: <filestem2>
        ...
    secondaries:
       < species1>:
            <antibody1>: <filestem1>
            <antibody2>: <filestem2>
            ...
       ...
```


In [None]:
# select all czi-files starting with 'MAP2_ck488' in the directory tree
equaliz3r.use_dirtree = True
equaliz3r.filepat = 'MAP2_ck488*.czi'
equaliz3r.set_filepaths()
equaliz3r.filepaths


## Basic run and report

Let' run the analysis on this small subset first.

In [None]:
equaliz3r.run()


It performs the following steps:
+ Smooth the input image

`equaliz3r.smooth()`
+ Separate in noise regions, background tissue regions, and foreground tissue regions

`equaliz3r.segment()`
+ Calculate summary measures for foreground and background, print report for each file

`equaliz3r.metrics()`
+ Gather files of individual repeats and generate a summary report

`equaliz3r.postprocess()`


An hdf5-file is created with smoothed images and segmentations in the same directory as the raw datafile.
A *equalization* subdirectory is created that will contain reports, parameter files and the results-csv.

Now look at the pdf report that is generated for one of the files.

In [None]:
im_idx = 1
filepath = equaliz3r.filepaths[im_idx]

filestem, inputs, outputs = equaliz3r._get_filepaths_inout(filepath)
equaliz3r.report(
    outputpath=None,
    ioff=False,
    name=filestem,
    filestem=filestem,
    inputs=inputs,
    outputs=outputs,
)


On the top row, the report shows the raw image with the histogram. 

The second row displays the smoothed image with tissue (blue) and noise (red) region thresholds. These thresholds are superimposed on the histogram of the smoothed image on the right. As a default, these thresholds are derived by multiplying the image's *otsu* threshold (green) by with tunable factors `equaliz3r.otsu_factor_noise = 0.1` and `equaliz3r.otsu_factor_tissue = 1.1`.

The third row shows the segmentation of the noise (green), foreground (cyan) and background (magenta) regions with the histogram of the tissue region.

Metrics for this image are printed next to the histogram:
  - foreground: median value of the foreground tissue (cyan)
  - background: median value of the background tissue (magenta)
  - contrast: `C = background / foreground`
  - contrast-to-noise: `CNR = (foreground - background) / SD(noise)`


These values are tabulated in a pandas dataframe.


In [None]:
equaliz3r.df

The *antibody* and *species* column are derived from the directory tree; as are the *primaries* and *secondaries* boolean indicators.

This dataframe is also written to a csv-file in the *equalization* subdirectory for further processing.

## Picking parameters for each step

We will now go through the analysis for all files step by step with a closer look at the parameters that can be adapted. First, create a new `equaliz3r` object. We set the verbosity to 0 to reduce output clutter. Only czi-files are selected, because h5-files were created in the previous steps. 

In [None]:
equaliz3r = equalization.Equaliz3r(datadir, verbosity=0)
equaliz3r.filepat = '*.czi'
#equaliz3r.filepat = 'MAP2_ck488*.czi'
equaliz3r.set_filepaths()

# Pick image for visualization.
im_idx = 1
filepath = equaliz3r.filepaths[im_idx]


### Smooth
The first step in the analysis is smoothing of the images to be able to detect the tissue boundaries. This generates a hdf5-file next to each original file with a smoothed image.


In [None]:
equaliz3r.smooth()


We can visualize the images in napari.

In [None]:
images = ['data', 'smooth']
labels = None

equaliz3r.view(filepath, images, labels)


##### smoothing kernel size
If the image is smoothed too much or too little to obtain a good tissue vs noise region segmentation, the `sigma` parameter can be adapted.

In [None]:
equaliz3r.sigma = 20

# save in new dataset
equaliz3r.outputpaths['smooth']['smooth'] = equaliz3r.outputpaths['smooth']['smooth'].replace('/smooth', '/smooth20')

equaliz3r.smooth()

images = ['data', 'smooth', 'smooth20']
equaliz3r.view(filepath, images, labels)


### Segment

Next, we perform segmentation of regions in the image. First, the image is separated in noise regions and tissue regions by thresholding the smoothed image. Then, foreground (signal-of-interest) is separated from background in the tissue region.


In [None]:
equaliz3r.segment()


In [None]:
images = ['data', 'smooth']
labels = ['noise_mask', 'tissue_mask']

equaliz3r.view(filepath, images, labels)


#### noise / tissue region thresholds

Thresholds applied to the smoothed image can be set by 

1. manual specification for each file (via attribute or yml), where `key` is the filename without the extension

    `equaliz3r.thresholds[key] = [1000, 2000]`


2. global specification for all files (via attribute or yml)

    `equaliz3r.threshold_noise = 1000`
    
    `equaliz3r.threshold_tissue = 2000` 


3. calculation of the otsu threshold, after which the two thresholds are computed via

    threshold_noise: `min(data) + otsu * otsu_factor_noise`

    threshold_tissue: `otsu * otsu_factor_tissue`

NOTE: The default segmentation procedure uses the otsu method to calculate the thresholds. If either `equaliz3r.threshold_noise` or `equaliz3r.threshold_tisue` is specified, these thresholds are used instead. If `equaliz3r.thresholds` is specified for a file it will be used for that particular file.


The next cells demonstrate changing these parameters. We define a convenience function to reset the thresholds; rerun the segmentation and visualize.

In [None]:
outputs = equaliz3r.outputpaths['segment']

def run_segment(equaliz3r, suffix, labels=['noise_mask', 'tissue_mask']):

    # save in new dataset
    for ids in ['noise_mask', 'tissue_mask', 'segmentation']:
        equaliz3r.outputpaths['segment'][ids] = outputs[ids].replace(ids, ids + suffix)

    # run
    equaliz3r.segment()

    # plot comparison
    images = ['data']
    labels += [f'{ids}{suffix}' for ids in labels]
    equaliz3r.view(filepath, images, labels)


##### otsu thresholding (automatic)


In [None]:
equaliz3r.thresholds = {}
equaliz3r.threshold_noise, equaliz3r.threshold_tissue = 0, 0

equaliz3r.otsu_factor_noise = 0.3
equaliz3r.otsu_factor_tissue = 1.2

run_segment(equaliz3r, '_otsu')


##### global thresholding

In [None]:
equaliz3r.thresholds = {}
equaliz3r.threshold_noise, equaliz3r.threshold_tissue = 0, 0

equaliz3r.threshold_noise = 0
equaliz3r.threshold_tissue = 200

run_segment(equaliz3r, '_global')


##### individual thresholding

In [None]:
equaliz3r.thresholds = {}
equaliz3r.threshold_noise, equaliz3r.threshold_tissue = 0, 0

some_new_thresholds = {
    'MAP2_ck488_sec_2_1': [1000, 6000],
    'MAP2_ck488_sec_2_2': [3000, 7000],
    'MAP2_ck488_sec_2_3': [1000, 6000],
    'MAP2_ck488_sec_4_1': [2000, 5000],
    'MAP2_ck488_sec_4_2': [1000, 6000],
    'MAP2_ck488_sec_4_3': [2000, 5000],
}
for k, v in some_new_thresholds.items():
    equaliz3r.thresholds[k] = v

run_segment(equaliz3r, '_individual')


#### foreground / background separation

The segmentation also separates the tissue region into forground and background for quantification of the signal-of-interest. The associated parameters are:

  `equaliz3r.segment_quantile` The intensity quantile of the tissue region at which to make the split.

  `equaliz3r.segment_min_size` The minimal amount of connected pixels, which will discard isolated noisy high-intensity pixels


First, visualize the original default segmentation:

In [None]:
# plot comparison
images = ['data']
labels = ['segmentation']
equaliz3r.view(filepath, images, labels)

f'quantile: {equaliz3r.segment_quantile}'
f'patch size: {equaliz3r.segment_min_size}'


In [None]:
equaliz3r.thresholds = {}
equaliz3r.threshold_noise, equaliz3r.threshold_tissue = 0, 0

equaliz3r.segment_quantile = 0.90
equaliz3r.segment_min_size = 7

run_segment(equaliz3r, '_fg_bg', labels=['segmentation'])


The lowered `equaliz3r.segment_quantile` setting included more pixels in the forground, while the increased `equaliz3r.segment_min_size` retained bigger patches.

### Metrics

To quantify the signal-of-interest in the image, there is a choice between a small number of methods and metrics. 

Metrics:
  - *foreground*: median value of the foreground tissue
  - *background*: median value of the background tissue
  - *signal-to-noise*: `SNR = foreground / SD(noise)`
  - *contrast*: `C = background / foreground`
  - *contrast-to-noise*: `CNR = (foreground - background) / SD(noise)`

Methods: `equaliz3r.methods = ['seg', 'q_base', 'q_clip', 'q_mask']`
- *seg*: median values of the foreground and the background pixels in the tissue mask
- *q_base*: quantiles of the image pixels
- *q_clip*: quantiles of the image pixels with clipping values excluded
- *q_mask*: quantiles of the image pixels in the tissue mask with clipping values excluded

In the 'quantile'-methods, foreground and background quantiles are specified through: `equaliz3r.quantiles = [0.50, 0.99]`

The prefered method, however, is to use the segmentation.




In [None]:
equaliz3r.methods = ['seg', 'q_base', 'q_clip', 'q_mask']
equaliz3r.quantiles = [0.50, 0.9]


In [None]:
equaliz3r.metrics()
equaliz3r.postprocess()
equaliz3r.df
