# STAPL-3D feature extraction demo

This notebook demonstrates the core components of the STAPL-3D feature extraction module.

If you did not follow the STAPL-3D README: please find STAPL-3D and the installation instructions [here](https://github.com/RiosGroup/STAPL3D) before doing this demo.


Let's start with some general settings and imports.

In [None]:
# Show all output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# Imports.
import os
import yaml
import urllib.request
from pprint import pprint

# Yaml printing function.
def yprint(ydict):
    """Print dictionary in yaml formatting."""
    print(yaml.dump(ydict, default_flow_style=False))


First, define where you want the data to be downloaded by changing *projectdir*; default is the current demo directory. The name of the dataset is *'HFK16w'* (for Human Fetal Kidney - 16 weeks). We create a directory for the dataset and jump to it.

In [None]:
projectdir = '.'
dataset = 'HFK16w'

datadir = os.path.join(projectdir, dataset)

os.makedirs(datadir, exist_ok=True)
os.chdir(datadir)
f'working in directory: {os.path.abspath(".")}'


We define STAPL3D parameters preferably using a [yaml](https://yaml.org) parameter file. It has a simple structure and can be parsed in Python and `bash`. We will download the example, read it into a dictionary structure and list all the main entries in the file. 

In [None]:
parameter_file = f'{dataset}.yml'

# Download the yml-file.
if not os.path.exists(parameter_file):
    url = 'https://surfdrive.surf.nl/files/index.php/s/SAVgQDPwM4XsLlC/download'
    urllib.request.urlretrieve(url, parameter_file)

# Load parameter file.
with open(parameter_file, 'r') as ymlfile:
    cfg = yaml.safe_load(ymlfile)

# List all entries.
cfg.keys()


## Masking and distance to edge

Define where the dataset can be found:

In [None]:
image_in = f'{dataset}_shading_stitching.ims'


In the example kidney dataset, we use a distance-to-edge feature that is informative for the spatial aspects of the dataset. In particular, developmentally early structures are found in the periphery, while fully formed nephrons will be found nearer the center of the sample. Therefore, we use a distance transform on the sample mask to create a volume that indicates this distance.

In [None]:
yprint(cfg['mask'])  # in yaml format


The input will be smoothed with a 48 um kernel. Slicewise thresholds are generated at 1/5 of the median value of the slice intensities, with a minimum of 2000. The calculation of the distance-to-edge volume is switched on.

In [None]:
from stapl3d.preprocessing import masking

mask3r = masking.Mask3r(image_in, parameter_file, prefix=dataset)
mask3r.run()


Let's inspect the report and the volumes to validate the result:

In [None]:
# (Re)generate the report from the data and plot inline.
ipaths, opaths = mask3r.fill_paths('postprocess')
mask3r.report(outputpath=None, ioff=False, inputs=ipaths, outputs=opaths)

# Initialize viewer.
viewer_settings = {
    'title': 'STAPL3D mask3r demo',
    'axes_visible': False,
    'clim': {'mean': [0, 6000], 'smooth': [0, 6000]},
    'opacity': {'mask': 0.5},
    }
mask3r.view(settings=viewer_settings)


## Feature extraction

The STAPL-3D feature extraction module offers fast extraction of features from large amounts of data. We create a feature table for each datablock using parallel processing, then combine these feature tables while filtering out doubles of the segments that are represented in multiple datablocks.

We supply the the following information to the feature extractor:

In [None]:
yprint(cfg['features']['estimate'])  # in yaml format


- We specify names for the 8 channels to appear in the columns of the feature csv output.
- We extract the features from the three separate compartments we have segmented. They are named 'full', 'memb' and 'nucl' and are specified as key-value pairs where the value is the internal path of the hdf5 dataset of the blockfiles.
- We provide 'dist_to_edge' as an additional input to extract the values of this volume at the centroids of the segments. If the dist-to-edge volume has been generated from a downsampled image, the downsample factors need to be provided.
- Morphological and intensity features are chosen by either a predefined feature set ('none', 'minimal', 'medium', 'maximal') or by providing lists of features (https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops).

We initialize the feature generator and show the chosen feature sets.

In [None]:
from stapl3d.segmentation import features
from importlib import reload
reload(features)

featur3r = features.Featur3r(image_in, parameter_file, prefix=dataset)
featur3r.morphological_features
featur3r.intensity_features


List all predefined features sets.

In [None]:
for fset in ('none', 'minimal', 'medium', 'maximal'):
    featur3r.morphological_features = featur3r.intensity_features = fset
    featur3r.set_feature_set()
    print(f'Name:\n\t {fset}')
    print(f'Morphological:\n\t {featur3r.morphological_features}')
    print(f'Intensity:\n\t {featur3r.intensity_features}\n')

# Revert to 'medium'
featur3r.morphological_features = featur3r.intensity_features = 'medium'
featur3r.set_feature_set()


To speed up the demo, we set the extractor to create the features of the first 5 blocks and run.

In [None]:
featur3r.blocks = list(range(5))
featur3r.estimate()


A csv is generated for each block and segmented compartment:


In [None]:
from glob import glob
import pandas as pd

# Show some of the files.
filelist = glob(os.path.join(os.path.abspath('.'), 'blocks', f'{dataset}_blocks_B*.csv'))
filelist.sort()
filelist[:9]

# Show one of the dataframes.
df = pd.read_csv(filelist[0], index_col='label', header=0)
df.describe()
df.columns


To create a single cell x feature matrix, we use the `postprocess` function to collate all the cells in the blocks. In this process we can also perform selection features as well as simple filtering of cells according to thresholding of the features.

In [None]:
yprint(cfg['features']['postprocess'])  # in yaml format


In [None]:
featur3r.postprocess()


In [None]:
# Show the dataframe.
inpath = featur3r.outputpaths['postprocess']['feature_csv']
df = pd.read_csv(inpath, index_col='label', header=0)
df.describe()
df.columns

# Plot histograms of intensity features.
cols = [col for col in df.columns if 'intensity' in col]
df.hist(column=cols, bins=100, layout=(4, 2), figsize=(16, 16))


## Backprojected visualization

In [None]:
from stapl3d import backproject

backproject3r = backproject.Backproject3r(image_in, parameter_file, prefix=dataset)
backproject3r.backproject()
backproject3r.postprocess()


In [None]:
images = ['area_nucl', 'SIX2_mean_intensity_nucl']
labels = ['label', 'block']

viewer_settings = {
    'title': 'STAPL3D backproject3r demo',
    'crosshairs': [int(backproject3r.blocksize[dim] / 2) for dim in 'zyx'],
    'axes_visible': False,
}

filepath = backproject3r.outputpaths['postprocess']['aggregate']
backproject3r.view(input=filepath, images=images, labels=labels, settings=viewer_settings)
