#  `infer_subc` :  a tool/framework for infering segmented organelles from multi-channel florescent 3D images.

## ❹. QUANTIFICATION 📏📐🧮

SUBCELLULAR COMPONENT METRICS
-  extent 
-  size
-  shape
-  position



This notebook illustrates various "stats" routines developed to quantify and gather statistics about the segmentations.



## IMPORTS

In [1]:
# top level imports
from pathlib import Path
import os, sys


import napari

### import local python functions in ../infer_subc
sys.path.append(os.path.abspath((os.path.join(os.getcwd(), '..'))))

from infer_subc.core.file_io import (read_czi_image,
                                        export_inferred_organelle,
                                        import_inferred_organelle,
                                        export_tiff,
                                        list_image_files)

from infer_subc.core.img import *
from infer_subc.utils.stats import *
from infer_subc.utils.stats_helpers import *

from infer_subc.organelles import * 

import time
%load_ext autoreload
%autoreload 2



In [2]:
# NOTE:  these "constants" are only accurate for the testing MCZ dataset
from infer_subc.constants import (TEST_IMG_N,
                                    NUC_CH ,
                                    LYSO_CH ,
                                    MITO_CH ,
                                    GOLGI_CH ,
                                    PEROX_CH ,
                                    ER_CH ,
                                    LD_CH ,
                                    RESIDUAL_CH )              


## SETUP

In [3]:
# this will be the example image for testing the pipeline below
test_img_n = TEST_IMG_N

# build the datapath
# all the imaging data goes here.
data_root_path = Path(os.path.expanduser("~")) / "Projects/Imaging/data"

# linearly unmixed ".czi" files are here
in_data_path = data_root_path / "raw"
im_type = ".czi"

# get the list of all files
img_file_list = list_image_files(in_data_path,im_type)
test_img_name = img_file_list[test_img_n]

# save output ".tiff" files here
out_data_path = data_root_path / "out"

if not Path.exists(out_data_path):
    Path.mkdir(out_data_path)
    print(f"making {out_data_path}")

In [4]:
img_data,meta_dict = read_czi_image(test_img_name)

# get some top-level info about the RAW data
channel_names = meta_dict['name']
img = meta_dict['metadata']['aicsimage']
scale = meta_dict['scale']
channel_axis = meta_dict['channel_axis']

source_file = meta_dict['file_name']

  d = to_dict(os.fspath(xml), parser=parser, validate=validate)


In [5]:
scale[0]/scale[1], scale

(7.267318290023735,
 (0.5804527163320905, 0.07987165184837318, 0.07987165184837318))

Now get the single "optimal" slice of all our organelle channels....

## get the inferred cellmask, nuclei and cytoplasm objects

(takes < 1 sec)

Builde the segmentations in order




In [6]:

###################
# CELLMASK, NUCLEI, CYTOPLASM, NUCLEUS
###################
nuclei_obj =  get_nuclei(img_data,meta_dict, out_data_path)
cellmask_obj = get_cellmask(img_data, nuclei_obj, meta_dict, out_data_path)
cyto_mask = get_cytoplasm(nuclei_obj , cellmask_obj , meta_dict, out_data_path)



loaded  inferred 3D `nuclei`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `cellmask`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `cytoplasm`  from /Users/ergonyc/Projects/Imaging/data/out 


-------------------------
## regionprops

`skimage.measure.regionprops` provides the basic tools nescessary to quantify our segmentations.

First lets see what works in 3D.  

> Note: the names of the regionprops correspond to the 2D analysis even for those which are well defined in 3D.  i.e. "area" is actually "volume" in 3D, etc.


-----------------
## basic stats

### per-organelle


- regionprops 


### summary stats

- group + aggregate:  surface_area, volume
  - median
  - mean
  - std 
  - count

- normalizers
  - CELLMASK?
  - CYTOPLASM?

### nuclei caveats
The other organelles are sensibly normalized by cytoplasm.  does normalizing the nuclei by cytoplasm make sense?  or use cellmask?

Lets see which possible measures are sensible for 3D or volumetric with regionprops

In [7]:
if False:
    labels = label(nuclei_obj )
    rp = regionprops(labels, intensity_image=img_data[NUC_CH])

    supported = [] 
    unsupported = []

    for prop in rp[0]:
        try:
            rp[0][prop]
            supported.append(prop)
        except NotImplementedError:
            unsupported.append(prop)

    print("Supported properties:")
    print("  " + "\n  ".join(supported))
    print()
    print("Unsupported properties:")
    print("  " + "\n  ".join(unsupported))

Supported properties:
-  area
-  bbox
-  bbox_area
-  centroid
-  convex_area
-  convex_image
-  coords
-  equivalent_diameter
-  euler_number
-  extent
-  feret_diameter_max
-  filled_area
-  filled_image
-  image
-  inertia_tensor
-  inertia_tensor_eigvals
-  intensity_image
-  label
-  local_centroid
-  major_axis_length
-  max_intensity
-  mean_intensity
-  min_intensity
-  minor_axis_length
-  moments
-  moments_central
-  moments_normalized
-  slice
-  solidity
-  weighted_centroid
-  weighted_local_centroid
-  weighted_moments
-  weighted_moments_central
-  weighted_moments_normalized
-
Unsupported properties:
-  eccentricity
-  moments_hu
-  orientation
-  perimeter
-  perimeter_crofton
-  weighted_moments_hu

In [8]:
# from scipy.ndimage import find_objects
    
# labels = label(nuclei_obj ).astype("int")
# objects = find_objects(labels)

# # objects are the slices into the original array for each organelle

In [9]:
# get overall summary stats for cellmask
cm_intensity =  raw_cellmask_fromaggr(img_data, scale_min_max=False)


In [10]:
cellmask_obj.dtype

dtype('bool')

Now we want to get a list of our organelle names, segmentations, intensities (florescence)

In [11]:
# names of organelles we have
organelle_names = ["nuclei","lyso", "mitochondria","golgi","peroxisome","er","lipid"]

get_methods  = [get_nuclei,
                get_lyso,
            get_mito,
            get_golgi,
            get_perox,
            get_ER,
            get_LD]

# load all the organelle segmentations
organelles = [meth(img_data,meta_dict, out_data_path) for meth in get_methods]

# get the intensities
organelle_channels = [NUC_CH, LYSO_CH,MITO_CH,GOLGI_CH,PEROX_CH,ER_CH,LD_CH]

intensities = [img_data[ch] for ch in organelle_channels]


loaded  inferred 3D `nuclei`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `lyso`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded lyso in (0.00) sec
loaded  inferred 3D `mitochondria`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `golgi`  from /Users/ergonyc/Projects/Imaging/data/out 
starting segmentation...
loaded  inferred 3D `peroxisome`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded peroxisome in (0.00) sec
loaded  inferred 3D `er`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `lipid`  from /Users/ergonyc/Projects/Imaging/data/out 



-----------------
## CONTACTS (cross-stats)

### organelle cross stats


- regionprops 



- intersect for A vs all other organelles Bi
  - regionprops on A ∩ Bi

   
- contacts?
  - dilate then intersect?
  - loop through each sub-object for each 



In [12]:


n_files = dump_organelle_stats(organelle_names, organelles,intensities, cyto_mask, out_data_path, source_file)

getting stats for A = nuclei
  b = lyso
  b = mitochondria
  b = golgi
  b = peroxisome
  b = er
  b = lipid
getting stats for A = lyso
  b = nuclei
  b = mitochondria
  b = golgi
  b = peroxisome
  b = er
  b = lipid
getting stats for A = mitochondria
  b = nuclei
  b = lyso
  b = golgi
  b = peroxisome
  b = er
  b = lipid
getting stats for A = golgi
  b = nuclei
  b = lyso
  b = mitochondria
  b = peroxisome
  b = er
  b = lipid
getting stats for A = peroxisome
  b = nuclei
  b = lyso
  b = mitochondria
  b = golgi
  b = er
  b = lipid
getting stats for A = er
  b = nuclei
  b = lyso
  b = mitochondria
  b = golgi
  b = peroxisome
  b = lipid
getting stats for A = lipid
  b = nuclei
  b = lyso
  b = mitochondria
  b = golgi
  b = peroxisome
  b = er
dumped 42 csvs


In [13]:
n_files = dump_shell_cross_stats(organelle_names, organelles, cyto_mask, out_data_path, source_file) 


getting stats for A = nuclei
  X lyso
  X mitochondria
  X golgi
  X peroxisome
  X er
  X lipid
getting stats for A = lyso
  X nuclei
  X mitochondria
  X golgi
  X peroxisome
  X er
  X lipid
getting stats for A = mitochondria
  X nuclei
  X lyso
  X golgi
  X peroxisome
  X er
  X lipid
getting stats for A = golgi
  X nuclei
  X lyso
  X mitochondria
  X peroxisome
  X er
  X lipid
getting stats for A = peroxisome
  X nuclei
  X lyso
  X mitochondria
  X golgi
  X er
  X lipid
getting stats for A = er
  X nuclei
  X lyso
  X mitochondria
  X golgi
  X peroxisome
  X lipid
getting stats for A = lipid
  X nuclei
  X lyso
  X mitochondria
  X golgi
  X peroxisome
  X er



-----------------
##  SUMMARY STATS  
> WARNING: (🚨🚨🚨🚨 WIP)
### normalizations.

- overlaps, normalized by CYTOPLASM, A, and B
- per cell averages, medians, std, and totals

These is all pandas munging and very straightforward tabular manipulation.


In [14]:
target = organelle_names[1]

csv_path = out_data_path / f"{source_file.stem}-{target}-stats.csv"

mito_table = pd.read_csv(csv_path)
mito_table.head()

Unnamed: 0.1,Unnamed: 0,label,max_intensity,mean_intensity,min_intensity,volume,equivalent_diameter,centroid-0,centroid-1,centroid-2,...,mitochondria_overlap,mitochondria_labels,golgi_overlap,golgi_labels,peroxisome_overlap,peroxisome_labels,er_overlap,er_labels,lipid_overlap,lipid_labels
0,0,1,6618,2426.684211,0,76,5.25539,1.328947,149.578947,665.907895,...,0,[],0,[],0,[],0,[],0,[]
1,1,2,65535,16752.316071,0,30857,38.915119,6.45053,206.110121,517.963801,...,3602,"[6, 47, 48, 49, 70, 80, 88, 114, 130, 133, 151...",2019,[1],77,"[3, 5, 6, 15, 17, 20, 21]",3740,"[3, 19, 187, 201]",0,[]
2,2,5,21639,10844.098901,2242,1547,14.349295,2.798319,370.100194,595.351648,...,0,[],0,[],0,[],139,"[3, 77]",0,[]
3,3,6,5773,2776.759036,480,83,5.412025,1.445783,131.614458,569.843373,...,0,[],0,[],0,[],0,[],0,[]
4,4,7,11263,3762.868852,482,61,4.884016,1.0,133.180328,604.42623,...,0,[],0,[],0,[],0,[],0,[]


In [15]:
mito_table.volume.mean()

480.036231884058


-----------------
## DISTRIBUTION  


### Radial distribution 

### 2D projection of inferred objects (and masks, florescence image)

Segment image in 3D;
sum projection of binary image; 
create 5 concentric rings going from the edge of the nuclie to the edge of the cellmask (ideally these will be morphed to cellmask/nuclei shape as done in CellProfiler); 
measure intensity per ring (include nuclei as the center area to measure from)/ring area; 
the normalized measurement will act as a frequency distribution of that organelle starting from the nuclei bin going out to the cell membrane - 
Measurements needed: mean, median, and standard deviation of the frequency will be calculated

- pre-processing
  1. Make 2D sum projection of binary segmentation
  2. Create 5 (default) bins linearly between edge of the nuclei to the edge of the cellmask - these are somewhat like rings morphed to the shape of the nuclei and cellmask, or more accurately like terrain lines of the normalized radial distance beween teh edge of the nuclei and the edge of the cellmask.
  3. Use nucleus + concentric rings to mask the 2D sum project into radial distribution regions: nuclei = bin 1, ... largest/outter most ring = bin 6. See similar concept in CellProfiler: https://cellprofiler-manual.s3.amazonaws.com/CellProfiler-4.2.5/modules/measurement.html?highlight=distribution#module-cellprofiler.modules.measureobjectintensitydistribution"	
   


The logic was borrowed from CellProfiler, but alorithm somewhate simplified by making assumpitions of doing all estimates over a single cellmask (single cell).   Most of the code should be capable of performing the more complicated multi-object versions as CellProfiler does.  Although this functionality is untested the source code was left in this more complex format in case it might be updated for this functionality in the future.


In [16]:
# csv_path = out_data_path / f"{o}_{meta_dict["file_name"].split('/')[-1].split('.')[0]}_stats.csv"
Path(meta_dict['file_name']).name

'ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed.czi'

In [17]:
organelles[0].shape, organelle_names[0]

((15, 768, 768), 'nuclei')

In [24]:
test_org = 3

# args 
cellmask_obj
nuclei_obj
organelle_mask = cyto_mask
organelle_name = organelle_names[test_org]
organelle_obj = organelles[test_org]
organelle_img = intensities[test_org]



In [25]:
rad_stats, bin_idx = get_radial_stats(        
        cellmask_obj,
        organelle_mask,
        organelle_obj,
        organelle_img,
        organelle_name,
        nuclei_obj
        )


In [26]:
rad_stats

# check divide by zero bug for zero bins

Unnamed: 0,organelle,mask,bin,n_bins,n_pix,cm_vox_cnt,org_vox_cnt,org_intensity,cm_radial_cv,org_radial_cv,img_radial_cv
0,golgi,cellmask,Ctr,5,43090.0,560133.0,9252.0,346229590.0,0.039089,2.285363,1.548474
1,golgi,cellmask,1,5,13370.0,133005.0,3378.0,175025329.0,0.238991,1.665067,1.153239
2,golgi,cellmask,2,5,19158.0,157767.0,3359.0,138790537.0,0.383749,1.471041,1.190105
3,golgi,cellmask,3,5,29521.0,181294.0,2900.0,86229287.0,0.463339,1.860701,0.70332
4,golgi,cellmask,4,5,75192.0,204117.0,2312.0,73259381.0,0.354069,2.225926,0.540664


In [27]:
viewer = napari.view_image(bin_idx)


### depth - summary
Segment image in 3D;
measure area fraction of each organelle per Z slice;
these measurements will act as a frequency distribution of that organelle starting from the bottom of the cellmask (not including neurites) to the top of the cellmask;
measurements: mean, median, and standard deviation of the frequency distribution	

- pre-processing
  1. subtract nuclei from the cellmask --> cellmask cytoplasm
  2. mask organelle channels with cellmask cytoplasm mask

- per-object measurements
  - For each Z slice in the masked binary image measure:
    1. organelle area
    2. cellmask cytoplasm area

- per-object calculations
  - For each Z slice in the masked binary image: organelle area / cellmask cytoplasm area

- per cell summary
  1. create a frequency table with the z slice number on the x axis and the area fraction on the y axis
  2. Measure the frequency distribution's mean, median, and standard deviation for each cell"

In [28]:
viewer.add_image(cellmask_obj>0)

<Image layer 'Image' at 0x2bf465db0>

In [29]:


# flattened
cellmask_proj = create_masked_depth_projection(cellmask_obj)
org_proj = create_masked_depth_projection(organelle_obj,organelle_mask.astype(bool))
img_proj = create_masked_depth_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

nucleus_proj = create_masked_depth_projection(nuclei_obj,cellmask_obj.astype(bool)) if nuclei_obj is not None else None


In [30]:
z_stats = get_depth_stats(        
        cellmask_obj,
        organelle_mask,
        organelle_obj,
        organelle_img,
        organelle_name,
        nuclei_obj
        )

In [31]:
cellmask_proj = create_masked_depth_projection(cellmask_obj)
org_proj = create_masked_depth_projection(organelle_obj,organelle_mask.astype(bool))
img_proj = create_masked_depth_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

nucleus_proj = create_masked_depth_projection(nuclei_obj,cellmask_obj.astype(bool)) if nuclei_obj is not None else None


In [32]:
z_stats

Unnamed: 0,organelle,mask,bin,n_bins,cm_vox_cnt,org_vox_cnt,org_intensity,nuc_vox_cnt
0,golgi,cellmask,0,15,0,0,0,0
1,golgi,cellmask,1,15,149379,146,55451513,17669
2,golgi,cellmask,2,15,121726,1086,64558080,24205
3,golgi,cellmask,3,15,109521,1941,77464132,29056
4,golgi,cellmask,4,15,99450,1777,81142030,31212
5,golgi,cellmask,5,15,94083,2364,85255502,32536
6,golgi,cellmask,6,15,90942,2512,79605412,32729
7,golgi,cellmask,7,15,93625,2241,72520796,32333
8,golgi,cellmask,8,15,90628,2948,72623683,30862
9,golgi,cellmask,9,15,85107,2635,67477771,27104


In [33]:
n_files = dump_projection_stats(organelle_names, organelles,intensities, cellmask_obj, nuclei_obj, cyto_mask, out_data_path, source_file)

dumped 7x2 csvs


In [34]:
n_files

7

In [35]:
organelle_names

['nuclei', 'lyso', 'mitochondria', 'golgi', 'peroxisome', 'er', 'lipid']



## Zernicky distributions...
- get the magnitude and phase for the zernike 
- he Zernike features characterize the distribution of intensity across the object. For instance, Zernike 1,1 has a high value if the intensity is low on one side of the object and high on the other. The zernike magnitudes feature records the rotationally invariant degree magnitude of the moment and the zernike phase feature gives the moment’s orientation

`zernike_degree` (default = 9) chooses how many moments to calculate.


The logic was borrowed from CellProfiler, but alorithm greatly simplified by making assumpitions of doing all estimates over a single cellmask (single cell)

In [38]:


stats_table,z = get_zernike_stats(        
                            cellmask_obj,
                            organelle_mask,
                            organelle_obj,
                            organelle_img,
                            organelle_name,
                            nuclei_obj,
                            zernike_degree=9
                            )




In [39]:
stats_table

Unnamed: 0,organelle,mask,zernike_n,zernike_m,cm_zer_mag,cm_zer_phs,obj_zer_mag,obj_zer_phs,nuc_zer_mag,nuc_zer_phs,img_zer_mag
0,golgi,cellmask,0,0,1.0,1.570796,1.0,1.570796,1.0,1.570796,1.570796
1,golgi,cellmask,1,1,0.353801,2.243572,0.545888,2.246698,0.290442,2.234844,2.437413
2,golgi,cellmask,2,0,0.445706,-1.570796,0.267229,-1.570796,0.769744,-1.570796,-1.570796
3,golgi,cellmask,2,2,0.111563,-2.901157,0.242907,2.807452,0.08095,2.913656,-2.824003
4,golgi,cellmask,3,1,0.33181,-0.989106,0.468878,-0.732337,0.457448,-0.908659,-0.700925
5,golgi,cellmask,3,3,0.02672,-1.161429,0.071244,3.033166,0.022256,-2.747794,-1.650807
6,golgi,cellmask,4,0,0.027831,1.570796,0.27892,-1.570796,0.417359,1.570796,-1.570796
7,golgi,cellmask,4,2,0.12842,0.073232,0.368215,-0.000182,0.187584,-0.221252,0.341712
8,golgi,cellmask,4,4,0.029116,0.556027,0.039601,1.902211,0.006403,-2.271546,-0.048984
9,golgi,cellmask,5,1,0.138867,1.953233,0.092661,-2.880913,0.4494,2.226373,2.364811
