#  `infer_subc` :  a tool/framework for infering segmented organelles from multi-channel florescent 3D images.

## ❹. QUANTIFICATION 📏📐🧮

SUBCELLULAR COMPONENT METRICS
-  extent 
-  size
-  shape
-  position



This notebook illustrates various "stats" routines developed to quantify and gather statistics about the segmentations.



## IMPORTS

In [4]:
# top level imports
from pathlib import Path
import os, sys


import napari

### import local python functions in ../infer_subc
sys.path.append(os.path.abspath((os.path.join(os.getcwd(), '..'))))

from infer_subc.core.file_io import (read_czi_image,
                                        export_inferred_organelle,
                                        import_inferred_organelle,
                                        export_tiff,
                                        list_image_files)

from infer_subc.core.img import *
from infer_subc.utils.stats import *
from infer_subc.utils.stats_helpers import *

from infer_subc.organelles import * 

import time
%load_ext autoreload
%autoreload 2



In [5]:
# NOTE:  these "constants" are only accurate for the testing MCZ dataset
from infer_subc.constants import (TEST_IMG_N,
                                    NUC_CH ,
                                    LYSO_CH ,
                                    MITO_CH ,
                                    GOLGI_CH ,
                                    PEROX_CH ,
                                    ER_CH ,
                                    LD_CH ,
                                    RESIDUAL_CH )              


## SETUP

In [6]:
# this will be the example image for testing the pipeline below
test_img_n = TEST_IMG_N

# build the datapath
# all the imaging data goes here.
data_root_path = Path(os.path.expanduser("~")) / "Projects/Imaging/data"

# linearly unmixed ".czi" files are here
in_data_path = data_root_path / "raw"
im_type = ".czi"

# get the list of all files
img_file_list = list_image_files(in_data_path,im_type)
test_img_name = img_file_list[test_img_n]

# save output ".tiff" files here
out_data_path = data_root_path / "out"

if not Path.exists(out_data_path):
    Path.mkdir(out_data_path)
    print(f"making {out_data_path}")

In [7]:
img_data,meta_dict = read_czi_image(test_img_name)

# get some top-level info about the RAW data
channel_names = meta_dict['name']
img = meta_dict['metadata']['aicsimage']
scale = meta_dict['scale']
channel_axis = meta_dict['channel_axis']

source_file = meta_dict['file_name']

  d = to_dict(os.fspath(xml), parser=parser, validate=validate)


In [8]:
scale[0]/scale[1], scale

(7.267318290023735,
 (0.5804527163320905, 0.07987165184837318, 0.07987165184837318))

Now get the single "optimal" slice of all our organelle channels....

## get the inferred cellmask, nuclei and cytoplasm objects

(takes < 1 sec)

Builde the segmentations in order




In [9]:

###################
# CELLMASK, NUCLEI, CYTOPLASM, NUCLEUS
###################
nuclei_obj =  get_nuclei(img_data,meta_dict, out_data_path)
cellmask_obj = get_cellmask(img_data, nuclei_obj, meta_dict, out_data_path)
cyto_mask = get_cytoplasm(nuclei_obj , cellmask_obj , meta_dict, out_data_path)



loaded  inferred 3D `nuclei`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `cellmask`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `cytoplasm`  from /Users/ergonyc/Projects/Imaging/data/out 


-------------------------
## regionprops

`skimage.measure.regionprops` provides the basic tools nescessary to quantify our segmentations.

First lets see what works in 3D.  

> Note: the names of the regionprops correspond to the 2D analysis even for those which are well defined in 3D.  i.e. "area" is actually "volume" in 3D, etc.


-----------------
## basic stats

### per-organelle


- regionprops 


### summary stats

- group + aggregate:  surface_area, volume
  - median
  - mean
  - std 
  - count

- normalizers
  - CELLMASK?
  - CYTOPLASM?

### nuclei caveats
The other organelles are sensibly normalized by cytoplasm.  does normalizing the nuclei by cytoplasm make sense?  or use cellmask?

Lets see which possible measures are sensible for 3D or volumetric with regionprops

In [10]:
if False:
    labels = label(nuclei_obj )
    rp = regionprops(labels, intensity_image=img_data[NUC_CH])

    supported = [] 
    unsupported = []

    for prop in rp[0]:
        try:
            rp[0][prop]
            supported.append(prop)
        except NotImplementedError:
            unsupported.append(prop)

    print("Supported properties:")
    print("  " + "\n  ".join(supported))
    print()
    print("Unsupported properties:")
    print("  " + "\n  ".join(unsupported))

Supported properties:
-  area
-  bbox
-  bbox_area
-  centroid
-  convex_area
-  convex_image
-  coords
-  equivalent_diameter
-  euler_number
-  extent
-  feret_diameter_max
-  filled_area
-  filled_image
-  image
-  inertia_tensor
-  inertia_tensor_eigvals
-  intensity_image
-  label
-  local_centroid
-  major_axis_length
-  max_intensity
-  mean_intensity
-  min_intensity
-  minor_axis_length
-  moments
-  moments_central
-  moments_normalized
-  slice
-  solidity
-  weighted_centroid
-  weighted_local_centroid
-  weighted_moments
-  weighted_moments_central
-  weighted_moments_normalized
-
Unsupported properties:
-  eccentricity
-  moments_hu
-  orientation
-  perimeter
-  perimeter_crofton
-  weighted_moments_hu

In [11]:
# from scipy.ndimage import find_objects
    
# labels = label(nuclei_obj ).astype("int")
# objects = find_objects(labels)

# # objects are the slices into the original array for each organelle

In [12]:
# get overall summary stats for cellmask
cm_intensity =  raw_cellmask_fromaggr(img_data, scale_min_max=False)


In [13]:
cellmask_obj.dtype

dtype('bool')

Now we want to get a list of our organelle names, segmentations, intensities (florescence)

In [14]:
# names of organelles we have
organelle_names = ["nuc","lyso", "mito","golgi","perox","ER","LD"]

get_methods  = [get_nuclei,
                get_lyso,
                get_mito,
                get_golgi,
                get_perox,
                get_ER,
                get_LD]

# load all the organelle segmentations
organelles = [meth(img_data,meta_dict, out_data_path) for meth in get_methods]

# get the intensities
organelle_channels = [NUC_CH, LYSO_CH,MITO_CH,GOLGI_CH,PEROX_CH,ER_CH,LD_CH]

intensities = [img_data[ch] for ch in organelle_channels]


loaded  inferred 3D `nuclei`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `lyso`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded lyso in (0.00) sec
loaded  inferred 3D `mitochondria`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `golgi`  from /Users/ergonyc/Projects/Imaging/data/out 
starting segmentation...
loaded  inferred 3D `peroxisome`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded peroxisome in (0.00) sec
loaded  inferred 3D `er`  from /Users/ergonyc/Projects/Imaging/data/out 
loaded  inferred 3D `lipid`  from /Users/ergonyc/Projects/Imaging/data/out 



-----------------
## CONTACTS (cross-stats)

### organelle cross stats


- regionprops 



- intersect for A vs all other organelles Bi
  - regionprops on A ∩ Bi

   
- contacts?
  - dilate then intersect?
  - loop through each sub-object for each 



In [26]:

from infer_subc.utils.stats import _assert_uint16_labels

organelle_to_colname = {"nuc":"NU", "lyso": "LY", "mito":"MY", "golgi":"GL", "perox":"PR", "ER":"ER", "LD":"LD", "cell":"CM", "cyto":"CY"}

def _make_organelle_stat_tables(
    organelle_names: List[str],
    organelles: List[np.ndarray],
    intensities: List[np.ndarray],
    nuclei_obj:np.ndarray, 
    cellmask_obj:np.ndarray,
    organelle_mask: np.ndarray, 
    out_data_path: Path, 
    source_file: str,
    n_rad_bins: Union[int,None] = None,
    n_zernike: Union[int,None] = None,
) -> int:
    """
    get summary and all cross stats between organelles `a` and `b`
    calls `get_summary_stats_3D`
    """
    count = 0
    org_stats_tabs = []
    for j, target in enumerate(organelle_names):
        # print(f"getting stats for A = {target}")
        org_img = intensities[j]        
        org_obj = _assert_uint16_labels(organelles[j])


        # A_stats_tab, rp = get_simple_stats_3D(A,mask)
        a_stats_tab, rp = get_summary_stats_3D(org_obj, org_img, organelle_mask)
        a_stats_tab.insert(loc=0,column='organelle',value=target )
        a_stats_tab.insert(loc=0,column='ID',value=source_file.stem )

        # add the touches for all other organelles
        # loop over Bs
        merged_tabs = []
        for i, nmi in enumerate(organelle_names):
            if i != j:
                # get overall stats of intersection
                # print(f"  b = {nmi}")
                count += 1
                # add the list of touches
                b = _assert_uint16_labels(organelles[i])

                ov = []
                b_labs = []
                labs = []
                for idx, lab in enumerate(a_stats_tab["label"]):  # loop over A_objects
                    xyz = tuple(rp[idx].coords.T)
                    cmp_org = b[xyz]
                    
                    # total number of overlapping pixels
                    overlap = sum(cmp_org > 0)
                    # overlap?
                    labs_b = cmp_org[cmp_org > 0]
                    b_js = np.unique(labs_b).tolist()

                    # if overlap > 0:
                    labs.append(lab)
                    ov.append(overlap)
                    b_labs.append(b_js)

                cname = organelle_to_colname[nmi]
                # add organelle B columns to A_stats_tab
                a_stats_tab[f"{cname}_overlap"] = ov
                a_stats_tab[f"{cname}_labels"] = b_labs  # might want to make this easier for parsing later

                #####  2  ###########
                # get cross_stats

                cross_tab = get_aXb_stats_3D(org_obj, b, organelle_mask) 
                shell_cross_tab = get_aXb_stats_3D(org_obj, b, organelle_mask, use_shell_a=True)
                            
                cross_tab["organelle_b"]=nmi
                #  Merge cross_tabs and shell_cross_tabs 

                merged_tab = pd.merge(cross_tab,shell_cross_tab, on="label_")
                merged_tabs.append( merged_tab )


        #  Now append the 
        # csv_path = out_data_path / f"{source_file.stem}-{target}_shellX{nmi}-stats.csv"
        # e_stats_tab.to_csv(csv_path)
        # stack these tables for each organelle
        crossed_tab = pd.concat(merged_tabs)
        # csv_path = out_data_path / f"{source_file.stem}-{target}X{nmi}-stats.csv"
        # stats_tab.to_csv(csv_path)
        crossed_tab.insert(loc=0,column='organelle',value=target )
        crossed_tab.insert(loc=0,column='ID',value=source_file.stem )

        # now get radial stats
        rad_stats,z_stats, _ = get_radial_stats(        
                cellmask_obj,
                organelle_mask,
                org_obj,
                org_img,
                target,
                nuclei_obj,
                n_rad_bins,
                n_zernike
                )
        
        d_stats = get_depth_stats(        
                cellmask_obj,
                organelle_mask,
                org_obj,
                org_img,
                target,
                nuclei_obj
                )
                
        proj_stats = pd.merge(rad_stats, z_stats,on=["organelle","mask"])
        proj_stats = pd.merge(proj_stats, d_stats,on=["organelle","mask"])
        proj_stats.insert(loc=0,column='ID',value=source_file.stem )

        # write out files... 
        # org_stats_tabs.append(A_stats_tab)
        csv_path = out_data_path / f"{source_file.stem}-{target}-stats.csv"
        a_stats_tab.to_csv(csv_path)

        csv_path = out_data_path / f"{source_file.stem}-{target}-cross-stats.csv"
        crossed_tab.to_csv(csv_path)

        csv_path = out_data_path / f"{source_file.stem}-{target}-proj-stats.csv"
        proj_stats.to_csv(csv_path)

        count += 1

    print(f"dumped {count}x3 organelle stats csvs")
    return count



In [27]:
n_rad_bins = 5
n_zernike = 9



from infer_subc.utils.stats_helpers import make_organelle_stat_tables

In [28]:
file_count =_make_organelle_stat_tables(organelle_names, 
                                      organelles,
                                      intensities, 
                                      nuclei_obj,
                                      cellmask_obj,
                                      cyto_mask, 
                                      out_data_path, 
                                      source_file,
                                      n_rad_bins=5,
                                      n_zernike=9)



KeyError: 'rkey'


-----------------
##  SUMMARY STATS  
> WARNING: (🚨🚨🚨🚨 WIP)
### normalizations.

- overlaps, normalized by CYTOPLASM, A, and B
- per cell averages, medians, std, and totals

These is all pandas munging and very straightforward tabular manipulation.


In [20]:
target = organelle_names[1]

csv_path = out_data_path / f"{source_file.stem}-{target}-stats.csv"

mito_table = pd.read_csv(csv_path)
mito_table.head()

Unnamed: 0.1,Unnamed: 0,ID,organelle,label,max_intensity,mean_intensity,min_intensity,volume,equivalent_diameter,centroid-0,...,MY_overlap,MY_labels,GL_overlap,GL_labels,PR_overlap,PR_labels,ER_overlap,ER_labels,LD_overlap,LD_labels
0,0,ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed,lyso,1,6618,2426.684211,0,76,5.25539,1.328947,...,0,[],0,[],0,[],0,[],0,[]
1,1,ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed,lyso,2,65535,16752.316071,0,30857,38.915119,6.45053,...,3602,"[6, 47, 48, 49, 70, 80, 88, 114, 130, 133, 151...",2019,[1],77,"[3, 5, 6, 15, 17, 20, 21]",3740,[1],0,[]
2,2,ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed,lyso,5,21639,10844.098901,2242,1547,14.349295,2.798319,...,0,[],0,[],0,[],139,[1],0,[]
3,3,ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed,lyso,6,5773,2776.759036,480,83,5.412025,1.445783,...,0,[],0,[],0,[],0,[],0,[]
4,4,ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed,lyso,7,11263,3762.868852,482,61,4.884016,1.0,...,0,[],0,[],0,[],0,[],0,[]


In [21]:
mito_table.volume.mean()

480.036231884058


-----------------
## DISTRIBUTION  


### Radial distribution 

### 2D projection of inferred objects (and masks, florescence image)

Segment image in 3D;
sum projection of binary image; 
create 5 concentric rings going from the edge of the nuclie to the edge of the cellmask (ideally these will be morphed to cellmask/nuclei shape as done in CellProfiler); 
measure intensity per ring (include nuclei as the center area to measure from)/ring area; 
the normalized measurement will act as a frequency distribution of that organelle starting from the nuclei bin going out to the cell membrane - 
Measurements needed: mean, median, and standard deviation of the frequency will be calculated

- pre-processing
  1. Make 2D sum projection of binary segmentation
  2. Create 5 (default) bins linearly between edge of the nuclei to the edge of the cellmask - these are somewhat like rings morphed to the shape of the nuclei and cellmask, or more accurately like terrain lines of the normalized radial distance beween teh edge of the nuclei and the edge of the cellmask.
  3. Use nucleus + concentric rings to mask the 2D sum project into radial distribution regions: nuclei = bin 1, ... largest/outter most ring = bin 6. See similar concept in CellProfiler: https://cellprofiler-manual.s3.amazonaws.com/CellProfiler-4.2.5/modules/measurement.html?highlight=distribution#module-cellprofiler.modules.measureobjectintensitydistribution"	
   


The logic was borrowed from CellProfiler, but alorithm somewhate simplified by making assumpitions of doing all estimates over a single cellmask (single cell).   Most of the code should be capable of performing the more complicated multi-object versions as CellProfiler does.  Although this functionality is untested the source code was left in this more complex format in case it might be updated for this functionality in the future.


In [22]:
# csv_path = out_data_path / f"{o}_{meta_dict["file_name"].split('/')[-1].split('.')[0]}_stats.csv"
Path(meta_dict['file_name']).name

'ZSTACK_PBTOhNGN2hiPSCs_BR1_N19_Unmixed.czi'

In [23]:
organelles[0].shape, organelle_names[0]

((15, 768, 768), 'nuclei')

In [24]:
test_org = 3

# args 
cellmask_obj
nuclei_obj
organelle_mask = cyto_mask
organelle_name = organelle_names[test_org]
organelle_obj = organelles[test_org]
organelle_img = intensities[test_org]



In [25]:
rad_stats, bin_idx = get_radial_stats(        
        cellmask_obj,
        organelle_mask,
        organelle_obj,
        organelle_img,
        organelle_name,
        nuclei_obj
        )


TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

In [26]:
rad_stats

# check divide by zero bug for zero bins

Unnamed: 0,organelle,mask,bin,n_bins,n_pix,cm_vox_cnt,org_vox_cnt,org_intensity,cm_radial_cv,org_radial_cv,img_radial_cv
0,golgi,cellmask,Ctr,5,43090.0,560133.0,9252.0,346229590.0,0.039089,2.285363,1.548474
1,golgi,cellmask,1,5,13370.0,133005.0,3378.0,175025329.0,0.238991,1.665067,1.153239
2,golgi,cellmask,2,5,19158.0,157767.0,3359.0,138790537.0,0.383749,1.471041,1.190105
3,golgi,cellmask,3,5,29521.0,181294.0,2900.0,86229287.0,0.463339,1.860701,0.70332
4,golgi,cellmask,4,5,75192.0,204117.0,2312.0,73259381.0,0.354069,2.225926,0.540664


In [27]:
viewer = napari.view_image(bin_idx)


### depth - summary
Segment image in 3D;
measure area fraction of each organelle per Z slice;
these measurements will act as a frequency distribution of that organelle starting from the bottom of the cellmask (not including neurites) to the top of the cellmask;
measurements: mean, median, and standard deviation of the frequency distribution	

- pre-processing
  1. subtract nuclei from the cellmask --> cellmask cytoplasm
  2. mask organelle channels with cellmask cytoplasm mask

- per-object measurements
  - For each Z slice in the masked binary image measure:
    1. organelle area
    2. cellmask cytoplasm area

- per-object calculations
  - For each Z slice in the masked binary image: organelle area / cellmask cytoplasm area

- per cell summary
  1. create a frequency table with the z slice number on the x axis and the area fraction on the y axis
  2. Measure the frequency distribution's mean, median, and standard deviation for each cell"

In [28]:
viewer.add_image(cellmask_obj>0)

<Image layer 'Image' at 0x2bf465db0>

In [29]:


# flattened
cellmask_proj = create_masked_depth_projection(cellmask_obj)
org_proj = create_masked_depth_projection(organelle_obj,organelle_mask.astype(bool))
img_proj = create_masked_depth_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

nucleus_proj = create_masked_depth_projection(nuclei_obj,cellmask_obj.astype(bool)) if nuclei_obj is not None else None


In [30]:
z_stats = get_depth_stats(        
        cellmask_obj,
        organelle_mask,
        organelle_obj,
        organelle_img,
        organelle_name,
        nuclei_obj
        )

In [None]:
import numpy as np
from skimage.measure import regionprops_table, regionprops, mesh_surface_area, marching_cubes, label
from skimage.morphology import binary_erosion
from skimage.measure._regionprops import _props_to_dict
from typing import Tuple, Any, Union

from scipy.ndimage import maximum_position, center_of_mass
from scipy.ndimage import sum as ndi_sum
from scipy.sparse import coo_matrix

import centrosome.cpmorphology
import centrosome.propagate
import centrosome.zernike

from infer_subc.core.img import apply_mask

import pandas as pd


def get_summary_stats_3D(input_labels: np.ndarray, intensity_img, mask: np.ndarray) -> Tuple[Any, Any]:
    """collect volumentric stats from skimage.measure.regionprops
        properties = ["label","max_intensity", "mean_intensity", "min_intensity" ,"area"->"volume" , "equivalent_diameter",
        "centroid", "bbox","euler_number", "extent"
        +   extra_properties = [standard_deviation_intensity]

    Parameters
    ------------
    input_obj:
        a 3d  np.ndarray image of the inferred organelle (labels or boolean)
    cellmask:
        a 3d image containing the cellmask object (mask)
    mask:
        a 3d image containing the cellmask object (mask)

    Returns
    -------------
    pandas dataframe of stats and the regionprops object
    """
    # in case we sent a boolean mask (e.g. cyto, nucleus, cellmask)
    input_labels = _assert_uint16_labels(input_labels)
    # mask
    input_labels = apply_mask(input_labels, mask)
    # start with LABEL
    properties = ["label"]
    # add intensity:
    properties = properties + ["max_intensity", "mean_intensity", "min_intensity"]

    # arguments must be in the specified order, matching regionprops
    def standard_deviation_intensity(region, intensities):
        return np.std(intensities[region])

    extra_properties = [standard_deviation_intensity]

    # add area
    properties = properties + ["area", "equivalent_diameter"]
    #  position:
    properties = properties + ["centroid", "bbox"]  # , 'bbox', 'weighted_centroid']
    # etc
    properties = properties + ["euler_number", "extent"]  # only works for BIG organelles: 'convex_area','solidity',

    rp = regionprops(input_labels, intensity_image=intensity_img, extra_properties=extra_properties)

    props = _my_props_to_dict(
        rp, input_labels, intensity_image=intensity_img, properties=properties, extra_properties=extra_properties
    )

    props["surface_area"] = surface_area_from_props(input_labels, props)
    props_table = pd.DataFrame(props)
    props_table.rename(columns={"area": "volume"}, inplace=True)
    #  # ETC.  skeletonize via cellprofiler /Users/ahenrie/Projects/Imaging/CellProfiler/cellprofiler/modules/morphologicalskeleton.py
    #         if x.volumetric:
    #             y_data = skimage.morphology.skeletonize_3d(x_data)
    # /Users/ahenrie/Projects/Imaging/CellProfiler/cellprofiler/modules/measureobjectskeleton.py

    return props_table, rp



In [None]:

def get_simple_stats_3D(a, mask):
    """collect volumentric stats of `a`"""

    properties = ["label"]  # our index to organelles
    # add area
    properties = properties + ["area", "equivalent_diameter"]
    #  position:
    properties = properties + ["centroid", "bbox"]  # ,  'weighted_centroid']
    # etc
    properties = properties + ["slice"]

    # in case we sent a boolean mask (e.g. cyto, nucleus, cellmask)
    labels = _assert_uint16_labels(a)

    # props = regionprops_table(labels, intensity_image=None,
    #                             properties=properties, extra_properties=[])

    rp = regionprops(labels, intensity_image=None, extra_properties=[])
    props = _my_props_to_dict(rp, labels, intensity_image=None, properties=properties, extra_properties=None)

    stats_table = pd.DataFrame(props)
    stats_table.rename(columns={"area": "volume"}, inplace=True)

    return stats_table, rp



def get_aXb_stats_3D(a, b, mask, use_shell_a=False):
    """
    collect volumentric stats of `a` intersect `b`
    """
    properties = ["label"]  # our index to organelles
    # add area
    properties = properties + ["area", "equivalent_diameter"]
    #  position:
    properties = properties + ["centroid", "bbox"]  # ,  'weighted_centroid']
    # etc
    properties = properties + ["slice"]

    # in case we sent a boolean mask (e.g. cyto, nucleus, cellmask)
    a = _assert_uint16_labels(a)
    b = _assert_uint16_labels(b)

    if use_shell_a:
        a_int_b = np.logical_and(np.logical_xor(a > 0, binary_erosion(a > 0)), b > 0)
    else:
        a_int_b = np.logical_and(a > 0, b > 0)

    labels = label(apply_mask(a_int_b, mask)).astype("int")

    props = regionprops_table(labels, intensity_image=None, properties=properties, extra_properties=None)

    props["surface_area"] = surface_area_from_props(labels, props)
    props["label_a"] = [a[s].max() for s in props["slice"]]
    props["label_b"] = [b[s].max() for s in props["slice"]]
    props_table = pd.DataFrame(props)
    props_table.rename(columns={"area": "volume"}, inplace=True)
    props_table.drop(columns="slice", inplace=True)

    return props_table





###################################
### DISTRIBUTIONAL STATS
###################################
def get_radial_stats(        
        cellmask_obj: np.ndarray,
        organelle_mask: np.ndarray,
        organelle_obj:np.ndarray,
        organelle_img: np.ndarray,
        organelle_name: str,
        nuclei_obj: np.ndarray,
        n_rad_bins: Union[int,None] = None,
        n_zernike: Union[int,None] = None,
        ):

    """
    Params


    Returns
    -----------
    rstats table of radial distributions
    zstats table of zernike magnitudes and phases
    rad_bins image of the rstats bins over the cellmask_obj 

    """


    # flattened
    cellmask_proj = create_masked_Z_projection(cellmask_obj)
    org_proj = create_masked_Z_projection(organelle_obj,organelle_mask.astype(bool))
    img_proj = create_masked_Z_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

    nucleus_proj = create_masked_Z_projection(nuclei_obj,cellmask_obj.astype(bool)) 

    radial_stats, radial_bin_mask = get_radial_distribution(cellmask_proj=cellmask_proj, 
                                                            org_proj=org_proj, 
                                                            img_proj=img_proj, 
                                                            org_name=organelle_name, 
                                                            nucleus_proj=nucleus_proj, 
                                                            n_bins=n_rad_bins
                                                            )
    
    zernike_stats = get_zernike_stats(
                                      cellmask_proj=cellmask_proj, 
                                      org_proj=org_proj, 
                                      img_proj=img_proj, 
                                      organelle_name=organelle_name, 
                                      nucleus_proj=nucleus_proj, 
                                      zernike_degree = n_zernike
                                      )

    return radial_stats,zernike_stats,radial_bin_mask
    


def get_radial_distribution(
        cellmask_proj: np.ndarray,
        org_proj: np.ndarray,
        img_proj: np.ndarray,
        org_name: str,
        nucleus_proj: Union[np.ndarray, None],
        n_bins: int = 5,
        from_edges: bool = True,
    ):
    """Perform the radial measurements on the image set

    Parameters
    ------------
    cellmask_proj: np.ndarray,
    org_proj: np.ndarray,
    img_proj: np.ndarray,
    org_name: str,
    nucleus_proj: Union[np.ndarray, None],
    n_bins: int = 5,
    from_edges: bool = True,

    masked

    # params
    #   n_bins .e.g. 6
    #   normalizer - cellmask_voxels, organelle_voxels, cellmask_and_organelle_voxels
    #   from_edges = True


    Returns
    -------------
    returns one statistics table + bin_indexes image array
    """

    # other params
    bin_count = n_bins if n_bins is not None else 5
    scale_bins = True 
    keep_nuc_bins = True # this toggles whether to count things inside the nuclei mask.  
    center_on_nuc = False # choosing the edge of the nuclei or the center as the center to propogate from

    if nucleus_proj is not None:
        center_objects = nucleus_proj>0 

    nobjects = 1
    labels = label(cellmask_proj>0) #extent as 0,1 rather than bool    


    ################   ################
    ## define masks for computing distances
    ################   ################
    normalized_distance, good_mask, i_center, j_center = get_normalized_distance_and_mask(labels, center_objects, center_on_nuc, keep_nuc_bins)
    
    if normalized_distance is None:
        print('WTF!!  normailzed_distance returned wrong')

    ################   ################
    ## get histograms
    ################   ################
    ngood_pixels = np.sum(good_mask)
    good_labels = labels[good_mask]
    nobjects = 1


    # protect against None normaized_distances


    bin_indexes = (normalized_distance * bin_count).astype(int)
    bin_indexes[bin_indexes > bin_count] = bin_count # shouldn't do anything

    #                 (    i          ,         j              )
    labels_and_bins = (good_labels - 1, bin_indexes[good_mask])
    #                coo_matrix( (             data,             (i, j)    ), shape=                      )
    histogram_cmsk = coo_matrix( (cellmask_proj[good_mask], labels_and_bins), shape=(nobjects, bin_count) ).toarray()
    histogram_org = coo_matrix(  (org_proj[good_mask],      labels_and_bins), shape=(nobjects, bin_count) ).toarray()
    histogram_img = coo_matrix(  (img_proj[good_mask],      labels_and_bins), shape=(nobjects, bin_count) ).toarray()

    bin_indexes = (normalized_distance * bin_count).astype(int)

    sum_by_object_cmsk = np.sum(histogram_cmsk, 1) # flattened cellmask voxel count
    sum_by_object_org = np.sum(histogram_org, 1)  # organelle voxel count
    sum_by_object_img = np.sum(histogram_img, 1)  # image intensity projection

    # DEPRICATE: since we are NOT computing object_i by object_i (individual organelle labels)
    # sum_by_object_per_bin = np.dstack([sum_by_object] * (bin_count + 1))[0]
    # fraction_at_distance = histogram / sum_by_object_per_bin

    # number of bins.
    number_at_distance = coo_matrix(( np.ones(ngood_pixels), labels_and_bins), (nobjects, bin_count)).toarray()

    # sicne we aren't breaking objects apart this is just ngood_pixels

    sum_by_object = np.sum(number_at_distance, 1)

    sum_by_object_per_bin = np.dstack([sum_by_object] * (bin_count))[0]
    fraction_at_bin = number_at_distance / sum_by_object_per_bin # sums to 1.0

    # object_mask = number_at_distance > 0
    # DEPRICATE:# not doing over multiple objects so don't need object mask.. or fractionals
    # mean_pixel_fraction = fraction_at_distance / ( fraction_at_bin + np.finfo(float).eps )
    # masked_fraction_at_distance = np.ma.masked_array( fraction_at_distance, ~object_mask )
    # masked_mean_pixel_fraction = np.ma.masked_array(mean_pixel_fraction, ~object_mask)

    ################   ################
    ## collect Anisotropy calculation.  + summarize
    ################   ################
    # Split each cell into eight wedges, then compute coefficient of variation of the wedges' mean intensities
    # in each ring. Compute each pixel's delta from the center object's centroid
    i, j = np.mgrid[0 : labels.shape[0], 0 : labels.shape[1]]
    imask = i[good_mask] > i_center[good_mask]
    jmask = j[good_mask] > j_center[good_mask]
    absmask = abs(i[good_mask] - i_center[good_mask]) > abs(
        j[good_mask] - j_center[good_mask]
    )
    radial_index = (
        imask.astype(int) + jmask.astype(int) * 2 + absmask.astype(int) * 4
    )

    statistics = []
    object_name = 'cellmask'
    # collect the numbers from each "bin"
    for bin in range(bin_count):
        bin_mask = good_mask & (bin_indexes == bin)
        bin_pixels = np.sum(bin_mask)

        bin_labels = labels[bin_mask]

        bin_radial_index = radial_index[bin_indexes[good_mask] == bin]
        labels_and_radii = (bin_labels - 1, bin_radial_index)
        pixel_count = coo_matrix( (np.ones(bin_pixels), labels_and_radii), (nobjects, 8) ).toarray()

        radial_counts_cmsk = coo_matrix( (cellmask_proj[bin_mask], labels_and_radii), (nobjects, 8) ).toarray()
        radial_counts = coo_matrix( (org_proj[bin_mask], labels_and_radii), (nobjects, 8) ).toarray()
        radial_values = coo_matrix( (img_proj[bin_mask], labels_and_radii), (nobjects, 8) ).toarray()

        # we might need the masked arrays for some organelles... but I think not. keeping for now
        mask = pixel_count == 0

        radial_means_cmsk = np.ma.masked_array(radial_counts_cmsk / pixel_count, mask)
        radial_cv_cmsk = np.std(radial_means_cmsk, 1) / np.mean(radial_means_cmsk, 1)
        radial_cv_cmsk[np.sum(~mask, 1) == 0] = 0
        radial_cv_cmsk.mask = np.sum(~mask, 1) == 0


        radial_means_obj = np.ma.masked_array(radial_counts / pixel_count, mask)
        radial_cv_obj = np.std(radial_means_obj, 1) / np.mean(radial_means_obj, 1)
        radial_cv_obj[np.sum(~mask, 1) == 0] = 0
        radial_cv_obj.mask = np.sum(~mask, 1) == 0

        radial_means_img = np.ma.masked_array(radial_values / pixel_count, mask)
        radial_cv_img = np.std(radial_means_img, 1) / np.mean(radial_means_img, 1)
        radial_cv_img[np.sum(~mask, 1) == 0] = 0
        radial_cv_img.mask = np.sum(~mask, 1) == 0

        bin_name = str(bin) if bin > 0 else "Ctr"

        # there's gotta be a better way to collect this stuff together... pandas?
        statistics += [
            (org_name,
                object_name,
                bin_name,
                str(bin_count),
                np.mean(number_at_distance[:, bin]), 
                np.mean(histogram_cmsk[:, bin]), 
                np.mean(histogram_org[:, bin]), 
                np.mean(histogram_img[:, bin]), 
                np.mean(radial_cv_cmsk) ,
                np.mean(radial_cv_obj) ,
                np.mean(radial_cv_img) )
        ]

    # TODO:  construct pictures of the histogram levels mapped to the bin_indexes array
    stats_tab = pd.DataFrame(statistics,columns=['organelle','mask','bin','n_bins','n_pix','cm_vox_cnt','org_vox_cnt','org_intensity','cm_radial_cv','org_radial_cv','img_radial_cv'])

    return stats_tab, bin_indexes



def create_masked_depth_projection(img_in:np.ndarray, mask:Union[np.ndarray, None]=None, to_bool:bool=True) -> np.ndarray:
    """
    create masked projection onto the Z dimension
    """
    img_out = img_in.astype(bool) if to_bool else img_in
    if mask is not None:
        img_out = apply_mask(img_out, mask)
    
    return img_out.sum(axis=(1,2))


def get_depth_stats(        
        cellmask_obj: np.ndarray,
        organelle_mask: np.ndarray,
        organelle_obj:np.ndarray,
        organelle_img: np.ndarray,
        organelle_name: str,
        nuclei_obj: Union[np.ndarray, None],
        ):
    """

    """

    # flattened
    cellmask_proj = create_masked_depth_projection(cellmask_obj)
    org_proj = create_masked_depth_projection(organelle_obj,organelle_mask.astype(bool))
    img_proj = create_masked_depth_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

    nucleus_proj = create_masked_depth_projection(nuclei_obj,cellmask_obj.astype(bool)) if nuclei_obj is not None else None


    stats_tab = pd.DataFrame({'organelle':organelle_name,
                              'mask':'cell',
                              'bin':range(cellmask_obj.shape[0]),
                              'n_bins':cellmask_obj.shape[0],
                              'cm_vox_cnt':cellmask_proj,
                              'org_vox_cnt':org_proj,
                              'org_intensity':img_proj,
                              'nuc_vox_cnt':nucleus_proj})
    return stats_tab
    

# Zernicke routines.  inspired by cellprofiler, but heavily simplified
def zernicke_stat(pixels,z):
    """
    
    """
    vr = np.sum(pixels[:,:,np.newaxis]*z.real, axis=(0,1))
    vi = np.sum(pixels[:,:,np.newaxis]*z.imag, axis=(0,1))    
    magnitude = np.sqrt(vr * vr + vi * vi) / pixels.sum()
    phase = np.arctan2(vr, vi)
    # return {"zer_mag": magnitude, "zer_phs": phase}
    return magnitude, phase



def zernike_polynomial(labels, zernike_is):
    """
    

    """
    # First, get a table of centers and radii of minimum enclosing
    # circles for the cellmask
    ij, r = centrosome.cpmorphology.minimum_enclosing_circle( labels )
    # Then compute x and y, the position of each labeled pixel
    # within a unit circle around the object
    iii, jjj = np.mgrid[0 : labels.shape[0], 0 : labels.shape[1]]

    # translate+scale
    iii = (iii-ij[0][0] ) / r
    jjj = (jjj-ij[0][1] ) / r

    z = centrosome.zernike.construct_zernike_polynomials(
        iii, jjj, zernike_is
    )
    return z
 

   
def get_zernike_stats(        
        cellmask_proj: np.ndarray,
        org_proj:np.ndarray,
        img_proj: np.ndarray,
        organelle_name: str,
        nucleus_proj: Union[np.ndarray, None] = None,
        zernike_degree: int = 9
        ):

    """
    
    """

    nobjects = 1
    labels = label(cellmask_proj>0) #extent as 0,1 rather than bool
    zernike_indexes = centrosome.zernike.get_zernike_indexes( zernike_degree + 1)


    z = zernike_polynomial(labels, zernike_indexes)

    z_cm = zernicke_stat(cellmask_proj, z)
    z_org = zernicke_stat(org_proj, z)
    z_nuc = zernicke_stat(nucleus_proj, z)
    z_img = zernicke_stat(img_proj, z)


    nm_labels = [f"{n}_{m}" for (n, m) in (zernike_indexes)]


    stats_tab = pd.DataFrame({'organelle':organelle_name,
                              'mask':'cellmask',
                              'zernike_n':zernike_indexes[:,0],
                              'zernike_m':zernike_indexes[:,1],
                                'cm_zer_mag':z_cm[0],
                                'cm_zer_phs':z_cm[1],   
                                'obj_zer_mag':z_org[0],
                                'obj_zer_phs':z_org[1],
                                'nuc_zer_mag':z_nuc[0],
                                'nuc_zer_phs':z_nuc[1],
                                'img_zer_mag':z_img[0],
                                'img_zer_mag':z_img[1]}
                                )


    return stats_tab


# untested 2D version
def get_summary_stats_2D(input_labels, intensity_img, mask):
    """collect volumentric stats"""

    # in case we sent a boolean mask (e.g. cyto, nucleus, cellmask)
    input_labels = _assert_uint16_labels(input_labels)

    # mask
    input_labels = apply_mask(input_labels, mask)

    properties = ["label"]
    # add intensity:
    properties = properties + ["max_intensity", "mean_intensity", "min_intensity"]

    # arguments must be in the specified order, matching regionprops
    def standard_deviation_intensity(region, intensities):
        return np.std(intensities[region])

    extra_properties = [standard_deviation_intensity]

    # add area
    properties = properties + ["area", "equivalent_diameter"]
    #  position:
    properties = properties + ["centroid", "bbox"]

    #  perimeter:
    properties = properties + ["perimeter", "perimeter_crofton"]

    rp = regionprops(input_labels, intensity_image=intensity_img, extra_properties=extra_properties)
    props = _my_props_to_dict(
        rp, input_labels, intensity_image=intensity_img, properties=properties, extra_properties=extra_properties
    )
    props_table = pd.DataFrame(props)
    #  # ETC.  skeletonize via cellprofiler /Users/ahenrie/Projects/Imaging/CellProfiler/cellprofiler/modules/morphologicalskeleton.py
    #             y_data = skimage.morphology.skeletonize(x_data)
    # /Users/ahenrie/Projects/Imaging/CellProfiler/cellprofiler/modules/measureobjectskeleton.py
    return props_table, rp


In [31]:
cellmask_proj = create_masked_depth_projection(cellmask_obj)
org_proj = create_masked_depth_projection(organelle_obj,organelle_mask.astype(bool))
img_proj = create_masked_depth_projection(organelle_img,organelle_mask.astype(bool), to_bool=False)

nucleus_proj = create_masked_depth_projection(nuclei_obj,cellmask_obj.astype(bool)) if nuclei_obj is not None else None


In [32]:
z_stats

Unnamed: 0,organelle,mask,bin,n_bins,cm_vox_cnt,org_vox_cnt,org_intensity,nuc_vox_cnt
0,golgi,cellmask,0,15,0,0,0,0
1,golgi,cellmask,1,15,149379,146,55451513,17669
2,golgi,cellmask,2,15,121726,1086,64558080,24205
3,golgi,cellmask,3,15,109521,1941,77464132,29056
4,golgi,cellmask,4,15,99450,1777,81142030,31212
5,golgi,cellmask,5,15,94083,2364,85255502,32536
6,golgi,cellmask,6,15,90942,2512,79605412,32729
7,golgi,cellmask,7,15,93625,2241,72520796,32333
8,golgi,cellmask,8,15,90628,2948,72623683,30862
9,golgi,cellmask,9,15,85107,2635,67477771,27104


In [33]:
n_files = dump_projection_stats(organelle_names, organelles,intensities, cellmask_obj, nuclei_obj, cyto_mask, out_data_path, source_file)

dumped 7x2 csvs


In [34]:
n_files

7

In [35]:
organelle_names

['nuclei', 'lyso', 'mitochondria', 'golgi', 'peroxisome', 'er', 'lipid']



## Zernicky distributions...
- get the magnitude and phase for the zernike 
- he Zernike features characterize the distribution of intensity across the object. For instance, Zernike 1,1 has a high value if the intensity is low on one side of the object and high on the other. The zernike magnitudes feature records the rotationally invariant degree magnitude of the moment and the zernike phase feature gives the moment’s orientation

`zernike_degree` (default = 9) chooses how many moments to calculate.


The logic was borrowed from CellProfiler, but alorithm greatly simplified by making assumpitions of doing all estimates over a single cellmask (single cell)

In [38]:


stats_table,z = get_zernike_stats(        
                            cellmask_obj,
                            organelle_mask,
                            organelle_obj,
                            organelle_img,
                            organelle_name,
                            nuclei_obj,
                            zernike_degree=9
                            )




In [39]:
stats_table

Unnamed: 0,organelle,mask,zernike_n,zernike_m,cm_zer_mag,cm_zer_phs,obj_zer_mag,obj_zer_phs,nuc_zer_mag,nuc_zer_phs,img_zer_mag
0,golgi,cellmask,0,0,1.0,1.570796,1.0,1.570796,1.0,1.570796,1.570796
1,golgi,cellmask,1,1,0.353801,2.243572,0.545888,2.246698,0.290442,2.234844,2.437413
2,golgi,cellmask,2,0,0.445706,-1.570796,0.267229,-1.570796,0.769744,-1.570796,-1.570796
3,golgi,cellmask,2,2,0.111563,-2.901157,0.242907,2.807452,0.08095,2.913656,-2.824003
4,golgi,cellmask,3,1,0.33181,-0.989106,0.468878,-0.732337,0.457448,-0.908659,-0.700925
5,golgi,cellmask,3,3,0.02672,-1.161429,0.071244,3.033166,0.022256,-2.747794,-1.650807
6,golgi,cellmask,4,0,0.027831,1.570796,0.27892,-1.570796,0.417359,1.570796,-1.570796
7,golgi,cellmask,4,2,0.12842,0.073232,0.368215,-0.000182,0.187584,-0.221252,0.341712
8,golgi,cellmask,4,4,0.029116,0.556027,0.039601,1.902211,0.006403,-2.271546,-0.048984
9,golgi,cellmask,5,1,0.138867,1.953233,0.092661,-2.880913,0.4494,2.226373,2.364811
