Necessary imports

In [1]:
# top level imports
from pathlib import Path
import os, sys
from typing import Optional, Union, Tuple, List, Any
import numpy as np
import itertools

from infer_subc.core.file_io import (read_tiff_image,
                                        list_image_files,
                                        read_czi_image)
from infer_subc.core.img import apply_mask
from skimage.morphology import skeletonize

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

from scipy import stats
from scipy.sparse import csgraph
from infer_subc.quantification.stats import *
from infer_subc.quantification.stats import _assert_uint16_labels

# for the sake of staying within the notebook, I will copy the functions directly

# from infer_subc.quantification.stats_helpers import make_dict, multi_contact, inkeys
from scipy.sparse._coo import coo_matrix
from infer_subc.utils.batch import find_segmentation_tiff_files
from datetime import datetime
import string

from skan import csr
import time
from datetime import datetime

If using the sample data, these are the correct constants

Neuron 1: 

> ```python
> LD_CH = 0
> ER_CH = 1
> GOLGI_CH = 2
> LYSO_CH = 3
> MITO_CH = 4
> PEROX_CH = 5
> ```

Astrocyte: 

> ```python
> LD_CH = 0
> ER_CH = 1
> GOLGI_CH = 2
> LYSO_CH = 3
> MITO_CH = 4
> PEROX_CH = 5
> ```

Neuron 2: 

> ```python
> LD_CH = 6
> ER_CH = 0
> GOLGI_CH = 2
> LYSO_CH = 4
> MITO_CH = 3
> PEROX_CH = 1
> ```

IPSC: 

> ```python
> LD_CH = 0
> ER_CH = 6
> GOLGI_CH = 4
> LYSO_CH = 2
> MITO_CH = 3
> PEROX_CH = 5
> ```

In [2]:
# Establishing the constants
LD_CH = 0
ER_CH = 1
GOLGI_CH = 2
LYSO_CH = 3
MITO_CH = 4
PEROX_CH = 5

region_names = ['nuc', 'cell']

# ***SKELETONIZATION STATS NOTEBOOK ☠️***

-----

## **OBJECTIVE**

After quantifying the organelle and region segmentations, we can also skeletonize these segmentations for additional morphological analysis. The purpose of this notebook is to break down infer-subc's skeletonization analysis in real time, step-by-step. As quoted by Dakai Jin et al. in their work titled *Skeletonization*, “Skeletonization provides a simple yet compact representation of an object while capturing its essential topologic and geometric features”. It can also be used to identify and distinguish organelle shapes. Supplementary metrics, including those related to skeleton shape, length, complexity, and connectivity, are added to the quantification output. Throughout the notebook, skeletons will be analyzed in a fashion that bears some resemblance to trees or node graphs in computer science. Therefore, some of the terminology is shared with these data science concepts. However, there are various nuanced differences; some terms will have an altered definition to match the needs of infer-subc.

We aim to describe in depth the morphology and distribution of the organelle skeletons in three-dimensional anisotropic data, building off of [Skan](https://skeleton-analysis.org/stable/index.html) (skeletonization python package) while also introducing new concepts.

## **Key Terms**

**These are terms and defenitions relating to skeletonization that will be used throughout the notebook**

> **Node Related Terms**

`Point`
- Any voxel/pixel in the skeleton; each point is given an id in skan.

`Connectivity`
- A value that describes the number of neighboring points a voxel/pixel has

`Node`
- A point in a skeleton where at least one branch branch begins or ends (i.e. any point that does not have a connectivity of 2)

`Endpoint`
- a type of node that is either the beginning or end of a singular branch, all endpoints have a connectivity of one

`Junction Node`
- a type of node that is present in multiple branches, functioning as the beginning and or end for each branch it is involved with. All junction nodes have connectivity of at least 3

`Path Point`
- A point that is not a node; all path points have a connectivity of two

`node-id`
- A numeric identification number given to every point in the skeleton, (path points do have node-ids)

`Source Node`
- In theory the start of a branch. In practice, this is automatically assigned to the node with the lower node ID and is not based on direction of any kind. There is no inherent difference between a source node and a destination node.

`Destination Node`
- In theory, the end of a branch. In practice this is automatically assigned to the node with the higher node ID and is not based on direction of any kind. There is no inherent difference between a source node and a destination node.

> **Branch Related Terms**

`Branch`
- a direct pathway between two nodes (or the same node if the branch is a cycle)

`Type 0 Branch`
- a branch that starts at an endpoint and ends at an endpoint (Endpoint to Endpoint), and thus will be isolated

`Type 1 Branch`
- a branch with one endpoint and one junction node as the source and destination nodes regardless of order (Junction to Endpoint)

`Type 2 Branch`
- a branch that both starts and ends at junction nodes (Junction to Junction)

`Type 3 Branch`
- A branch that begins at a point and cycles back to the same point (Cycle). There are two types of Cycles: a connected cycle has a singular junction node that is shared with another branch (for a simple example, imagine a lollipop). An isolated cycle is exclusively composed of path points, thus having zero nodes.

`Path`
- A node to node pathway within a skeleton object (can be the same node). The pathway can include intermediate nodes in any order possible by the structure of the skeleton object. Thus, there is technically an infinite amount of paths within a skeleton object with multiple nodes. This also means that every branch is a path but not vice versa.

`Main Path`
- The longest direct path in a skeleton object between two nodes; there can be intermediate nodes but they must not repeat

> **Skeleton Object Related Terms**

`Skeleton`
- a thinned representation of a shape that reveals equidistant lines with respect to the shape's surface

`Organelle Skeleton`
- the skeletonization of the aggregate organelle segmentation

`Organelle Object`
- a component of the organelle segmentation in which all voxels are connected via contact.

`Skeleton Object`
- The skeletonization of an organelle object (in some rare cases, an organelle object can spawn multiple non connected skeleton components, these components will still be observed as one skeleton object). Because of this there is a one to one correspondence between organelle objects and skeleton objects (skeleton object IDs are identical to the organelle object they represent).

`Punctate`
- A type of skeleton object usually representing a blob-like or spherical organelle object. Absolute Punctates fall under this category; however, a skeleton object consisting of one endpoint to endpoint (Type 0) branch can also be deemed a punctate if the branch length is under a predetermined threshold. This is due to some spherical objects returning skeletons that are not absolute punctates, but endpoint to endpoint branches of minimal length.

`Absolute Punctate`
- a type of skeleton subobject only consisting of a singular node and no branches, all absolute punctates have a connectivity of zero

`Rod`
- A type of skeleton object consisting of one endpoint to endpoint (Type 0) branch of sufficient length (longer than the punctate threshold)

`Isolated Cycle`
-  A type of skeleton object consisting of one cycle (type 3) branch 

`Network`
- A type of skeleton object consisting of multiple branches


---
**Assumptions**

- Each organelle object has **one** skeleton object (one-to-one correspondance); in the case where a skeleton object is not created
an absolute punctate will be placed in the center of the organelle object (this issue is only seen with blob-like objects)

- Even if two organelle objects are in contact, their skeleton objects will be identified as **seperate** skeleton objects

- There is **no inherit difference** between a source node and destination node !Also write about why they are labled this way!

- The only circumstance of when a skeleton object will contain **zero nodes**, is in the case of an **isolated cycle** (as they will all be path points)

- In the rare case that an organelle object is skeletonized and has **multiple disconnected branches**, the object will still be
identified as a **network**
---

## summary of steps

➡️ **INPUT**

- setup

    - choose test img to observe (test_img_n = user input)
    - choose organelle to observe (org = user input)

- loading the image and the masks

- creation of the skeleton object

    - collect labeled organelle segmentation
    - define skel_plus(labeled organelle segmentation)
    - csr.Skeleton(labeled skeleton)

📝 **DEFINE MEASUREMENTS**

- branch table
- node table
- skeleton object table
- skeleton summary table

🛠️ **DEFINE FUNCTIONS**

- define skeleton metrics function
- update to get_org_morphology_3D
- update to make_all_metrics_tables
- update to batch_process_quantification
- update to batch_summary_stats

**OUTPUT** ➡️

- branch output
    - branch table column reference
    - branch table
- node output
    - node table column reference
    - node table
- skeleton object output
    - skeleton object table column reference
    - skeleton object table
    - skeleton summary output
    - skeleton summary table column reference
    - skeleton summary table
- get_org_morphology_3D output
    - get_org_morphology_3D reference

The computation and analysis in this notebook was made possible by Juan Nunez-Iglesias & skan contributors via the [Skan](https://skeleton-analysis.org/stable/) python package

###### *all of the skeleton related column labels will be identical in the other stats functions e.g. make_all_metrics_tables, batch_process_quantification (batch_summary_stats will just include summary metrics of these stats)*


## **INPUT**

### **Setup**

#### 🛑 ✍ **User Input Required**:

Please specify the following information about your data: `raw_img_type`, `data_root_path`, `raw_data_path`, `seg_data_path`, and `quant_data_path`.

In [3]:
# Organelle to observe
org = ER_CH

### USER INPUT REQUIRED ###
# If using the sample data, select which cell type you would like analyze ("neuron_1", "astrocyte", "neuron_2" or "ipsc"):
# If not using the sample data, set sample_data_type to None
sample_data_type = "astrocyte"

# If you are not using the sample data, please edit "USER SPECIFIED" as necessary.

## Define the path to the directory that contains the input image folder.
data_root_path = Path("USER SPECIFIED")

# Specify the file type of your raw data that will be analyzed. Ex) ".czi" or ".tiff"
raw_img_type = "USER SPECIFIED"

## Specify which subfolder that contains the input data and the input data file extension
raw_data_path = data_root_path / "USER SPECIFIED"

## Specify which subfolder that contains the segmentations
seg_data_path = data_root_path / "USER SPECIFIED"

## Specify the output folder to save the quantification outputs if.
## If its not already created, the code below will create it for you
quant_data_path = data_root_path / "USER SPECIFIED"
                
# These are the organelles
org_list = ['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
org_list2 = ['Lipid Droplets',
            'Endoplasmic Reticulum',
            'Golgi Apparatus',
            'Lysosomes',
            'Mitochondria',
            'Peroxisomes']

# Organelle name
org_name = org_list[org]

#### 🏃 **Run code; no user input required**

👓 **FYI**:

- A list of the images included in the raw_data_path folder is printed below for easy reference.
- If the quant_data_path folder does not exist, it will be created now.

In [4]:
# For the sake of remaining within the notebook this function will be placed here for now

def sample_input_quant(cell_type: Union[str, None]) -> tuple[Path, str, Path, Path, Path]:
    """
    automatically sets the necessary paths for sample data if cell_type in the quantification
    notebook is set equal to "neuron_1", "astrocyte", "neuron_2" or "ipsc".
    """
    cell_type_list = ["neuron_1","astrocyte","neuron_2","ipsc"]
    
    if cell_type in cell_type_list:

        sd_fol = Path(os.getcwd()).parents[1] / "sample_data"

        data_root_path = sd_fol /  f"example_{cell_type}"

        # Specify the file type of the sample data
        raw_img_type = ".tiff"

        ## Specify which subfolder that contains the input data and the input data file extension
        raw_data_path = data_root_path / "raw"

        ## Specify the location of the segmentations.
        seg_data_path = sd_fol / "example_quant" / "seg"

        # Where to output the quantification
        quant_data_path = sd_fol / "example_quant" / "quant"

        return data_root_path, raw_img_type, raw_data_path, seg_data_path, quant_data_path
    else:
        raise ValueError('Sample data file type must be "neuron_1", "astrocyte", "neuron_2" or "ipsc"')

In [5]:
# If sample_data_type is set to "neuron_1", "astrocyte", "neuron_2" or "ipsc" then the sample data is used and the directories are set
if sample_data_type != None:
    data_root_path, raw_img_type, raw_data_path, seg_data_path, quant_data_path = sample_input_quant(sample_data_type)

# Create the output directory to save the segmentation outputs in.
if not Path.exists(quant_data_path):
    Path.mkdir(quant_data_path)
    print(f"making {quant_data_path}")

# Create a list of the file paths for each image in the input folder. Select test image path.
raw_img_file_list = list_image_files(raw_data_path,raw_img_type)
pd.set_option('display.max_colwidth', None)
pd.DataFrame({"Image Name":raw_img_file_list})

Unnamed: 0,Image Name
0,c:\Users\redre\Documents\CohenLab\scohen_lab_repo\infer-subc\sample_data\example_astrocyte\raw\05052022_astro_control_2_Linear unmixing_0_cmle.ome.tiff


#### 🛑 ✍ **User Input Required**:
Use the list above to specify which image you wish to analyze:

- `test_img_n`: the index, or number, associated with your image of choice from the list above.

Follow this example's formatting

> ```python
> test_img_n = 5
> ```

In [6]:
#### USER INPUT REQUIRED ###
test_img_n = 0

### **Loading the image and masks**

#### 🏃 **Run code; no user input required**
👓 FYI: This code block reads the image and image metadata into memory. Then, the metadata is printed.

In [7]:
# Read in the image and metadata as an ndarray and dictionary from the test image selected above. 
test_img_name = raw_img_file_list[test_img_n]
img_data,meta_dict = read_czi_image(test_img_name)

# Define some of the metadata features.
channel_names = meta_dict['name']
meta = meta_dict['metadata']['aicsimage']
scale = meta_dict['scale']
channel_axis = meta_dict['channel_axis']
file_path = meta_dict['file_name']

print("Metadata information")
print(f"File path: {file_path}")
for i in list(range(len(channel_names))):
    print(f"Channel {i} name: {channel_names[i]}")
print(f"Scale (ZYX): {scale}")
print(f"Channel axis: {channel_axis}")

Metadata information
File path: c:\Users\redre\Documents\CohenLab\scohen_lab_repo\infer-subc\sample_data\example_astrocyte\raw\05052022_astro_control_2_Linear unmixing_0_cmle.ome.tiff
Channel 0 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:0
Channel 1 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:1
Channel 2 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:2
Channel 3 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:3
Channel 4 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:4
Channel 5 name: 05052022_astro_control_2_Linear unmixing_0_cmle.ome :: Channel:5
Scale (ZYX): (0.396091, 0.079947, 0.079947)
Channel axis: 0


In [8]:
#### USER INPUT REQUIRED ###
# These two lists must have the SAME corresponding items in the same order
org_file_names = ['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
org_channels_ordered = [LD_CH, ER_CH, GOLGI_CH, LYSO_CH, MITO_CH, PEROX_CH]

regions_file_names = ["nuc", "cell"]
mask_name = "cell"
suffix_separator = "-"

#### &#x1F3C3; **Run code; no user input required**

&#x1F453; **FYI:** This code finds the matching segmentation file from the `seg_file_path` folder.

In [9]:
# find file paths for segmentations
all_suffixes = org_file_names + regions_file_names
filez = find_segmentation_tiff_files(file_path, all_suffixes, seg_data_path, suffix_separator)

# read the segmentation and masks/regions files into memory
organelles = [read_tiff_image(filez[org]) for org in org_file_names]

regions = [read_tiff_image(filez[m]) for m in regions_file_names]

# match the intensity channels to the segmentation files
intensities = [img_data[ch] for ch in org_channels_ordered]

# specifiy the mask image
m = regions_file_names.index(mask_name)
mask = regions[m]

# print paths to matching seg files
print("The following matching files were found:")
filez

The following matching files were found:


{'raw': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_astrocyte/raw/05052022_astro_control_2_Linear unmixing_0_cmle.ome.tiff'),
 'LD': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_quant/seg/05052022_astro_control_2_Linear unmixing_0_cmle.ome-LD.tiff'),
 'ER': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_quant/seg/05052022_astro_control_2_Linear unmixing_0_cmle.ome-ER.tiff'),
 'golgi': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_quant/seg/05052022_astro_control_2_Linear unmixing_0_cmle.ome-golgi.tiff'),
 'lyso': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_quant/seg/05052022_astro_control_2_Linear unmixing_0_cmle.ome-lyso.tiff'),
 'mito': WindowsPath('c:/Users/redre/Documents/CohenLab/scohen_lab_repo/infer-subc/sample_data/example_quant/seg/05052

###### The dimensions of the image in the real world will be taken into account as visualization more effectively depicts the cell as it was when the images were taken.

In [10]:
# This code is used to get the real world dimensions of the data !change name of dim variable for consistency!
scale = meta_dict['scale']

## **Creation of the skeleton object**

In [11]:
# This is the organelle segmentation
org_seg = organelles[org_channels_ordered.index(org)]

In [12]:
if org == ER_CH:
    org_seg = (org_seg > 0).astype(np.uint16)

In [13]:
def skel_plus(segmentation: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    ''' A function that generates punctate objects for the round organelle objects that lack a skeleton.

    As of 7/7/24 the original `skeletonize()` function from `skimage` labels the voxels of the skeleton with the same label,
    however the label serves no purpose to us as of now. 
    The purpose of this function is to: 

    1) Establish punctate objects in the center of organelle objects that for some reason lack a skeleton object (if necessary)
    * these objects are found to be round/spherical in every case this happens, so a punctate is appropriate
    2) Return a skeleton with float labels (due to skan requiring this) corresponding to its location in the original segmentation 
    3) Return a boolean skeleton for input in computation
    '''

    t0 = time.time()

    # This is the raw organelle skeleton, some fixing and relabeling has to be done before we can use the skeleton for computation
    skeleton = skeletonize(segmentation.astype(bool)).astype(bool)

    # All of the organelle object labels
    all_lab = set(pd.unique(segmentation.ravel()))

    # Applying the segmentation labels to the skeleton
    lab_skel = _assert_uint16_labels(skeleton * segmentation)

    # Labels present in the skeleton
    skel_lab = set(pd.unique(lab_skel.ravel()))

    t1 = time.time()

    print(f'Part One took {t1 - t0} sec(s)')

    # Checker to see if there are any objects without a skeleton
    if all_lab == skel_lab:

        t2 = time.time()
        print(f'Total Time: {t2 - t0} sec(s)')

        return lab_skel.astype(float), skeleton
    else:
        # gets a list of the missing labels
        mis_lab = all_lab - skel_lab

        t2 = time.time()
        print(f'Part Two took {t2 - t1} sec(s)')

        for label in mis_lab:
            # list of coordinates of the object's voxels
            coord_list = np.nonzero(segmentation == label)

            # The coordinate closest to the middle of the object (due to rounding)
            av_coord = np.round(np.mean(coord_list,axis = 1)).astype(int)

            #checker and result
            if segmentation[tuple(av_coord)] == label:
                lab_skel[tuple(av_coord)] = label
            else:
                print("Apperently the centermost point is not in the object???? :(")
                break

        t3 = time.time()
        print(f'Part Three took {t3 - t2} sec(s)')
        print(f'Total time: {t3 - t0}')
        
        return (lab_skel.astype(float), lab_skel.astype(bool))

In [14]:
org_skel_arr, org_skel_arr2 = skel_plus(org_seg)

Part One took 5.041286945343018 sec(s)
Total Time: 5.041286945343018 sec(s)


###### The computations will be done using the object below of the skeleton class via Skan

In [15]:
import napari
viewer = napari.Viewer()
viewer.add_image(org_seg,
                 scale = scale)

<Image layer 'org_seg' at 0x12b3dc799f0>

In [16]:
org_skel = csr.Skeleton(
    skeleton_image = org_skel_arr,
    spacing = scale,
    value_is_height = False)

## **DEFINE MEASUREMENTS**


### **Branch Table**

In [17]:
# This is a modified version of the summarize function optimized for the workflow

summary = {}
value_is_height = False
ndim = org_skel.coordinates.shape[1]
### using this .indptr method I could single out the nodes even faster
endpoints_src = org_skel.paths.indices[org_skel.paths.indptr[:-1]]
endpoints_dst = org_skel.paths.indices[org_skel.paths.indptr[1:] - 1]

# Because the values of the path points are the organelle object ids (all path points and nodes 
# in a branch should come from the same organelle object)
if not np.any(org_skel.path_stdev()):
    # checker to see if all path points and nodes come from the same object
    summary['skel-obj-id'] = org_skel.path_means().astype(int)
else:
    raise ValueError("at least one branch came from different organelle objects")
summary['node-id-src'] = endpoints_src
summary['node-id-dst'] = endpoints_dst
deg_src = org_skel.degrees[endpoints_src]
deg_dst = org_skel.degrees[endpoints_dst]
summary['deg_src'] = deg_src
summary['deg_dst'] = deg_dst
summary['branch-distance'] = org_skel.path_lengths()
kind = np.full(deg_src.shape, 2)  # default: junction-to-junction
kind[(deg_src == 1) | (deg_dst == 1)] = 1  # tip-junction
kind[(deg_src == 1) & (deg_dst == 1)] = 0  # tip-tip
kind[endpoints_src == endpoints_dst] = 3  # cycle
summary['branch-type'] = kind
for i in range(ndim):  # keep loops separate for best insertion order
    summary[f'image-coord-src-{i}'] = org_skel.coordinates[endpoints_src, i]
for i in range(ndim):
    summary[f'image-coord-dst-{i}'] = org_skel.coordinates[endpoints_dst, i]
coords_real_src = org_skel.coordinates[endpoints_src] * org_skel.spacing
for i in range(ndim):
    summary[f'coord-src-{i}'] = coords_real_src[:, i]
coords_real_dst = org_skel.coordinates[endpoints_dst] * org_skel.spacing
for i in range(ndim):
    summary[f'coord-dst-{i}'] = coords_real_dst[:, i]
summary['euclidean-distance'] = (
        np.sqrt((coords_real_dst - coords_real_src)**2
                @ np.ones(ndim + int(value_is_height)))
        )

summary['str-prop'] = summary['euclidean-distance'] / summary['branch-distance']
branch_table = pd.DataFrame(summary)

### **Node Table**

In [18]:
# A dictionary that reveals the branches a node is referenced in
# If a node is an absolute punctate it will return "NaN"
node2branches = dict()
node2obj = dict()

for branch in range(org_skel.n_paths):
    obj = branch_table.loc[branch]['skel-obj-id']
    for point in org_skel.path(branch):
        if org_skel.degrees[point] != 2:
            node2obj[point] = obj
            try:
                node2branches[point] += [branch]
            except:
                node2branches[point] = [branch]

# the point ids for the points with zero connectivity
abs_punc_ids = np.arange(len(org_skel.degrees))[org_skel.degrees == 0]
abs_punc_obj = (org_skel_arr[org_skel_arr2][abs_punc_ids]).astype(int)

for i,punc_id in enumerate(abs_punc_ids):
    node2branches[punc_id] = []
    node2obj[punc_id] = abs_punc_obj[i]

In [19]:
nodes = list(node2branches)

node_lab = []
base_n = ["Abs Punctate",
        "Endpoint",
        "Path Point"]

if np.max(org_skel.degrees) <= 2:
    node_lab = base_n[0:len(pd.unique(org_skel.degrees))]
else:
    node_lab = base_n
    for i in np.arange(3, np.max(org_skel.degrees) + 1):
        node_lab += [f"{i}-way"]
    
node_table_data = {
    "node-id": nodes,
    "node-type": np.array(node_lab)[org_skel.degrees[nodes]],
    "connectivity": org_skel.degrees[nodes],
    "image-coord-0": org_skel.coordinates[nodes,0],
    "image-coord-1": org_skel.coordinates[nodes,1],
    "image-coord-2": org_skel.coordinates[nodes,2],
    "coord-0": org_skel.coordinates[nodes,0] * scale[0],
    "coord-1": org_skel.coordinates[nodes,1] * scale[1],
    "coord-2": org_skel.coordinates[nodes,2] * scale[2],
    "branch-id(s)": [node2branches[key] for key in nodes],
    'skel-obj-id': [int(node2obj[key]) for key in nodes]
}

node_table = pd.DataFrame(data=node_table_data)

### **Skeleton Object Table**

In [20]:
# This is to prevent the workflow acting as if the absolute punctates are their own skeleton objects
if org_name != 'ER':
    obj_list = list(branch_table.groupby('skel-obj-id').groups) + list(abs_punc_obj)
else:
    obj_list = list(branch_table.groupby('skel-obj-id').groups)


#### This code was originally used to find the main path within a skeleton object, but this was before the one to one correspondence requirement.
#### Because of this a skeleton object can consist of multiple disconnected components which complicates the code, so for now main paths will
#### For now I will also be removing main paths from the visual section because for cells with a huge amount of branches it makes the workflow
#### take longer

# main_branches = branch_table[branch_table['in-main'] == True]

# main_nodes_1 = dict(main_branches[main_branches['src-deg'] == 1].groupby('skeleton-id').groups)
# main_nodes_2 = dict(main_branches[main_branches['dst-deg'] == 1].groupby('skeleton-id').groups)


# main_nodes = dict((obj,[]) for obj in obj_list)

# for obj in obj_list:
#     main_nodes[obj] = []
#     try:
#         offset1 = list(main_nodes_1[obj])
#         for off in offset1:
#             main_nodes[obj] += [tuple((
#                 main_branches['node-id-src'][off],
#                 main_branches['src-deg'][off]))]
#     except:
#         0
#     try:
#         offset2 = list(main_nodes_2[obj])
#         for off in offset2:
#             main_nodes[obj] += [tuple((
#                 main_branches['node-id-dst'][off],
#                 main_branches['dst-deg'][off]))]
#     except:
#         0
#     # Added this because of isolated cycles that are connected to other branches
#     if len(main_nodes[obj]) < 2:
#         try:
#             offset = list(main_nodes_2[obj])
#             main_nodes[obj] += [tuple((
#                 int(main_branches['node-id-src'][offset].iloc[0]),
#                 int(main_branches['src-deg'][offset].iloc[0])))]
#         except:
#             0
#         try:
#             offset = list(main_nodes_1[obj])
#             main_nodes[obj] += [tuple((
#                 int(main_branches['node-id-dst'][offset].iloc[0]),
#                 int(main_branches['dst-deg'][offset].iloc[0])))]
#         except:
#             0
#     main_nodes[obj] = sorted(main_nodes[obj])
#     # in the case of a isolated cycle that is its own object
#     if len(main_nodes[obj]) < 2:
#         main_nodes[obj] = list(main_nodes[obj]) * 2

In [21]:
# the highest value of length where a type 0 branch will still be considered punctate
p_threshold = min(scale) * 2

#dictionary that keeps track of the number of branches per each object
objfreq = dict(branch_table.groupby("skel-obj-id").count()['branch-distance'])
#dictionary that keeps track of the total length per object
sumobj = dict(branch_table.groupby("skel-obj-id").sum()['branch-distance'])
#dictonary that keeps of the average branch length per object
aveobj = dict(branch_table.groupby("skel-obj-id").mean()['branch-distance'])

obj_class = dict()
obj_class_n = dict()

for obj in objfreq:
    if objfreq[obj] == 1:
        if sumobj[obj] > p_threshold:
            obj_class[obj] = "Rod"
            obj_class_n[obj] = 1
        else:
            obj_class[obj] = "Punctate"
            obj_class_n[obj] = 0
    else:
        obj_class[obj] = "Network"
        obj_class_n[obj] = 2

In [22]:
# for some reason this was faster
skel_obj2branch = dict()
skel_obj2_mean_str = dict(branch_table.groupby('skel-obj-id').mean()['str-prop'])

for branch in range(org_skel.n_paths):
    obj = branch_table['skel-obj-id'][branch]
    try:
      skel_obj2branch[obj] += [branch]
    except:
       skel_obj2branch[obj] = [branch]

if org_name != 'ER':
  for obj in abs_punc_obj:
    skel_obj2branch[obj] = []
    obj_class[obj] = "Punctate"
    obj_class_n[obj] = 0
    skel_obj2_mean_str[obj] = "NaN"
    objfreq[obj] = 0
    aveobj[obj] = 0
    sumobj[obj] = 0

In [23]:
brh_tcounts = dict()
brh_tlist = dict()
for i in range(4):
    brh_tcounts[i] = dict((obj,0) for obj in obj_list)
    brh_tlist[i] = dict((obj,[]) for obj in obj_list)

branch_type_tables = [branch_table[np.array(branch_table["branch-type"]) == i] for i in np.arange(4)]

for i in range(4):
    groupedby_skel = branch_type_tables[i].groupby('skel-obj-id')
    for skel_obj in obj_list:
        try:
            # Although they get the branch row indexes this is fine because the branch row indexes
            # are the branch ids :)
            type_list = list(groupedby_skel.groups[skel_obj])
            brh_tlist[i][skel_obj] = type_list
            brh_tcounts[i][skel_obj] = len(type_list)
        except:
            None

In [24]:
# Code to find the highest node degree in the skeleton object
max_deg = dict([(obj,0) for obj in obj_list])

obj2brh = branch_table.groupby('skel-obj-id').groups
for obj in obj2brh:
    for branch in obj2brh[obj]:
        brh_max = np.max(org_skel.degrees[org_skel.path(branch)])
        if brh_max > max_deg[obj]:
            max_deg[obj] = brh_max

In [25]:
skel_obj2nodes = dict([(obj,[]) for obj in obj_list])
ep_count = dict([(obj,0) for obj in obj_list])
jn_count = dict([(obj,0) for obj in obj_list])
jn_table = node_table[node_table['connectivity'] > 2]
ave_jn_deg = dict([(obj,"NaN") for obj in obj_list])

for obj in obj_list:
    try:
        skel_obj2nodes[obj] = list(node_table['node-id'][node_table.groupby("skel-obj-id").groups[obj]])
    except:
        None
    try:
        ep_count[obj] = node_table[node_table['connectivity'] == 1].groupby('skel-obj-id').count()['connectivity'][obj]
    except:
        None
    try:
        jn_count[obj] = node_table[node_table['connectivity'] > 2].groupby('skel-obj-id').count()['connectivity'][obj]
        if jn_count[obj] > 0:
            ave_jn_deg[obj] = np.mean(jn_table['connectivity'][jn_table.groupby('skel-obj-id').groups[obj]])
    except:
        None
        
skel_table_data = {
        "skel-obj-id": obj_list,
        "skel-type": [obj_class[obj] for obj in obj_list],
        "skel-type-num": [obj_class_n[obj] for obj in obj_list],
        "brh-count": [objfreq[obj] for obj in obj_list],
        "branch-id(s)": [skel_obj2branch[obj] for obj in obj_list],
        "min-brh-length": [branch_table.groupby('skel-obj-id').min()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
        "max-brh-length": [branch_table.groupby('skel-obj-id').max()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
        "ave-brh-length": [aveobj[obj] for obj in obj_list],
        # the requirement is changed to > 1 because the sd of a single value is undefined
        "sd-brh-length": [branch_table.groupby('skel-obj-id').std()['branch-distance'][obj] if objfreq[obj] > 1 else "NaN" for obj in obj_list],
        "med-brh-length" : [branch_table.groupby('skel-obj-id').median()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
        "total-length": [sumobj[obj] for obj in obj_list],
        "brh-type-0-tot": [brh_tcounts[0][obj] for obj in obj_list],
        "brh-type-0-id": [brh_tlist[0][obj] for obj in obj_list],
        "brh-type-1-tot": [brh_tcounts[1][obj] for obj in obj_list],
        "brh-type-1-ids": [brh_tlist[1][obj] for obj in obj_list],
        "brh-type-2-tot": [brh_tcounts[2][obj] for obj in obj_list],
        "brh-type-2-ids": [brh_tlist[2][obj] for obj in obj_list],
        "brh-type-3-tot": [brh_tcounts[3][obj] for obj in obj_list],
        "brh-type-3-ids": [brh_tlist[3][obj] for obj in obj_list],
        "node-count": [len(skel_obj2nodes[obj]) for obj in obj_list],
        'ep-count': [ep_count[obj] for obj in obj_list],
        'jn-count' : [jn_count[obj] for obj in obj_list],
        'ave-jn-deg' : [ave_jn_deg[obj] for obj in obj_list],
        'max-deg' : [max_deg[obj] for obj in obj_list],
        "node-id(s)": [skel_obj2nodes[obj] for obj in obj_list],
        "mean-brh-str": [skel_obj2_mean_str[obj] for obj in obj_list]}
    

skel_table = pd.DataFrame(data=skel_table_data)

### **Skeleton Summary Table**

In [26]:
skel_sum_table_data = {

    'total-length': np.sum(org_skel.path_lengths()),
    'point-count': org_skel.graph.shape[0],
    # -- Skeleton Object section -- #

    'skel-obj-count': len(skel_table),
    'punc-count': np.sum(skel_table['skel-type-num'] == 0),
    'rod-count': np.sum(skel_table['skel-type-num'] == 1),
    'net-count': np.sum(skel_table['skel-type-num'] == 2),
    'prop-obj-punc': np.sum(skel_table['skel-type-num'] == 0) / len(skel_table),
    'prop-obj-rod': np.sum(skel_table['skel-type-num'] == 1) / len(skel_table),
    'prop-obj-net': np.sum(skel_table['skel-type-num'] == 2) / len(skel_table),
    'punc-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 0]['total-length']),
    'rod-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 1]['total-length']),
    'net-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 2]['total-length']),
    'prop-len-punc': np.sum(skel_table[skel_table['skel-type-num'] == 0]['total-length']) / np.sum(org_skel.path_lengths()),
    'prop-len-rod': np.sum(skel_table[skel_table['skel-type-num'] == 1]['total-length']) / np.sum(org_skel.path_lengths()),
    'prop-len-net': np.sum(skel_table[skel_table['skel-type-num'] == 2]['total-length']) / np.sum(org_skel.path_lengths()),
    'ave-len-obj': np.mean(skel_table['total-length']),
    'min-len-obj': np.min(skel_table['total-length']),
    'max-len-obj': np.max(skel_table['total-length']),
    'ave-brh-obj': np.mean(skel_table['brh-count']),
    'min-brh-obj': np.min(skel_table['brh-count']),
    'max-brh-obj': np.max(skel_table['brh-count']),

    # -- Branch section -- #

    'brh-count': org_skel.n_paths,
    'min-brh-len': np.min(org_skel.path_lengths()),
    'max-brh-len': np.max(org_skel.path_lengths()),
    'ave-brh-len': np.mean(org_skel.path_lengths()),
    'type-0-brhs': len(branch_table[branch_table['branch-type'] == 0]),
    'type-1-brhs': len(branch_table[branch_table['branch-type'] == 1]),
    'type-2-brhs': len(branch_table[branch_table['branch-type'] == 2]),
    'type-3-brhs': len(branch_table[branch_table['branch-type'] == 3]),
    'prop-brh-t0': len(branch_table[branch_table['branch-type'] == 0]) / len(branch_table),
    'prop-brh-t1': len(branch_table[branch_table['branch-type'] == 1]) / len(branch_table),
    'prop-brh-t2': len(branch_table[branch_table['branch-type'] == 2]) / len(branch_table),
    'prop-brh-t3': len(branch_table[branch_table['branch-type'] == 3]) / len(branch_table),
    't0-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 0]),
    't1-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 1]),
    't2-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 2]),
    't3-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 3]),
    'prop-len-t0': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 0]) / np.sum(org_skel.path_lengths()),
    'prop-len-t1': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 1]) / np.sum(org_skel.path_lengths()),
    'prop-len-t2': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 2]) / np.sum(org_skel.path_lengths()),
    'prop-len-t3': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 3]) / np.sum(org_skel.path_lengths()),
    
    # -- Node section -- #

    'node-count': len(node_table),
    'ave-deg-nodes': np.mean(node_table['connectivity']),
    'ep-count': np.sum(node_table['connectivity'] == 1),
    'jn-count': np.sum(node_table['connectivity'] > 2),
    'ap-count': np.sum(node_table['connectivity'] == 0),
    'prop-ep': np.sum(node_table['connectivity'] == 1) / len(node_table),
    'prop-jn': np.sum(node_table['connectivity'] > 2) / len(node_table),
    'prop-ap': np.sum(node_table['connectivity'] == 0) / len(node_table)
}

skel_sum_table = pd.DataFrame(data=skel_sum_table_data, index = [org_list[org]])

## **DEFINE FUNCTIONS**

### **Define Skeleton Metrics Function**

In [27]:
def skeleton_metrics(org_seg: np.ndarray,
                     org_name: str,
                     scale: Union[Tuple, None],
                     output_all_tables: bool = False):
    
    '''
    The function that returns the skeleton object table in the skeletonization portion of the workflow.
    The branch, node and summary table can also be returned if the user sets `output_all_tables` to true
    
    Parameters:

    * org_seg: the labeled organelle segmentation (int ndarray)
    * org_name: the shorted name of the organelle (str)
    * scale: the real world dimensions from the metadata of the image (Tuple, ndarray or list of 3 floats)
    * output_all_tables: an option that allows you to return the branch, node, skeleton object and skeleton summary table
    in that order (bool)

    Output:

    if output_all_tables is False (default)

    * skeleton object table (pandas table)

    if output_all_tables is False (default)

    * branch table (pandas table)
    * node table (pandas table)
    * skeleton object table (pandas table)
    * skeleton summary table (pandas table)
    
    '''

    org_skel_arr, org_skel_arr2 = skel_plus(org_seg)

    org_skel = csr.Skeleton(
    skeleton_image = org_skel_arr,
    spacing = scale,
    value_is_height = False
    )

    ##############################################################################
    # BRANCH TABLE
    ##############################################################################

    # This is a modified version of the summarize() function from skan optimized for skeleton metrics

    # This is a modified version of the summarize function optimized for the workflow

    summary = {}
    value_is_height = False
    ndim = org_skel.coordinates.shape[1]
    endpoints_src = org_skel.paths.indices[org_skel.paths.indptr[:-1]]
    endpoints_dst = org_skel.paths.indices[org_skel.paths.indptr[1:] - 1]

    # Because the values of the path points are the organelle object ids (all path points and nodes 
    # in a branch should come from the same organelle object)
    if not np.any(org_skel.path_stdev()):
        # checker to see if all path points and nodes come from the same object
        summary['skel-obj-id'] = org_skel.path_means().astype(int)
    else:
        raise ValueError("at least one branch came from different organelle objects")
    summary['node-id-src'] = endpoints_src
    summary['node-id-dst'] = endpoints_dst
    deg_src = org_skel.degrees[endpoints_src]
    deg_dst = org_skel.degrees[endpoints_dst]
    summary['deg_src'] = deg_src
    summary['deg_dst'] = deg_dst
    summary['branch-distance'] = org_skel.path_lengths()
    kind = np.full(deg_src.shape, 2)  # default: junction-to-junction
    kind[(deg_src == 1) | (deg_dst == 1)] = 1  # tip-junction
    kind[(deg_src == 1) & (deg_dst == 1)] = 0  # tip-tip
    kind[endpoints_src == endpoints_dst] = 3  # cycle
    summary['branch-type'] = kind
    for i in range(ndim):  # keep loops separate for best insertion order
        summary[f'image-coord-src-{i}'] = org_skel.coordinates[endpoints_src, i]
    for i in range(ndim):
        summary[f'image-coord-dst-{i}'] = org_skel.coordinates[endpoints_dst, i]
    coords_real_src = org_skel.coordinates[endpoints_src] * org_skel.spacing
    for i in range(ndim):
        summary[f'coord-src-{i}'] = coords_real_src[:, i]
    coords_real_dst = org_skel.coordinates[endpoints_dst] * org_skel.spacing
    for i in range(ndim):
        summary[f'coord-dst-{i}'] = coords_real_dst[:, i]
    summary['euclidean-distance'] = (
            np.sqrt((coords_real_dst - coords_real_src)**2
                    @ np.ones(ndim + int(value_is_height)))
            )

    summary['str-prop'] = summary['euclidean-distance'] / summary['branch-distance']
    branch_table = pd.DataFrame(summary)

    ##############################################################################
    # NODE TABLE
    ##############################################################################

   # A dictionary that reveals the branches a node is referenced in
    # If a node is an absolute punctate it will return "NaN"
    node2branches = dict()
    node2obj = dict()

    for branch in range(org_skel.n_paths):
        obj = branch_table.loc[branch]['skel-obj-id']
        for point in org_skel.path(branch):
            if org_skel.degrees[point] != 2:
                node2obj[point] = obj
                try:
                    node2branches[point] += [branch]
                except:
                    node2branches[point] = [branch]

    # the point ids for the points with zero connectivity
    abs_punc_ids = np.arange(len(org_skel.degrees))[org_skel.degrees == 0]
    abs_punc_obj = (org_skel_arr[org_skel_arr2][abs_punc_ids]).astype(int)

    for i,punc_id in enumerate(abs_punc_ids):
        node2branches[punc_id] = []
        node2obj[punc_id] = abs_punc_obj[i]
    
    # note that if a node has a duplicate branch listed, the branch was a cycle
    nodes = list(node2branches)

    node_lab = []
    base_n = ["Abs Punctate",
            "Endpoint",
            "Path Point"]

    if np.max(org_skel.degrees) <= 2:
        node_lab = base_n[0:len(pd.unique(org_skel.degrees))]
    else:
        node_lab = base_n
        for i in np.arange(3, np.max(org_skel.degrees) + 1):
            node_lab += [f"{i}-way"]
        
    node_table_data = {
        "node-id": nodes,
        "node-type": np.array(node_lab)[org_skel.degrees[nodes]],
        "connectivity": org_skel.degrees[nodes],
        "image-coord-0": org_skel.coordinates[nodes,0],
        "image-coord-1": org_skel.coordinates[nodes,1],
        "image-coord-2": org_skel.coordinates[nodes,2],
        "coord-0": org_skel.coordinates[nodes,0] * scale[0],
        "coord-1": org_skel.coordinates[nodes,1] * scale[1],
        "coord-2": org_skel.coordinates[nodes,2] * scale[2],
        "branch-id(s)": [node2branches[key] for key in nodes],
        'skel-obj-id': [int(node2obj[key]) for key in nodes]
    }

    node_table = pd.DataFrame(data=node_table_data)

    ##############################################################################
    # SKELETON OBJECT TABLE
    ##############################################################################

    # This is to prevent the workflow acting as if the absolute punctates are their own skeleton objects
    if org_name != 'ER':
        obj_list = list(branch_table.groupby('skel-obj-id').groups) + list(abs_punc_obj)
    else:
        obj_list = list(branch_table.groupby('skel-obj-id').groups)


    # the highest value of length where a type 0 branch will still be considered punctate
    p_threshold = min(scale) * 2

    #dictionary that keeps track of the number of branches per each object
    objfreq = dict(branch_table.groupby("skel-obj-id").count()['branch-distance'])
    #dictionary that keeps track of the total length per object
    sumobj = dict(branch_table.groupby("skel-obj-id").sum()['branch-distance'])
    #dictonary that keeps of the average branch length per object
    aveobj = dict(branch_table.groupby("skel-obj-id").mean()['branch-distance'])

    obj_class = dict()
    obj_class_n = dict()

    for obj in objfreq:
        if objfreq[obj] == 1:
            if sumobj[obj] > p_threshold:
                obj_class[obj] = "Rod"
                obj_class_n[obj] = 1
            else:
                obj_class[obj] = "Punctate"
                obj_class_n[obj] = 0
        else:
            obj_class[obj] = "Network"
            obj_class_n[obj] = 2

    # for some reason this was faster
    skel_obj2branch = dict()
    skel_obj2_mean_str = dict(branch_table.groupby('skel-obj-id').mean()['str-prop'])

    for branch in range(org_skel.n_paths):
        obj = branch_table['skel-obj-id'][branch]
        try:
            skel_obj2branch[obj] += [branch]
        except:
            skel_obj2branch[obj] = [branch]

    if org_name != 'ER':
        for obj in abs_punc_obj:
            skel_obj2branch[obj] = []
            obj_class[obj] = "Punctate"
            obj_class_n[obj] = 0
            skel_obj2_mean_str[obj] = "NaN"
            objfreq[obj] = 0
            aveobj[obj] = 0
            sumobj[obj] = 0

    
    brh_tcounts = dict()
    brh_tlist = dict()
    for i in range(4):
        brh_tcounts[i] = dict((obj,0) for obj in obj_list)
        brh_tlist[i] = dict((obj,[]) for obj in obj_list)

    branch_type_tables = [branch_table[np.array(branch_table["branch-type"]) == i] for i in np.arange(4)]

    for i in range(4):
        groupedby_skel = branch_type_tables[i].groupby('skel-obj-id')
        for skel_obj in obj_list:
            try:
                # Although they get the branch row indexes this is fine because the branch row indexes
                # are the branch ids :)
                type_list = list(groupedby_skel.groups[skel_obj])
                brh_tlist[i][skel_obj] = type_list
                brh_tcounts[i][skel_obj] = len(type_list)
            except:
                None

   # Code to find the highest node degree in the skeleton object
    max_deg = dict([(obj,0) for obj in obj_list])

    obj2brh = branch_table.groupby('skel-obj-id').groups
    for obj in obj2brh:
        for branch in obj2brh[obj]:
            brh_max = np.max(org_skel.degrees[org_skel.path(branch)])
            if brh_max > max_deg[obj]:
                max_deg[obj] = brh_max
            

    skel_obj2nodes = dict([(obj,[]) for obj in obj_list])
    ep_count = dict([(obj,0) for obj in obj_list])
    jn_count = dict([(obj,0) for obj in obj_list])
    jn_table = node_table[node_table['connectivity'] > 2]
    ave_jn_deg = dict([(obj,"NaN") for obj in obj_list])

    for obj in obj_list:
        try:
            skel_obj2nodes[obj] = list(node_table['node-id'][node_table.groupby("skel-obj-id").groups[obj]])
        except:
            None
        try:
            ep_count[obj] = node_table[node_table['connectivity'] == 1].groupby('skel-obj-id').count()['connectivity'][obj]
        except:
            None
        try:
            jn_count[obj] = node_table[node_table['connectivity'] > 2].groupby('skel-obj-id').count()['connectivity'][obj]
            if jn_count[obj] > 0:
                ave_jn_deg[obj] = np.mean(jn_table['connectivity'][jn_table.groupby('skel-obj-id').groups[obj]])
        except:
            None
            
    skel_table_data = {
            "skel-obj-id": obj_list,
            "skel-type": [obj_class[obj] for obj in obj_list],
            "skel-type-num": [obj_class_n[obj] for obj in obj_list],
            "brh-count": [objfreq[obj] for obj in obj_list],
            "branch-id(s)": [skel_obj2branch[obj] for obj in obj_list],
            "min-brh-length": [branch_table.groupby('skel-obj-id').min()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
            "max-brh-length": [branch_table.groupby('skel-obj-id').max()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
            "ave-brh-length": [aveobj[obj] for obj in obj_list],
            # the requirement is changed to > 1 because the sd of a single value is undefined
            "sd-brh-length": [branch_table.groupby('skel-obj-id').std()['branch-distance'][obj] if objfreq[obj] > 1 else "NaN" for obj in obj_list],
            "med-brh-length" : [branch_table.groupby('skel-obj-id').median()['branch-distance'][obj] if objfreq[obj] != 0 else "NaN" for obj in obj_list],
            "total-length": [sumobj[obj] for obj in obj_list],
            "brh-type-0-tot": [brh_tcounts[0][obj] for obj in obj_list],
            "brh-type-0-id": [brh_tlist[0][obj] for obj in obj_list],
            "brh-type-1-tot": [brh_tcounts[1][obj] for obj in obj_list],
            "brh-type-1-ids": [brh_tlist[1][obj] for obj in obj_list],
            "brh-type-2-tot": [brh_tcounts[2][obj] for obj in obj_list],
            "brh-type-2-ids": [brh_tlist[2][obj] for obj in obj_list],
            "brh-type-3-tot": [brh_tcounts[3][obj] for obj in obj_list],
            "brh-type-3-ids": [brh_tlist[3][obj] for obj in obj_list],
            "node-count": [len(skel_obj2nodes[obj]) for obj in obj_list],
            'ep-count': [ep_count[obj] for obj in obj_list],
            'jn-count' : [jn_count[obj] for obj in obj_list],
            'ave-jn-deg' : [ave_jn_deg[obj] for obj in obj_list],
            'max-deg' : [max_deg[obj] for obj in obj_list],
            "node-id(s)": [skel_obj2nodes[obj] for obj in obj_list],
            "mean-brh-str": [skel_obj2_mean_str[obj] for obj in obj_list]}

    skel_table = pd.DataFrame(data=skel_table_data)

    if not output_all_tables:
        return skel_table.sort_values('skel-obj-id')

    ##############################################################################
    # SKELETON SUMMARY TABLE
    ##############################################################################

    skel_sum_table_data = {

    'total-length': np.sum(org_skel.path_lengths()),
    'point-count': org_skel.graph.shape[0],

    # -- Skeleton Object section -- #

    'skel-obj-count': len(skel_table),
    'punc-count': np.sum(skel_table['skel-type-num'] == 0),
    'rod-count': np.sum(skel_table['skel-type-num'] == 1),
    'net-count': np.sum(skel_table['skel-type-num'] == 2),
    'prop-obj-punc': np.sum(skel_table['skel-type-num'] == 0) / len(skel_table),
    'prop-obj-rod': np.sum(skel_table['skel-type-num'] == 1) / len(skel_table),
    'prop-obj-net': np.sum(skel_table['skel-type-num'] == 2) / len(skel_table),
    'punc-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 0]['total-length']),
    'rod-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 1]['total-length']),
    'net-tot-len': np.sum(skel_table[skel_table['skel-type-num'] == 2]['total-length']),
    'prop-len-punc': np.sum(skel_table[skel_table['skel-type-num'] == 0]['total-length']) / np.sum(org_skel.path_lengths()),
    'prop-len-rod': np.sum(skel_table[skel_table['skel-type-num'] == 1]['total-length']) / np.sum(org_skel.path_lengths()),
    'prop-len-net': np.sum(skel_table[skel_table['skel-type-num'] == 2]['total-length']) / np.sum(org_skel.path_lengths()),
    'ave-len-obj': np.mean(skel_table['total-length']),
    'min-len-obj': np.min(skel_table['total-length']),
    'max-len-obj': np.max(skel_table['total-length']),
    'ave-brh-obj': np.mean(skel_table['brh-count']),
    'min-brh-obj': np.min(skel_table['brh-count']),
    'max-brh-obj': np.max(skel_table['brh-count']),

    # -- Branch section -- #

    'brh-count': org_skel.n_paths,
    'min-brh-len': np.min(org_skel.path_lengths()),
    'max-brh-len': np.max(org_skel.path_lengths()),
    'ave-brh-len': np.mean(org_skel.path_lengths()),
    'type-0-brhs': len(branch_table[branch_table['branch-type'] == 0]),
    'type-1-brhs': len(branch_table[branch_table['branch-type'] == 1]),
    'type-2-brhs': len(branch_table[branch_table['branch-type'] == 2]),
    'type-3-brhs': len(branch_table[branch_table['branch-type'] == 3]),
    'prop-brh-t0': len(branch_table[branch_table['branch-type'] == 0]) / len(branch_table),
    'prop-brh-t1': len(branch_table[branch_table['branch-type'] == 1]) / len(branch_table),
    'prop-brh-t2': len(branch_table[branch_table['branch-type'] == 2]) / len(branch_table),
    'prop-brh-t3': len(branch_table[branch_table['branch-type'] == 3]) / len(branch_table),
    't0-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 0]),
    't1-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 1]),
    't2-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 2]),
    't3-brh-len': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 3]),
    'prop-len-t0': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 0]) / np.sum(org_skel.path_lengths()),
    'prop-len-t1': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 1]) / np.sum(org_skel.path_lengths()),
    'prop-len-t2': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 2]) / np.sum(org_skel.path_lengths()),
    'prop-len-t3': np.sum(org_skel.path_lengths()[branch_table['branch-type'] == 3]) / np.sum(org_skel.path_lengths()),
    
    # -- Node section -- #

    'node-count': len(node_table),
    'ave-deg-nodes': np.mean(node_table['connectivity']),
    'ep-count': np.sum(node_table['connectivity'] == 1),
    'jn-count': np.sum(node_table['connectivity'] > 2),
    'ap-count': np.sum(node_table['connectivity'] == 0),
    'prop-ep': np.sum(node_table['connectivity'] == 1) / len(node_table),
    'prop-jn': np.sum(node_table['connectivity'] > 2) / len(node_table),
    'prop-ap': np.sum(node_table['connectivity'] == 0) / len(node_table)
    }

    skel_sum_table = pd.DataFrame(data=skel_sum_table_data, index = [org_name])

    return branch_table, node_table.sort_values('node-id'), skel_table.sort_values('skel-obj-id'), skel_sum_table

In [28]:
_branch_table, _node_table, _skel_table, _skel_sum_table = skeleton_metrics(organelles[org_channels_ordered.index(org)],
                org_list[org_channels_ordered.index(org)],
                scale,
                output_all_tables = True)

Part One took 10.558670282363892 sec(s)
Total Time: 10.558670282363892 sec(s)


### **Update to `get_morphology_metrics`**

In [29]:
def _get_morphology_metrics(segmentation_img: np.ndarray, 
                           seg_name: str, 
                           intensity_img, 
                           mask: np.ndarray, 
                           mask_name: str,
                           scale: Union[tuple, None]=None,
                           skel_met: bool = False):
    """
    Parameters
    ------------
    segmentation_img:
        an np.ndarray of segmented objects 
    seg_name: str
        a name or nickname (usually the segmentation file suffix) of the object being measured; this will be used for record keeping in the output table
    intensity_img:
        a single-channel np.ndarray contain gray scale values from the "raw" image the segmentation is based on; this image should be the same shape as the segmentation file
    mask:
        a binary np.ndarray mask of the area to measure from; this image should be the same shape as the segmentation file
    scale: tuple, optional
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    skel_met:
        a boolean that determines whether to include skeleton metrics


    Regionprops measurements:
    ------------------------
    'label',
    'centroid',
    'bbox',
    'area',
    'equivalent_diameter',
    'extent',
    'euler_number',
    'solidity',
    'axis_major_length',
    'min_intensity',
    'max_intensity',
    'mean_intensity'

    Additional measurements:
    -----------------------
    'standard_deviation_intensity',
    'surface_area',
    'SA_to_volume_ratio`

    Skeleton measurements (optional):
    ------------------------
    'skel-obj-id',
    'skel-type',
    'skel-brh-count',
    'skel-min-brh-length',
    'skel-max-brh-length',
    'skel-ave-brh-length',
    'skel-sd-brh-length',
    'skel-med-brh-length',
    'skel-total-length',
    'skel-node-count',
    'skel-ep-count',
    'skel-jn-count',
    'skel-ave-jn-deg',
    'skel-max-deg',
    'skel-mean-brh-str'

    Returns
    -------------
    pandas dataframe of containing regionprops measurements (columns) for each object in the segmentation image (rows) and the regionprops object
    
    """
    # dealing with numerous solidity warning from regionprops
    warnings.simplefilter("ignore")

    ###################################################
    ## MASK THE ORGANELLE OBJECTS THAT WILL BE MEASURED
    ###################################################
    input_labels = _assert_uint16_labels(segmentation_img)
    input_labels = apply_mask(input_labels, mask)

    ##########################################
    ## CREATE LIST OF REGIONPROPS MEASUREMENTS
    ##########################################
    # start with LABEL
    properties = ["label", "centroid", "bbox", "area", 
                  "equivalent_diameter", "extent", "euler_number", "solidity", "axis_major_length",
                  "min_intensity", "max_intensity", "mean_intensity"]

    #######################
    ## ADD EXTRA PROPERTIES
    #######################
    def standard_deviation_intensity(region, intensities):
        return np.std(intensities[region])

    extra_properties = [standard_deviation_intensity]

    ##################
    ## RUN REGIONPROPS
    ##################
    props = regionprops_table(input_labels, 
                           intensity_image=intensity_img, 
                           properties=properties,
                           extra_properties=extra_properties,
                           spacing=scale)
    
    # measure the mask volume as well for easier normalization in downstream functions
    mask_vol = regionprops_table(mask,properties=["area"], spacing=scale)['area'][0]

    props_table = pd.DataFrame(props)

    ##################################################################
    ## RUN SURFACE AREA FUNCTION SEPARATELY AND APPEND THE PROPS_TABLE
    ##################################################################
    surface_area_tab = pd.DataFrame(surface_area_from_props(input_labels, props, scale))

    #############################################
    ## RENAME AND ADD ADDITIONAL METADATA COLUMNS
    #############################################
    props_table.insert(0, "object", seg_name)
    props_table.rename(columns={"area": "volume"}, inplace=True)

    if scale is not None:
        round_scale = (round(scale[0], 4), round(scale[1], 4), round(scale[2], 4))
        props_table.insert(props_table.columns.get_loc('label') + 1, column="scale", value=f"{round_scale}")
    else: 
        props_table.insert(props_table.columns.get_loc('label') + 1, column="scale", value=f"{tuple(np.ones(segmentation_img.ndim))}") 

    props_table.insert(props_table.columns.get_loc('volume') + 1, "surface_area", surface_area_tab)
    props_table.insert(props_table.columns.get_loc('surface_area') + 1, "SA_to_volume_ratio", props_table["surface_area"].div(props_table["volume"]))
    props_table[f"{mask_name}_volume"] = mask_vol

    ################################################################
    ## ADD SKELETONIZATION OPTION FOR MEASURING LENGTH AND BRANCHING
    ################################################################
    if skel_met:
        skel_table = skeleton_metrics(input_labels.astype(float),
                 seg_name,
                 scale)
        
        skel_tab = skel_table[['skel-obj-id',
                                'skel-type',
                                'brh-count',
                                'min-brh-length',
                                'max-brh-length',
                                'ave-brh-length',
                                'sd-brh-length',
                                'med-brh-length',
                                'total-length',
                                'node-count',
                                'ep-count',
                                'jn-count',
                                'ave-jn-deg',
                                'max-deg',
                                'mean-brh-str']]

        props_table = props_table.merge(skel_tab, left_on = 'label', right_on = 'skel-obj-id').drop('skel-obj-id', axis = 1)

        props_table.rename(columns = dict([(name, 'skel-' + name) for name in skel_tab.columns[2:]]), inplace = True)

    # print this statement to let user known of suppressed warnings
    if Warning: print(f"Warning(s) suppressed while quantifying {seg_name}. See 'method_morphology.ipynb' notebook for more details.")

    return props_table

In [30]:
##################################
###################################
# Quantifying organelle morphology
##################################
##################################

# quantify the morphology of one or more organelle from one cell
def get_org_morphology(source_file_path: str,
                         list_obj_names: List[str],
                         list_obj_segs: List[np.ndarray],
                         list_intensity_img: List[np.ndarray],
                         list_region_names: List[str],
                         list_region_segs: List[np.ndarray],
                         mask_name: str,
                         scale: Union[tuple,None] = None):
    """
    Measure the amount, size, and shape of multiple organelles from a single cell

    Parameters:
    ----------
    source_file: str
        file path; this is used for recorder keeping of the file name in the output data tables
    list_obj_names: List[str]
        a list of object names (strings) that will be measured; this should match the order in list_obj_segs
    list_obj_segs: List[np.ndarray]
        a list of 3D (ZYX) segmentation np.ndarrays that will be measured per cell; the order should match the list_obj_names 
    list_intensity_img: List[np.ndarray]
        a list of 3D (ZYX) grayscale np.ndarrays that will be used to measure fluoresence intensity in each region and object
    list_region_names: List[str]
        a list of region names (strings); these should include the mask (entire region being measured - usually the cell) 
        and other sub-mask regions from which we can meausure the objects in (ex - nucleus, neurites, soma, etc.). It should 
        also include the centering object used when created the XY distribution bins.
        The order should match the list_region_segs
    list_region_segs: List[np.ndarray]
        a list of 3D (ZYX) binary np.ndarrays of the region masks; the order should match the list_region_names.
    mask: str
        a str of which region name (contained in the list_region_names list) should be used as the main mask (e.g., cell mask)
    scale: Union[tuple,None] = None
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)

    Returns:
    ----------
    Dataframe of measurements of organelle morphology

    """
    print(f"Quantifying organelle morphology from {source_file_path}.")

    # select the mask from the region list
    mask = list_region_segs[list_region_names.index(mask_name)]
    
    # empty list to collect a morphology data for each organelle
    org_tabs = []

    # loop through the list of organelles and run the get_morphology_metrics function
    for j, target in enumerate(list_obj_names):
        # select intensity image
        org_img = list_intensity_img[j]  
        
        # select segmentation and if ER, ensure it is only one object
        if target == 'ER':
            org_obj = (list_obj_segs[j] > 0).astype(np.uint16)  
        else:
            org_obj = list_obj_segs[j]
        
        # run get_morphology_metrics function to output a table of measurements
        org_metrics = _get_morphology_metrics(segmentation_img=org_obj, 
                                            seg_name=target,
                                            intensity_img=org_img, 
                                            mask=mask,
                                            mask_name=mask_name,
                                            scale=scale,
                                            skel_met = skel_met)

        # add table to list above
        org_tabs.append(org_metrics)

    # combine the lists for each organelle into one table
    final_org_tab = pd.concat(org_tabs, ignore_index=True)

    # add a new column to list the name of the image these data are derived from 
    final_org_tab.insert(loc=0,column='image_name',value=source_file_path.stem)
    

    return final_org_tab

In [31]:
### OLD FUNCTION
# def _get_org_morphology_3D(segmentation_img: np.ndarray, 
#                            seg_name: str, 
#                            intensity_img, 
#                            mask: np.ndarray, 
#                            scale: Union[tuple, None]=None,
#                            skel_met: bool = False):
#     """
#     Parameters
#     ------------
#     segmentation_img:
#         a 3D (ZYX) np.ndarray of segmented objects 
#     seg_name: str
#         a name or nickname of the object being measured; this will be used for record keeping in the output table
#     intensity_img:
#         a 3D (ZYX) np.ndarray contain gray scale values from the "raw" image the segmentation is based on )single channel)
#     mask:
#         a 3D (ZYX) binary np.ndarray mask of the area to measure from
#     scale: tuple, optional
#         a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
#     skel_met:
#         a boolean regarding whether to include the skeleton metrics


#     Regionprops measurements:
#     ------------------------
#     ['label',
#     'centroid',
#     'bbox',
#     'area',
#     'equivalent_diameter',
#     'extent',
#     'feret_diameter_max',
#     'euler_number',
#     'convex_area',
#     'solidity',
#     'axis_major_length',
#     'axis_minor_length',
#     'max_intensity',
#     'mean_intensity',
#     'min_intensity']

#     Additional measurements:
#     -----------------------
#     ['standard_deviation_intensity',
#     'surface_area']

#     Skeleton metrics (if added):
#     ------------------------
#     ['skel-obj-id',
#     'skel-type',
#     'skel-brh-count',
#     'skel-min-brh-length',
#     'skel-max-brh-length',
#     'skel-ave-brh-length',
#     'skel-sd-brh-length',
#     'skel-med-brh-length',
#     'skel-total-length',
#     'skel-node-count',
#     'skel-ep-count',
#     'skel-jn-count',
#     'skel-ave-jn-deg',
#     'skel-max-deg',
#     'skel-mean-brh-str']


#     Returns
#     -------------
#     pandas dataframe of containing regionprops measurements (columns) for each object in the segmentation image (rows) and the regionprops object
    
#     """
#     ###################################################
#     ## MASK THE ORGANELLE OBJECTS THAT WILL BE MEASURED
#     ###################################################
#     # in case we sent a boolean mask (e.g. cyto, nucleus, cellmask)
#     input_labels = _assert_uint16_labels(segmentation_img)

#     # mask
#     input_labels = apply_mask(input_labels, mask)

#     ##########################################
#     ## CREATE LIST OF REGIONPROPS MEASUREMENTS
#     ##########################################
#     # start with LABEL
#     properties = ["label"]

#     # add position
#     properties = properties + ["centroid", "bbox"]

#     # add area
#     properties = properties + ["area", "equivalent_diameter"] # "num_pixels", 

#     # add shape measurements
#     properties = properties + ["extent", "euler_number", "solidity", "axis_major_length"] # ,"feret_diameter_max", "axis_minor_length"]

#     # add intensity values (used for quality checks)
#     properties = properties + ["min_intensity", "max_intensity", "mean_intensity"]

#     #######################
#     ## ADD EXTRA PROPERTIES
#     #######################
#     def standard_deviation_intensity(region, intensities):
#         return np.std(intensities[region])

#     extra_properties = [standard_deviation_intensity]

#     ##################
#     ## RUN REGIONPROPS
#     ##################
#     props = regionprops_table(input_labels, 
#                            intensity_image=intensity_img, 
#                            properties=properties,
#                            extra_properties=extra_properties,
#                            spacing=scale)

#     props_table = pd.DataFrame(props)
#     props_table.insert(0, "object", seg_name)
#     props_table.rename(columns={"area": "volume"}, inplace=True)

#     if scale is not None:
#         round_scale = (round(scale[0], 4), round(scale[1], 4), round(scale[2], 4))
#         props_table.insert(loc=2, column="scale", value=f"{round_scale}")
#     else: 
#         props_table.insert(loc=2, column="scale", value=f"{tuple(np.ones(segmentation_img.ndim))}") 

#     ##################################################################
#     ## RUN SURFACE AREA FUNCTION SEPARATELY AND APPEND THE PROPS_TABLE
#     ##################################################################
#     surface_area_tab = pd.DataFrame(surface_area_from_props(input_labels, props, scale))

#     props_table.insert(12, "surface_area", surface_area_tab)
#     props_table.insert(14, "SA_to_volume_ratio", props_table["surface_area"].div(props_table["volume"]))

#     ################################################################
#     ## ADD SKELETONIZATION OPTION FOR MEASURING LENGTH AND BRANCHING
#     ################################################################
#     if skel_met:
#         skel_table = skeleton_metrics(input_labels.astype(float),
#                  seg_name,
#                  dim)
        
#         skel_tab = skel_table[['skel-obj-id',
#                                 'skel-type',
#                                 'brh-count',
#                                 'min-brh-length',
#                                 'max-brh-length',
#                                 'ave-brh-length',
#                                 'sd-brh-length',
#                                 'med-brh-length',
#                                 'total-length',
#                                 'node-count',
#                                 'ep-count',
#                                 'jn-count',
#                                 'ave-jn-deg',
#                                 'max-deg',
#                                 'mean-brh-str']]

#         props_table = props_table.merge(skel_tab, left_on = 'label', right_on = 'skel-obj-id').drop('skel-obj-id', axis = 1)

#         props_table.rename(columns = dict([(name, 'skel-' + name) for name in skel_tab.columns[2:]]), inplace = True)

#     return props_table

In [32]:
# get_morph_table = _get_morphology_metrics(segmentation_img = organelles[org_channels_ordered.index(org)],
#                       seg_name = org_list[org_channels_ordered.index(org)],
#                       intensity_img = img_data[org],
#                       mask = mask,
#                       mask_name = mask_name,
#                       skel_met = True,
#                       scale = scale)

get_morph_table = _get_morphology_metrics(segmentation_img = org_seg,
                      seg_name = org_list[org_channels_ordered.index(org)],
                      intensity_img = img_data[org],
                      mask = mask,
                      mask_name = mask_name,
                      skel_met = True,
                      scale = scale)

import warnings
warnings.simplefilter('ignore')

Part One took 10.746462106704712 sec(s)
Total Time: 10.746462106704712 sec(s)


In [33]:
testa_inputs = [org_seg,
 org_list[org_channels_ordered.index(org)],
 img_data[org],
 mask,
 mask_name,
 True,
 scale]

In [34]:
testa_inputs

[array([[[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0]],
 
        [[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0]],
 
        [[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0]],
 
        ...,
 
        [[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         ...,
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0]],
 
        [[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],

### **Update to `make_all_metrics_tables`**

In [35]:
def _make_all_metrics_tables(source_file: str,
                             list_obj_names: List[str],
                             list_obj_segs: List[np.ndarray],
                             list_intensity_img: List[np.ndarray],
                             list_region_names: List[str],
                             list_region_segs: List[np.ndarray],
                             mask: str,
                             dist_centering_obj:str, 
                             dist_num_bins: int,
                             dist_center_on: bool=False,
                             dist_keep_center_as_bin: bool=True,
                             dist_zernike_degrees: Union[int, None]=None,
                             scale: Union[tuple,None] = None,
                             include_contact_dist:bool=True,
                             skel: List[str] = []):
    """
    Measure the composition, morphology, distribution, and contacts of multiple organelles in a cell

    Parameters:
    ----------
    source_file: str
        file path; this is used for recorder keeping of the file name in the output data tables
    list_obj_names: List[str]
        a list of object names (strings) that will be measured; this should match the order in list_obj_segs
    list_obj_segs: List[np.ndarray]
        a list of 3D (ZYX) segmentation np.ndarrays that will be measured per cell; the order should match the list_obj_names 
    list_intensity_img: List[np.ndarray]
        a list of 3D (ZYX) grayscale np.ndarrays that will be used to measure fluoresence intensity in each region and object
    list_region_names: List[str]
        a list of region names (strings); these should include the mask (entire region being measured - usually the cell) 
        and other sub-mask regions from which we can meausure the objects in (ex - nucleus, neurites, soma, etc.). It should 
        also include the centering object used when created the XY distribution bins.
        The order should match the list_region_segs
    list_region_segs: List[np.ndarray]
        a list of 3D (ZYX) binary np.ndarrays of the region masks; the order should match the list_region_names.
    mask: str
        a str of which region name (contained in the list_region_names list) should be used as the main mask (e.g., cell mask)
    dist_centering_obj:str
        a str of which region name (contained in the list_region_names list) should be used as the centering object in 
        get_XY_distribution()
    dist_num_bins: int
        the number of concentric rings to draw between the centering object and edge of the mask in get_XY_distribution()
    dist_center_on: bool=False,
        for get_XY_distribution:
        True = distribute the bins from the center of the centering object
        False = distribute the bins from the edge of the centering object
    dist_keep_center_as_bin: bool=True
        for get_XY_distribution:
        True = include the centering object area when creating the bins
        False = do not include the centering object area when creating the bins
    dist_zernike_degrees: Union[int, None]=None
        for get_XY_distribution:
        the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
        be included in the output
    scale: Union[tuple,None] = None
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    include_contact_dist:bool=True
        whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution
    skel: List[str] = []
        The organelles in which skeleton quantification will be ran on, (the list is empty by default)

    Returns:
    ----------
    4 Dataframes of measurements of organelle morphology, region morphology, contact morphology, and organelle/contact distributions

    """
    start = time.time()
    count = 0

    # segmentation image for all masking steps below
    mask = list_region_segs[list_region_names.index(mask)]

    ######################
    # measure cell regions
    ######################
    # create np.ndarray of intensity images
    raw_image = np.stack(list_intensity_img)
    
    # container for region data
    region_tabs = []
    for r, r_name in enumerate(list_region_names):
        region = list_region_segs[r]
        region_metrics = get_region_morphology_3D(region_seg=region, 
                                                  region_name=r_name,
                                                  channel_names=list_obj_names,
                                                  intensity_img=raw_image, 
                                                  mask=mask,
                                                  scale=scale)
        region_tabs.append(region_metrics)

    ##############################################################
    # loop through all organelles to collect measurements for each
    ##############################################################
    # containers to collect per organelle information
    org_tabs = []
    dist_tabs = []
    XY_bins = []
    XY_wedges = []

    for j, target in enumerate(list_obj_names):
        
        # organelle intensity image
        org_img = list_intensity_img[j]

        # organelle segmentation
        if target == 'ER':
            # ensure ER is only one object
            org_obj = (list_obj_segs[j] > 0).astype(np.uint16)
        else:
            org_obj = list_obj_segs[j]

        ### NRM DEBUGGING CODE ###
        print(list_obj_names)
        
        print(f'target={target} in this iteration, channel {j}')
        
        viewer.add_image(org_img,
                         name = f"{target} intensity",
                         scale = scale)
        
        viewer.add_image(org_obj,
                         name = f"{target} object (seg)",
                         scale = scale)
        
         ### ################# ###

        ##########################################################
        # measure organelle morphology & number of objs contacting
        ##########################################################
         ### If the organelle is in the skel list, then skeletonization will be ran for that organelle
        ### By default the list is empty
        if target in skel:
            skel_met = True
        else:
            skel_met = False

        if target == "ER":
            
            testb_inputs = (org_obj,
                         target,
                         org_img,
                         mask,
                         mask_name,
                         scale,
                         skel_met)
            
            print("org obj test")
            print(np.all(testa_inputs[0] == testb_inputs[0]))

            print("target test")
            print(testa_inputs[1], testb_inputs[1])
            print(testa_inputs[1] == testb_inputs[1])

            print("org img test")
            print(np.all(testa_inputs[2] == testb_inputs[2]))

            print("mask test")
            viewer.add_image(testa_inputs[3],
                             scale = scale,
                             name = "test A mask")
            viewer.add_image(testb_inputs[3],
                             scale = scale,
                             name = "test B mask")
            print(np.all(testa_inputs[3] == testb_inputs[3]))

            print("mask name test")
            print(np.all(testa_inputs[4] == testb_inputs[4]))

            print("scale test")
            print(testa_inputs[5], testb_inputs[5])
            print(testa_inputs[5] == testb_inputs[5])

            print("skel test")
            print(testa_inputs[6], testb_inputs[6])
            print(testa_inputs[6] == testb_inputs[6])
            


        
        org_metrics = _get_morphology_metrics(segmentation_img=org_obj, 
                                            seg_name=target,
                                            intensity_img=org_img, 
                                            mask = mask,
                                            mask_name = mask_name,
                                            scale=scale,
                                            skel_met = skel_met)
        


        ### org_metrics.insert(loc=0,column='cell',value=1) 
        # ^^^ saving this thought for later when someone might have more than one cell per image.
        # Not sure how they analysis process would fit in our pipelines as they exist now. 
        # Maybe here, iterating though the index of the masks above all of this and using that index as the cell number?

        org_tabs.append(org_metrics)

        ################################
        # measure organelle distribution 
        ################################
        centering = list_region_segs[list_region_names.index(dist_centering_obj)]
        XY_org_distribution, XY_bin_masks, XY_wedge_masks = get_XY_distribution(mask=mask,
                                                                                centering_obj=centering,
                                                                                obj=org_obj,
                                                                                obj_name=target,
                                                                                scale=scale,
                                                                                num_bins=dist_num_bins,
                                                                                center_on=dist_center_on,
                                                                                keep_center_as_bin=dist_keep_center_as_bin,
                                                                                zernike_degrees=dist_zernike_degrees)
        Z_org_distribution = get_Z_distribution(mask=mask, 
                                                obj=org_obj,
                                                obj_name=target,
                                                center_obj=centering,
                                                scale=scale)
        
        org_distribution_metrics = pd.merge(XY_org_distribution, Z_org_distribution,on=["object", "scale"])

        dist_tabs.append(org_distribution_metrics)
        XY_bins.append(XY_bin_masks)
        XY_wedges.append(XY_wedge_masks)

    #######################################
    # collect non-redundant contact metrics 
    #######################################
    # list the non-redundant organelle pairs
    contact_combos = list(itertools.combinations(list_obj_names, 2))

    # container to keep contact data in
    contact_tabs = []

    # loop through each pair and measure contacts
    for pair in contact_combos:
        # pair names
        a_name = pair[0]
        b_name = pair[1]

        # segmentations to measure
        if a_name == 'ER':
            # ensure ER is only one object
            a = (list_obj_segs[list_obj_names.index(a_name)] > 0).astype(np.uint16)
        else:
            a = list_obj_segs[list_obj_names.index(a_name)]
        
        if b_name == 'ER':
            # ensure ER is only one object
            b = (list_obj_segs[list_obj_names.index(b_name)] > 0).astype(np.uint16)
        else:
            b = list_obj_segs[list_obj_names.index(b_name)]
        

        if include_contact_dist == True:
            contact_tab, contact_dist_tab = get_contact_metrics_3D(a, a_name, 
                                                                   b, b_name, 
                                                                   mask, 
                                                                   scale, 
                                                                   include_dist=include_contact_dist,
                                                                   dist_centering_obj=centering,
                                                                   dist_num_bins=dist_num_bins,
                                                                   dist_zernike_degrees=dist_zernike_degrees,
                                                                   dist_center_on=dist_center_on,
                                                                   dist_keep_center_as_bin=dist_keep_center_as_bin)
            dist_tabs.append(contact_dist_tab)
        else:
            contact_tab = get_contact_metrics_3D(a, a_name, 
                                                 b, b_name, 
                                                 mask, 
                                                 scale, 
                                                 include_dist=include_contact_dist)
        contact_tabs.append(contact_tab)


    ###########################################
    # combine all tabs into one table per type:
    ###########################################
    final_org_tab = pd.concat(org_tabs, ignore_index=True)
    final_org_tab.insert(loc=0,column='image_name',value=source_file.stem)

    final_contact_tab = pd.concat(contact_tabs, ignore_index=True)
    final_contact_tab.insert(loc=0,column='image_name',value=source_file.stem)

    combined_dist_tab = pd.concat(dist_tabs, ignore_index=True)
    combined_dist_tab.insert(loc=0,column='image_name',value=source_file.stem)

    final_region_tab = pd.concat(region_tabs, ignore_index=True)
    final_region_tab.insert(loc=0,column='image_name',value=source_file.stem)

    end = time.time()
    print(f"It took {(end-start)/60} minutes to quantify one image.")
    return final_org_tab, final_contact_tab, combined_dist_tab, final_region_tab

In [36]:
# # OLD CODE

# def _make_all_metrics_tables(source_file: str,
#                              list_obj_names: List[str],
#                              list_obj_segs: List[np.ndarray],
#                              list_intensity_img: List[np.ndarray],
#                              list_region_names: List[str],
#                              list_region_segs: List[np.ndarray],
#                              mask: str,
#                              dist_centering_obj:str, 
#                              dist_num_bins: int,
#                              dist_center_on: bool=False,
#                              dist_keep_center_as_bin: bool=True,
#                              dist_zernike_degrees: Union[int, None]=None,
#                              scale: Union[tuple,None] = None,
#                              include_contact_dist:bool=True,
#                              splitter:str="_",
#                              skel: List[str] = []):
#     """
#     Measure the composition, morphology, distribution, and contacts of multiple organelles in a cell

#     Parameters:
#     ----------
#     source_file: str
#         file path; this is used for recorder keeping of the file name in the output data tables
#     list_obj_names: List[str]
#         a list of object names (strings) that will be measured; this should match the order in list_obj_segs
#     list_obj_segs: List[np.ndarray]
#         a list of 3D (ZYX) segmentation np.ndarrays that will be measured per cell; the order should match the list_obj_names 
#     list_intensity_img: List[np.ndarray]
#         a list of 3D (ZYX) grayscale np.ndarrays that will be used to measure fluoresence intensity in each region and object
#     list_region_names: List[str]
#         a list of region names (strings); these should include the mask (entire region being measured - usually the cell) 
#         and other sub-mask regions from which we can meausure the objects in (ex - nucleus, neurites, soma, etc.). It should 
#         also include the centering object used when created the XY distribution bins.
#         The order should match the list_region_segs
#     list_region_segs: List[np.ndarray]
#         a list of 3D (ZYX) binary np.ndarrays of the region masks; the order should match the list_region_names.
#     mask: str
#         a str of which region name (contained in the list_region_names list) should be used as the main mask (e.g., cell mask)
#     dist_centering_obj:str
#         a str of which region name (contained in the list_region_names list) should be used as the centering object in 
#         get_XY_distribution()
#     dist_num_bins: int
#         the number of concentric rings to draw between the centering object and edge of the mask in get_XY_distribution()
#     dist_center_on: bool=False,
#         for get_XY_distribution:
#         True = distribute the bins from the center of the centering object
#         False = distribute the bins from the edge of the centering object
#     dist_keep_center_as_bin: bool=True
#         for get_XY_distribution:
#         True = include the centering object area when creating the bins
#         False = do not include the centering object area when creating the bins
#     dist_zernike_degrees: Union[int, None]=None
#         for get_XY_distribution:
#         the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
#         be included in the output
#     scale: Union[tuple,None] = None
#         a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
#     include_contact_dist:bool=True
#         whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution
#     skel: List[str] = []
#         The organelles in which skeleton quantification will be ran on, (the list is empty by default)

#     Returns:
#     ----------
#     4 Dataframes of measurements of organelle morphology, region morphology, contact morphology, and organelle/contact distributions

#     """
#     start = time.time()
#     count = 0

#     # segmentation image for all masking steps below
#     mask = list_region_segs[list_region_names.index(mask)]

#     # containers to collect per organelle information
#     org_tabs = []
#     dist_tabs = []
#     XY_bins = []
#     XY_wedges = []

#     #############################
#     # Measure Organelle Contacts 
#     #############################
#     if len(list_obj_names) >=2:
#         contact_tabs = []
#         org_dict = make_dict(obj_names=list_obj_names,
#                              obj_segs=list_obj_segs)
#         all_conts, non_red_conts=multi_contact(org_segs=org_dict,
#                                                organelles=list_obj_names,
#                                                splitter=splitter,
#                                                redundancy=False)
#         if include_contact_dist:
#             centering = list_region_segs[list_region_names.index(dist_centering_obj)]
#             for orgs, site in all_conts.items():
#                 cont_tab, dist_tab = get_contact_metrics_3D(orgs = orgs,
#                                         organelle_segs = org_dict,
#                                         mask = mask,
#                                         splitter = splitter,
#                                         scale = scale,
#                                         include_dist = include_contact_dist, 
#                                         dist_centering_obj = centering,
#                                         dist_num_bins = dist_num_bins,
#                                         dist_zernike_degrees =  dist_zernike_degrees,
#                                         dist_center_on = dist_center_on,
#                                         dist_keep_center_as_bin = dist_keep_center_as_bin)
#                 # for d_tabs,c_tabs in zip(dist_tab,cont_tab):
#                 #     dist_tabs.append(d_tabs)
#                 #     contact_tabs.append(c_tabs)
#                 dist_tabs.append(dist_tab[0])
#                 contact_tabs.append(cont_tab)
#             ##########################################
#             # Collecting empty distance table metrics
#             ##########################################
#             all_pos = []
#             for n in list(map(lambda x:x+2, (range(len(list_obj_names)-1)))):
#                 all_pos += itertools.combinations(list_obj_names, n)
#             possib = [splitter.join(cont) for cont in all_pos if not inkeys(all_conts, splitter.join(cont), splitter)]
#             del all_conts, non_red_conts
#             for con in possib:
#                 dist_tabs.append(get_empty_contact_dist_tabs(mask=mask,
#                                                              name=con,
#                                                              dist_centering_obj=centering,
#                                                              scale=scale,
#                                                              dist_zernike_degrees=dist_zernike_degrees,
#                                                              dist_center_on=dist_center_on,
#                                                              dist_keep_center_as_bin=dist_keep_center_as_bin,
#                                                              dist_num_bins=dist_num_bins))
#             del possib
#         else:
#             for orgs, site in all_conts.items():
#                 cont_tab = get_contact_metrics_3D(orgs = orgs,
#                                                    organelle_segs = org_dict,
#                                                    mask = mask,
#                                                    splitter = splitter,
#                                                    scale = scale,
#                                                    include_dist = False)
#             del all_conts, non_red_conts

#     ######################
#     # measure cell regions
#     ######################
#     # create np.ndarray of intensity images
#     raw_image = np.stack(list_intensity_img)

#     # container for region data
#     region_tabs = []
#     for r, r_name in enumerate(list_region_names):
#         region = list_region_segs[r]
#         region_metrics = get_region_morphology_3D(region_seg=region, 
#                                                   region_name=r_name,
#                                                   channel_names=list_obj_names,
#                                                   intensity_img=raw_image, 
#                                                   mask=mask,
#                                                   scale=scale)
#         region_tabs.append(region_metrics)

#     ##############################################################
#     # loop through all organelles to collect measurements for each
#     ##############################################################

#     for j, target in enumerate(list_obj_names):
#         print(target)
#         # organelle intensity image
#         org_img = list_intensity_img[j]

#         # organelle segmentation
#         if target == 'ER':
#             # ensure ER is only one object
#             org_obj = (list_obj_segs[j] > 0).astype(np.uint16)
#         else:
#             org_obj = list_obj_segs[j]

#         ##########################################################
#         # measure organelle morphology
#         ##########################################################

#         ### If the organelle is in the skel list, then skeletonization will be ran for that organelle
#         ### By default the list is empty
#         if target in skel:
#             skel_met = True
#         else:
#             skel_met = False
#         org_metrics = _get_org_morphology_3D(segmentation_img=org_obj, 
#                                             seg_name=target,
#                                             intensity_img=org_img, 
#                                             mask=mask,
#                                             scale=scale,
#                                             skel_met = skel_met)

#         ### org_metrics.insert(loc=0,column='cell',value=1) 
#         # ^^^ saving this thought for later when someone might have more than one cell per image.
#         # Not sure how they analysis process would fit in our pipelines as they exist now. 
#         # Maybe here, iterating though the index of the masks above all of this and using that index as the cell number?

#         # TODO: find a better way to quantify the number and area of contacts per organelle
#             # I think it can be done during summarizing based on the label and object values in the contact sheet
#         # for i, nmi in enumerate(list_obj_names):
#         #     if i != j:
#         #         if target == 'ER':
#         #             b = (list_obj_segs[i] > 0).astype(np.uint16)
#         #         else:
#         #             b = list_obj_segs[i]
            
#         #         ov = []
#         #         b_labs = []
#         #         labs = []
#         #         for idx, lab in enumerate(org_metrics["label"]):
#         #             xyz = tuple(rp[idx].coords.T)
#         #             cmp_org = b[xyz]
                    
#         #             # total area (in voxels or real world units) where these two orgs overlap within the cell
#         #             if scale != None:
#         #                 overlap = sum(cmp_org > 0)*scale[0]*scale[1]*scale[2]
#         #             else:
#         #                 # total number of overlapping pixels
#         #                 overlap = sum(cmp_org > 0)
#         #                 # overlap?
                    
#         #             # which b organelles are involved in that overlap
#         #             labs_b = cmp_org[cmp_org > 0]
#         #             b_js = np.unique(labs_b).tolist()

#         #             # if overlap > 0:
#         #             labs.append(lab) # labs.append(lab)
#         #             ov.append(overlap)
#         #             b_labs.append(b_js)
#         #         org_metrics[f"{nmi}_overlap"] = ov
#         #         org_metrics[f"{nmi}_labels"] = b_labs 

#         org_tabs.append(org_metrics)

#         ################################
#         # measure organelle distribution 
#         ################################
#         centering = list_region_segs[list_region_names.index(dist_centering_obj)]
#         XY_org_distribution, XY_bin_masks, XY_wedge_masks = get_XY_distribution(mask=mask,
#                                                                                 centering_obj=centering,
#                                                                                 obj=org_obj,
#                                                                                 obj_name=target,
#                                                                                 scale=scale,
#                                                                                 num_bins=dist_num_bins,
#                                                                                 center_on=dist_center_on,
#                                                                                 keep_center_as_bin=dist_keep_center_as_bin,
#                                                                                 zernike_degrees=dist_zernike_degrees)
#         Z_org_distribution = get_Z_distribution(mask=mask, 
#                                                 obj=org_obj,
#                                                 obj_name=target,
#                                                 center_obj=centering,
#                                                 scale=scale)
        
#         org_distribution_metrics = pd.merge(XY_org_distribution, Z_org_distribution,on=["object", "scale"])

#         dist_tabs.append(org_distribution_metrics)
#         XY_bins.append(XY_bin_masks)
#         XY_wedges.append(XY_wedge_masks)

#     # # list the non-redundant organelle pairs
#     # contact_combos = list(itertools.combinations(list_obj_names, 2))

#     # # container to keep contact data in
#     # contact_tabs = []

#     # # loop through each pair and measure contacts
#     # for pair in contact_combos:
#     #     # pair names
#     #     a_name = pair[0]
#     #     b_name = pair[1]

#     #     # segmentations to measure
#     #     if a_name == 'ER':
#     #         # ensure ER is only one object
#     #         a = (list_obj_segs[list_obj_names.index(a_name)] > 0).astype(np.uint16)
#     #     else:
#     #         a = list_obj_segs[list_obj_names.index(a_name)]
        
#     #     if b_name == 'ER':
#     #         # ensure ER is only one object
#     #         b = (list_obj_segs[list_obj_names.index(b_name)] > 0).astype(np.uint16)
#     #     else:
#     #         b = list_obj_segs[list_obj_names.index(b_name)]
        

#     #     if include_contact_dist == True:
#     #         contact_tab, contact_dist_tab = get_contact_metrics_3D(a, a_name, 
#     #                                                                b, b_name, 
#     #                                                                mask, 
#     #                                                                scale, 
#     #                                                                include_dist=include_contact_dist,
#     #                                                                dist_centering_obj=centering,
#     #                                                                dist_num_bins=dist_num_bins,
#     #                                                                dist_zernike_degrees=dist_zernike_degrees,
#     #                                                                dist_center_on=dist_center_on,
#     #                                                                dist_keep_center_as_bin=dist_keep_center_as_bin)
#     #         dist_tabs.append(contact_dist_tab)
#     #     else:
#     #         contact_tab = get_contact_metrics_3D(a, a_name, 
#     #                                              b, b_name, 
#     #                                              mask, 
#     #                                              scale, 
#     #                                              include_dist=include_contact_dist)
#     #     contact_tabs.append(contact_tab)


#     ###########################################
#     # combine all tabs into one table per type:
#     ###########################################
#     final_org_tab = pd.concat(org_tabs, ignore_index=True)
#     final_org_tab.insert(loc=0,column='image_name',value=source_file.stem)

#     final_contact_tab = pd.concat(contact_tabs, ignore_index=True)
#     final_contact_tab.insert(loc=0,column='image_name',value=source_file.stem)

#     combined_dist_tab = pd.concat(dist_tabs, ignore_index=True)
#     combined_dist_tab.insert(loc=0,column='image_name',value=source_file.stem)

#     final_region_tab = pd.concat(region_tabs, ignore_index=True)
#     final_region_tab.insert(loc=0,column='image_name',value=source_file.stem)

#     end = time.time()
#     print(f"It took {(end-start)/60} minutes to quantify one image.")
#     return final_org_tab, final_contact_tab, combined_dist_tab, final_region_tab

In [37]:
# get_contact_metrics_3D(orgs=orgs,
#                                                             site=site,
#                                                             HO = non_red_conts[orgs],
#                                                             organelle_segs=org_dict,
#                                                             mask=mask,
#                                                             splitter=splitter,
#                                                             scale=scale,
#                                                             include_dist=include_contact_dist,
#                                                             dist_centering_obj=x,
#                                                             dist_num_bins=dist_num_bins,
#                                                             dist_zernike_degrees=dist_zernike_degrees,
#                                                             dist_center_on=dist_center_on,
#                                                             dist_keep_center_as_bin=dist_keep_center_as_bin)

In [38]:
# reorder the organelles to match channel indicies
region_names = ['nuc', 'cell']

In [39]:
import napari
viewer = napari.Viewer()

In [40]:
# Test code to confirm the order

for i in range(len(['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox'])):
    viewer.add_image(organelles[i],
                     name = f"seg_{org_list[i]}")
    viewer.add_image(intensities[i],
                     name = f"int_{org_list[i]}")

In [41]:
test_final_org_tab, test_final_contact_tab, test_combined_dist_tab, test_final_regions_tab = _make_all_metrics_tables(source_file= test_img_name,
                                                                                                                      list_obj_names=org_list,
                                                                                                                      list_obj_segs= organelles,
                                                                                                                      list_intensity_img=intensities,
                                                                                                                      list_region_names=region_names,
                                                                                                                      list_region_segs=regions,
                                                                                                                      mask='cell',
                                                                                                                      dist_centering_obj='nuc',
                                                                                                                      dist_num_bins=5,
                                                                                                                      dist_center_on=True,
                                                                                                                      dist_keep_center_as_bin=True,
                                                                                                                      dist_zernike_degrees=9,
                                                                                                                      scale=scale,
                                                                                                                      include_contact_dist=True,
                                                                                                                      skel = ['ER'])

warnings.simplefilter('ignore')

['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=LD in this iteration, channel 0
['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=ER in this iteration, channel 1
org obj test
True
target test
ER ER
True
org img test
True
mask test
True
mask name test
True
scale test
True (0.396091, 0.079947, 0.079947)
False
skel test
(0.396091, 0.079947, 0.079947) True
False
Part One took 12.577378988265991 sec(s)
Total Time: 12.577901601791382 sec(s)
['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=golgi in this iteration, channel 2
['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=lyso in this iteration, channel 3
['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=mito in this iteration, channel 4
['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox']
target=perox in this iteration, channel 5


KeyboardInterrupt: 

### **Update to `_find_segmentation_tiff_files`**

In [None]:
def _find_segmentation_tiff_files(prototype:Union[Path,str],
                                  name_list:List[str], 
                                  seg_path:Union[Path,str],
                                  suffix:Union[str, None]=None):
    """
    Find the matching segmentation files to the raw image file based on the raw image file path.

    Paramters:
    ---------
    prototype:Union[Path,str]
        the file path (as a string) for one raw image file; this file should have matching segmentation 
        output files with the same file name root and different file name ending that match the strings 
        provided in name_list
    name_list:List[str]
        a list of file name endings related to what segmentation is that file
    seg_path:Union[Path,str]
        the path (as a string) to the matching segmentation files.
    suffix:Union[str, None]=None
        any additional text that exists between the file root and the name_list ending
        Ex) Prototype = "C:/Users/Shannon/Documents/Python_Scripts/Infer-subc/raw/a48hrs-Ctrl_9_Unmixing.czi"
            Name of organelle file = a48hrs-Ctrl_9_Unmixing-20230426_test_cell.tiff
            result of .stem = "a48hrs-Ctrl_9_Unmixing"
            organelle/cell area type = "cell"
            suffix = "-20230426_test_"
    
    Returns:
    ----------
    a dictionary of file paths for each image type (raw and all the different segmentations)

    """
    # raw
    prototype = Path(prototype)
    if not prototype.exists():
        print(f"bad prototype. please choose an existing `raw` file as prototype")
        return dict()

    out_files = {"raw":prototype}
    seg_path = Path(seg_path) 

    # raw
    if not seg_path.is_dir():
        print(f"bad path argument. please choose an existing path containing organelle segmentations")
        return out_files

    # segmentations
    for org_n in name_list:
        org_name = Path(seg_path) / f"{prototype.stem}{suffix}{org_n}.tiff"
        if org_name.exists():
            out_files[org_n] = org_name
        else:
            # checker for .tif files usually as a result of manual segmentations
            if (Path(seg_path) / f"{prototype.stem}{suffix}{org_n}.tif").exists():
                out_files[org_n] = Path(seg_path) / f"{prototype.stem}{suffix}{org_n}.tif"
                print(f"{org_n} had a .tif file instead")
            else:
                print(f"{org_n} .tiff file not found in {seg_path} returning")
                out_files[org_n] = None
    
    return out_files

### **Update to `batch_process_quantification`**

In [None]:
# for convex hull errors
warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", RuntimeWarning)

def _batch_process_quantification(out_file_name: str,
                                  seg_path: Union[Path,str],
                                  out_path: Union[Path, str], 
                                  raw_path: Union[Path,str], 
                                  raw_file_type: str,
                                  organelle_names: List[str],
                                  organelle_channels: List[int],
                                  region_names: List[str],
                                  masks_file_name: str,
                                  mask: str,
                                  dist_centering_obj:str, 
                                  dist_num_bins: int,
                                  dist_center_on: bool=False,
                                  dist_keep_center_as_bin: bool=True,
                                  dist_zernike_degrees: Union[int, None]=None,
                                  include_contact_dist: bool = True,
                                  scale:bool=True,
                                  seg_suffix:Union[str, None]=None,
                                  splitter: str = '_',
                                  skel: List[str] = []) -> int :
    """  
    batch process segmentation quantification (morphology, distribution, contacts); this function is currently optimized to process images from one file folder per image type (e.g., raw, segmentation)
    the output csv files are saved to the indicated out_path folder

    Parameters:
    ----------
    out_file_name: str
        the prefix to use when naming the output datatables
    seg_path: Union[Path,str]
        Path or str to the folder that contains the segmentation tiff files
    out_path: Union[Path, str]
        Path or str to the folder that the output datatables will be saved to
    raw_path: Union[Path,str]
        Path or str to the folder that contains the raw image files
    raw_file_type: str
        the file type of the raw data; ex - ".tiff", ".czi"
    organelle_names: List[str]
        a list of all organelle names that will be analyzed; the names should be the same as the suffix used to name each of the tiff segmentation files
        Note: the intensity measurements collect per region (from get_region_morphology_3D function) will only be from channels associated to these organelles 
    organelle_channels: List[int]
        a list of channel indices associated to respective organelle staining in the raw image; the indices should listed in same order in which the respective segmentation name is listed in organelle_names
    region_names: List[str]
        a list of regions, or masks, to measure; the order should correlate to the order of the channels in the "masks" output segmentation file
    masks_file_name: str
        the suffix of the "masks" segmentation file; ex- "masks_B", "masks", etc.
        this function currently does not accept indivial region segmentations 
    mask: str
        the name of the region to use as the mask when measuring the organelles; this should be one of the names listed in regions list; usually this will be the "cell" mask
    dist_centering_obj:str
        the name of the region or object to use as the centering object in the get_XY_distribution function
    dist_num_bins: int
        the number of bins for the get_XY_distribution function
    dist_center_on: bool=False,
        for get_XY_distribution:
        True = distribute the bins from the center of the centering object
        False = distribute the bins from the edge of the centering object
    dist_keep_center_as_bin: bool=True
        for get_XY_distribution:
        True = include the centering object area when creating the bins
        False = do not include the centering object area when creating the bins
    dist_zernike_degrees: Union[int, None]=None
        for get_XY_distribution:
        the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
        be included in the output
    include_contact_dist:bool=True
        whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution
    scale:bool=True
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    seg_suffix:Union[str, None]=None
        any additional text that is included in the segmentation tiff files between the file stem and the segmentation suffix
        TODO: this can't be None!!! need to update!!!
    skel: List[str] = []
        The organelles in which skeleton quantification will be ran on


    Returns:
    ----------
    count: int
        the number of images processed
        
    """
    start = time.time()
    count = 0

    if isinstance(raw_path, str): raw_path = Path(raw_path)
    if isinstance(seg_path, str): seg_path = Path(seg_path)
    if isinstance(out_path, str): out_path = Path(out_path)
    
    if not Path.exists(out_path):
        Path.mkdir(out_path)
        print(f"making {out_path}")
    
    # reading list of files from the raw path
    img_file_list = list_image_files(raw_path, raw_file_type)

    # list of segmentation files to collect
    segs_to_collect = organelle_names + [masks_file_name]

    # containers to collect data tabels
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []

    for img_f in img_file_list:
        count = count + 1
        filez = _find_segmentation_tiff_files(img_f, segs_to_collect, seg_path, seg_suffix)

        # read in raw file and metadata
        img_data, meta_dict = read_czi_image(filez["raw"])

        # create intensities from raw file as list based on the channel order provided
        intensities = [img_data[ch] for ch in organelle_channels]

        # define the scale
        if scale is True:
            scale_tup = meta_dict['scale']
        else:
            scale_tup = None

        # load regions as a list based on order in list (should match order in "masks" file)
        # masks = read_tiff_image(filez[masks_file_name]) 
        # regions = [masks[r] for r, region in enumerate(region_names)]
        
        regions= [read_tiff_image(filez[masks_file_name])[0],
                  read_tiff_image(filez[masks_file_name])[1]]

        # store organelle images as list
        organelles = [read_tiff_image(filez[org]) for org in organelle_names]

        org_metrics, contact_metrics, dist_metrics, region_metrics = _make_all_metrics_tables(source_file=img_f,
                                                                                             list_obj_names=organelle_names,
                                                                                             list_obj_segs=organelles,
                                                                                             list_intensity_img=intensities, 
                                                                                             list_region_names=region_names,
                                                                                             list_region_segs=regions, 
                                                                                             mask=mask,
                                                                                             dist_centering_obj=dist_centering_obj,
                                                                                             dist_num_bins=dist_num_bins,
                                                                                             dist_center_on=dist_center_on,
                                                                                             dist_keep_center_as_bin=dist_keep_center_as_bin,
                                                                                             dist_zernike_degrees=dist_zernike_degrees,
                                                                                             scale=scale_tup,
                                                                                             include_contact_dist=include_contact_dist,
                                                                                             splitter=splitter,
                                                                                             skel = skel)

        org_tabs.append(org_metrics)
        contact_tabs.append(contact_metrics)
        dist_tabs.append(dist_metrics)
        region_tabs.append(region_metrics)

        end2 = time.time()
        print(f"Completed processing for {count} images in {(end2-start)/60} mins.")

    final_org = pd.concat(org_tabs, ignore_index=True)
    final_contact = pd.concat(contact_tabs, ignore_index=True)
    final_dist = pd.concat(dist_tabs, ignore_index=True)
    final_region = pd.concat(region_tabs, ignore_index=True)

    org_csv_path = out_path / f"{out_file_name}organelles.csv"
    final_org.to_csv(org_csv_path)

    contact_csv_path = out_path / f"{out_file_name}contacts.csv"
    final_contact.to_csv(contact_csv_path)

    dist_csv_path = out_path / f"{out_file_name}distributions.csv"
    final_dist.to_csv(dist_csv_path)

    region_csv_path = out_path / f"{out_file_name}regions.csv"
    final_region.to_csv(region_csv_path)

    end = time.time()
    print(f"Quantification for {count} files is COMPLETE! Files saved to '{out_path}'.")
    print(f"It took {(end - start)/60} minutes to quantify these files.")
    return count

In [None]:
seg_data_path = Path(os.path.expanduser("~")) / "OneDrive/Desktop/seg"
out_data_path = Path(os.path.expanduser("~")) / "OneDrive/Desktop/out"
raw_data_path = Path(os.path.expanduser("~")) / "OneDrive/Desktop/raw"

n_files = _batch_process_quantification(out_file_name = "skel_test_",
                                  seg_path=seg_data_path,
                                  out_path=out_data_path, 
                                  raw_path=raw_data_path, 
                                  raw_file_type=".tiff",
                                  organelle_names=org_list,
                                  organelle_channels=organelle_channels,
                                  region_names=region_names,
                                  masks_file_name="masks_B",
                                  mask="cell",
                                  dist_centering_obj="nuc", 
                                  dist_num_bins=5,
                                  dist_center_on=False,
                                  dist_keep_center_as_bin=True,
                                  dist_zernike_degrees=9,
                                  include_contact_dist=True,
                                  scale=True,
                                  seg_suffix="-",
                                  splitter='_',
                                  skel = ['ER','mito','golgi'])

### **`batch_summary_stats`**

In [None]:
def check_for_existing_combo(contact, contact_list, splitter):
    for ctc in contact_list:
        if sorted(contact) == sorted(ctc.split(splitter)):
            return(ctc.split(splitter))
    return contact

In [None]:
def _batch_summary_stats(csv_path_list: List[str],
                         out_path: str,
                         out_preffix: str,
                         splitter: str='X'):
    """" 
    csv_path_list: List[str],
        A list of path strings where .csv files to analyze are located.
    out_path: str,
        A path string where the summary data file will be output to
    out_preffix: str
        The prefix used to name the output file.    
    """
    ds_count = 0
    fl_count = 0
    ###################
    # Read in the csv files and combine them into one of each type
    ###################
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []

    for loc in csv_path_list:
        print(loc)
        ds_count = ds_count + 1
        loc=Path(loc)
        files_store = sorted(loc.glob("*.csv"))
        for file in files_store:
            fl_count = fl_count + 1
            stem = file.stem

            org = "organelles"
            contacts = "contacts"
            dist = "distributions"
            regions = "regions"

            if org in stem:
                test_orgs = pd.read_csv(file, index_col=0)
                test_orgs.insert(0, "dataset", stem[:-11])
                org_tabs.append(test_orgs)
            if contacts in stem:
                test_contact = pd.read_csv(file, index_col=0)
                test_contact.insert(0, "dataset", stem[:-9])
                contact_tabs.append(test_contact)
            if dist in stem:
                test_dist = pd.read_csv(file, index_col=0)
                test_dist.insert(0, "dataset", stem[:-14])
                dist_tabs.append(test_dist)
            if regions in stem:
                test_regions = pd.read_csv(file, index_col=0)
                test_regions.insert(0, "dataset", stem[:-8])
                region_tabs.append(test_regions)
            
    org_df = pd.concat(org_tabs,axis=0, join='outer')
    contacts_df = pd.concat(contact_tabs,axis=0, join='outer')
    dist_df = pd.concat(dist_tabs,axis=0, join='outer')
    regions_df = pd.concat(region_tabs,axis=0, join='outer')
    ##########################
    # List organelles in cell
    ###########################
    all_orgs = list(set(org_df.loc[:, 'object'].tolist()))

    ###################
    # adding new metrics to the original sheets
    ###################
    # TODO: include these labels when creating the original sheets
    contact_cnt = contacts_df[["dataset", "image_name", "object", "label", "volume"]]
    ctc = contact_cnt["object"].values.tolist()
    ##############################################################################
    #  Creating New methods of storing A & B
    ###############################################################################
    # len(max(contact_cnt["object"].str.split('X'), key=len))) provides max number of organelles involved in contact
    contact_cnt[[f"org{cha}" for cha in string.ascii_uppercase[:(len(max(contact_cnt["object"].str.split(splitter), key=len)))]]] = contact_cnt["object"].str.split(splitter, expand=True)
    contact_cnt[[f"{cha}_ID" for cha in string.ascii_uppercase[:(len(max(contact_cnt["label"].str.split('_'), key=len)))]]] = contact_cnt["label"].str.split('_', expand=True)
    #iterating from a to val
    unstacked_cont = []
    for cha in string.ascii_uppercase[:len(max(contact_cnt["object"].str.split(splitter), key=len))]:
        valid = (contact_cnt[f"org{cha}"] != None) & (contact_cnt[f"{cha}_ID"] != None)
        contact_cnt[f"{cha}"] = None
        contact_cnt.loc[valid, f"{cha}"] = contact_cnt[f"org{cha}"] + "_" + contact_cnt[f"{cha}_ID"]
        contact_cnt_percell = contact_cnt[["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object", "volume"]].groupby(["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object"]).agg(["count", "sum"])
        contact_cnt_percell.columns = ["_".join(col_name).rstrip('_') for col_name in contact_cnt_percell.columns.to_flat_index()]
        unstacked = contact_cnt_percell.unstack(level='object')
        unstacked.columns = ["_".join(col_name).rstrip('_') for col_name in unstacked.columns.to_flat_index()]
        unstacked = unstacked.reset_index()
        for col in unstacked.columns:
            if col.startswith("volume_count_"):
                newname = col.split("_")[-1] + "_count"
                unstacked.rename(columns={col:newname}, inplace=True)
            if col.startswith("volume_sum_"):
                newname = col.split("_")[-1] + "_volume"
                unstacked.rename(columns={col:newname}, inplace=True)
        unstacked.rename(columns={f"org{cha}":"object", f"{cha}_ID":"label"}, inplace=True)
        unstacked.set_index(['dataset', 'image_name', 'object', 'label'])    
        unstacked_cont.append(unstacked)
    contact_cnt = pd.concat(unstacked_cont, axis=0).sort_index(axis=0)
    contact_cnt = contact_cnt.groupby(['dataset', 'image_name', 'object', 'label']).sum().reset_index()                 #adds together all duplicates at the index, then resets the index
    contact_cnt['label']=contact_cnt['label'].astype("Int64")  
    org_df = pd.merge(org_df, contact_cnt, how='left', on=['dataset', 'image_name', 'object', 'label'], sort=True)
    org_df[contact_cnt.columns] = org_df[contact_cnt.columns].fillna(0)

    ###################
    # summary stat group
    ###################
    group_by = ['dataset', 'image_name', 'object']
    sharedcolumns = ["SA_to_volume_ratio", "equivalent_diameter", "extent", "euler_number", "solidity", "axis_major_length"]
    ag_func_standard = ['mean', 'median', 'std']

    ###################
    # summarize shared measurements between org_df and contacts_df
    ###################
    org_cont_tabs = []
    for tab in [org_df, contacts_df]:
        tab1 = tab[group_by + ['volume']].groupby(group_by).agg(['count', 'sum'] + ag_func_standard)
        tab2 = tab[group_by + ['surface_area']].groupby(group_by).agg(['sum'] + ag_func_standard)
        tab3 = tab[group_by + sharedcolumns].groupby(group_by).agg(ag_func_standard)
        shared_metrics = pd.merge(tab1, tab2, 'outer', on=group_by)
        shared_metrics = pd.merge(shared_metrics, tab3, 'outer', on=group_by)
        org_cont_tabs.append(shared_metrics)

    org_summary = org_cont_tabs[0]
    contact_summary = org_cont_tabs[1]

    ###################
    # group metrics from regions_df similar to the above
    ###################
    regions_summary = regions_df[group_by + ['volume', 'surface_area'] + sharedcolumns].set_index(group_by)

    ###################
    # summarize extra metrics from org_df
    ###################
    columns2 = [col for col in org_df.columns if col.endswith(("_count", "_volume"))]
    contact_counts_summary = org_df[group_by + columns2].groupby(group_by).agg(['sum'] + ag_func_standard)
    org_summary = pd.merge(org_summary, contact_counts_summary, 'outer', on=group_by)#left_on=group_by, right_on=True)

    ###################
    # summarize distribution measurements
    ###################
    # organelle distributions
    hist_dfs = []
    for ind in dist_df.index:
        selection = dist_df.loc[[ind]]
        bins_df = pd.DataFrame()
        wedges_df = pd.DataFrame()
        Z_df = pd.DataFrame()

        bins_df[['bins', 'masks', 'obj']] = selection[['XY_bins', 'XY_mask_vox_cnt_perbin', 'XY_obj_vox_cnt_perbin']]
        wedges_df[['bins', 'masks', 'obj']] = selection[['XY_wedges', 'XY_mask_vox_cnt_perwedge', 'XY_obj_vox_cnt_perwedge']]
        Z_df[['bins', 'masks', 'obj']] = selection[['Z_slices', 'Z_mask_vox_cnt', 'Z_obj_vox_cnt']]

        dfs = [selection[['dataset', 'image_name', 'object']].reset_index()]
        for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
            single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
                                            df["obj"].values[0][1:-1].split(", "), 
                                            df["masks"].values[0][1:-1].split(", "))), columns =['bins', 'obj', 'mask']).astype(int)

            if "Z_" in prefix:
                single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
                single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
            single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            # if "Z_" in prefix:
            #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)


            sumstats_df = pd.DataFrame()

            s = single_df['bins'].repeat(single_df['obj_norm'])
            sumstats_df['hist_mean']=[s.mean()]
            sumstats_df['hist_median']=[s.median()]
            if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
            else: sumstats_df['hist_mode']=['NaN']
            sumstats_df['hist_min']=[s.min()]
            sumstats_df['hist_max']=[s.max()]
            sumstats_df['hist_range']=[s.max() - s.min()]
            sumstats_df['hist_stdev']=[s.std()]
            sumstats_df['hist_skew']=[s.skew()]
            sumstats_df['hist_kurtosis']=[s.kurtosis()]
            sumstats_df['hist_var']=[s.var()]
            sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
            dfs.append(sumstats_df.reset_index())
        combined_df = pd.concat(dfs, axis=1).drop(columns="index")
        hist_dfs.append(combined_df)
    dist_org_summary = pd.concat(hist_dfs, ignore_index=True)

    # nucleus distribution
    nuc_dist_df = dist_df[["dataset", "image_name", 
                        "XY_bins", "XY_center_vox_cnt_perbin", "XY_mask_vox_cnt_perbin",
                        "XY_wedges", "XY_center_vox_cnt_perwedge", "XY_mask_vox_cnt_perwedge",
                        "Z_slices", "Z_center_vox_cnt", "Z_mask_vox_cnt"]].set_index(["dataset", "image_name"])
    nuc_hist_dfs = []
    for idx in nuc_dist_df.index.unique():
        selection = nuc_dist_df.loc[idx].iloc[[0]].reset_index()
        bins_df = pd.DataFrame()
        wedges_df = pd.DataFrame()
        Z_df = pd.DataFrame()

        bins_df[['bins', 'center', 'masks']] = selection[['XY_bins', 'XY_center_vox_cnt_perbin', 'XY_mask_vox_cnt_perbin']]
        wedges_df[['bins', 'center', 'masks']] = selection[['XY_wedges', 'XY_center_vox_cnt_perwedge', 'XY_mask_vox_cnt_perwedge']]
        Z_df[['bins', 'center', 'masks']] = selection[['Z_slices', 'Z_center_vox_cnt', 'Z_mask_vox_cnt']]

        dfs = [selection[['dataset', 'image_name']]]
        for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
            single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
                                            df["masks"].values[0][1:-1].split(", "),
                                            df["center"].values[0][1:-1].split(", "))), columns =['bins', 'mask', 'obj']).astype(int)
            # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100
            # if "Z_" in prefix:
            #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)

            if "Z_" in prefix:
                single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
                single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
            single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            sumstats_df = pd.DataFrame()

            s = single_df['bins'].repeat(single_df['obj_norm'])
            sumstats_df['hist_mean']=[s.mean()]
            sumstats_df['hist_median']=[s.median()]
            if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
            else: sumstats_df['hist_mode']=['NaN']
            sumstats_df['hist_min']=[s.min()]
            sumstats_df['hist_max']=[s.max()]
            sumstats_df['hist_range']=[s.max() - s.min()]
            sumstats_df['hist_stdev']=[s.std()]
            sumstats_df['hist_skew']=[s.skew()]
            sumstats_df['hist_kurtosis']=[s.kurtosis()]
            sumstats_df['hist_var']=[s.var()]
            sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
            dfs.append(sumstats_df.reset_index())
        combined_df = pd.concat(dfs, axis=1).drop(columns="index")
        nuc_hist_dfs.append(combined_df)
    dist_center_summary = pd.concat(nuc_hist_dfs, ignore_index=True)
    dist_center_summary.insert(2, column="object", value="nuc")

    dist_summary = pd.concat([dist_org_summary, dist_center_summary], axis=0).set_index(group_by).sort_index()

    ###################
    # add normalization
    ###################
    # organelle area fraction
    area_fractions = []
    for idx in org_summary.index.unique():
        org_vol = org_summary.loc[idx][('volume', 'sum')]
        cell_vol = regions_summary.loc[idx[:-1] + ('cell',)]["volume"]
        afrac = org_vol/cell_vol
        area_fractions.append(afrac)
    org_summary[('volume', 'fraction')] = area_fractions
    # TODO: add in line to reorder the level=0 columns here

    # contact sites volume normalized
    # norm_toA_list = []
    # norm_toB_list = []
    norm_to_list = {}
    for col in contact_summary.index:
        for idx, cha in enumerate(string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]):
            if cha not in norm_to_list:
                norm_to_list[f"{cha}"] = []
            if ((idx+1) <= len(col[-1].split(splitter))):
                norm_to_list[f"{cha}"].append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[idx],)][('volume', 'sum')])
            else:
                norm_to_list[f"{cha}"].append(None)
    for cha in string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]:
        contact_summary[('volume', f'norm_to_{cha}')] = norm_to_list[f"{cha}"]
        # norm_toA_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[0],)][('volume', 'sum')])
        # norm_toB_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[1],)][('volume', 'sum')])
    # contact_summary[('volume', 'norm_to_A')] = norm_toA_list
    # contact_summary[('volume', 'norm_to_B')] = norm_toB_list

    # number and area of individuals organelle involved in contact
    cont_cnt = org_df[group_by]
    cont_cnt[[col.split('_')[0] for col in org_df.columns if col.endswith(("_count"))]] = org_df[[col for col in org_df.columns if col.endswith(("_count"))]].astype(bool)
    cont_cnt_perorg = cont_cnt.groupby(group_by).agg('sum')
    cont_cnt_perorg.columns = pd.MultiIndex.from_product([cont_cnt_perorg.columns, ['count_in']])
    for col in cont_cnt_perorg.columns:
        cont_cnt_perorg[(col[0], 'num_fraction_in')] = cont_cnt_perorg[col].values/org_summary[('volume', 'count')].values
    cont_cnt_perorg.sort_index(axis=1, inplace=True)
    org_summary = pd.merge(org_summary, cont_cnt_perorg, on=group_by, how='outer')


    ###################
    # flatten datasheets and combine
    # TODO: restructure this so that all of the datasheets and unstacked and then reorded based on shared level 0 columns before flattening
    ###################
    # org flattening
    org_final = org_summary.unstack(-1)
    for col in org_final.columns:
        if col[1] in ('count_in', 'num_fraction_in') or col[0].endswith(('_count', '_volume')):
            if col[2] not in col[0]:
                org_final.drop(col,axis=1, inplace=True)
    ########################################################################
    # MAKING new_col_order flexible to work with any organelle input values and combo number
    #######################################################################
    new_col_order = ['dataset', 'image_name', 'object', 'volume', 'surface_area', 'SA_to_volume_ratio', 
                     'equivalent_diameter', 'extent', 'euler_number', 'solidity', 'axis_major_length'] 
    all_combos = []
    for n in list(map(lambda x:x+2, (range(len(all_orgs)-1)))):
            for o in itertools.combinations(all_orgs, n):
                all_combos.append(check_for_existing_combo(o, ctc, splitter))
    combos = [splitter.join(cont) for cont in all_combos]
    for combo in combos:
        new_col_order += [f"{combo}", f"{combo}_count", f"{combo}_volume"]
    new_cols = org_final.columns.reindex(new_col_order, level=0)
    org_final = org_final.reindex(columns=new_cols[0])
    org_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in org_final.columns.to_flat_index()]

    #renaming, filling "NaN" with 0 when needed, and removing ER_std columns
    for col in org_final.columns:
        if '_count_in_' or '_fraction_in_' in col:
            org_final[col] = org_final[col].fillna(0)
        if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
            org_final[col] = org_final[col].fillna(0)
        if col.endswith("_count_volume"):
            org_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
        if col.startswith("ER_std_"):
            org_final.drop(columns=[col], inplace=True)
    org_final = org_final.reset_index()

    # contacts flattened
    contact_final = contact_summary.unstack(-1)
    contact_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in contact_final.columns.to_flat_index()]

    #renaming and filling "NaN" with 0 when needed
    for col in contact_final.columns:
        if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
            contact_final[col] = contact_final[col].fillna(0)
        if col.endswith("_count_volume"):
            contact_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
    contact_final = contact_final.reset_index()

    # distributions flattened
    dist_final = dist_summary.unstack(-1)
    dist_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in dist_final.columns.to_flat_index()]
    dist_final = dist_final.reset_index()

    # regions flattened & normalization added
    regions_final = regions_summary.unstack(-1)
    regions_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in regions_final.columns.to_flat_index()]
    regions_final['nuc_area_fraction'] = regions_final['nuc_volume'] / regions_final['cell_volume']
    regions_final = regions_final.reset_index()

    # combining them all
    combined = pd.merge(org_final, contact_final, on=["dataset", "image_name"], how="outer")
    combined = pd.merge(combined, dist_final, on=["dataset", "image_name"], how="outer")
    combined = pd.merge(combined, regions_final, on=["dataset", "image_name"], how="outer").set_index(["dataset", "image_name"])
    combined.columns = [col.replace('sum', 'total') for col in combined.columns]

    ###################
    # export summary sheets
    ###################
    org_summary.to_csv(out_path + f"/{out_preffix}per_org_summarystats.csv")
    contact_summary.to_csv(out_path + f"/{out_preffix}per_contact_summarystats.csv")
    dist_summary.to_csv(out_path + f"/{out_preffix}distribution_summarystats.csv")
    regions_summary.to_csv(out_path + f"/{out_preffix}per_region_summarystats.csv")
    combined.to_csv(out_path + f"/{out_preffix}summarystats_combined.csv")

    print(f"Processing of {fl_count} files from {ds_count} dataset(s) is complete.")
    return f"{fl_count} files from {ds_count} dataset(s) were processed"

In [None]:
_batch_summary_stats([str(out_data_path)],
                     str(out_data_path),
                     f"{datetime.today().strftime('%Y%m%d')}_skel_test",
                     splitter="_")

In [None]:
def _batch_summary_stats_debug(csv_path_list: List[str],
                         out_path: str,
                         out_preffix: str,
                         splitter: str='X'):
    """" 
    csv_path_list: List[str],
        A list of path strings where .csv files to analyze are located.
    out_path: str,
        A path string where the summary data file will be output to
    out_preffix: str
        The prefix used to name the output file.    
    """
    ds_count = 0
    fl_count = 0
    ###################
    # Read in the csv files and combine them into one of each type
    ###################
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []

    for loc in csv_path_list:
        print(loc)
        ds_count = ds_count + 1
        loc=Path(loc)
        files_store = sorted(loc.glob("*.csv"))
        for file in files_store:
            fl_count = fl_count + 1
            stem = file.stem

            org = "organelles"
            contacts = "contacts"
            dist = "distributions"
            regions = "regions"

            if org in stem:
                test_orgs = pd.read_csv(file, index_col=0)
                test_orgs.insert(0, "dataset", stem[:-11])
                org_tabs.append(test_orgs)
            if contacts in stem:
                test_contact = pd.read_csv(file, index_col=0)
                test_contact.insert(0, "dataset", stem[:-9])
                contact_tabs.append(test_contact)
            if dist in stem:
                test_dist = pd.read_csv(file, index_col=0)
                test_dist.insert(0, "dataset", stem[:-14])
                dist_tabs.append(test_dist)
            if regions in stem:
                test_regions = pd.read_csv(file, index_col=0)
                test_regions.insert(0, "dataset", stem[:-8])
                region_tabs.append(test_regions)
            
    org_df = pd.concat(org_tabs,axis=0, join='outer')
    contacts_df = pd.concat(contact_tabs,axis=0, join='outer')
    dist_df = pd.concat(dist_tabs,axis=0, join='outer')
    regions_df = pd.concat(region_tabs,axis=0, join='outer')
    ##########################
    # List organelles in cell
    ###########################
    all_orgs = list(set(org_df.loc[:, 'object'].tolist()))

    ###################
    # adding new metrics to the original sheets
    ###################
    # TODO: include these labels when creating the original sheets
    contact_cnt = contacts_df[["dataset", "image_name", "object", "label", "volume"]]
    ctc = contact_cnt["object"].values.tolist()
    ##############################################################################
    #  Creating New methods of storing A & B
    ###############################################################################
    # len(max(contact_cnt["object"].str.split('X'), key=len))) provides max number of organelles involved in contact
    contact_cnt[[f"org{cha}" for cha in string.ascii_uppercase[:(len(max(contact_cnt["object"].str.split(splitter), key=len)))]]] = contact_cnt["object"].str.split(splitter, expand=True)
    contact_cnt[[f"{cha}_ID" for cha in string.ascii_uppercase[:(len(max(contact_cnt["label"].str.split('_'), key=len)))]]] = contact_cnt["label"].str.split('_', expand=True)
    #iterating from a to val
    unstacked_cont = []
    for cha in string.ascii_uppercase[:len(max(contact_cnt["object"].str.split(splitter), key=len))]:
        valid = (contact_cnt[f"org{cha}"] != None) & (contact_cnt[f"{cha}_ID"] != None)
        contact_cnt[f"{cha}"] = None
        contact_cnt.loc[valid, f"{cha}"] = contact_cnt[f"org{cha}"] + "_" + contact_cnt[f"{cha}_ID"]
        contact_cnt_percell = contact_cnt[["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object", "volume"]].groupby(["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object"]).agg(["count", "sum"])
        contact_cnt_percell.columns = ["_".join(col_name).rstrip('_') for col_name in contact_cnt_percell.columns.to_flat_index()]
        unstacked = contact_cnt_percell.unstack(level='object')
        unstacked.columns = ["_".join(col_name).rstrip('_') for col_name in unstacked.columns.to_flat_index()]
        unstacked = unstacked.reset_index()
        for col in unstacked.columns:
            if col.startswith("volume_count_"):
                newname = col.split("_")[-1] + "_count"
                unstacked.rename(columns={col:newname}, inplace=True)
            if col.startswith("volume_sum_"):
                newname = col.split("_")[-1] + "_volume"
                unstacked.rename(columns={col:newname}, inplace=True)
        unstacked.rename(columns={f"org{cha}":"object", f"{cha}_ID":"label"}, inplace=True)
        unstacked.set_index(['dataset', 'image_name', 'object', 'label'])    
        unstacked_cont.append(unstacked)

    return(unstacked_cont, string.ascii_uppercase[:len(max(contact_cnt["object"].str.split(splitter), key=len))])
    # contact_cnt = pd.concat(unstacked_cont, axis=0).sort_index(axis=0)
    # contact_cnt = contact_cnt.groupby(['dataset', 'image_name', 'object', 'label']).sum().reset_index()                 #adds together all duplicates at the index, then resets the index
    # contact_cnt['label']=contact_cnt['label'].astype("Int64")  
    # org_df = pd.merge(org_df, contact_cnt, how='left', on=['dataset', 'image_name', 'object', 'label'], sort=True)
    # org_df[contact_cnt.columns] = org_df[contact_cnt.columns].fillna(0)

    # ###################
    # # summary stat group
    # ###################
    # group_by = ['dataset', 'image_name', 'object']
    # sharedcolumns = ["SA_to_volume_ratio", "equivalent_diameter", "extent", "euler_number", "solidity", "axis_major_length"]
    # ag_func_standard = ['mean', 'median', 'std']

    # ###################
    # # summarize shared measurements between org_df and contacts_df
    # ###################
    # org_cont_tabs = []
    # for tab in [org_df, contacts_df]:
    #     tab1 = tab[group_by + ['volume']].groupby(group_by).agg(['count', 'sum'] + ag_func_standard)
    #     tab2 = tab[group_by + ['surface_area']].groupby(group_by).agg(['sum'] + ag_func_standard)
    #     tab3 = tab[group_by + sharedcolumns].groupby(group_by).agg(ag_func_standard)
    #     shared_metrics = pd.merge(tab1, tab2, 'outer', on=group_by)
    #     shared_metrics = pd.merge(shared_metrics, tab3, 'outer', on=group_by)
    #     org_cont_tabs.append(shared_metrics)

    # org_summary = org_cont_tabs[0]
    # contact_summary = org_cont_tabs[1]

    # ###################
    # # group metrics from regions_df similar to the above
    # ###################
    # regions_summary = regions_df[group_by + ['volume', 'surface_area'] + sharedcolumns].set_index(group_by)

    # ###################
    # # summarize extra metrics from org_df
    # ###################
    # columns2 = [col for col in org_df.columns if col.endswith(("_count", "_volume"))]
    # contact_counts_summary = org_df[group_by + columns2].groupby(group_by).agg(['sum'] + ag_func_standard)
    # org_summary = pd.merge(org_summary, contact_counts_summary, 'outer', on=group_by)#left_on=group_by, right_on=True)

    # ###################
    # # summarize distribution measurements
    # ###################
    # # organelle distributions
    # hist_dfs = []
    # for ind in dist_df.index:
    #     selection = dist_df.loc[[ind]]
    #     bins_df = pd.DataFrame()
    #     wedges_df = pd.DataFrame()
    #     Z_df = pd.DataFrame()

    #     bins_df[['bins', 'masks', 'obj']] = selection[['XY_bins', 'XY_mask_vox_cnt_perbin', 'XY_obj_vox_cnt_perbin']]
    #     wedges_df[['bins', 'masks', 'obj']] = selection[['XY_wedges', 'XY_mask_vox_cnt_perwedge', 'XY_obj_vox_cnt_perwedge']]
    #     Z_df[['bins', 'masks', 'obj']] = selection[['Z_slices', 'Z_mask_vox_cnt', 'Z_obj_vox_cnt']]

    #     dfs = [selection[['dataset', 'image_name', 'object']].reset_index()]
    #     for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
    #         single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
    #                                         df["obj"].values[0][1:-1].split(", "), 
    #                                         df["masks"].values[0][1:-1].split(", "))), columns =['bins', 'obj', 'mask']).astype(int)

    #         if "Z_" in prefix:
    #             single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
    #             single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
    #         single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
    #         single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
    #         single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

    #         # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
    #         # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
    #         # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

    #         # if "Z_" in prefix:
    #         #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)


    #         sumstats_df = pd.DataFrame()

    #         s = single_df['bins'].repeat(single_df['obj_norm'])
    #         sumstats_df['hist_mean']=[s.mean()]
    #         sumstats_df['hist_median']=[s.median()]
    #         if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
    #         else: sumstats_df['hist_mode']=['NaN']
    #         sumstats_df['hist_min']=[s.min()]
    #         sumstats_df['hist_max']=[s.max()]
    #         sumstats_df['hist_range']=[s.max() - s.min()]
    #         sumstats_df['hist_stdev']=[s.std()]
    #         sumstats_df['hist_skew']=[s.skew()]
    #         sumstats_df['hist_kurtosis']=[s.kurtosis()]
    #         sumstats_df['hist_var']=[s.var()]
    #         sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
    #         dfs.append(sumstats_df.reset_index())
    #     combined_df = pd.concat(dfs, axis=1).drop(columns="index")
    #     hist_dfs.append(combined_df)
    # dist_org_summary = pd.concat(hist_dfs, ignore_index=True)

    # # nucleus distribution
    # nuc_dist_df = dist_df[["dataset", "image_name", 
    #                     "XY_bins", "XY_center_vox_cnt_perbin", "XY_mask_vox_cnt_perbin",
    #                     "XY_wedges", "XY_center_vox_cnt_perwedge", "XY_mask_vox_cnt_perwedge",
    #                     "Z_slices", "Z_center_vox_cnt", "Z_mask_vox_cnt"]].set_index(["dataset", "image_name"])
    # nuc_hist_dfs = []
    # for idx in nuc_dist_df.index.unique():
    #     selection = nuc_dist_df.loc[idx].iloc[[0]].reset_index()
    #     bins_df = pd.DataFrame()
    #     wedges_df = pd.DataFrame()
    #     Z_df = pd.DataFrame()

    #     bins_df[['bins', 'center', 'masks']] = selection[['XY_bins', 'XY_center_vox_cnt_perbin', 'XY_mask_vox_cnt_perbin']]
    #     wedges_df[['bins', 'center', 'masks']] = selection[['XY_wedges', 'XY_center_vox_cnt_perwedge', 'XY_mask_vox_cnt_perwedge']]
    #     Z_df[['bins', 'center', 'masks']] = selection[['Z_slices', 'Z_center_vox_cnt', 'Z_mask_vox_cnt']]

    #     dfs = [selection[['dataset', 'image_name']]]
    #     for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
    #         single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
    #                                         df["masks"].values[0][1:-1].split(", "),
    #                                         df["center"].values[0][1:-1].split(", "))), columns =['bins', 'mask', 'obj']).astype(int)
    #         # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
    #         # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
    #         # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100
    #         # if "Z_" in prefix:
    #         #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)

    #         if "Z_" in prefix:
    #             single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
    #             single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
    #         single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
    #         single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
    #         single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

    #         sumstats_df = pd.DataFrame()

    #         s = single_df['bins'].repeat(single_df['obj_norm'])
    #         sumstats_df['hist_mean']=[s.mean()]
    #         sumstats_df['hist_median']=[s.median()]
    #         if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
    #         else: sumstats_df['hist_mode']=['NaN']
    #         sumstats_df['hist_min']=[s.min()]
    #         sumstats_df['hist_max']=[s.max()]
    #         sumstats_df['hist_range']=[s.max() - s.min()]
    #         sumstats_df['hist_stdev']=[s.std()]
    #         sumstats_df['hist_skew']=[s.skew()]
    #         sumstats_df['hist_kurtosis']=[s.kurtosis()]
    #         sumstats_df['hist_var']=[s.var()]
    #         sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
    #         dfs.append(sumstats_df.reset_index())
    #     combined_df = pd.concat(dfs, axis=1).drop(columns="index")
    #     nuc_hist_dfs.append(combined_df)
    # dist_center_summary = pd.concat(nuc_hist_dfs, ignore_index=True)
    # dist_center_summary.insert(2, column="object", value="nuc")

    # dist_summary = pd.concat([dist_org_summary, dist_center_summary], axis=0).set_index(group_by).sort_index()

    # ###################
    # # add normalization
    # ###################
    # # organelle area fraction
    # area_fractions = []
    # for idx in org_summary.index.unique():
    #     org_vol = org_summary.loc[idx][('volume', 'sum')]
    #     cell_vol = regions_summary.loc[idx[:-1] + ('cell',)]["volume"]
    #     afrac = org_vol/cell_vol
    #     area_fractions.append(afrac)
    # org_summary[('volume', 'fraction')] = area_fractions
    # # TODO: add in line to reorder the level=0 columns here

    # # contact sites volume normalized
    # # norm_toA_list = []
    # # norm_toB_list = []
    # norm_to_list = {}
    # for col in contact_summary.index:
    #     for idx, cha in enumerate(string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]):
    #         if cha not in norm_to_list:
    #             norm_to_list[f"{cha}"] = []
    #         if ((idx+1) <= len(col[-1].split(splitter))):
    #             norm_to_list[f"{cha}"].append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[idx],)][('volume', 'sum')])
    #         else:
    #             norm_to_list[f"{cha}"].append(None)
    # for cha in string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]:
    #     contact_summary[('volume', f'norm_to_{cha}')] = norm_to_list[f"{cha}"]
    #     # norm_toA_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[0],)][('volume', 'sum')])
    #     # norm_toB_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[1],)][('volume', 'sum')])
    # # contact_summary[('volume', 'norm_to_A')] = norm_toA_list
    # # contact_summary[('volume', 'norm_to_B')] = norm_toB_list

    # # number and area of individuals organelle involved in contact
    # cont_cnt = org_df[group_by]
    # cont_cnt[[col.split('_')[0] for col in org_df.columns if col.endswith(("_count"))]] = org_df[[col for col in org_df.columns if col.endswith(("_count"))]].astype(bool)
    # cont_cnt_perorg = cont_cnt.groupby(group_by).agg('sum')
    # cont_cnt_perorg.columns = pd.MultiIndex.from_product([cont_cnt_perorg.columns, ['count_in']])
    # for col in cont_cnt_perorg.columns:
    #     cont_cnt_perorg[(col[0], 'num_fraction_in')] = cont_cnt_perorg[col].values/org_summary[('volume', 'count')].values
    # cont_cnt_perorg.sort_index(axis=1, inplace=True)
    # org_summary = pd.merge(org_summary, cont_cnt_perorg, on=group_by, how='outer')


    # ###################
    # # flatten datasheets and combine
    # # TODO: restructure this so that all of the datasheets and unstacked and then reorded based on shared level 0 columns before flattening
    # ###################
    # # org flattening
    # org_final = org_summary.unstack(-1)
    # for col in org_final.columns:
    #     if col[1] in ('count_in', 'num_fraction_in') or col[0].endswith(('_count', '_volume')):
    #         if col[2] not in col[0]:
    #             org_final.drop(col,axis=1, inplace=True)
    # ########################################################################
    # # MAKING new_col_order flexible to work with any organelle input values and combo number
    # #######################################################################
    # new_col_order = ['dataset', 'image_name', 'object', 'volume', 'surface_area', 'SA_to_volume_ratio', 
    #                  'equivalent_diameter', 'extent', 'euler_number', 'solidity', 'axis_major_length'] 
    # all_combos = []
    # for n in list(map(lambda x:x+2, (range(len(all_orgs)-1)))):
    #         for o in itertools.combinations(all_orgs, n):
    #             all_combos.append(check_for_existing_combo(o, ctc, splitter))
    # combos = [splitter.join(cont) for cont in all_combos]
    # for combo in combos:
    #     new_col_order += [f"{combo}", f"{combo}_count", f"{combo}_volume"]
    # new_cols = org_final.columns.reindex(new_col_order, level=0)
    # org_final = org_final.reindex(columns=new_cols[0])
    # org_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in org_final.columns.to_flat_index()]

    # #renaming, filling "NaN" with 0 when needed, and removing ER_std columns
    # for col in org_final.columns:
    #     if '_count_in_' or '_fraction_in_' in col:
    #         org_final[col] = org_final[col].fillna(0)
    #     if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
    #         org_final[col] = org_final[col].fillna(0)
    #     if col.endswith("_count_volume"):
    #         org_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
    #     if col.startswith("ER_std_"):
    #         org_final.drop(columns=[col], inplace=True)
    # org_final = org_final.reset_index()

    # # contacts flattened
    # contact_final = contact_summary.unstack(-1)
    # contact_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in contact_final.columns.to_flat_index()]

    # #renaming and filling "NaN" with 0 when needed
    # for col in contact_final.columns:
    #     if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
    #         contact_final[col] = contact_final[col].fillna(0)
    #     if col.endswith("_count_volume"):
    #         contact_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
    # contact_final = contact_final.reset_index()

    # # distributions flattened
    # dist_final = dist_summary.unstack(-1)
    # dist_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in dist_final.columns.to_flat_index()]
    # dist_final = dist_final.reset_index()

    # # regions flattened & normalization added
    # regions_final = regions_summary.unstack(-1)
    # regions_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in regions_final.columns.to_flat_index()]
    # regions_final['nuc_area_fraction'] = regions_final['nuc_volume'] / regions_final['cell_volume']
    # regions_final = regions_final.reset_index()

    # # combining them all
    # combined = pd.merge(org_final, contact_final, on=["dataset", "image_name"], how="outer")
    # combined = pd.merge(combined, dist_final, on=["dataset", "image_name"], how="outer")
    # combined = pd.merge(combined, regions_final, on=["dataset", "image_name"], how="outer").set_index(["dataset", "image_name"])
    # combined.columns = [col.replace('sum', 'total') for col in combined.columns]

    # ###################
    # # export summary sheets
    # ###################
    # org_summary.to_csv(out_path + f"/{out_preffix}per_org_summarystats.csv")
    # contact_summary.to_csv(out_path + f"/{out_preffix}per_contact_summarystats.csv")
    # dist_summary.to_csv(out_path + f"/{out_preffix}distribution_summarystats.csv")
    # regions_summary.to_csv(out_path + f"/{out_preffix}per_region_summarystats.csv")
    # combined.to_csv(out_path + f"/{out_preffix}summarystats_combined.csv")

    # print(f"Processing of {fl_count} files from {ds_count} dataset(s) is complete.")
    # return f"{fl_count} files from {ds_count} dataset(s) were processed"

In [None]:
xx, yy = _batch_summary_stats_debug([str(out_data_path)],
                     str(out_data_path),
                     f"{datetime.today().strftime('%Y%m%d')}_skel_test",
                     splitter="_")

In [None]:
pd.concat(unstacked_cont, axis=0).sort_index(axis=0)

In [None]:
pd.concat([xx[3], xx[4]], ignore_index=True)

In [None]:
xx[1]

In [None]:
xx[0]

In [None]:
xx[2]

In [None]:
pd.merge(xx[0])

## **OUTPUT**

### **Branch Output**

#### **Column Reference for Branch Table**

- `skel-obj-id` : the numeric ID assigned to the skeleton object to which the branch belongs, this ID is also the same as the label of organelle object the skeleton object represents
- `node-id-src` : the numeric point ID assigned to the source node of the branch
- `node-id-dst` : the numeric point ID assigned to the destination node of the branch
- `deg-src` : the source node's degree of connectivity
- `deg-dst`: the destination node's degree of connectivity
- `branch-distance*` : the length of the branch
- `branch-type` : the type of branch, dependent on node behavior:
###### 0 - endpoint to endpoint | 1 - junction to endpoint | 2 - junction to junction | 3 - cycle
- `image-coord-src-0` : the z-coordinate of the source node
- `image-coord-src-1` : the y-coordinate of the source node
- `image-coord-src-2` : the x-coordinate of the source node
- `image-coord-dst-0` : the z-coordinate of the destination
- `image-coord-dst-1` : the y-coordinate of the destination
- `image-coord-dst-2` : the x-coordinate of the destination
- `coord-src-0*` : the z-coordinate of the source node
- `coord-src-1*` : the y-coordinate of the source node
- `coord-src-2*` : the x-coordinate of the source node
- `coord-dst-0*` : the z-coordinate of the destination node
- `coord-dst-1*` : the y-coordinate of the destination node
- `coord-dst-2*` : the x-coordinate of the destination node
- `euclidean-distance*` : The length of the line segment between the source and destination node
- `str-prop` : A proportion measuring the straightness of the branch, it is calculated by (euclidean-distance / branch-distance)

###### The branch table is the only table where the branch-id column is redundant as the row index value is equivalent to the branch-id

###### * Measurements affected by scale, thus values are in real world units (microns) terms  if a scale is not provided, these measurements will be in voxel terms

In [None]:
_branch_table.head(20)

In [None]:
_branch_table.tail(20)

### **Node Output**

#### **Column Reference for Node Table**

- `node-id` : the numeric point ID assigned to the node
- `node-type` : the node type dependent on the degree of connectivity
- `connectivity`: the number of points the node contacts in voxel space (degree of connectivity)
- `image-coord-0` : the z-coordinate of the node
- `image-coord-1` : the y-coordinate of the node
- `image-coord-2` : the x-coordinate of the node
- `coord-0*` : the z-coordinate of the node
- `coord-1*` : the y-coordinate of the node
- `coord-2*` : the x-coordinate of the node
- `branch-id(s)` : a list consisting of the numeric ID(s) of the branch or branches the that node is a part of
- `skel-obj-id` : the numeric ID assigned to the skeleton object of which the node belongs to, this ID is also the same as the label of organelle object the skeleton object represents

###### * Measurements affected by scale, thus values are in real world units (microns) terms  if a scale is not provided, these measurements will be in voxel terms

In [None]:
_node_table.head(20)

In [None]:
_node_table.tail(20)

### **Skeleton Object Output**

#### **Column Reference for Skeleton Object Table**

- `skel-obj-id` : the numeric ID assigned to the skeleton object, this ID is also the same as the label of organelle object this skeleton object represents
- `skel-type` : the classification of the skeleton object based on its branch behavior
- `skel-type-num` : the numeric classification of the skeleton object based on its branch behavior
- `brh-count` : the number of branches the skeleton object contains
- `branch-id(s)` : a list consisting of the numeric ID(s) of the branch or branches that the skeleton object contains
- `min-brh-length*` : the minimum branch length of the branches within the skeleton
- `max-brh-length*` : the maximum branch length of the branches within the skeleton
- `ave-brh-length*` : the mean of the branch lengths within the skeleton
- `sd-brh-length*` : the standard deviation of the branch lengths within the skeleton
- `med-brh-length*` : the median of the branch lengths within the skeleton
- `total-length*` : the sum of the branch lengths within the skeleton object
- `brh-type-0-tot` : the number of type 0 (endpoint to endpoint) branches the skeleton object contains (maximum is 1)
- `brh-type-0-ids` : the numeric ID of type 0 branch the skeleton object contains (if it contains one)
- `brh-type-1-tot` : the number of type 1 (junction to endpoint) branches the skeleton object contains
- `brh-type-1-ids` : the numeric ID(s) of the type 1 branch or branches the skeleton object contains
- `brh-type-2-tot` : the number of type 2 (junction to junction) branches the skeleton object contains
- `brh-type-2-ids` : the numeric ID(s) of the type 2 branch or branches the skeleton object contains
- `brh-type-3-tot` : the number of type 3 (cycle) branches the skeleton object contains
- `brh-type-3-ids` : the numeric ID(s) of the type 3 branch or branches the skeleton object contains
- `node-count` : the number of nodes within the skeleton object
- `ep-count` : the number of endpoints within the skeleton object
- `jn-count` : the number of junction nodes within the skeleton object
- `ave-jn-deg` : the average degree of the junction nodes in the skeleton object
- `max-deg` : the maximum degree of connectivity of the nodes in the skeleton object
- `node-id(s)` : the numeric point ID(s) of the nodes within the skeleton object
- `mean-brh-str` : the mean branch straightness proportion of the branches in the skeleton object

###### * Measurements affected by scale, thus values are in real world units (microns) terms  if a scale is not provided, these measurements will be in voxel terms

In [None]:
_skel_table.head(20)

In [None]:
_skel_table.tail(20)

### **Skeleton Summary Output**

#### **Column Reference for Skeleton Object Table**

- `total-length*` : combined length of all the branches in the organelle skeleton
- `point-count` : the amount of voxels throughout the entirety of the organelle skeleton

-- Skeleton Object section --

- `skel-obj-count` : the amount of skeleton objects in the organelle skeleton
- `punc-count` : the amount of punctates in the organelle skeleton
- `rod-count` : the amount of rods in the organelle skeleton
- `net-count` : the amount of networks in the organelle skeleton
- `prop-obj-punc` : the proportion of skeleton objects that are punctates
- `prop-obj-rod` : the proportion of skeleton objects that are rods
- `prop-obj-net` : the proportion of skeleton objects that are networks
- `punc-tot-len*` : the total length of the punctates (only the non-absolute punctates provide length)
- `rod-tot-len*` : the total length of the rods
- `net-tot-len*` : the total length of the networks
- `prop-len-punc` : the proportion of the organelle skeleton's length that are from punctates (only the non-absolute punctates provide length)
- `prop-len-rod` : the proportion of the organelle skeleton's length that are from rods
- `prop-len-net` : the proportion of the organelle skeleton's length that are from networks
- `ave-len-obj` : the average total length of the skeleton objects
- `min-len-obj` : the minimum total length of the skeleton objects
- `max-len-obj` : the maximum total length of the skeleton objects
- `ave-brh-obj` : the average amount of branches per skeleton object
- `min-brh-obj` : the minimum amount of branches per skeleton object
- `max-brh-obj` : the maximum amount of branches per skeleton object

-- Branch section --

- `brh-count` : the total amount of branches in the organelle skeleton
- `min-brh-len*` : length of the shortest branch in the organelle skeleton
- `max-brh-len*` : length of the longest branch in the organelle skeleton
- `ave-brh-len*` : the mean branch length in the organelle skeleton
- `type-0-brhs` : the amount of type 0 (endpoint to endpoint) branches
- `type-1-brhs` : the amount of type 1 (junction to endpoint) branches
- `type-2-brhs` : the amount of type 2 (junction to junction) branches
- `type-3-brhs` : the amount of type 3 (cycle) branches
- `prop-brh-t0` : the proportion of branches in the organelle skeleton that are type 0 branches
- `prop-brh-t1` : the proportion of branches in the organelle skeleton that are type 1 branches
- `prop-brh-t2` : the proportion of branches in the organelle skeleton that are type 2 branches
- `prop-brh-t3` : the proportion of branches in the organelle skeleton that are type 3 branches
- `t0-brh-len*` : combined length of all type 0 branches
- `t1-brh-len*` : combined length of all type 1 branches
- `t2-brh-len*` : combined length of all type 2 branches
- `t3-brh-len*` : combined length of all type 3 branches
- `prop-len-t0` : the proportion of the organelle skeleton's length that is from type 0 branches
- `prop-len-t1` : the proportion of the organelle skeleton's length that is from type 1 branches
- `prop-len-t2` : the proportion of the organelle skeleton's length that is from type 2 branches
- `prop-len-t3` : the proportion of the organelle skeleton's length that is from type 3 branches

-- Node section --

- `node-count` : the amount of nodes in the organelle skeleton
- `ave-deg-nodes` : the average degree of connectivity for the nodes in the organelle skeleton
- `ep-count` : the amount of endpoint nodes in the organelle skeleton
- `jn-count` : the amount of junction nodes in the organelle skeleton
- `ap-count` : the amount of absolute punctates in the organelle skeleton
- `prop-ep` : the proportion of the nodes that are endpoints
- `prop-jn` : the proportion of the nodes that are junction nodes
- `prop-ap` : the proportion of the nodes that are absolute punctates

###### * Measurements affected by scale, thus values are in real world units (microns) terms  if a scale is not provided, these measurements will be in voxel terms

In [None]:
_skel_sum_table

### **`_get_org_morphology_3D`** Output

#### **Column Reference for `_get_org_morphology_3D` output**

-- Default morphology section --

- `object`: the shorted name of the organelle being observed
- `label`: the numeric ID assigned to the organelle object
- `scale*`: the real world dimensions of each voxel in the microscopy image (ZYX in which each dimension is measured in microns)
- `centroid-0*`: the Z coordinate of the centroid (center of mass) of the organelle object
- `centroid-1*`: the Y coordinate of the centroid (center of mass) of the organelle object
- `centroid-2*`: the X coordinate of the centroid (center of mass) of the organelle object
- `bbox-0`: the minimum Z coordinate value of the cuboid that bounds the organelle object, lowest Z of the bounding box
- `bbox-1`: the minimum Y coordinate value of the cuboid that bounds the organelle object, lowest Y of the bounding box
- `bbox-2`: the minimum X coordinate value of the cuboid that bounds the organelle object, lowest X of the bounding box
- `bbox-3`: the maximum Z coordinate value of the cuboid that bounds the organelle object, highest Z of the bounding box
- `bbox-4`: the maximum Y coordinate value of the cuboid that bounds the organelle object, highest Y of the bounding box
- `bbox-5`: the maximum X coordinate value of the cuboid that bounds the organelle object, highest X of the bounding box
- `surface_area*`: the estimated area of the organelle object's surface (after triangulation and interpolation using the marching cubes method)
- `volume*`: the amount of space covered by the organelle object
- `SA_to_volume_ratio*`: the surface area divided by the volume of the organelle object, given as a ratio
- `equivalent_diameter*`: the diameter of a perfect sphere with the same volume as the organelle object
- `extent`: the proportion of the bounding cuboid/box filled by the organelle object
- `euler_number`: the value is derived from the formula ***χ = V-E+F-C*** where **V** is the number of **verticies**, **E** is the number of **edges**, **F** is the number of **faces**  and **C** is the number of **disconnected components**
- `solidity`: the proportion of the convex hull filled in by the organelle object; the convex hull being the smallest convex polygon that contains all of the voxels in the organelle object
- `axis_major_length*`: The length of the major axis of the ellipsoid that shares the same second central moments as the organelle object
- `min_intensity`: the minimum intensity/signal within the region of the organelle object
- `max_intensity`: the maximum intensity/signal within the region of the organelle object
- `mean_intensity` : the mean intensity/signal within the region of the organelle object
- `standard_deviation_intensity`: the standard deviation of the intensity/signal values within the region of the organelle object

-- Skeleton section --

- `skel-type` - the classification of the skeleton object based on its branch behavior
- `skel-brh-count` - the amount of branches the skeleton object contains
- `skel-min-brh-length*` - the minimum branch length of the branches within the skeleton
- `skel-max-brh-length*` - the maximum branch length of the branches within the skeleton
- `skel-ave-brh-length*` - the mean of the branch lengths within the skeleton
- `skel-sd-brh-length*` - the standard deviation of the branch lengths within the skeleton
- `skel-med-brh-length*` - the median of the branch lengths within the skeleton
- `skel-total-length*` - the sum of the branch lengths within the skeleton object
- `skel-node-count` - the amount of nodes within the skeleton object
- `skel-ep-count` - the amount of endpoints within the skeleton object
- `skel-jn-count` - the amount of junction nodes within the skeleton object
- `skel-ave-jn-deg` - the average degree of the junction nodes in the skeleton object
- `skel-max-deg` - the maximum degree of connectivity of the nodes in the skeleton object
- `skel-mean-brh-str` - the mean branch straightness proportion of the branches in the skeleton object

In [None]:
get_morph_table

## **CONCLUSION**

###### Although this is all only the beginning in terms of the effectiveness of skeletonization analysis for three-dimensional anisotrophic data, there is a lot of potential. With more testing, corrections as well as new ideas will emerge to further improve the project. More specifically, there could be an additional metric depicting the complexity of a network. Regardless the future looks bright for skeletonization.