# SCohenLab 2D  Image Processing Workflows notebook (Simplified MCZ)

--------------
# WORKFLOWS OVERVIEW
In order to manage efficient batch processing and UI interaction via Napari we want to leverage `aics-segmentation`s `workflow` tools.  (See also [`napari-allencell-segmenter`](https://github.com/AllenCell/napari-allencell-segmenter)).   This notebook tests out how to use these tools to build some workflows which can be eaisily adjusted via a napari plugin.


## Workflow Levels

> BATCH - an "experiments" worth of multi-channel images
> > IMAGE - a single multi-channel image
> > > STRUCTURE CHANNEL - a single (or aggregate) channel 
> > > > STEPS / CATEGORIES - the steps to perform

## "Other" organizing principles

Categories
- Pre
- Core
- Post 
- Post-post

IO
- import: reads file and returns array image
- choose:  i.e. a single Z-slice or ROI -> returns slices, coordinates, indixes
- extract: i.e. select structur channel or create aggregate signal -> returns array image
- export: creates file
- quantify:  returns table of stats





## 2D Workflow example 
- IMAGE
  - 𝟘  _import_ image from .czi file
    - input: file_path
    - output: 4D array, metadata
  - 𝟙  _extract_ optimal Z-slice from 
    - input: 4D array (CH,Z,X,Y)
    - output: 4D array with "chosen" `z_opt` (7,1,nX,nY)
  - 𝟚  binarize STRUCTURE Channel workflows
     - 1️⃣ _SOMA_ workflow
       - input: 4D array (CH,Z,X,Y)  
       - output: SOMA mask (1,1,nX,nY)
     - 2️⃣ _NUCLEI_ workflow
       - input: 4D array (CH,Z,X,Y)  
       - output: NU object(1,1,nX,nY)
     - 3️⃣ _CYTOSOL_ workflow
       - input: SOMA_mask, NU_mask
       - output: CYTO mask (1,1,nX,nY)
     - 4️⃣ _LYSOSOMES_ workflow
        - input: 4D array (CH,Z,X,Y),CYTO_mask
        - output:  LY object
     - 5️⃣ _MITOCHONDRIA_ workflow
        - input: 4D array (CH,z_slice,X,Y),CYTO_mask
        - output:  MT object
     - 6️⃣ _GOLGI complex_ workflow
        - input: 4D array (CH,Z,X,Y),CYTO_mask
        - output:  GL object
     - 7️⃣ _PEROXISOMES_ workflow
        - input: 4D array (CH,Z,X,Y),CYTO_mask
        - output:  PX object
     - 8️⃣ _ENDOPLASMIC RETICULUM_ workflow
        - input: 4D array (CH,Z,X,Y),CYTO_mask
        - output:  ER object
     - 9️⃣ _LIPID BODY_ workflow
        - input: 4D array (CH,Z,X,Y),CYTO_mask
        - output:  LB object 
  - 𝟛  export_ binarized organelle objects
  - 4  quantify binarized organelles





## CHOOSE Z workflow examples
𝟙  _CHOOSE_ optimal Z-slice from 
- inputs: 4D array (CH,Z,X,Y)
 - EXTRACT:
   - create SIG "non-nuclei" signal by adding ch 1-6 
   - create NU signal aggregate
   - return: soma(1,1,nX,nY),
- PRE
    - NU: intensity normalization
    - NU: smoothing
- CORE
  - NU: segmentation
- POST
  - mask total signal to set segmented nuclei pixels to 0
- POST-POST
  - sum over X & Y
  - choose maximum Z
- output:  Z_opt slice



## Structure Channel workflow examples
1️⃣ _SOMA_ workflow
 - inputs: 4D array (CH,Z,X,Y)
 - EXTRACT:
     - create "soma" CH aggregate
     - return: soma(1,1,nX,nY),
  - PRE
    - intensity normalization
    - smoothing
  - CORE
    - segmentation
  - POST
    - filter binarized objects
  - output:  MT object


5️⃣ _MITOCHONDRIA_ workflow
 - inputs: 4D array (CH,Z,X,Y), CYTO_mask
  - EXTRACT:
     - choose mitochondria CH
     - return: (1,1,nX,nY)
  - PRE
    - intensity normalization
    - smoothing
  - CORE
    - segmentation
  - POST
    - filter binarized objects
  - output:  MT object


## ❸. IMAGE PROCESSING ⚙️🩻🔬
### INFERENCE OF SUB-CELLULAR OBJECTS
The imported images have already been pre-processed to transform the 32 channel spectral measuremnts into "linearly unmixed" images which estimate independently labeled sub-cellular components.  Thes 7 channels (plus a residual "non-linear" signal) will be used to infer the shapes and extents of these sub-cellular components.   
We will perform computational image analysis on the pictures (in 2D an 3D) to _segment_ the components of interest for measurement.  In other prcoedures we can used these labels as "ground truth" labels to train machine learning models to automatically perform the inference of these objects.
Pseudo-independent processing of the imported multi-channel image to acheive each of the 9 objecives stated above.  i.e. infering: NUCLEI, SOMA, CYTOSOL, LYSOSOME, MITOCHONDRIA, GOLGI COMPLEX, PEROZISOMES, ENDOPLASMIC RETICULUM, and LIPID BODIES

### General flow for infering objects via segmentation
- Pre-processing 🌒
- Core-processing (thresholding) 🌕
- Post-processing  🌘

### QC 🚧 WIP 🚧 




## ❹. QUANTIFICATION 📏📐🧮

SUBCELLULAR COMPONENT METRICS
-  extent 
-  size
-  shape
-  position



### NOTE: PIPELINE TOOL AND DESIGN CHOICES?
We want to leverage the Allen Cell & Structure Setmenter.  It has been wrapped as a [napari-plugin](https://www.napari-hub.org/plugins/napari-allencell-segmenter) but fore the workflow we are proving out here we will want to call the `aicssegmentation` [package](https://github.com/AllenCell/aics-segmentation) directly.

#### ​The Allen Cell & Structure Segmenter 
​The Allen Cell & Structure Segmenter is a Python-based open source toolkit developed at the Allen Institute for Cell Science for 3D segmentation of intracellular structures in fluorescence microscope images. This toolkit brings together classic image segmentation and iterative deep learning workflows first to generate initial high-quality 3D intracellular structure segmentations and then to easily curate these results to generate the ground truths for building robust and accurate deep learning models. The toolkit takes advantage of the high replicate 3D live cell image data collected at the Allen Institute for Cell Science of over 30 endogenous fluorescently tagged human induced pluripotent stem cell (hiPSC) lines. Each cell line represents a different intracellular structure with one or more distinct localization patterns within undifferentiated hiPS cells and hiPSC-derived cardiomyocytes.

More details about Segmenter can be found at https://allencell.org/segmenter
In order to leverage the A
# IMPORTS

import  all nescessary packages

we are using `napari` for visualization, and `scipy` `ndimage` and `skimage` for analyzing the image files.  The underlying data format are `numpy` `ndarrays` and tools from  Allen Institute for Cell Science.


In [13]:
# top level imports
from pathlib import Path
import os, sys
from collections import defaultdict

import numpy as np
import scipy

# TODO:  prune the imports.. this is the big set for almost all organelles
# # function for core algorithm
from scipy import ndimage as ndi
from scipy.ndimage import median_filter
import aicssegmentation
from aicssegmentation.core.seg_dot import dot_3d_wrapper, dot_slice_by_slice, dot_2d_slice_by_slice_wrapper, dot_3d
from aicssegmentation.core.pre_processing_utils import ( intensity_normalization, 
                                                         image_smoothing_gaussian_3d,  
                                                         image_smoothing_gaussian_slice_by_slice, edge_preserving_smoothing_3d )
from aicssegmentation.core.utils import topology_preserving_thinning, size_filter
from aicssegmentation.core.MO_threshold import MO
from aicssegmentation.core.utils import hole_filling
from aicssegmentation.core.vessel import filament_2d_wrapper, vesselnessSliceBySlice
from aicssegmentation.core.output_utils import   save_segmentation,  generate_segmentation_contour
                                                 
from skimage import filters
from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from skimage.morphology import remove_small_objects, binary_closing, ball , dilation, remove_small_holes   # function for post-processing (size filter)
from skimage.measure import label

# # package for io 
from aicsimageio import AICSImage

import napari

### import local python functions in ../infer_subc_2d
sys.path.append(os.path.abspath((os.path.join(os.getcwd(), '..'))))

from infer_subc_2d.utils.file_io import (read_czi_image, 
                                                                    list_image_files, 
                                                                    export_ome_tiff, 
                                                                    etree_to_dict, 
                                                                    save_parameters, 
                                                                    load_parameters, 
                                                                    export_ndarray)
from infer_subc_2d.utils.img import *
from infer_subc_2d.organelles import *

from infer_subc_2d.constants import (TEST_IMG_N,
                                                                    NUC_CH ,
                                                                    LYSO_CH ,
                                                                    MITO_CH ,
                                                                    GOLGI_CH ,
                                                                    PEROXI_CH ,
                                                                    ER_CH ,
                                                                    LIPID_CH ,
                                                                    RESIDUAL_CH )

%load_ext autoreload
%autoreload 2

test_img_n = TEST_IMG_N

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload



------------------------
# LOAD RAW IMAGE DATA
Identify path to _raw_ image data and load our example image


In [14]:
# build the datapath
# all the imaging data goes here.
data_root_path = Path(os.path.expanduser("~")) / "Projects/Imaging/data"

# linearly unmixed ".czi" files are here
data_path = data_root_path / "raw"
im_type = ".czi"

# get the list of all files
img_file_list = list_image_files(data_path,im_type)
test_img_name = img_file_list[test_img_n]


In [15]:
img_data,meta_dict = read_czi_image(test_img_name)

# get some top-level info about the RAW data
channel_names = meta_dict['name']
img = meta_dict['metadata']['aicsimage']
scale = meta_dict['scale']
channel_axis = meta_dict['channel_axis']


  d = to_dict(os.fspath(xml), parser=parser, validate=validate)


In [1]:
from infer_subc_2d.workflow import InferSubC2dWorkflowEngine

NameError: name 'SegmenterFunction' is not defined

In [55]:
from aicssegmentation.workflow import (
                                                SegmenterFunction, 
                                                FunctionParameter, 
                                                WidgetType, 
                                                WorkflowStep, 
                                                WorkflowStepCategory,
                                                Workflow,
                                                BatchWorkflow, 
                                                WorkflowDefinition,
                                                PrebuiltWorkflowDefinition,
                                                WorkflowEngine
                                                )
from aicssegmentation.workflow.workflow_config import WorkflowConfig
#from .segmenter_function import SegmenterFunction, FunctionParameter, WidgetType  # noqa F401
#from .workflow_step import WorkflowStep, WorkflowStepCategory  # noqa F401
#from .workflow import Workflow  # noqa F401
#from .batch_workflow import BatchWorkflow  # noqa F401
#from .workflow_definition import WorkflowDefinition, PrebuiltWorkflowDefinition  # noqa F401
#from .workflow_engine import  WorkflowEngine  # noqa F401

import infer_subc_2d

Try to sub-class WorkflowConfig, Directories, etc. to get configurations from infer_subc_2D.organelles


PreBuiltWorkflowDefinition also reference Directories, but we don't want to use it anyway...

In [58]:
import json

class InferSubC2dDirectories:
    """
    Provides safe paths to common infer-subc-2D module directories
    """

    _module_base_dir = Path(infer_subc_2d.__file__).parent

    @classmethod
    def get_assets_dir(cls) -> Path:
        """
        Path to the assets directory
        """
        return cls._module_base_dir / "assets"

    @classmethod
    def get_structure_config_dir(cls) -> Path:
        """
        Path to the structure json config directory
        """
        return cls._module_base_dir / "organelles_config"



class InferSubC2dWorkflowConfig(WorkflowConfig):
    """
    infer-subc-2D Provides access to structure workflow configuration
    """

    def __init__(self):
        self._all_functions = None
        self._available_workflow_names = None

    def get_available_workflows(self) -> List[str]:
        """
        Get the list of all workflows available through configuration
        """
        if self._available_workflow_names is None:
            json_list = sorted(InferSubC2dDirectories.get_structure_config_dir().glob("conf_*.json"))
            self._available_workflow_names = [p.stem[5:] for p in json_list]

        return self._available_workflow_names

    def get_all_functions(self) -> List[SegmenterFunction]:
        """
        Get the list of all available Functions from configuration
        """
        if self._all_functions is None:
            path = InferSubC2dDirectories.get_structure_config_dir() / "all_functions.json"

            try:
                with open(path) as file:
                    obj = json.load(file)
                    self._all_functions = self._all_functions_decoder(obj)
            except Exception as ex:
                raise ConfigurationException(f"Error reading json configuration from {path}") from ex

        return self._all_functions

    def get_workflow_definition(self, workflow_name: str) -> PrebuiltWorkflowDefinition:
        """
        Get a WorkflowDefinition for the given workflow from the corresponding
        prebuilt json structure config
        """
        if workflow_name is None or len(workflow_name.strip()) == 0:
            raise ValueError("workflow_name cannot be empty")

        if workflow_name not in self.get_available_workflows():
            raise ValueError(f"No workflow configuration available for {workflow_name}")

        path = InferSubC2dDirectories.get_structure_config_dir() / f"conf_{workflow_name}.json"

        return self.get_workflow_definition_from_config_file(path, workflow_name, prebuilt=True)


class InferSubC2dWorkflowEngine(WorkflowEngine):
    """
    infer-subc-2D workflow engine
    Use this class to access and execute aicssegmentation structure workflows
    """

    def __init__(self, workflow_config: WorkflowConfig = None):
        self._workflow_config = workflow_config or InferSubC2dWorkflowConfig()
        self._workflow_definitions = self._load_workflow_definitions()


In [59]:
workflow_engine = InferSubC2dWorkflowEngine()

In [64]:
len(workflow_engine.workflow_definitions)

2

In [61]:
wf = workflow_engine.get_executable_workflow_from_config_file(
        "../infer_subc_2d/organelles_config/conf_actb.json",
        img_data[0])


In [46]:

wf.__class__

aicssegmentation.workflow.workflow.Workflow

In [37]:
out = wf.execute_next()

In [48]:
out.__class__

numpy.ndarray

In [42]:
all_functions = workflow_engine._workflow_config.get_all_functions()[0]

In [45]:
wf_definition = wf.workflow_definition

In [33]:
wf._results

[array([[[5.90889040e-02, 1.61406745e-02, 2.00505273e-13, ...,
          8.13650399e-02, 5.56803144e-02, 6.12944620e-02],
         [1.00372940e-01, 2.68677066e-03, 7.44676585e-02, ...,
          6.93948751e-02, 8.73801981e-02, 9.15507078e-02],
         [5.11288447e-02, 2.50631592e-02, 6.50639612e-02, ...,
          1.00392990e-01, 8.51144885e-02, 9.76661186e-02],
         ...,
         [2.00505273e-13, 2.00505273e-13, 7.87985724e-03, ...,
          2.09528011e-02, 1.13686490e-02, 1.68223924e-02],
         [2.97349320e-02, 2.69278582e-02, 1.52785018e-02, ...,
          7.13798773e-03, 1.89878494e-02, 1.25516301e-02],
         [3.44468060e-02, 2.07322453e-02, 9.78465734e-03, ...,
          2.42410875e-02, 4.15045916e-03, 3.02963468e-02]],
 
        [[2.31784096e-02, 2.00505273e-13, 2.00505273e-13, ...,
          3.50884228e-01, 2.25307776e-01, 2.32686370e-01],
         [5.66627902e-02, 2.00505273e-13, 2.00505273e-13, ...,
          1.76945904e-01, 2.90592293e-01, 3.06311906e-01],
       