# SCohenLab 2D Image Processing notebook (Simplified MCZ)

This notebook illustrates how to read the .czi images and process them.   Specifically how to select an "optimal" Z-slice for further processing.


--------------
# PIPELINE OVERVIEW
## 1. GOAL SETTING ✍

### GOAL:  Infer sub-cellular components in order to understand interactome 

To measure shape, position, size, and interaction of eight organelles/cellular components (Nuclei (NU), Lysosomes (LS),Mitochondria (MT), Golgi (GL), Peroxisomes (PO), Endoplasmic Reticulum (ER), Lipid Droplet (LD), and SOMA) during differentiation of iPSCs, in order to understand the Interactome / Spatiotemporal coordination.

### summary of _OBJECTIVES_ ✅
- robust inference of subcellular objects:
  -  #### 1️⃣. [Infer ***soma***](./01_infer_soma.ipynb) (🚨🚨🚨🚨 Steps 2-9 depend on establishing a good solution here.)
  -  #### 2️⃣. [infer ***nuclei*** ](./02_infer_nuclei.ipynb) 
  -  #### 3️⃣. [Infer ***cytosol***](./03_infer_cytosol.ipynb)
  -  #### 4️⃣. [Infer ***lysosomes***](./04_infer_lysosome.ipynb)
  -  #### 5️⃣. [Infer ***mitochondria***](./05_infer_mitochondria.ipynb)
  -  #### 6️⃣. [Infer ***golgi*** complex](./06_golgi.ipynb)
  -  #### 7️⃣. [Infer ***peroxisome***](./07_peroxisome.ipynb)
  -  #### 8️⃣. [Infer ***endoplasmic reticulum*** ](./08_endoplasmic_reticulum.ipynb)
  -  #### 9️⃣. [Infer ***lipid body***](./09_lipid_bodies.ipynb) 





## 2. DATA CREATION
> METHODS:📚📚
> 
> iPSC lines prepared and visualized on Zeiss Microscopes. 32 channel multispectral images collected.  Linear Unmixing in  ZEN Blue software with target emission spectra yields 8 channel image outputs.  Channels correspond to: Nuclei (NU), Lysosomes (LS),Mitochondria (MT), Golgi (GL), Peroxisomes (PO), Endoplasmic Reticulum (ER), Lipid Droplet (LD), and a “residual” signal.

> Meta-DATA 🏺 (artifacts)
>   - Microcope settings
>  - OME scheme
> - Experimenter observations
> - Sample, bio-replicate, image numbers, condition values, etc
>  - Dates
>  - File structure, naming conventions
>  - etc.





## 3. IMAGE PROCESSING  ⚙️🩻🔬
### INFERENCE OF SUB-CELLULAR OBJECTS
The imported images have already been pre-processed to transform the 32 channel spectral measuremnts into "linearly unmixed" images which estimate independently labeled sub-cellular components.  Thes 7 channels (plus a residual "non-linear" signal) will be used to infer the shapes and extents of these sub-cellular components.   
A single "optimal" Z slice is chosen for each image for subsequent 2D analysis.
We will perform computational image analysis on the pictures to _segment_ (or _binarize_) the components of interest for measurement.  In other procedures we can used these labels as "ground truth" labels to train machine learning models to automatically perform the inference of these objects.
Pseudo-independent processing of the imported multi-channel image to acheive each of the 9 objecives stated above.  i.e. infering: NUCLEI, SOMA, CYTOSOL, LYSOSOME, MITOCHONDRIA, GOLGI COMPLEX, PEROZISOMES, ENDOPLASMIC RETICULUM, and LIPID BODIES

### General flow for infering objects via segmentation
- Pre-processing 🌒
- Core-processing (thresholding) 🌕
- Post-processing  🌘

### QC 🚧 WIP 🚧 




## 4. QUANTIFICATION 📏📐🧮

SUBCELLULAR COMPONENT METRICS
-  extent 
-  size
-  shape
-  position



### NOTE: PIPELINE TOOL AND DESIGN CHOICES?
We want to leverage the Allen Cell & Structure Segmenter.  It has been wrapped as a [napari-plugin](https://www.napari-hub.org/plugins/napari-allencell-segmenter) but fore the workflow we are proving out here we will want to call the `aicssegmentation` [package](https://github.com/AllenCell/aics-segmentation) directly.

#### ​The Allen Cell & Structure Segmenter 
​The Allen Cell & Structure Segmenter is a Python-based open source toolkit developed at the Allen Institute for Cell Science for 3D segmentation of intracellular structures in fluorescence microscope images. This toolkit brings together classic image segmentation and iterative deep learning workflows first to generate initial high-quality 3D intracellular structure segmentations and then to easily curate these results to generate the ground truths for building robust and accurate deep learning models. The toolkit takes advantage of the high replicate 3D live cell image data collected at the Allen Institute for Cell Science of over 30 endogenous fluorescently tagged human induced pluripotent stem cell (hiPSC) lines. Each cell line represents a different intracellular structure with one or more distinct localization patterns within undifferentiated hiPS cells and hiPSC-derived cardiomyocytes.

More details about Segmenter can be found at https://allencell.org/segmenter
In order to leverage the A
# IMPORTS

import  all nescessary packages

we are using `napari` for visualization, and `scipy` `ndimage` and `skimage` for analyzing the image files.  The underlying data format are `numpy` `ndarrays` and tools from  Allen Institute for Cell Science.


In [1]:
# top level imports
from pathlib import Path
import os, sys
from collections import defaultdict

import numpy as np
import scipy

from typing import Union, List, Tuple, Any
# TODO:  prune the imports.. this is the big set for almost all organelles
# # function for core algorithm
from scipy import ndimage as ndi
import aicssegmentation
from aicssegmentation.core.pre_processing_utils import ( intensity_normalization, 
                                                         image_smoothing_gaussian_slice_by_slice )

# # package for io 
from aicsimageio import AICSImage

import napari

### import local python functions in ../infer_subc_2d
sys.path.append(os.path.abspath((os.path.join(os.getcwd(), '..'))))

from infer_subc_2d.utils.file_io import (read_czi_image,
                                                                    list_image_files)
from infer_subc_2d.utils.img import *
from infer_subc_2d.organelles import fixed_get_optimal_Z_image, get_optimal_Z_image
from infer_subc_2d.constants import (TEST_IMG_N,
                                                                    NUC_CH ,
                                                                    LYSO_CH ,
                                                                    MITO_CH ,
                                                                    GOLGI_CH ,
                                                                    PEROXI_CH ,
                                                                    ER_CH ,
                                                                    LIPID_CH ,
                                                                    RESIDUAL_CH , 
                                                                    ALL_CHANNELS)

%load_ext autoreload
%autoreload 2

test_img_n = TEST_IMG_N


# Get and load Image for processing
For testing purposes... TODO: build a nice wrapper for this.



Read the data into memeory from the `.czi` files.  (Note: there is also the 2D slice .tif file read for later comparision).  WE will also collect metatdata.

> the `data_path` variable should have the full path to the set of images wrapped in a `Path()`.   Below the path is built in 3 stages
> 1. my user directory "~" plus
> 2. general imaging data directory "Projects/Imaging/data" plus
> 3. "raw" where the linearly unmixed zstacks are

The image "type" is also set by `im_type = ".czi"`


In [2]:
# build the datapath
# all the imaging data goes here.
data_root_path = Path(os.path.expanduser("~")) / "Projects/Imaging/data"

# linearly unmixed ".czi" files are here
data_path = data_root_path / "raw"
im_type = ".czi"

# depricate this
# list_img_files = lambda img_folder,f_type: [os.path.join(img_folder,f_name) for f_name in os.listdir(img_folder) if f_name.endswith(f_type)]
img_file_list = list_image_files(data_path,im_type)

test_img_name = img_file_list[test_img_n]
test_img_name

'/Users/ahenrie/Projects/Imaging/data/raw/ZSTACK_PBTOhNGN2hiPSCs_BR3_N04_Unmixed.czi'

In [3]:
img_data,meta_dict = read_czi_image(test_img_name)

# get some top-level info about the RAW data
channel_names = meta_dict['name']
img = meta_dict['metadata']['aicsimage']
scale = meta_dict['scale']
channel_axis = meta_dict['channel_axis']


  d = to_dict(os.fspath(xml), parser=parser, validate=validate)


## CHOOZE Z-SLICE

We need to chooze which Z-slice to analyze for subsequent organelle segmentation.  _A priori_ we expect that the other organelles are ***NOT*** _inside_ of the nucleus, and we want to chooze a Z with minimal overlap between the nucleus and other organelles.  For now we will also assume that a majority of the signals are attributable to a single well marked cell.


In the future we might develop an optimization procedure to resample along an arbitrary plane such that the total florescence signal is maximized and the overlap on nuclei with other organelles.   We may also want to limit ourselves to a single neuron / soma.

Maria Clara's description of how she choose Z-slices:  "I do choose the slice which it contains all the organelle (Nuclei and Golgi are the most difficult one since show high polarity) and that it would contain the highest density of them (you can see trough the slices that some organelle show high density in specific planes)"


In [4]:
# get nuclei and normalize...
# median filter in 2D / convert to float 0-1.   get rid of the "residual"
scaling_param =  [0]   

nuclei = select_channel_from_raw(img_data, 0)

nuclei = intensity_normalization(nuclei, scaling_param=[0])

med_filter_size = 4   
# structure_img_median_3D = ndi.median_filter(struct_img,    size=med_filter_size  )
nuclei = median_filter_slice_by_slice( 
                                                                nuclei,
                                                                size=med_filter_size  )

gaussian_smoothing_sigma = 1.34
nuclei = image_smoothing_gaussian_slice_by_slice(   nuclei, sigma=gaussian_smoothing_sigma )

#struct_obj = struct_img > filters.threshold_li(struct_img)
threshold_value_log = threshold_li_log(nuclei)

threshold_factor = 0.9 #from cellProfiler
thresh_min = .1
thresh_max = 1.
threshold = min( max(threshold_value_log*threshold_factor, thresh_min), thresh_max)

struct_obj = nuclei > threshold

###################
# mask everything and find most intense flourescence NOT in the nuclei
ch_to_agg = ( LYSO_CH ,
                        MITO_CH ,
                        GOLGI_CH ,
                        PEROXI_CH ,
                        ER_CH ,
                        LIPID_CH )

total_florescence = img_data[ch_to_agg,].astype( np.double ).sum(axis=0)
print(total_florescence.shape)
total_ = total_florescence
total_[struct_obj] =0 
optimal_Z = total_.sum(axis=(1,2)).argmax()


intensity normalization: min-max normalization with NO absoluteintensity upper bound
(16, 768, 768)


Now make some functions to handle the procedure

In [5]:


def _aggregate_signal_channels(img_i:np.ndarray, chs:Union[List, Tuple], ws:Union[List, Tuple, Any]= None) -> np.ndarray:
    """ 
    return a weighted sum of the image across channels

    img_in:  np.ndarray  [ch,z,x,y]
    chs: list/tuple of channels to aggregate
    ws: list/tuple/ of weights for aggregation
    """

    n_chan = len(chs)
    if n_chan <= 1:
        return img_in[chs]

    if ws is None:
        ws = n_chan*[1.]
    img_out = np.zeros_like(img_in[0]).astype(np.double)
    for w,ch in zip(ws,chs):
        img_out += w*img_in[ch]
    return img_out
    #return img_in[ch_to_agg,].astype( np.double ).sum(axis=0)


def _choose_agg_signal_zmax(img_in,chs,ws=None,mask=None):
    """ 
    return z the maximum signal for the aggregate signal

    img_in:  np.ndarray  [ch,z,x,y]
    chs: list of channels to aggregate
    ws: list of weights for aggregation
    mask: mask for img_in

    returns z with maximum signal
    """
    total_florescence_ = aggregate_signal_channels(img_data, chs)
    if mask is not None:
        total_florescence_[mask] = 0.
    return total_florescence_.sum(axis=(1,2)).argmax()


optimal_Z = _choose_agg_signal_zmax(img_data,ch_to_agg)



In [6]:


def _select_channel_from_raw(img_in:np.ndarray, ch:Union[int, Tuple[int]]) -> np.ndarray:
    """" 
    Parameters:
    ------------
    img_in: np.ndarray

    ch:int  
        channel to extract.

    Returns:
    -------------
        np.ndarray
    """
    return img_in[ch]


In [7]:
    
def _apply_log_li_threshold(img_in, threshold_factor=1.0, thresh_min=None, thresh_max=None):
    """return a binary mask after applying a log_li threshold
    
    Parameters:
    ------------
    img_in: np.ndimage

    threshold_factor:float=1.0  scaling value for threshold

    thresh_min= None or min

    thresh_max = None or max

    Returns:
    -------------
        np.ndimage
    """
    #struct_obj = struct_img > filters.threshold_li(struct_img)
    threshold_value_log = threshold_li_log(img_in)
    threshold = threshold_value_log*threshold_factor

    if thresh_min is not None:
        threshold = max(threshold, thresh_min)
    if thresh_max is not None:
        threshold = min(threshold, thresh_max)
    return img_in > threshold



In [8]:

def _find_optimal_Z(raw_img:np.ndarray, nuc_ch:int, ch_to_agg:Tuple[int]) -> int:
    """
    Procedure to infer _best_ Zslice  from linearly unmixed input.

    Parameters:
    ------------
    raw_img: np.ndarray
        a ch,z,x,y - image containing florescent signal

    nuclei_ch: int
        holds the needed parameters

    nuclei_ch: int
        holds the needed parameters

    Returns:
    -------------
    opt_z:
        the "0ptimal" z-slice which has the most signal intensity for downstream 2D segmentation    
    """

    # median filter in 2D / convert to float 0-1.   get rid of the "residual"
    struct_img = _select_channel_from_raw(raw_img, nuc_ch)

    struct_img = min_max_intensity_normalization(struct_img)
    med_filter_size = 4   
    # structure_img_median_3D = ndi.median_filter(struct_img,    size=med_filter_size  )
    nuclei = median_filter_slice_by_slice( struct_img,
                                                                    size=med_filter_size  )

    gaussian_smoothing_sigma = 1.34
    nuclei = image_smoothing_gaussian_slice_by_slice(  nuclei,
                                                                                                sigma=gaussian_smoothing_sigma
                                                                                                )
    threshold_factor = 0.9 #from cellProfiler
    thresh_min = .1
    thresh_max = 1.
    struct_obj = _apply_log_li_threshold(nuclei, threshold_factor=threshold_factor, thresh_min=thresh_min, thresh_max=thresh_max)

    optimal_Z = _choose_agg_signal_zmax(raw_img,ch_to_agg, mask=struct_obj)

    return optimal_Z


In [9]:
def _fixed_find_optimal_Z(img_data):
    """
    Procedure to infer _best_ Zslice  from linearly unmixed input with fixed parameters
    """
    nuc_ch = NUC_CH
    ch_to_agg = ( LYSO_CH ,
                            MITO_CH ,
                            GOLGI_CH ,
                            PEROXI_CH ,
                            ER_CH ,
                            LIPID_CH )
    return _find_optimal_Z(img_data, nuc_ch, ch_to_agg) 

In [11]:
optimal_Z = _fixed_find_optimal_Z(img_data) 


In [13]:
from infer_subc_2d.organelles import fixed_find_optimal_Z, fixed_get_optimal_Z_image

optimal_Z = fixed_find_optimal_Z(img_data) 
single_Z_img = fixed_find_optimal_Z(img_data) 
single_Z_img

8

## create a helper function to add function entries
These helpers encode the workflow particulars for creating widgets and defining/executing workflows.    This function will be added to `infer_subc_2d.organelles_config.helper.py` 

In [22]:
from infer_subc_2d.utils.directories import Directories

def _add_function_spec_to_widget_json( function_name, function_dict, json_file_name = "all_functions.json", overwrite=False):
    """ helper function to compose / update list of functions for Workflows
    """
    # read all_functions.json into dict
    path = Directories.get_structure_config_dir() / json_file_name
    try:
        with open(path) as file:
            obj = json.load(file)
    except: #Exception as ex:
        print(f"file {path} not found")
        return 
        
    # add function entry
    if function_name in obj.keys():
        print(f"function {function_name} is already in {json_file_name}")
        if not overwrite:
            return(0)
    
    obj[function_name] = function_dict    # write updated all_functions.json
        
    # re-write file
    with open(path, "w") as file:
        json.dump(obj, file, indent=4, sort_keys=True)
        
    return(1)


In [29]:
fixed_find_optimal_Z_dict =  {
        "name": "find the optimal Z slice (fixed parameters)",
        "python::module": "infer_subc_2d.organelles",
        "python::function": "fixed_find_optimal_Z",
        "parameters": None
        }

fixed_find_optimal_Z_dict

{'name': 'find the optimal Z slice (fixed parameters)',
 'python::module': 'infer_subc_2d.organelles',
 'python::function': 'fixed_find_optimal_Z',
 'parameters': None}

In [30]:
fixed_get_optimal_Z_img_dict =  {
        "name": "extract optimal Z slice (fixed parameters)",
        "python::module": "infer_subc_2d.organelles",
        "python::function": "fixe_get_optimal_Z_image",
        "parameters": None
        }

fixed_get_optimal_Z_img_dict

{'name': 'extract optimal Z slice (fixed parameters)',
 'python::module': 'infer_subc_2d.organelles',
 'python::function': 'fixe_get_optimal_Z_image',
 'parameters': None}

In [31]:
from infer_subc_2d.organelles_config.helper import add_function_spec_to_widget_json

add_function_spec_to_widget_json("fixed_find_optimal_Z",fixed_find_optimal_Z_dict, overwrite=True)
add_function_spec_to_widget_json("fixed_get_optimal_Z_img",fixed_get_optimal_Z_img_dict, overwrite=True)

function fixed_find_optimal_Z is already in all_functions.json
overwriting  fixed_find_optimal_Z
function fixed_get_optimal_Z_img is already in all_functions.json
overwriting  fixed_get_optimal_Z_img


1

In [28]:
from infer_subc_2d.constants import ALL_CHANNELS
_select_channel_from_raw_dict =  {
        "name": "select a channel ",
        "python::module": "infer_subc_2d.utils.img",
        "python::function": "select_channel_from_raw",
        "parameters": {
            "chan": {
                "data_type": "int",
                "options": ALL_CHANNELS,
                "widget_type": "drop-down"
            }
        }
}

add_function_spec_to_widget_json("select_channel_from_raw",_select_channel_from_raw_dict )

function select_channel_from_raw is already in all_functions.json


0

1

In [13]:
ch_to_agg = (1,2,3,4,5,6)
nuc_ch = 0

In [27]:
optimal_Z = _find_optimal_Z(img_data, nuc_ch, ch_to_agg) 


## TODO:  create a napari plugin to interactively find the best slice...

- minimize: overlap_matric {nuclei(ch0), all other organelles (1-6)}
- maximize: signal_pwr (ch 1-6)

---------------------------
Please proceed to 01_infer_nuclei.ipynb


everything below is just testing some speed of different libraries..  

Visualize the raw data file with [napari](https://napari.org)

In [None]:

viewer = napari.view_image(
    img_data,
    channel_axis=0,
    name=channel_names,
    scale=scale
)
viewer.scale_bar.visible = True


In [None]:
viewer.dims.ndisplay = 3
viewer.camera.angles = (-30, 25, 120)

In [None]:
##  need to save: 

# output_path, list_of_files
viewer.dims.ndisplay = 2