# Trace blobs (e.g. nuclei, cells, ...) over the z-axis

Code written by Nathan De Fruyt (nathan.defruyt@kuleuven.be, nathan.defruyt@gmail.com). 

The algorithm is simple, but functional for minimal goals:
1. **version 3 edit:** instead of blob recognition, I threshold on the percentile of I values (i.e. also intrinsic background correction)
1. **version 4 edit:** splitting blobs that are larger than normal (95% percentile?)
1. **version 5 edit:** still to figure out splitting, but already 3D image labeling + better thresholding
1. **version 6 edit:** segment anything algorithm instead of skimage

* next, I deviated from Wim's advice and went on to work with the blob's 
    1. **center coordinates
    1. **mean intensity value
    1. **radius** (restricted to 20 pixels, as this appeared to be towards the higher end of nucleus radii -- adapt this!)

To this, the program:
1. determines **common blob labels** based on how near they are (max displacement of center = 10 pixels in x/y direction, max rise of 5 planes)
1. renders one line per blob for the plane with the **highest intensity value**

Each step (1. blob identification, 2. blob labelling, 3. summary) are rendered in separate .csv files. 
Blob identification takes the longest (a few hours for a day of pictures). The subseding steps are fast.

**Parameters can therefore be adapted** in the second and third step without consideration. 

Do think about changing parameters to the first step.

___Questions are welcome, optimization of the algorithm too.___

In [1]:
## general math and system modules/functions
from math import sqrt, atan, tan, cos, sin
import numpy as np
import os
import glob
import itertools
import tkinter as tk
from tkinter import filedialog
from tqdm import tqdm
import shutil as sh
from lxml import etree ## for parsing html (for the metadata)

## parallellization!
from multiprocessing import Pool, cpu_count

## data formatting and parsing!
import pandas as pd
from html5lib import *
import xml.etree.ElementTree as ET

## image import and processing module functions
import czifile as cfile
from skimage import data, data, measure, exposure
from skimage.measure import label, regionprops_table, regionprops
from skimage.morphology import closing
from skimage.segmentation import clear_border
from skimage.feature import blob_dog, blob_log, blob_doh
from skimage.color import rgb2gray
from skimage.filters import gaussian, laplace, threshold_otsu
import cv2 as cv
import PIL
import tifffile as tf
from scipy.spatial.distance import pdist, squareform

import string

## plotting modules
import plotly
import plotly.express as px
import matplotlib.pyplot as plt
from matplotlib.pyplot import Axes

## 1. Find data

First I adapted some existing functions to more easily check and handle data either here in jupyter notebook or to the purpose of an application. 

In [2]:
def write2tif(img, file, pixelsizes, pixelunits, channels): 
    tf.imwrite(file,
              img,
              bigtiff = False,
              photometric = 'rgb',
              planarconfig = 'separate', 
              metadata = {
                  'axes': 'TCZYXS',
                  'SignificantBits': 8,
                  'PhysicalSizeX': pixelsizes[0],
                  'PhysicalSizeXUnit': pixelunits[0],
                  'PhysicalSizeY': pixelsizes[1],
                  'PhysicalSizeYUnit': pixelunits[1],
                  'PhysicalSizeZ': pixelsizes[2],
                  'PhysicalSizeZUnit': pixelunits[2],
                  'Channel': {'Name': channels},
              })

## 2. Render ___summary___ (mean, max, ...) intensity

I want to make the process automated for each picture in the folder.
1. __load__ the file
1. __extract__ the channels (gfp and bfp in my case)
1. __threshold and detect__ nuclei in the gfp channel
1. extract object __features__ (including intensity, area, coordinates, and radius)
1. threshold on object __radius__

For all these things, I'd like to save (per pictures): 
1. the object __features__ (for all objects)
1. also include the __intensity__ in the other channels
1. possibly recognize whether the nucleus is fully __contained__ by the marker (e.g. nucleus in BAG)
1. save a __tif__ with the labeled nuclei in one channel, the original gfp values in the other channel and the bfp in yet another channels

In [3]:
def quantify_nuclei(channel1, channel2, sigma = 7, threshold1 = 99.6, threshold2 = 99.6, min_radius = 10, max_radius = 35, max_ratio = 3):
    
    ## 1) process/quantify image channel 1 - the nuclearly localized channel:
    
        ## take laplacian of the gaussian of the image - make sigma wide enough for good smoothing
    img1_lp = laplace(gaussian(channel1, sigma = sigma))

        ## threshold on percentile - very strict, yet slightly permissive threshold of 99.8
    thresh1 = np.percentile(img1_lp, threshold1)
    img1_bw = np.zeros(img1_lp.shape)
    img1_bw[img1_lp > thresh1] = 1

        ## close the edges and label adhering regions
#     img_cls = closing(img_bw)
#     img_lbl = label(img_cls)
    img1_lbl = label(img1_bw)
    
        ## subtract the background from the image (everything that's below the threshold is considered background)
    bg1 = np.mean(img1_lp < thresh1)
    img1_bg = channel1 - bg1
    img1_bg[img1_bg < 0] = 0
    print(f'    Thresholded green channel on intensity percentile (P{threshold1})')
    
        ## extract object features
    idf1 = pd.DataFrame(regionprops_table(label_image=img1_lbl, intensity_image=img1_bg, properties = ('label', 'intensity_mean', 'centroid', 'bbox', 'area')))
        ## relabel columns
    idf1 = idf1.rename(columns = {'label': "ID", 'intensity_mean': "I", 'area': "area", 'centroid-0': "z", 'centroid-1': "y", 'centroid-2': "x", 'bbox-0': "zmin", 'bbox-1': "ymin", 'bbox-2': "xmin", 'bbox-3': "zmax", 'bbox-4': "ymax", 'bbox-5': "xmax"})
        ## calculate radius
    idf1['r'] = list(map(lambda x: np.cbrt(3*x/(4*np.pi)), idf1['area']))
        ## calculate axes
    idf1['r_z'] = xyzsize[2] * (idf1['zmax'] - idf1['zmin'])
    idf1['r_y'] = xyzsize[1] * (idf1['ymax'] - idf1['ymin'])
    idf1['r_x'] = xyzsize[0] * (idf1['xmax'] - idf1['xmin'])
        ## compute a ratio as a proxy for eccentricity (major axis/minor axis)
    def r_ratio(lbl):
        sub = list(idf1.loc[idf1['ID'] == lbl, ['r_z', 'r_y', 'r_x']].iloc[0, :])
        return float(max(sub)/min(sub))
    idf1['r_ratio'] = list(map(r_ratio, idf1['ID']))
    print(f'    Calculated object intensity and shape features')
        ## calculate max area - we assume spherical object and semi-spherical objects (sometimes cells are merges...)
#     max_area = (4/3)*np.pi*(max_radius**3)
#     min_area = (4/3)*np.pi(*min_radius**3)
    
        ## now threshold objects on shape (nuclei can only be three times as long as they are wide)
    valid_nuclei = idf1[idf1['ID'].isin(idf1.ID[(idf1['r'] > min_radius) & (idf1['r'] < max_radius)])] ## first on radius
    valid_nuclei = valid_nuclei[valid_nuclei['ID'].isin(idf1.ID[(idf1['r_ratio'] < max_ratio)])] ## then on shape
    img1_selected = img1_lbl.copy()
    img1_selected[np.isin(img1_selected, valid_nuclei['ID'], invert = True)] = 0
    print(f'    Thresholded objects on radius ({min_radius} < r < {max_radius} and shape (min radius/max radius ratio = {max_ratio})')
    ## 2) threshold channel 2 - the marker channel:
    
        ## 'smooth'
    img2_lp = laplace(gaussian(channel2, sigma = sigma))
    
        ## threshold on percentile
    thresh2 = np.percentile(img2_lp, threshold2)
    img2_bw = np.zeros(img2_lp.shape)
    img2_bw[img2_lp > thresh2] = 1
    
    print(f'    Thresholded blue channel on intensity percentile (P{threshold2})')
    
    ## 3) calculate coexpression value

        ## calculate the percentage of the object that shows above-threshold expression in both channels
    object_labels = valid_nuclei['ID'].unique() ## summarize object labels to check
    coexpr = img2_bw.copy() ## make a copy of channel2 (marker) above threshold expression
    coexpr[img1_selected == 0] = 0 ## set all pixels that are not within nuclei to black
    perc_coexpr = np.zeros(img1_selected.shape) ## initialize a new array to visualize coexpression percentages
    valid_nuclei['p_coexpr'] = 0 ## initialize a column to store the coexpression percentages

    for lbl in tqdm(object_labels): ## for each object
            ## count the number of pixesl showing coexpression
        nr_contained = np.sum(coexpr[img1_selected == lbl] > 0)
            ## count the total number of pixels in the object
        total_pxls = np.sum(img1_selected == lbl)
            ## calculate the percentage
        perc_coexpr[img1_selected == lbl] = nr_contained/total_pxls
        valid_nuclei.loc[valid_nuclei['ID'] == lbl, ['p_coexpr']] = nr_contained/total_pxls

    print(f'    Calculated coexpression values for all {len(object_labels)} objects')

    ## return output
    return img1_selected, perc_coexpr, valid_nuclei

In [5]:
## don't touch this cell!!

## thresholding parameters
gfp_threshold = 99.5
bfp_threshold = 99.6
min_radius = 10
max_radius = 35
max_ratio = 5
sigma = 7

### and set folder
folder = filedialog.askdirectory()
files = glob.glob(folder + '/*.czi')

for i, file in enumerate(files):
    ## read the file
    imstacks = cfile.imread(file)
    print(f'>> Read file ({file}) {i+1}/{len(files)}\n')
    
    ## collect experimental metadata
    meta = os.path.basename(file).split('-')
    date = meta[0]
    protocol = meta[1]
    
    submeta = meta[2][:-4].split('_')
    strain = submeta[0]
    cultivation = submeta[1]
    replicate = int(submeta[2])
    
    ## collect resolution metadata
    root = ET.fromstring(cfile.CziFile(file).metadata())
        ## initialize lists
    xyzsize = list()
    xyzunit = list()
    channels = list()
        ## retrieve physical distances and units
    for axis in root[0][4][2]:
        for detail in axis:
            if detail.tag == 'Value': xyzsize.append(float(detail.text))
            if detail.tag == 'DefaultUnitFormat': xyzunit.append(detail.text)   
        ## retrieve channels
    for channel in root[0][5][0]:
        channels.append(channel.attrib.get('Name'))
    print(f'    Collected metadata')
    
    ## extract separate channels (here gfp and bfp)
    gfp = imstacks[0, 0, :imstacks.shape[2], :imstacks.shape[3], :imstacks.shape[4], 0]
    bfp = imstacks[0, 1, :imstacks.shape[2], :imstacks.shape[3], :imstacks.shape[4], 0]
    
    ## green channel: identify and label nuclei, threshold on radius, and extract object features
    gfp_selected, coexpr_nuclei, Idf = quantify_nuclei(gfp, bfp, sigma = sigma, threshold1 = gfp_threshold, threshold2 = bfp_threshold, min_radius = min_radius, max_radius = max_radius, max_ratio = max_ratio)
    nr_objects = len(np.unique(gfp_selected)) ## just for following progress
    print(f'    ->> detected {nr_objects} nucleus-like objects')
    
    ## make a new folder to store the file in
    newdir = folder + '/' + meta[2][:-4]
    if not os.path.isdir(newdir): 
        os.makedirs(newdir)
    
    ## write the stack to a tif file
    print(f'    Writing labeled nuclei channel to file\n')
    write2tif(gfp_selected,  newdir + '/labeled.tif', xyzsize, xyzunit, ['EGFP'])
    print(f'    Writing BFP channel to file\n')
    write2tif(bfp,  newdir + '/BFP.tif', xyzsize, xyzunit, ['EBFP'])
    print(f'    Writing GFP channel to file\n')
    write2tif(gfp,  newdir + '/GFP.tif', xyzsize, xyzunit, ['EGFP'])
    print(f'    Writing coexpression channel to file\n')
    write2tif(coexpr_nuclei,  newdir + '/coexpression.tif', xyzsize, xyzunit, ['coexpression'])

    ## write the quantification to a file:
    Idf.to_csv(folder + '/' + os.path.basename(file)[:-4] + '_ObjectFeatures.csv')
    
    print(f'    Finished analysis for file {file}\nFind your files in {newdir}\n')

>> Read file (E:/PhD/confocal/confocal_data/20241024-PXXX_vX-PHX9311_DiI\20241024-PXXX_vX-PHX9311_DiI.63x.7_1.czi) 1/3

    Collected metadata
    Thresholded green channel on intensity percentile (P99.5)
    Calculated object intensity and shape features
    Thresholded objects on radius (10 < r < 35 and shape (min radius/max radius ratio = 5)
    Thresholded blue channel on intensity percentile (P99.6)


MemoryError: Unable to allocate 6.24 GiB for an array with shape (253, 1820, 1820) and data type float64