<h2>3D stack to 2D MIP - Single image - Marker+ based on colocalization</h2>

The following notebook is able to process a 3D stack (.czi or .nd2 files) into a MIP and allows you to:

1. Inspect your images in Napari.
2. Define regions of interest (ROIs) using labels in Napari. Store said ROIs as .tiff files if needed.
3. Predict nuclei labels and store them as .tiff files for further processing.
4. Extract numbers of cells positive for a marker based on colocalization (using a user-defined min_max range).
5. Display positive cells in Napari.
6. Extract and save number of positive cells in a .csv file (SP_marker_+_label_coloc).

In [88]:
from pathlib import Path
from tqdm import tqdm
import tifffile
import napari
import os
import numpy as np
import pandas as pd
import pyclesperanto_prototype as cle
from utils_cellpose import list_images, read_image, save_rois, segment_nuclei_2d, segment_marker_positive_nuclei

cle.select_device("RTX")

<NVIDIA GeForce RTX 4090 on Platform: NVIDIA CUDA (1 refs)>

<h3>Define the directory where your images are stored (.nd2 or .czi files)</h3>

In [89]:
# Copy the path where your images are stored, you can use absolute or relative paths to point at other disk locations
directory_path = Path("../raw_data/test_data")

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
# Remember in Python one starts counting from 0, so your first channel will be 0
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# Iterate through the .czi and .nd2 files in the directory
images = list_images(directory_path)

images

['..\\raw_data\\test_data\\HI1_CONTRA_M8_S6_TR1.czi',
 '..\\raw_data\\test_data\\HI1_CONTRA_M8_S6_TR2.czi',
 '..\\raw_data\\test_data\\HI1_CONTRA_M8_S7_TR1.czi',
 '..\\raw_data\\test_data\\HI1_CONTRA_M8_S7_TR2.czi',
 '..\\raw_data\\test_data\\HI1_IPSI_M8_S6_TR1.czi',
 '..\\raw_data\\test_data\\HI1_IPSI_M8_S6_TR2.czi',
 '..\\raw_data\\test_data\\HI1_IPSI_M8_S7_TR1.czi',
 '..\\raw_data\\test_data\\HI2_CONTRA_M10_S10_TR1.czi',
 '..\\raw_data\\test_data\\HI2_CONTRA_M10_S10_TR2.czi',
 '..\\raw_data\\test_data\\HI2_IPSI_M10_S10_TR1.czi',
 '..\\raw_data\\test_data\\HI2_IPSI_M10_S10_TR2.czi',
 '..\\raw_data\\test_data\\HI3_CONTRA_M11_S10_TR1.czi',
 '..\\raw_data\\test_data\\HI3_CONTRA_M11_S10_TR2.czi',
 '..\\raw_data\\test_data\\HI3_IPSI_M11_S10_TR1.czi',
 '..\\raw_data\\test_data\\HI3_IPSI_M11_S10_TR2.czi',
 '..\\raw_data\\test_data\\SHAM1_CONTRA_M6_S11_TR1.czi',
 '..\\raw_data\\test_data\\SHAM1_CONTRA_M6_S11_TR2.czi',
 '..\\raw_data\\test_data\\SHAM1_CONTRA_M7_S11_TR1.czi',
 '..\\raw_data\\t

<h3>Open each image in the directory</h3>
You can do so by changing the number within the brackets below <code>image = images[0]</code>. By changing the <code>slicing factor</code> you lose resolution but speed up processing times (check the results).

In [90]:
# Explore a different image to crop (0 defines the first image in the directory)
image = images[0]

# Image size reduction to improve processing times (slicing, not lossless compression)
slicing_factor = None # Use 2 or 4 for compression (None for lossless)

# Segmentation type ("2D" or "3D"). 
# 2D takes a z-stack as input, performs MIP (Maximum Intensity Projection) and predicts nuclei from the resulting projection (faster, useful for single layers of cells)
# 3D is more computationally expensive. Predicts 3D nuclear volumes, useful for multilayered structures
segmentation_type = "2D"

# This is a placeholder to later choose from fine-tuned Cellpose models (default nuclei model in Cellpose 3.0)
model_name = "Cellpose"

# Generate maximum intensity projection and extract filename
img_mip, filename = read_image(image, slicing_factor)

# Show image in Napari to define ROI
viewer = napari.Viewer(ndisplay=2)
viewer.add_image(img_mip)



Image analyzed: HI1_CONTRA_M8_S6_TR1
Original Array shape: (4, 14, 3803, 2891)
MIP Array shape: (4, 3803, 2891)


<Image layer 'img_mip' at 0x22c701ccfd0>

<h3>Label your regions of interest in Napari and explore the signal of your marker of interest</h3>

Make sure to set <code>n edit dim = 3</code> so the label propagates across all channels. Name your regions of interest as i.e. <code>DG</code>, <code>CA1</code>, <code>CA3</code> or <code>HIPPO</code>. If you do not draw any ROI the entire image will be analyzed.

Fnally, select the <code>img_mip</code> layer and play with the contrast limit to later set a min_max range of intensities within which cells will be considered positive for said marker.

<video controls>
  <source src="../assets/napari_labels.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

<h3>Save user-defined label ROIs as .tiff files</h3>

In [91]:
save_rois(viewer, directory_path, filename)

No user-defined ROIs have been stored


<h3>Define your analysis parameters</h3>

Modify the values for <code>nuclei_channel</code>, <code>marker_name</code>, <code>min_max_range</code> and <code>erosion_factor</code>

Define marker <code>location</code> as either "cytoplasm" or "nucleus", and the <code>cytoplasm_dil_radius</code>.

In [92]:
# You can choose markers from the following list
markers

[('ki67', 0, 'nucleus'), ('neun', 1, 'nucleus'), ('calbindin', 2, 'cytoplasm')]

In [93]:
# Define the nuclei channel order ('Remember in Python one starts counting from zero')
nuclei_channel = 3

# Type the marker you want to analyze from the list above
marker_name = 'neun'

# Define the intensity min_max range within which a cell is considered positive for a marker
# Useful for i.e. ignoring unspecific bright spots
min_max_range = (50, 115)

# Define your nuclei diameter, it speeds up nuclei detection, if unknown leave it as None
cellpose_nuclei_diameter = None

# Define the amount of blur applied to nuclei
# Blurs the mip_nuclei image to even out high intensity foci within the nucleus, the higher the value blurriness increases
# High values help segment sparse nuclei (CA and CTX regions) but as a drawback it merges nuclei entities that are very close together (DG region)
gaussian_sigma = 0

# Sets the amount of erosion that is applied to areas where the marker+ signal colocalizes with nuclear signal
# The higher the value, the stricter the conditions to consider a nuclei as marker+
erosion_factor = 4

# Define the amount of pixels you want to add around the nuclei to simulate the cytoplasm
cytoplasm_dilation_radius = 2

# Retrieve the first and second values (channel and location) of the corresponding tuple in markers
for item in markers:
    if item[0] == marker_name:
        marker_channel = item[1]
        location = item[2]
        break  # Stop searching once the marker is found

# Slice the nuclei and marker stack
nuclei_img = img_mip[nuclei_channel, :, :]
marker_img = img_mip[marker_channel, :, :]

<h3>Mask the input image with the user defined labels and extract data</h3>

In [94]:
# Construct ROI and nuclei predictions paths from directory_path above
roi_path = directory_path / "ROIs"
nuclei_preds_path =  directory_path / "nuclei_preds" / segmentation_type / model_name

# Extract the experiment name from the data directory path
experiment_id = directory_path.name

# List of subfolder names
try:
    roi_names = [folder.name for folder in roi_path.iterdir() if folder.is_dir()]

except FileNotFoundError:
    roi_names = ["full_image"]
        
print(f"The following regions of interest will be analyzed: {roi_names}")

# Create an empty list to store all stats extracted from each image
stats = []

for roi_name in tqdm(roi_names):

    # Read the user defined ROIs, in case of full image analysis generate a label covering the entire image
    try:
        # Read previously defined ROIs
        user_roi = tifffile.imread(roi_path / roi_name / f"{filename}.tiff")

    except FileNotFoundError:
        # Extract the xy dimensions of the input image 
        img_shape = img_mip.shape
        img_xy_dims = img_shape[-2:]

        # Create a label covering the entire image
        user_roi = np.ones(img_xy_dims).astype(np.uint8)

    # Read previously predicted nuclei labels, if not present generate nuclei predictions and save them
    try:
        # Read the nuclei predictions per ROI
        nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{filename}.tiff")

    except FileNotFoundError:

        # Slice the nuclei stack
        nuclei_img = img_mip[nuclei_channel, :, :]

        # We will create a mask where roi is greater than or equal to 1
        mask = (user_roi >= 1).astype(np.uint8)

        # Apply the mask to nuclei_img and marker_img, setting all other pixels to 0
        masked_nuclei_img = np.where(mask, nuclei_img, 0)

        # Segment nuclei and return labels
        nuclei_labels = segment_nuclei_2d(masked_nuclei_img, gaussian_sigma, cellpose_nuclei_diameter)

        # Save nuclei labels as .tiff files to reuse them later
        # Create nuclei_predictions directory if it does not exist
        try:
            os.makedirs(directory_path / "nuclei_preds" / roi_name)
        except FileExistsError:
            pass

        # Construct path to store
        path_to_store = nuclei_preds_path / roi_name / f"{filename}.tiff"

        # Save mask (binary image)
        tifffile.imwrite(path_to_store, nuclei_labels)

    # Add the predicted nuclei as labels into Napari
    viewer.add_labels(nuclei_labels, name=f"{roi_name}_nuclei")

    # Add the ROIs as labels into Napari
    viewer.add_labels(user_roi, name=f"{roi_name}_ROI", opacity=0.4)

    # Simulate a cytoplasm by growing the nuclei_labels
    if location == "cytoplasm":
        
        nuclei_labels = cle.dilate_labels(nuclei_labels, radius=cytoplasm_dilation_radius)
        nuclei_labels = cle.pull(nuclei_labels)

    # Select marker positive nuclei
    nuclei_and_marker, eroded_nuclei_and_marker, marker_mip, processed_region_labels = segment_marker_positive_nuclei (nuclei_labels, marker_img, min_max_range, erosion_factor)
    viewer.add_image(nuclei_and_marker, name=f"{roi_name}_{marker_name}_nuclei_coloc")
    viewer.add_image(eroded_nuclei_and_marker, name=f"{roi_name}_{marker_name}_nuclei_eroded")
    viewer.add_labels(processed_region_labels, name=f"{roi_name}_{marker_name}+_nuclei")

    # Extract your information of interest
    total_nuclei = len(np.unique(nuclei_labels)) - 1
    marker_pos_nuclei = len(np.unique(processed_region_labels)) - 1

    # Calculate "%_marker+_cells" and avoid division by zero errors
    try:
        perc_marker_pos_cells = (marker_pos_nuclei * 100) / total_nuclei
    except ZeroDivisionError:
        perc_marker_pos_cells = 0

    # Create a dictionary containing all extracted info per masked image
    stats_dict = {
                "filename": filename,
                "ROI": roi_name,
                "marker": marker_name,
                "marker_location":location,
                "total_nuclei": total_nuclei,
                "marker+_nuclei": marker_pos_nuclei,
                "%_marker+_cells": perc_marker_pos_cells,
                "nuclei_ch": nuclei_channel,
                "marker_ch": marker_channel,
                "marker_min_max": min_max_range,
                "cytoplasm_dilation":cytoplasm_dilation_radius,
                "erosion_factor": erosion_factor,
                "cellpose_nuclei_diameter": cellpose_nuclei_diameter,
                "gaussian_sigma": gaussian_sigma,
                "slicing_factor": slicing_factor
                }

    # Append the current data point to the stats_list
    stats.append(stats_dict)    

The following regions of interest will be analyzed: ['CA', 'DG']


100%|██████████| 2/2 [00:11<00:00,  5.57s/it]


In [95]:
def unique_with_filtering(arr):
    "Finds unique values in an array and removes background label (0)"
    return np.unique(arr[arr != 0])

positive_labels = unique_with_filtering(processed_region_labels)

positive_labels

array([   2,    3,    5,    7,   10,   11,   15,   20,   26,   27,   28,
         29,   33,   35,   38,   39,   41,   42,   46,   47,   48,   53,
         56,   58,   59,   62,   63,   64,   68,   71,   72,   73,   74,
         76,   77,   78,   81,   82,   83,   85,   86,   87,   89,   92,
         93,   95,   97,   98,   99,  100,  102,  103,  104,  105,  106,
        108,  109,  111,  113,  114,  116,  117,  121,  122,  124,  127,
        128,  129,  130,  131,  133,  134,  139,  141,  142,  143,  145,
        146,  148,  150,  151,  152,  153,  154,  155,  156,  158,  159,
        160,  161,  163,  164,  166,  167,  168,  171,  173,  174,  177,
        178,  183,  184,  185,  186,  188,  189,  190,  193,  194,  195,
        196,  197,  200,  202,  203,  209,  211,  212,  214,  215,  216,
        218,  219,  220,  222,  223,  227,  228,  230,  231,  232,  233,
        234,  235,  237,  238,  239,  241,  242,  243,  244,  245,  246,
        250,  252,  254,  255,  259,  261,  262,  2

In [96]:
# Get unique positive labels
positive_labels = unique_with_filtering(processed_region_labels)

# Generate the label column with all labels
max_label = nuclei_labels.max()
label_column = np.arange(1, max_label + 1)

# Check if positive_labels is in label_column and set values to True 
channel_column = np.isin(label_column, positive_labels)

# Create the DataFrame to hold per label data
df = pd.DataFrame({
    "filename": filename,
    "ROI":roi_name,
    'label': label_column,
    marker_name: channel_column
})

df

Unnamed: 0,filename,ROI,label,neun
0,HI1_CONTRA_M8_S6_TR1,DG,1,False
1,HI1_CONTRA_M8_S6_TR1,DG,2,True
2,HI1_CONTRA_M8_S6_TR1,DG,3,True
3,HI1_CONTRA_M8_S6_TR1,DG,4,False
4,HI1_CONTRA_M8_S6_TR1,DG,5,True
...,...,...,...,...
1579,HI1_CONTRA_M8_S6_TR1,DG,1580,False
1580,HI1_CONTRA_M8_S6_TR1,DG,1581,False
1581,HI1_CONTRA_M8_S6_TR1,DG,1582,False
1582,HI1_CONTRA_M8_S6_TR1,DG,1583,False


In [97]:
max_label

1584

In [98]:
label_column

array([   1,    2,    3, ..., 1582, 1583, 1584])

<h3>Data saving</h3>


In [99]:
# Define output folder for results
results_folder = Path("results") / experiment_id / segmentation_type / model_name

# Create the necessary folder structure if it does not exist
try:
    os.makedirs(str(results_folder))
    print(f"Output folder created: {results_folder}")
except FileExistsError:
    print(f"Output folder already exists: {results_folder}")

# Transform into a dataframe to store it as .csv later
df = pd.DataFrame(stats)

# Define the .csv path
csv_path = results_folder / f"SP_marker_+_label_coloc.csv"

# Append to the .csv with new data points each round
df.to_csv(csv_path, mode="a", index=False, header=not os.path.isfile(csv_path))

# Show the updated .csv 
csv_df = pd.read_csv(csv_path)

csv_df

Output folder already exists: results\test_data\2D\Cellpose


Unnamed: 0,filename,ROI,marker,marker_location,total_nuclei,marker+_nuclei,%_marker+_cells,nuclei_ch,marker_ch,marker_min_max,cytoplasm_dilation,erosion_factor,cellpose_nuclei_diameter,gaussian_sigma,slicing_factor
0,HI1_CONTRA_M8_S6_TR1,CA,neun,nucleus,1270,302,23.779528,3,1,"(50, 115)",2,4,,0,
1,HI1_CONTRA_M8_S6_TR1,DG,neun,nucleus,1584,881,55.618687,3,1,"(50, 115)",2,4,,0,
2,HI1_CONTRA_M8_S6_TR1,CA,neun,nucleus,1270,302,23.779528,3,1,"(50, 115)",2,4,,0,
3,HI1_CONTRA_M8_S6_TR1,DG,neun,nucleus,1584,881,55.618687,3,1,"(50, 115)",2,4,,0,
