<h2>3D stack - Batch Processing - Marker+ based on colocalization</h2>

The following notebook is able to process a 3D stack (.czi or .nd2 files) into a MIP and allows you to:

1. Read previously defined ROIs, if not present, full image is analyzed.
2. Read previously predicted nuclei labels, if not present, generates them.
3. Extract numbers of cells positive for a marker based on colocalization (using a user-defined threshold).
4. Save number and % of positive cells in a .csv file (BP_marker_+_label_coloc).

In [None]:
from pathlib import Path
import tifffile
import os
from tqdm import tqdm
import numpy as np
import pandas as pd
from stardist.models import StarDist3D
import pyclesperanto_prototype as cle
from utils_stardist import get_gpu_details, list_images, read_image, extract_nuclei_stack, maximum_intensity_projection, segment_marker_positive_nuclei, segment_nuclei_3d

get_gpu_details()
cle.select_device("RTX")

<h3>Define the directory where your images are stored (.nd2 or .czi files)</h3>

In [None]:
# Copy the path where your images are stored, you can use absolute or relative paths to point at other disk locations
directory_path = Path("../raw_data/test_data")

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
# Remember in Python one starts counting from 0, so your first channel will be 0
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# Iterate through the .czi and .nd2 files in the directory
images = list_images(directory_path)

images

<h3>Define your batch analysis parameters</h3>

If you have generated nuclei predictions already, make sure to input the same <code>slicing factor</code> you used when generating nuclei predictions. 

If you have not generated nuclei predictions before, input <code>nuclei_channel</code>, <code>cellpose_nuclei_diameter</code> and <code>gaussian_sigma</code> values.

In [None]:
# Explore each image to analyze (0 defines the first image in the directory)
image = images[0]

# Image size reduction to improve processing times (slicing, not lossless compression)
slicing_factor = None # Use 2 or 4 for compression (None for lossless)

# Define the nuclei and markers of interest channel order ('Remember in Python one starts counting from zero')
nuclei_channel = 3
n_tiles=(6,6,3)

# Segmentation type ("2D" or "3D"). 
# 2D takes a z-stack as input, performs MIP (Maximum Intensity Projection) and predicts nuclei from the resulting projection (faster, useful for single layers of cells)
# 3D is more computationally expensive. Predicts 3D nuclear volumes, useful for multilayered structures
segmentation_type = "3D"

# Nuclear segmentation model type ("Stardist")
# Choose your Stardist fine-tuned model (model_name) from stardist_models folder
model_name = "MEC0.1"

# Model loading 
model = StarDist3D(None, name=model_name, basedir='stardist_models') 

Define in <code>parameters_per_marker</code> the <code>marker</code> you want to use to define your cell populations of interest, the <code>min_max</code> range of pixel intensity values and the <code>population</code> name.

In addition, set the <code>erosion_factor</code> and <code>cytoplasm_dilation_radius</code> for each marker you want to analyze.

In [None]:
# min_max range defines the pixel intensity range within which a cell is considered positive for a marker

# erosion_factor sets the amount of erosion that is applied to areas where the marker+ signal colocalizes with nuclear or cytoplasmic signal
# The higher the value, the stricter the conditions to consider a nuclei as marker+

# cytoplasm_dilation_radius sets the amount of pixels you want to add around the nuclei to simulate the cytoplasm

parameters_per_marker = [{"marker": "ki67", "min_max_range": (200, 255), "population": "ki67", "erosion_factor":4, "cytoplasm_dilation_radius":0},
                      {"marker": "neun", "min_max_range": (50, 115), "population": "neun_low", "erosion_factor":4, "cytoplasm_dilation_radius":0},
                      {"marker": "neun", "min_max_range": (115, 255), "population": "neun_high", "erosion_factor":4, "cytoplasm_dilation_radius":0},
                      {"marker": "calbindin", "min_max_range": (65, 255),"population": "calbindin", "erosion_factor":4,  "cytoplasm_dilation_radius":2}]

<h3>Run Batch Analysis</h3>

In [None]:
# Construct ROI and nuclei predictions paths from directory_path above
roi_path = directory_path / "ROIs"
nuclei_preds_path =  directory_path / "nuclei_preds" / segmentation_type / model_name

# Extract the experiment name from the data directory path
experiment_id = directory_path.name

# List of subfolder names
try:
    roi_names = [folder.name for folder in roi_path.iterdir() if folder.is_dir()]

except FileNotFoundError:
    roi_names = ["full_image"]
        
print(f"The following regions of interest will be analyzed: {roi_names}")

for image in tqdm(images):

    # Create an empty list to store all stats extracted from each image
    stats = []

    # Generate maximum intensity projection and extract filename
    img, filename = read_image (image, slicing_factor)

    # Generate maximum intensity projection 
    img_mip = maximum_intensity_projection(img)

    for roi_name in roi_names:

        print(f"\nAnalyzing ROI: {roi_name}")

        # Read the user defined ROIs, in case of full image analysis generate a label covering the entire image
        try:
            # Read previously defined ROIs
            user_roi = tifffile.imread(roi_path / roi_name / f"{filename}.tiff")

        except FileNotFoundError:
            # Extract the xy dimensions of the input image 
            img_shape = img_mip.shape
            img_xy_dims = img_shape[-2:]

            # Create a label covering the entire image
            user_roi = np.ones(img_xy_dims).astype(np.uint8)

        # Read previously predicted nuclei labels, if not present generate nuclei predictions and save them
        try:
            # Read the nuclei predictions per ROI
            nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{filename}.tiff")
            print(f"Pre-computed nuclei labels found for {filename}")

        except FileNotFoundError:
            print(f"Generating nuclei labels for {filename}")

            # Slice the nuclei stack
            nuclei_img = extract_nuclei_stack(img, nuclei_channel)

            # We will create a mask where roi is greater than or equal to 1
            mask = (user_roi >= 1).astype(np.uint8)

            # 3D segmentation logic, extend 2D mask across the entire stack volume
            if segmentation_type == "3D":
                # Extract the number of z-slices to extend the mask
                slice_nr = img.shape[1]
                # Extend the mask across the entire volume
                mask = np.tile(mask, (slice_nr, 1, 1))
                # Apply the mask to nuclei_img, setting all other pixels to 0
                masked_nuclei_img = np.where(mask, nuclei_img, 0)
            else:
                # Apply the mask to nuclei_img, setting all other pixels to 0
                masked_nuclei_img = np.where(mask, nuclei_img, 0)

            # Segment nuclei and return labels
            nuclei_labels = segment_nuclei_3d(masked_nuclei_img, model, n_tiles)

            # Save nuclei labels as .tiff files to reuse them later
            try:
                os.makedirs(nuclei_preds_path / roi_name, exist_ok=True)
            except Exception as e:
                print(f"Error creating directory {nuclei_preds_path / roi_name}: {e}")

            # Construct path to store
            path_to_store = nuclei_preds_path / roi_name / f"{filename}.tiff"
            print(f"Saving nuclei labels to {path_to_store}")
            try:
                tifffile.imwrite(path_to_store, nuclei_labels)
            except Exception as e:
                print(f"Error saving file {path_to_store}: {e}")

        for marker_parameters in parameters_per_marker:

            # Extract info from list of dictionaries and open marker_img
            marker_name = marker_parameters["marker"]
            min_max_range = marker_parameters["min_max_range"]
            population = marker_parameters["population"]
            erosion_factor = marker_parameters["erosion_factor"]
            cytoplasm_dilation_radius = marker_parameters["cytoplasm_dilation_radius"]

            print(f"Analyzing marker/population: {marker_name}/{population}")

            # Retrieve the first and second values (channel and location) of the corresponding tuple in markers
            for item in markers:
                if item[0] == marker_name:
                    marker_channel = item[1]
                    location = item[2]
                    break  # Stop searching once the marker is found

            # Access the corresponding marker intensity image
            marker_img = img[marker_channel, :, :, :]

            # Simulate a cytoplasm by growing the nuclei_labels if cytoplasmic marker location 
            if location == "cytoplasm":
                print(f"Generating cytoplasm labels for: {marker_name}")
                nuclei_labels = cle.dilate_labels(nuclei_labels, radius=cytoplasm_dilation_radius)
                nuclei_labels = cle.pull(nuclei_labels)

            # Select marker positive nuclei
            nuclei_and_marker, eroded_nuclei_and_marker, marker_mip, processed_region_labels = segment_marker_positive_nuclei (nuclei_labels, marker_img, min_max_range, erosion_factor)

            # Extract your information of interest
            total_nuclei = len(np.unique(nuclei_labels)) - 1
            marker_pos_nuclei = len(np.unique(processed_region_labels)) - 1

            # Calculate "%_marker+_cells" and avoid division by zero errors
            try:
                perc_marker_pos_cells = (marker_pos_nuclei * 100) / total_nuclei
            except ZeroDivisionError:
                perc_marker_pos_cells = 0

            # Create a dictionary containing all extracted info per masked image
            stats_dict = {
                        "filename": filename,
                        "ROI": roi_name,
                        "population": population,
                        "marker": marker_name,
                        "marker_location":location,
                        "total_nuclei": total_nuclei,
                        "marker+_nuclei": marker_pos_nuclei,
                        "%_marker+_cells": perc_marker_pos_cells,
                        "nuclei_ch": nuclei_channel,
                        "marker_ch": marker_channel,
                        "min_max_avg_int": min_max_range,
                        "cytoplasm_dilation":cytoplasm_dilation_radius,
                        "erosion_factor": erosion_factor,
                        "slicing_factor": slicing_factor
                        }

            # Append the current data point to the stats_list
            stats.append(stats_dict)

        # Define output folder for results
        results_folder = Path("results") / experiment_id / segmentation_type / model_name

        # Create the necessary folder structure if it does not exist
        try:
            os.makedirs(str(results_folder))
        except FileExistsError:
            pass

    # Transform into a dataframe to store it as .csv later
    df = pd.DataFrame(stats)

    # Define the .csv path
    csv_path = results_folder / f"BP_marker_+_label_coloc.csv"

    # Append to the .csv with new data points each round
    df.to_csv(csv_path, mode="a", index=False, header=not os.path.isfile(csv_path))

# Show the updated .csv 
csv_df = pd.read_csv(csv_path)

csv_df    