<h2>3D stack - Single image - Marker+ based on average intensity</h2>

The following notebook is able to process a 3D stack (.czi or .nd2 files) and allows you to:

1. Inspect your images in Napari.
2. Define ROIs if needed.
3. Read previously defined ROIs, if not present, full image is analyzed.
4. Read previously predicted nuclei labels, if not present, generates them.
5. Plot cytoplasmic and nuclear average intensities of all markers in all ROIs to later on set a threshold for classification.
6. Extract numbers of cells positive for all marker based on signal average intensity within the nuclear or cytoplasmic compartments (using a user-defined min-max range).
6. Display positive cells in Napari.
7. Extract and save per label per ROI per marker data in a .csv file (filename_per_label_avg_int.csv).
8. Extract and save number of positive cells in a .csv file (SP_marker_+_label_avg_int.csv).

In [27]:
from pathlib import Path
import tifffile
import napari
import os
from tqdm import tqdm
import numpy as np
import pandas as pd
from skimage.measure import regionprops_table
import plotly.express as px
from stardist.models import StarDist3D
from utils_stardist import get_gpu_details, list_images, read_image, extract_nuclei_stack, maximum_intensity_projection, save_rois, simulate_cytoplasm_chunked_3d, segment_nuclei_3d

get_gpu_details()

Device name: /device:GPU:0
Device type: GPU
GPU model: device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9


<h3>Define the directory for your images (.nd2 or .czi files) and cell marker info</h3>

In [28]:
# Copy the path where your images are stored, you can use absolute or relative paths to point at other disk locations
directory_path = Path("../raw_data/test_data")

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
# Remember in Python one starts counting from 0, so your first channel will be 0
# markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# Iterate through the .czi and .nd2 files in the raw_data directory
images = list_images(directory_path)

images

['..\\raw_data\\test_data\\HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi',
 '..\\raw_data\\test_data\\HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 2.czi',
 '..\\raw_data\\test_data\\HI 1  Contralateral Mouse 8  slide 7 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi',
 '..\\raw_data\\test_data\\HI 1  Contralateral Mouse 8 slide 7 Neun Red Calb Green KI67 Magenta 40x technical replica 2.czi',
 '..\\raw_data\\test_data\\HI 1  Ipsilateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi',
 '..\\raw_data\\test_data\\HI 1  Ipsilateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 2.czi',
 '..\\raw_data\\test_data\\HI 1 Ipsilateral Mouse 8 slide 7 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi',
 '..\\raw_data\\test_data\\HI 2  Ipsilateral Mouse 10 slide 10 Neun Red Calb Green KI67 Magenta 40x technical replica 2.

<h3>Open each image in the directory</h3>
You can do so by changing the number within the brackets below <code>image = images[0]</code>. Make sure to input the same <code>slicing factor</code> you used when generating nuclei predictions. 

If you have not generated nuclei predictions already, input <code>nuclei_channel</code>, <code>cellpose_nuclei_diameter</code> and <code>gaussian_sigma</code> values.

In [29]:
# Explore each image to analyze (0 defines the first image in the directory)
image = images[0]

# Image size reduction to improve processing times (slicing, not lossless compression)
slicing_factor = None # Use 2 or 4 for compression (None for lossless)

# Define the nuclei and markers of interest channel order ('Remember in Python one starts counting from zero')
nuclei_channel = 3

n_tiles=(6,6,3)

# Segmentation type ("2D" or "3D"). 
# 2D takes a z-stack as input, performs MIP (Maximum Intensity Projection) and predicts nuclei from the resulting projection (faster, useful for single layers of cells)
# 3D is more computationally expensive. Predicts 3D nuclear volumes, useful for multilayered structures
segmentation_type = "3D"

# Nuclear segmentation model type ("Stardist")
# Choose your Stardist fine-tuned model (model_name) from stardist_models folder
model_name = "MEC0.1"

# Model loading 
model = StarDist3D(None, name=model_name, basedir='stardist_models') 

# Generate maximum intensity projection and extract filename
img, filename = read_image (image, slicing_factor)

# Slice the nuclei stack
nuclei_img = extract_nuclei_stack(img, nuclei_channel)

# Generate maximum intensity projection 
img_mip = maximum_intensity_projection(img)

# Show image in Napari
viewer = napari.Viewer(ndisplay=2)
viewer.add_image(img_mip)

Loading network weights from 'weights_best.h5'.
Loading thresholds from 'thresholds.json'.
Using default values: prob_thresh=0.583933, nms_thresh=0.3.


Image analyzed: HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1
Original Array shape: (4, 14, 3803, 2891)
Compressed Array shape: (4, 14, 3803, 2891)


<Image layer 'img_mip' at 0x2f6feb4ceb0>

<h3>Label your regions of interest in Napari and explore the signal of your marker of interest</h3>

Make sure to set <code>n edit dim = 3</code> so the label propagates across all channels. Name your regions of interest as i.e. <code>DG</code>, <code>CA1</code>, <code>CA3</code> or <code>HIPPO</code>. If you do not draw any ROI the entire image will be analyzed.

<video controls>
  <source src="../assets/napari_labels.mp4" type="video/mp4">
  Your browser does not support the video tag.
</video>

<h3>Save user-defined label ROIs as .tiff files</h3>

In [30]:
save_rois(viewer, directory_path, filename)

No user-defined ROIs have been stored


<h3>Extract average intensities of markers within each defined cell compartment and ROI</h3>

In [31]:
# Add the 3D-stack into Napari
if segmentation_type == "3D":
    # Remove the 'img_mip' layer if it exists
    if 'img_mip' in viewer.layers:
        viewer.layers.remove('img_mip')
    # Add the 'img' stack
    viewer.add_image(img)

# Construct ROI and nuclei predictions paths from directory_path above
roi_path = directory_path / "ROIs"
nuclei_preds_path =  directory_path / "nuclei_preds" / segmentation_type / model_name

# Extract the experiment name from the data directory path
experiment_id = directory_path.name

# List of subfolder names
try:
    roi_names = [folder.name for folder in roi_path.iterdir() if folder.is_dir()]
except FileNotFoundError:
    roi_names = ["full_image"]

print(f"The following regions of interest will be analyzed: {roi_names}")

# Initialize an empty list to hold the extracted dataframes on a per ROI basis
per_roi_props = []

for roi_name in tqdm(roi_names):
    print(f"\nAnalyzing ROI: {roi_name}")

    # Initialize an empty list to hold the extracted dataframes on a per channel basis
    props_list = []

    # Read the user defined ROIs, in case of full image analysis generate a label covering the entire image
    try:
        # Read previously defined ROIs
        user_roi = tifffile.imread(roi_path / roi_name / f"{filename}.tiff")
    except FileNotFoundError:
        # Extract the xy dimensions of the input image 
        img_shape = img.shape
        img_xy_dims = img_shape[-2:]

        # Create a label covering the entire image
        user_roi = np.ones(img_xy_dims).astype(np.uint8)
 
    # Read previously predicted nuclei labels, if not present generate nuclei predictions and save them
    try:
        # Read the nuclei predictions per ROI
        nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{filename}.tiff")
        print(f"Pre-computed nuclei labels found for {filename}")
    except FileNotFoundError:
        print(f"Generating nuclei labels for {filename}")

        # We will create a mask where roi is greater than or equal to 1
        mask = (user_roi >= 1).astype(np.uint8)

        # 3D segmentation logic, extend 2D mask across the entire stack volume
        if segmentation_type == "3D":
            # Extract the number of z-slices to extend the mask
            slice_nr = img.shape[1]
            # Extend the mask across the entire volume
            mask = np.tile(mask, (slice_nr, 1, 1))
            # Apply the mask to nuclei_img, setting all other pixels to 0
            masked_nuclei_img = np.where(mask, nuclei_img, 0)
        else:
            # Apply the mask to nuclei_img, setting all other pixels to 0
            masked_nuclei_img = np.where(mask, nuclei_img, 0)

        # Segment nuclei and return labels
        nuclei_labels = segment_nuclei_3d(masked_nuclei_img, model, n_tiles)

        # Save nuclei labels as .tiff files to reuse them later
        try:
            os.makedirs(nuclei_preds_path / roi_name, exist_ok=True)
        except Exception as e:
            print(f"Error creating directory {nuclei_preds_path / roi_name}: {e}")

        # Construct path to store
        path_to_store = nuclei_preds_path / roi_name / f"{filename}.tiff"
        print(f"Saving nuclei labels to {path_to_store}")
        try:
            tifffile.imwrite(path_to_store, nuclei_labels)
        except Exception as e:
            print(f"Error saving file {path_to_store}: {e}")

    # Add the predicted nuclei as labels into Napari
    viewer.add_labels(nuclei_labels, name=f"{roi_name}_nuclei")

    # Add the ROIs as labels into Napari
    viewer.add_labels(user_roi, name=f"{roi_name}_ROI", opacity=0.4)

    # Create a dictionary containing all image descriptors
    descriptor_dict = {"filename": filename, "ROI": roi_name}

    # Loop through each channel and extract the average intensity within either nuclei or cytoplasmic regions
    for channel_name, ch_nr, location in tqdm(markers):
        print(f"Analyzing channel: {channel_name}")

        if location == "cytoplasm":
            print(f"Generating cytoplasm labels for: {channel_name}")
            # Simulate a cytoplasm by dilating the nuclei and subtracting the nuclei mask afterwards
            cytoplasm_labels = simulate_cytoplasm_chunked_3d(nuclei_labels, dilation_radius=2, erosion_radius=0, chunk_size=(1, 1024, 1024))
            # Add the predicted cytoplasm labels into Napari
            viewer.add_labels(cytoplasm_labels, name=f"{roi_name}_cytoplasm")

            # Extract intensity information from each marker channel
            props = regionprops_table(label_image=cytoplasm_labels,
                                      intensity_image=img[ch_nr],
                                      properties=["label", "intensity_mean"])
        elif location == "nucleus":
            # Extract intensity information from each marker channel
            props = regionprops_table(label_image=nuclei_labels,
                                      intensity_image=img[ch_nr],
                                      properties=["label", "intensity_mean"])

        # Convert to dataframe
        props_df = pd.DataFrame(props)

        # Rename intensity_mean column to indicate the specific image
        props_df.rename(columns={"intensity_mean": f"{location}_{channel_name}_avg_int"}, inplace=True)

        # Append each props_df to props_list
        props_list.append(props_df)

    # Initialize the df with the first df in the list
    props_df = props_list[0]
    # Start looping from the second df in the list
    for df in props_list[1:]:
        props_df = props_df.merge(df, on="label")

    # Add each key-value pair from descriptor_dict to props_df at the specified position
    insertion_position = 0
    for key, value in descriptor_dict.items():
        props_df.insert(insertion_position, key, value)
        insertion_position += 1  # Increment position to maintain the order of keys in descriptor_dict

    # Append each props_df to per_roi_props
    per_roi_props.append(props_df)

# Concatenate all per_roi_props into final_df
final_df = pd.concat(per_roi_props, ignore_index=True)


The following regions of interest will be analyzed: ['full_image']


  0%|          | 0/1 [00:00<?, ?it/s]


Analyzing ROI: full_image
Pre-computed nuclei labels found for HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1




Analyzing channel: ki67




Analyzing channel: neun




Analyzing channel: calbindin
Generating cytoplasm labels for: calbindin


100%|██████████| 3/3 [00:05<00:00,  1.92s/it]
100%|██████████| 1/1 [00:07<00:00,  7.05s/it]


<h3>Save per label per ROI per marker data</h3>

In [32]:
# Create a 'results' folder in the root directory
results_folder = Path("results") / experiment_id / segmentation_type / model_name

try:
    os.makedirs(results_folder)
    print(f"'{results_folder}' folder created successfully.")
except FileExistsError:
    print(f"'{results_folder}' folder already exists.")

# Save the df containing per_label results into a CSV file
final_df.to_csv(results_folder / f'{filename}_per_label_avg_int.csv')

'results\test_data\3D\MEC0.1' folder created successfully.


<h3>Plot average intensities</h3>

In [33]:
# Select all column names in 'final_df' that contain the substring 'avg_int'
avg_int_columns = [col for col in final_df.columns if 'avg_int' in col]

# Loop over all extracted channel average intensities
for column_name in avg_int_columns:

    # Plot the average_intensity distribution in order to make an informed decision on the threshold
    for roi_name in roi_names:

        # Filter rows in final_df where ROI matches roi_name 
        filtered_df = final_df[(final_df["ROI"] == roi_name)]

        # Get the values of the 'label' column in filtered_df as a list
        avg_int_values = filtered_df[column_name]

        # Plot a histogram with 256 bins and a title indicating the column and ROI
        fig = px.histogram(avg_int_values, nbins=256, 
                           title=f"{column_name}_in_{roi_name}",
                           labels={'value': column_name, 'count': 'Frequency'},
                           range_x=[0, 255])
        
        # Show the plot
        fig.show() 

<h3>Select cells positive for a marker based on average intensity</h3>

Define in <code>min_max_per_marker</code> the <code>marker</code> you want to use to define your cell populations of interest, the <code>min_max</code> range of avg_int and the <code>population</code> name.


In [34]:
# You can choose markers from the following list
markers

[('ki67', 0, 'nucleus'), ('neun', 1, 'nucleus'), ('calbindin', 2, 'cytoplasm')]

In [35]:
# Create a dictionary with min_max_avg_int for each channel
# Give the possibility to define populations for the same marker (i.e. neun high and neun low)
# max_values are set to 255 since the test input images are 8-bit, higher bit depths can result in higher max avg_int values

min_max_per_marker = [{"marker": "ki67", "min_max": (110,255), "population":"ki67"},
                      {"marker": "neun", "min_max": (20,80), "population":"neun_low"},
                      {"marker": "neun", "min_max": (80,255), "population":"neun_high"},
                      {"marker": "calbindin", "min_max": (25,255), "population":"calbindin"},]

In [36]:
# Create an empty list to store all stats extracted from each image
stats = []

for marker_analysis in min_max_per_marker:

    marker = marker_analysis["marker"]
    min_max_avg_int = marker_analysis["min_max"]
    population = marker_analysis["population"]

    # Retrieve the column name from which the avg_int values should be read
    for column in avg_int_columns:
        if marker in column:
            column_name = column

    for roi_name in roi_names:

        # Initialize an empty list to hold the extracted dataframes on a per channel basis
        props_list = []

        # Retrieve the first and second values (channel and location) of the corresponding tuple in markers
        for item in markers:
            if item[0] == marker:
                channel = item[1]
                location = item[2]
                break  # Stop searching once the marker is found

        # Read the nuclei predictions per ROI
        nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{filename}.tiff")

        # Filter rows in final_df where ROI matches roi_name and column_name is higher than threshold
        filtered_df = final_df[(final_df["ROI"] == roi_name) & (final_df[column_name] > min_max_avg_int[0]) & (final_df[column_name] < min_max_avg_int[1])]

        # Get the values of the 'label' column in filtered_df as a list
        label_values = filtered_df["label"].tolist()

        # Create a boolean mask where each element is True if the corresponding value in 'nuclei_labels' 
        # is found in 'label_values', and False otherwise
        mask = np.isin(nuclei_labels, label_values)

        # Use the mask to set values in 'nuclei_labels' that are not in 'label_values' to 0,
        # creating a new array 'filtered_labels' with only the specified values retained
        filtered_labels = np.where(mask, nuclei_labels, 0)

        viewer.add_labels(filtered_labels, name=f"{population}_+_in_{roi_name}")

        # Extract your information of interest
        total_nuclei = len(np.unique(nuclei_labels)) - 1
        marker_pos_nuclei = len(np.unique(filtered_labels)) - 1

        # Calculate "%_marker+_cells" and avoid division by zero errors
        try:
            perc_marker_pos_cells = (marker_pos_nuclei * 100) / total_nuclei
        except ZeroDivisionError:
            perc_marker_pos_cells = 0

        # Create a dictionary containing all extracted info per masked image
        stats_dict = {
                    "filename": filename,
                    "ROI": roi_name,
                    "based_on": column_name,
                    "marker": marker,
                    "population":population,
                    "marker_ch": channel,
                    "location": location,
                    "min_max_avg_int": min_max_avg_int,
                    "total_nuclei": total_nuclei,
                    "marker+_nuclei": marker_pos_nuclei,
                    "%_marker+_cells": perc_marker_pos_cells,
                    "slicing_factor": slicing_factor
                    }
        
        # Append the current data point to the stats_list
        stats.append(stats_dict)  

<h3>Save number of positive cells based on each marker</h3>

In [37]:
# Create a 'results' folder in the root directory
results_folder = Path("results") / experiment_id / segmentation_type / model_name

# Create the necessary folder structure if it does not exist
try:
    os.mkdir(str(results_folder))
    print(f"Output folder created: {results_folder}")
except FileExistsError:
    print(f"Output folder already exists: {results_folder}")

# Transform into a dataframe to store it as .csv later
df = pd.DataFrame(stats)

# Define the .csv path
csv_path = results_folder / f"SP_marker_+_label_avg_int.csv"

# Append to the .csv with new data points each round
df.to_csv(csv_path, mode="a", index=False, header=not os.path.isfile(csv_path))

# Show the updated .csv 
csv_df = pd.read_csv(csv_path)

csv_df

Output folder already exists: results\test_data\3D\MEC0.1


Unnamed: 0,filename,ROI,based_on,marker,population,marker_ch,location,min_max_avg_int,total_nuclei,marker+_nuclei,%_marker+_cells,slicing_factor
0,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,full_image,nucleus_ki67_avg_int,ki67,ki67,0,nucleus,"(110, 255)",6152,455,7.395969,
1,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,full_image,nucleus_neun_avg_int,neun,neun_low,1,nucleus,"(20, 80)",6152,2030,32.997399,
2,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,full_image,nucleus_neun_avg_int,neun,neun_high,1,nucleus,"(80, 255)",6152,1286,20.903771,
3,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,full_image,cytoplasm_calbindin_avg_int,calbindin,calbindin,2,cytoplasm,"(25, 255)",6152,260,4.226268,
