<h2>3D stack to 2D MIP - Single image - Marker+ based on average intensity</h2>

The following notebook is able to process a 3D stack (.czi or .nd2 files) into a MIP and allows you to:

1. Inspect your images in Napari.
2. Read previously defined ROIs in notebook 1, if not present, full image is analyzed.
3. Read previously predicted nuclei labels in notebook 1, if not present, generates them.
4. Plot cytoplasmic and nuclear average intensities of all markers in all ROIs to later on set a threshold for classification.
5. Extract numbers of cells positive for a marker based on signal average intensity within the nuclear or cytoplasmic compartments (using a user-defined threshold).
6. Display positive cells in Napari.
7. Extract and save per label per ROI per marker data in a .csv file.
8. Extract and save number of positive cells in a .csv file (marker_+_label_avg_int_2D.csv).

In [43]:
from pathlib import Path
import tifffile
import napari
import os
from tqdm import tqdm
import numpy as np
import pandas as pd
from skimage.measure import regionprops_table
import plotly.express as px
from utils import list_images, read_image, simulate_cytoplasm, segment_nuclei_2d

<h3>Define the directory for your images (.nd2 or .czi files) and cell marker info</h3>

In [44]:
# Copy the path where your images are stored, ideally inside the raw_data directory
directory_path = Path("./raw_data/test_data")

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
# Remember in Python one starts counting from 0, so your first channel will be 0
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# Construct ROI and nuclei predictions paths from directory_path above
roi_path = directory_path / "ROIs"
nuclei_preds_path =  directory_path / "nuclei_preds"

# Iterate through the .czi and .nd2 files in the raw_data directory
images = list_images(directory_path)

images

['raw_data\\test_data\\HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi',
 'raw_data\\test_data\\HI 1  Ipsilateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1.czi']

<h3>Open each image in the directory</h3>
You can do so by changing the number within the brackets below <code>image = images[0]</code>. Make sure to input the same <code>slicing factor</code> you used when generating nuclei predictions. 

If you have not generated nuclei predictions input <code>nuclei_channel</code>, <code>cellpose_nuclei_diameter</code> and <code>gaussian_sigma</code> values.

In [45]:
# Explore each image to analyze (0 defines the first image in the directory)
image = images[0]

# Remember to input the same slicing factor you used when generating the nuclei predictions
slicing_factor = None # Use 2 or 4 for compression (None for lossless)

# Define the nuclei and markers of interest channel order ('Remember in Python one starts counting from zero')
nuclei_channel = 3

# Define your nuclei diameter, it speeds up nuclei detection, if unknown leave it as None
cellpose_nuclei_diameter = None

# Define the amount of blur applied to nuclei
# Blurs the mip_nuclei image to even out high intensity foci within the nucleus, the higher the value blurriness increases
# High values help segment sparse nuclei (CA and CTX regions) but as a drawback it merges nuclei entities that are very close together (DG region)
gaussian_sigma = 0 

# Generate maximum intensity projection and extract filename
img_mip, filename = read_image(image, slicing_factor)

# Extract filename with extension
file_and_ext = Path(image).stem

# Show image in Napari
viewer = napari.Viewer(ndisplay=2)
viewer.add_image(img_mip)

Image analyzed: HI 1  Contralateral Mouse 8  slide 6 Neun Red Calb Green KI67 Magenta 40x technical replica 1
Original Array shape: (4, 14, 3803, 2891)
MIP Array shape: (4, 3803, 2891)


<Image layer 'img_mip' at 0x1de55456f70>

In [46]:
# List of subfolder names
try:
    roi_names = [folder.name for folder in roi_path.iterdir() if folder.is_dir()]

except FileNotFoundError:
    roi_names = ["full_image"]
        
print(f"The following regions of interest will be analyzed: {roi_names}")

The following regions of interest will be analyzed: ['CA', 'DG']


<h3>Extract average intensities of markers within each defined cell compartment and ROI</h3>

In [47]:
# Initialize an empty list to hold the extracted dataframes on a per ROI basis
per_roi_props = []

for roi_name in tqdm(roi_names):

    # Initialize an empty list to hold the extracted dataframes on a per channel basis
    props_list = []

    # Read the user defined ROIs, in case of full image analysis generate a label covering the entire image
    try:
        # Read previously defined ROIs
        user_roi = tifffile.imread(roi_path / roi_name / f"{file_and_ext}.tiff")

    except FileNotFoundError:
        # Extract the xy dimensions of the input image (from nuclei_labels)
        img_shape = img_mip.shape
        img_xy_dims = img_shape[-2:]

        # Create a label covering the entire image
        user_roi = np.ones(img_xy_dims).astype(np.uint8)

    # Read previously predicted nuclei labels, if not present generate nuclei predictions and save them
    try:
        # Read the nuclei predictions per ROI
        nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{file_and_ext}.tiff")

    except FileNotFoundError:

        # Slice the nuclei stack
        nuclei_img = img_mip[nuclei_channel, :, :]

        # We will create a mask where roi is greater than or equal to 1
        mask = (user_roi >= 1).astype(np.uint8)

        # Apply the mask to nuclei_img and marker_img, setting all other pixels to 0
        masked_nuclei_img = np.where(mask, nuclei_img, 0)

        # Segment nuclei and return labels
        nuclei_labels = segment_nuclei_2d(masked_nuclei_img, gaussian_sigma, cellpose_nuclei_diameter)

        # Save nuclei labels as .tiff files to reuse them later
        # Create nuclei_predictions directory if it does not exist
        try:
            os.makedirs(directory_path / "nuclei_preds" / roi_name)
        except FileExistsError:
            pass

        # Construct path to store
        path_to_store = directory_path / "nuclei_preds" / roi_name / f"{filename}.tiff"

        # Save mask (binary image)
        tifffile.imwrite(path_to_store, nuclei_labels)

    # Add the predicted nuclei as labels into Napari
    viewer.add_labels(nuclei_labels, name=f"{roi_name}_nuclei")

    # Add the ROIs as labels into Napari
    viewer.add_labels(user_roi, name=f"{roi_name}_ROI", opacity=0.4)

    # Create a dictionary containing all image descriptors
    descriptor_dict = {
                "filename": filename,
                "ROI": roi_name,
                }

    # Loop through each channel and extract the average intensity within either nuclei or cytoplasmic regions
    for tuple in markers:

        channel_name = tuple[0]
        ch_nr = tuple[1]
        location = tuple[2]

        if location == "cytoplasm":

            # Simulate a cytoplasm by dilating the nuclei and substracting the nuclei mask afterwards
            cytoplasm_labels = simulate_cytoplasm(nuclei_labels, dilation_radius = 2, erosion_radius = 0)

            # Add the predicted nuclei as labels into Napari
            viewer.add_labels(cytoplasm_labels, name=f"{roi_name}_cytoplasm")

            # Extract intensity information from each marker channel
            props = regionprops_table(label_image=cytoplasm_labels,
                                intensity_image=img_mip[ch_nr],
                                properties=["label", "intensity_mean"])
        
        elif location == "nucleus":

            # Extract intensity information from each marker channel
            props = regionprops_table(label_image=nuclei_labels,
                                intensity_image=img_mip[ch_nr],
                                properties=["label", "intensity_mean"])
        

        # Convert to dataframe
        props_df = pd.DataFrame(props)

        # Rename intensity_mean column to indicate the specific image
        props_df.rename(columns={"intensity_mean": f"{location}_{channel_name}_avg_int"}, inplace=True)

        # Append each props_df to props_list
        props_list.append(props_df)

        # Initialize the df with the first df in the list
        props_df = props_list[0]
        # Start looping from the second df in the list
        for df in props_list[1:]:
            props_df = props_df.merge(df, on="label")

    # Add each key-value pair from descriptor_dict to props_df at the specified position
    insertion_position = 0    
    for key, value in descriptor_dict.items():
        props_df.insert(insertion_position, key, value)
        insertion_position += 1  # Increment position to maintain the order of keys in descriptor_dict

    # Append each props_df to props_list
    per_roi_props.append(props_df)

final_df = pd.concat(per_roi_props, ignore_index=True)

final_df

100%|██████████| 2/2 [00:18<00:00,  9.45s/it]


Unnamed: 0,filename,ROI,label,nucleus_ki67_avg_int,nucleus_neun_avg_int,cytoplasm_calbindin_avg_int
0,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,1,93.302419,100.520161,12.485149
1,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,2,109.193662,160.524648,14.086705
2,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,3,86.524465,91.558104,15.710280
3,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,4,60.469697,15.094697,8.805755
4,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,5,86.252907,97.027132,15.851145
...,...,...,...,...,...,...
3073,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,1743,94.752066,14.283747,14.752874
3074,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,1744,111.054545,155.880808,30.288000
3075,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,1745,136.708108,23.772973,15.714286
3076,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,1746,96.565116,38.776744,20.472868


<h3>Save per label per ROI per marker data</h3>

In [48]:
# Create a 'results' folder in the root directory
results_folder = 'results'

try:
    os.makedirs(results_folder)
    print(f"'{results_folder}' folder created successfully.")
except FileExistsError:
    print(f"'{results_folder}' folder already exists.")

# Save the df containing per_label results into a CSV file
final_df.to_csv(f'./results/{filename}_per_label.csv')

'results' folder already exists.


<h3>Plot average intensities</h3>

In [49]:
# Select all column names in 'final_df' that contain the substring 'avg_int'
avg_int_columns = [col for col in final_df.columns if 'avg_int' in col]

# Loop over all extracted channel average intensities
for column_name in avg_int_columns:

    # Plot the average_intensity distribution in order to make an informed decision on the threshold
    for roi_name in roi_names:

        # Filter rows in final_df where ROI matches roi_name 
        filtered_df = final_df[(final_df["ROI"] == roi_name)]

        # Get the values of the 'label' column in filtered_df as a list
        avg_int_values = filtered_df[column_name]

        # Plot a histogram with 256 bins and a title indicating the column and ROI
        fig = px.histogram(avg_int_values, nbins=256, 
                           title=f"{column_name}_in_{roi_name}",
                           labels={'value': column_name, 'count': 'Frequency'},
                           range_x=[0, 255])
        
        # Show the plot
        fig.show() 

<h3>Select cells positive for a marker based on average intensity</h3>

1. Define in <code>column name</code> the column you want to use to filter your positive cells.
2. Define in <code>min_max_avg_int</code> the min and max values within which cells will be considered positive for a marker

In [54]:
markers

[('ki67', 0, 'nucleus'), ('neun', 1, 'nucleus'), ('calbindin', 2, 'cytoplasm')]

In [None]:
#TODO: Create a dictionary with min_max_avg_int for each channel
# Give the possibility to define populations for the same marker, thinking about Neun High and Low

In [53]:
# Input the column name for the average intensity you want to use to filter your positive cells (i.e. "nucleus_neun_avg_int"), by default avg_int_columns[0]
column_name = avg_int_columns[1]

# Define the min, max avg_int values within which cells will be considered as marker+ for each channel
min_max_avg_int = (115, 255)

# Create an empty list to store all stats extracted from each image
stats = []

for roi_name in roi_names:

    # Initialize an empty list to hold the extracted dataframes on a per channel basis
    props_list = []

    # Extract marker name from column_name
    column_info = column_name.split("_")
    marker = column_info[1]
    marker

    # Retrieve the first and second values (channel and location) of the corresponding tuple
    for item in markers:
        if item[0] == marker:
            channel = item[1]
            location = item[2]
            break  # Stop searching once the marker is found

    # Read the nuclei predictions per ROI
    nuclei_labels = tifffile.imread(nuclei_preds_path / roi_name / f"{file_and_ext}.tiff")

    # Filter rows in final_df where ROI matches roi_name and column_name is higher than threshold
    filtered_df = final_df[(final_df["ROI"] == roi_name) & (final_df[column_name] > min_max_avg_int[0]) & (final_df[column_name] < min_max_avg_int[1])]


    # Get the values of the 'label' column in filtered_df as a list
    label_values = filtered_df["label"].tolist()

    # Create a boolean mask where each element is True if the corresponding value in 'nuclei_labels' 
    # is found in 'label_values', and False otherwise
    mask = np.isin(nuclei_labels, label_values)

    # Use the mask to set values in 'nuclei_labels' that are not in 'label_values' to 0,
    # creating a new array 'filtered_labels' with only the specified values retained
    filtered_labels = np.where(mask, nuclei_labels, 0)

    viewer.add_labels(filtered_labels, name=f"{marker}_+_in_{roi_name}")

    # Extract your information of interest
    total_nuclei = len(np.unique(nuclei_labels)) - 1
    marker_pos_nuclei = len(np.unique(filtered_labels)) - 1

    # Calculate "%_marker+_cells" and avoid division by zero errors
    try:
        perc_marker_pos_cells = (marker_pos_nuclei * 100) / total_nuclei
    except ZeroDivisionError:
        perc_marker_pos_cells = 0

    # Create a dictionary containing all extracted info per masked image
    stats_dict = {
                "filename": filename,
                "ROI": roi_name,
                "based_on": column_name,
                "marker": marker,
                "marker_ch": channel,
                "location": location,
                "min_max_avg_int": min_max_avg_int,
                "total_nuclei": total_nuclei,
                "marker+_nuclei": marker_pos_nuclei,
                "%_marker+_cells": perc_marker_pos_cells,
                "slicing_factor": slicing_factor
                }
    
    # Append the current data point to the stats_list
    stats.append(stats_dict)  


<h3>Save number of positive cells based on each marker</h3>

In [51]:
# Define output folder for results
results_folder = "./results/"

# Create the necessary folder structure if it does not exist
try:
    os.mkdir(str(results_folder))
    print(f"Output folder created: {results_folder}")
except FileExistsError:
    print(f"Output folder already exists: {results_folder}")

# Transform into a dataframe to store it as .csv later
df = pd.DataFrame(stats)

# Define the .csv path
csv_path = "./results/marker_+_label_avg_int_2D.csv"

# Append to the .csv with new data points each round
df.to_csv(csv_path, mode="a", index=False, header=not os.path.isfile(csv_path))

# Show the updated .csv 
csv_df = pd.read_csv(csv_path)

csv_df

Output folder already exists: ./results/


Unnamed: 0,filename,ROI,based_on,marker,marker_ch,location,min_max_avg_int,total_nuclei,marker+_nuclei,%_marker+_cells,slicing_factor
0,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 65535)",1331,12,0.901578,
1,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 65535)",1747,75,4.293074,
2,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 255)",1331,12,0.901578,
3,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 255)",1747,75,4.293074,
4,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,CA,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 255)",1331,12,0.901578,
5,HI 1 Contralateral Mouse 8 slide 6 Neun Red ...,DG,cytoplasm_calbindin_avg_int,calbindin,2,cytoplasm,"(50, 255)",1747,75,4.293074,
