<h2>Single image - APOC Object Classifier training</h2>

The following notebook is able to process multichannel 3D stack or 2D image (<code>.czi</code>, <code>.nd2</code> files) and allows you to:

1. Inspect your images in Napari.
2. Train an Object Classifier based on signal intensity in 2D or 3D images
3. Visualize the results after the training.
4. Correct your annotations and retrain.
5. Save the resulting classifier to use it in SP and BP_Object_Classifier.ipynb

Remember that to train your classifier you **must generate <code>full_image</code> nuclei labels first**.

In [1]:
import pyclesperanto_prototype as cle
import apoc
from pathlib import Path
import tifffile
import napari
import os
import sys
from utils_stardist import get_gpu_details, list_images, read_image, maximum_intensity_projection, simulate_cytoplasm_chunked_3d, simulate_cell_chunked_3d, simulate_cytoplasm, simulate_cell

get_gpu_details()


cle.select_device('RTX')

Device name: /device:GPU:0
Device type: GPU
GPU model: device: 0, name: NVIDIA GeForce RTX 4090 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9


<NVIDIA GeForce RTX 4090 Laptop GPU on Platform: NVIDIA CUDA (1 refs)>

<h3>Define the directory where your images are stored (.nd2 or .czi files)</h3>

In [2]:
# Copy the path where your images are stored, you can use absolute or relative paths to point at other disk locations
# At this point you should have generate the nuclei label predictions in advance
directory_path = Path("../raw_data/SMA_staining/SMA staining/")

# Define the channels for which you want to train the ObjectClassifier using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
# cellular locations can be "nucleus", "cytoplasm" or "cell" (cell being the sum volume of nucleus and cytoplasm)
# Remember in Python one starts counting from 0, so your first channel will be 0
# i.e. markers = [("ki67", 0, "nucleus"), ("neun", 1, "cell"), ("calbindin", 2, "cytoplasm")]

markers = [("sma", 2, "cytoplasm")]

# Iterate through the .czi and .nd2 files in the raw_data directory
images = list_images(directory_path, format="ndpi")

images

['..\\raw_data\\SMA_staining\\SMA staining\\1361 436 MD SMA - 2016-04-15 07.36.02.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1361 436+912 MI SMA - 2016-04-15 07.45.40.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1362 436 MD SMA - 2016-04-15 07.50.57.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1362 436+912 MI SMA - 2016-04-15 08.01.21.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1363 436 MD SMA - 2016-04-15 08.12.14.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1363 436+907 MI SMA - 2016-04-15 08.17.47.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1364 436 MD SMA - 2016-04-15 08.29.51.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1364 436+907 MI SMA - 2016-04-15 09.02.39.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1368 436+907 MI SMA - 2016-04-15 09.20.09.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1368 436+912 MD SMA - 2016-04-15 09.08.14.ndpi',
 '..\\raw_data\\SMA_staining\\SMA staining\\1369 436+907 MI SMA - 2016-04-15 09.33.49.

<h3>Open each image in the directory</h3>
You can do so by changing the number within the brackets below <code>image = images[0]</code>. Match the <code>slicing factor</code> to the one you will use during your nuclei label prediction and analysis.

Choose an image to train your classifier on (0 defines the first image in the directory)

The image should contain all classes (i.e. negative, positive, high intensity, low intensity) that are present in your dataset.

Under <code>marker_name</code> input the name of the marker that you wish to load and train the classifier on. You must train the classifier for all the markers you plan to analyze later on.

In [3]:
# Choose an image on which you will train your classifier (0 defines the first image in the directory)
# The image should contain all classes (i.e. negative, positive, high intensity, low intensity) that are present in your dataset
image = images[0]

# Image size reduction (downsampling) to improve processing times (slicing, not lossless compression)
# Now, in addition to xy, you can downsample across your z-stack
# Try and use the same factors that you applied during your nuclei label prediction and analysis
slicing_factor_xy = None # Use 2 or 4 for downsampling in xy (None for lossless)
slicing_factor_z = None # Use 2 to select 1 out of every 2 z-slices

# Define the nuclei and markers of interest channel order ('Remember in Python one starts counting from zero')
nuclei_channel = 0

# Segmentation type ("2D" or "3D"). 
# 2D takes a z-stack as input, performs MIP (Maximum Intensity Projection) and predicts nuclei from the resulting projection (faster, useful for single layers of cells)
# 3D is more computationally expensive. Predicts 3D nuclear volumes, useful for multilayered structures
segmentation_type = "2D"

# Nuclear segmentation model type ("Stardist")
# Choose your Stardist fine-tuned model (model_name) from stardist_models folder
model_name = "test"

# Type the ROI name you wish to load (by default it is "full_image")
# It is recommended to traom the ObjectClassifier based on the full imag
roi_name = "full_image"

# Choose the channel you want to use to train the ObjectClassifier for:
marker_name = "sma"

# Read image, apply slicing if needed and return filename and img as a np array
img, filename = read_image(image, slicing_factor_xy, slicing_factor_z)
# Construct ROI and nuclei predictions paths from directory_path above
roi_path = directory_path / "ROIs"
nuclei_preds_path =  directory_path / "nuclei_preds" / segmentation_type / model_name

Add cellular compartment labels to Napari to start annotating your data

In [4]:
# Retrieve the first and second values (channel and location) of the corresponding tuple in markers
for item in markers:
    if item[0] == marker_name:
        marker_channel = item[1]
        location = item[2]
        break  # Stop searching once the marker is found

# Close any previous Napari instances that are open, ignore WARNING messages
try:
    viewer.close()

except NameError:
    pass

except RuntimeError:
    pass

if segmentation_type == "3D":

    # Load Napari viewer
    viewer = napari.Viewer(ndisplay=2)
    # Slice marker stack
    marker_img = img[marker_channel]
    viewer.add_image(marker_img)

elif segmentation_type == "2D":

    # Load Napari viewer
    viewer = napari.Viewer(ndisplay=2)
    # Slice marker stack
    marker_img = img[marker_channel]
    viewer.add_image(marker_img)

# Load nuclei labels and transform them into cell or cytoplasm labels if necessary
try:
    # Read the nuclei predictions per ROI
    labels = tifffile.imread(nuclei_preds_path / roi_name / f"{filename}.tiff")
    print(f"Pre-computed nuclei labels found for {filename}")

except FileNotFoundError:
    sys.exit(f"Nuclei labels for filename: {filename} ROI: {roi_name} not found. Please generate them using 002_BP_Predict_nuclei_labels.ipynb")

if location == "cytoplasm":
    if segmentation_type == "3D":
        print(f"Generating {segmentation_type} cytoplasm labels for: {marker_name}")
        # Simulate a cytoplasm by dilating the nuclei and subtracting the nuclei mask afterwards
        labels = simulate_cytoplasm_chunked_3d(labels, dilation_radius=2, erosion_radius=0, chunk_size=(labels.shape[0], 1024, 1024))

    elif segmentation_type == "2D":
        print(f"Generating {segmentation_type} cytoplasm labels for: {marker_name}")
        # Simulate a cytoplasm by dilating the nuclei and subtracting the nuclei mask afterwards
        labels = simulate_cytoplasm(labels, dilation_radius=2, erosion_radius=0)

elif location == "cell":
    if segmentation_type == "3D":
        print(f"Generating {segmentation_type} cell labels for: {marker_name}")
        # Simulate a cell volume by dilating the nuclei 
        labels = simulate_cell_chunked_3d(labels, dilation_radius=2, erosion_radius=0, chunk_size=(labels.shape[0], 1024, 1024))

    elif segmentation_type == "2D":
        print(f"Generating {segmentation_type} cell labels for: {marker_name}")
        # Simulate a cytoplasm by dilating the nuclei and subtracting the nuclei mask afterwards
        labels = simulate_cell(labels, dilation_radius=2, erosion_radius=0)

viewer.add_labels(labels, opacity=0.3)

Pre-computed nuclei labels found for 1361 436 MD SMA - 2016-04-15 07.36.02
Generating 2D cytoplasm labels for: sma


<Labels layer 'labels' at 0x25b2c96fd60>

<h2>Data Annotation in Napari</h2>

In this example we have cells negative for Neun (label 1), low Neun (label 2) and high Neun cells (label 3). Follow the next steps to annotate your data in Napari:

1. Navigate through your stack and choose a good representative slice, alternatively switch to 3D mode and annotate in 3D.
2. Create a new labels layer.
3. Start annotating your different classes starting with negative cells (label 1). In this case we have low Neun (label 2) and high Neun cells (label 3). You can use points for specificity or paint lines across the objects. Empty space is not accounted for, only the object that your annotation touches.
4. Once you are done annotating, keep Napari open and run the next cells.

<video controls>
  <source src="../assets/apoc_oc_annotation.mp4" type="video/mp4">
</video>

If you have already trained your classifier skip the next couple of cells and run the last one to see how the classifier applies to other images in your dataset.

In [5]:
# Create folder structure to store resulting Object Classifiers
apoc_path = Path("APOC_ObjectClassifiers") / directory_path.name
try:
    os.makedirs(apoc_path)
except FileExistsError:
    pass

# Define features on which the classifier will be trained on (see train -help for full list of features)
features = 'min_intensity,max_intensity,sum_intensity,mean_intensity,standard_deviation_intensity'

cl_filename = f"./{apoc_path}/ObjClass_{segmentation_type}_ch{marker_channel}.cl"

# Create an object classifier
apoc.erase_classifier(cl_filename) # Delete it if it was existing before
classifier = apoc.ObjectClassifier(cl_filename)

If you are not happy with the classifier go back to Napari and edit the "Labels" layer with a few more annotations, then run the cells below to fetch your modifications, train the classifier again and display the updated results.

In [None]:
# Collect user input from Napari and train/retrain the ObjectClasifier based on it
user_input = user_input = viewer.layers["Labels"].data

# Train or retrain your classifier
classifier.train(features, labels, user_input, marker_img, continue_training=True)

# Print the weights of each feature in the decision process
classifier.feature_importances()

This cell below loads the pre-trained classifier from disk and applies it to the corresponding intensity channel and labels displayed in Napari

In [None]:
apoc_path = Path("APOC_ObjectClassifiers") / directory_path.name
cl_filename = f"./{apoc_path}/ObjClass_{segmentation_type}_ch{marker_channel}.cl"

# Reload the classifier from disc to use the latest version
classifier = apoc.ObjectClassifier(cl_filename)

# Determine object classification
result = classifier.predict(labels, marker_img)

# Show the result
viewer.add_labels(result, name='classification')