<a href="https://colab.research.google.com/github/darieyr/BDS3_2025_ML_in_bioimage_analysis/blob/main/notebooks/ua/2_2_Pure_data_processing_ua.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Аналіз мікроскопічних зображень за допомогою алгоритмів на основі машинного навчання**


**15-07-2025**<br>

Створено для Biological Data School (BDS^3) 2025 <br>

**Автор:** *Дарина Якименко*

# Завантаження пакетів і залежностей

Більшість необхідних для виконання цієї частини проєкту бібліотек вже попередньо встановлені в середовищі Google Colaboratry.
Однак декілька із них все ж необхідно заінсталювати вручну:

In [None]:
!pip install --quiet zarr
!pip install --quiet -U "xarray<2024.5.0" "bioimageio.core[torch]>=0.6.8,<0.7"

[0m

Імпортуємо потрібні нам пакети/бібліотеки в середовище праці.

Короткий опис пакетів:



*  *matplotlib. pyplot* - інтерфейс на основі matplotlib для простих випадків програмного створення графіків та їх інтерактивного дослідження. Дозволяж будувати графіки подібні до створюваних у MATLAB, відкривати їх на екрані. Виконує функції менеджера графічного інтерфейсу фігур/графіків.

*  *numpy* - базовий пакет для наукових обчислень, що дозволяє створювати N-вимірні масиви об'єктів, застосовувати корисні функції з лінійної алгебри, перетворення Фур'є та генерації випадкових чисел, створювати власні складні функції і т.д.

*  *zarr* - забезпечує створення стиснутих, розбитих на фрагменти, N-вимірних масивів, призначених для використання в паралельних обчисленнях.

*  *dask* - призначений для виконання паралельних і розподілених обчислень.

*  *bioimageio* - спеціальні основні утиліти Python для ресурсів bioimage.io (зокрема, DL-моделей).

*  *imageio* - простий інтерфейс для читання та запису широкого спектру даних зображень, включаючи анімовані зображення, об'ємні дані та наукові формати.

*  *tifffile* - зберігання масивів NumPy у файлах TIFF (Tagged Image File Format) та вчитування зображень і метаданих із файлів типу TIFF, що використовуються в біовізуалізації.



In [None]:
import matplotlib.pyplot as plt
import zarr
import dask
import dask.array as da
import numpy as np
import bioimageio.core
from bioimageio.core import Tensor, Sample, create_prediction_pipeline
from bioimageio.spec.utils import load_array
from dask import delayed
import imageio
from tifffile import imread
from google.colab import drive
from mpl_toolkits.mplot3d import Axes3D
import re
import os
import tifffile as tiff
from skimage.filters import threshold_otsu

# Функції

In [None]:
#Function to interact with a 3D image
%matplotlib inline
from ipywidgets import *

def update(image, z=0):
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    c = 0
    t = 0
    plt.imshow(image[t, c, z, :, :])
    fig.canvas.flush_events()

def update3ch(image, z=0):
    fig = plt.figure(figsize=(6, 6))
    rgb = np.stack([
        image[0, 1, z],  # channel 0 -> Green
        image[0, 0, z],  # channel 1 -> Red
        image[0, 2, z],  # channel 2 -> Blue
    ], axis=-1)

    #Values normalization, if they are not in [0,1]
    rgb = rgb.astype(np.float32)
    rgb -= rgb.min()
    rgb /= rgb.max()

    plt.imshow(rgb)
    plt.title(f"Z = {z}")
    plt.axis("off")
    plt.show()

def update3ch_together(image, z=0):
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))

    channel_names = ['Green=EdU', 'Red=gammaH2AX', 'Blue=53BP1']
    for c in range(3):
        axes[c].imshow(image[0, c, z, :, :], cmap='gray')
        axes[c].set_title(f'{channel_names[c]} channel - Z={z}')
        axes[c].axis('off')

    plt.tight_layout()
    plt.show()

#Display the results of masking (probabilities prediction)
def display(i=0):
    prediction, sample, inp_id, outp_id, name = results[i]
    pred_array = np.asarray(prediction.members[outp_id].data)
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(pred_array[0, 0, :, :])
    plt.title(name)
    fig.canvas.flush_events()

binarized_results = []

# Function to display and collect binarized results
def display_bin(i=0):
    prediction, sample, inp_id, outp_id, name = results[i]
    pred_array = np.asarray(prediction.members[outp_id].data)
    binary_pred = pred_array[0, 0, :, :] >= 0.5
    binarized_results.append(binary_pred)  # Append binarized 2D result (mask)

    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(binary_pred[:, :])
    plt.title(name)
    fig.canvas.flush_events()


In [None]:
# Function to handle model inference
def run_model_inference(bmz_model, arr):
    # load model
    model_resource = bioimageio.core.load_description(bmz_model)

    # load model's test input image
    test_input_image = load_array(model_resource.inputs[0].test_tensor)

    # match test data type with the data type of the model input
    arr = arr.astype(test_input_image.dtype)

    # create input tensor
    input_tensor = Tensor.from_numpy(arr, dims=tuple(model_resource.inputs[0].axes))

    # create collection of tensors (sample)
    inp_id = model_resource.inputs[0].id
    outp_id = model_resource.outputs[0].id
    sample = Sample(members={inp_id: input_tensor}, stat={}, id="id")

    # The prediction_pipeline function is used to run a prediction with a given model
    # It applies the pre-processing, if indicated in the model rdf.yaml,
    # runs inference with the model and applies the post-processing, again if specified in the model rdf.yaml.
    prediction_pipeline = create_prediction_pipeline(model_resource)

    # Use the new prediction pipeline to run a prediction. The prediction pipeline returns a Sample object
    prediction = prediction_pipeline.predict_sample_without_blocking(sample)

    return prediction, sample, inp_id, outp_id

In [None]:
def create_a_mask(image, z):
    #plane = image_3D[z, :, :].compute() # convert the Dask array into a Numpy array
    prediction, sample, inp_id, outp_id = run_model_inference(bmz_model_id, image)
    name = "z:%s" % (z)
    return prediction, sample, inp_id, outp_id, name

# Обробка даних за допомогою обраного моделю

Спочатку отримаймо доступ до даних з Google Drive:

In [None]:
#from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Файли будуть доступні за стежкою: "/content/drive/MyDrive/"

In [None]:
#Our path
folder_path = "/content/drive/MyDrive/BDS3_2025_data/ETP " #change for your path
#Choose all the files in .tiff format
file_list = [f for f in os.listdir(folder_path) if f.endswith('.tif')]

In [None]:
%time
# Extract unique base names (without _C1, _C2, _C3 before .tif)
base_names = set()
for f in file_list:
    if '_C' in f:
        base_name = f.split('_C')[0]
        base_names.add(base_name)

#Sort base names for consistency:
base_names = sorted(list(base_names))

#Template for extracting cycle phase name from the file name:
pattern = re.compile(r'1h-1h_(.+?)_Pos') #with usage of RegEx packade (re)

#Data storage list: each item is [C1, C2, C3]
dict = {}

for name in base_names:
    match = pattern.search(name)
    if not match:
        continue
    phase = match.group(1)

    c1_path = os.path.join(folder_path, f"{name}_C1.tif")
    c2_path = os.path.join(folder_path, f"{name}_C2.tif")
    c3_path = os.path.join(folder_path, f"{name}_C3.tif")

    if all(os.path.exists(p) for p in [c1_path, c2_path, c3_path]):
        #add to dictionary
        if phase not in dict:
            dict[phase] = []
        dict[phase].append([c1_path, c2_path, c3_path])
    else:
        print(f"Lack of a channel for: {name}")
#results format will be like:
#{'G': [[c1_path, c2_path, c3_path], ...], 'SIII': [...], ...}

CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 6.91 µs


In [None]:
print("Phases:")
for phase in dict:
    phases = []
    phases.append(phase)
    print(f"- {phase}")

Phases:
- SIII
- SII
- SIV-V
- SI
- G


In [None]:
for phase in dict:
    count = len(dict[phase])
    print(f"Phase '{phase}' contains {count} samples(sample_index: 0)")

Phase 'SIII' contains 1 samples(sample_index: 0)
Phase 'SII' contains 1 samples(sample_index: 0)
Phase 'SIV-V' contains 1 samples(sample_index: 0)
Phase 'SI' contains 1 samples(sample_index: 0)
Phase 'G' contains 1 samples(sample_index: 0)


In [None]:
#show first image -> [0], second channel - [1]
image_path = dict["SIII"][0][2] #dict[phase][sample_index][channel]
our_image = imread(image_path)

In [None]:
our_image.shape

(99, 512, 512)

In [None]:
import time
print(time.ctime())

Wed Jul 30 08:44:10 2025


In [None]:
bmz_model_id =  "affable-shark"

#folder_name = "predictions"
#if not os.path.exists(folder_name):
    #os.makedirs(folder_name)
    #print(f"Folder '{folder_name}' created successfully!")
#else:
    #print(f"Folder '{folder_name}' already exists.")


folder_name = "masks"
if not os.path.exists(folder_name):
    os.makedirs(folder_name)
    print(f"Folder '{folder_name}' created successfully!")
else:
    print(f"Folder '{folder_name}' already exists.")

folder_name = "/content/drive/MyDrive/masks"
if not os.path.exists(folder_name):
    os.makedirs(folder_name)
    print(f"Folder '{folder_name}' created successfully!")
else:
    print(f"Folder '{folder_name}' already exists.")

Folder 'masks' already exists.
Folder '/content/drive/MyDrive/masks' created successfully!


In [None]:
def process_images(image_data):
    for image in image_data:
        img = imread(image)
        print(img.shape)

        #sample_ch = f.split(folder_path[0])
        #sample_ch = os.path.basename(image)
        sample_ch = os.path.splitext(os.path.basename(image))[0]
        print(sample_ch)
        #print (f"Image name: {sample_ch}")

        #save image as zarr
        zarr_path = "./zarr_files/" + sample_ch + ".zarr"
        #print(zarr_path)
        zarr.save(zarr_path, img)
        print("Saved as Zarr:", zarr_path)
        z = zarr.open(zarr_path, mode='r')
        z_dask = da.from_zarr(z)
        print(f"Z_dask shape: {z_dask.shape}")

        lazy_results = []

        for z in range(z_dask.shape[0]):
          r = delayed(create_a_mask)(z_dask[z, :, :] , z)
          lazy_results.append(r)

        print(time.ctime())
        results = dask.compute(*lazy_results)
        print(time.ctime())

        #save prediction
        #prediction_stack = np.stack(results, axis=0)
        #output_path = "./predictions/" + sample_ch + ".tif"
        #tiff.imwrite(output_path, prediction_stack.astype(np.uint8))

        #masks
        # List to store binarized planes
        binarized_results = []

          # Iterate over all planes
        for i in range(len(results)):
          prediction, sample, inp_id, outp_id, name = results[i]
          pred_array = np.asarray(prediction.members[outp_id].data)
          # Extract the 2D plane
          plane = pred_array[0, 0, :, :]
          # Binarize using Otsu’s method
          #threshold = threshold_otsu(plane)
          threshold = 0.5
          binary_plane = plane >= threshold  # True (1) for foreground, False (0) for background

          binarized_results.append(binary_plane.astype(np.uint8))  # Convert to uint8 (0 and 1)
          # Convert list to 3D array
        mask_3d = np.stack(binarized_results, axis=0)  # Stack along Z-axis
        drive_output_dir = "/content/drive/MyDrive/masks"
        os.makedirs(drive_output_dir, exist_ok=True)
        output_path = os.path.join(drive_output_dir, f"{sample_ch}.tiff")
        imageio.volsave(output_path, mask_3d.astype(np.uint8))

        mask_path = "./masks/" + sample_ch + ".tif"
        imageio.volsave(mask_path, mask_3d.astype(np.uint8))
        print("Saved binarized 3D mask.")

    return print("Finally!")

In [None]:
#to call the function, you need:
process_images(dict["G"][0])
#print(shape)


(99, 512, 512)
2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C1
Saved as Zarr: ./zarr_files/2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C1.zarr
Z_dask shape: (99, 512, 512)
Wed Jul 30 08:45:39 2025


[32m2025-07-30 08:45:40.924[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
Downloading data from 'https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml' to file '/root/.cache/bioimageio/a9089c4a932954227e300606fff4c424-rdf.yaml'.
100%|█████████████████████████████████████| 4.06k/4.06k [00:00<00:00, 2.63MB/s]
Downloading data from 'https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/zero_mean_unit_variance.ijm' to file '/root/.cache/bioimageio/5ab6d42cb36097a2f1a2d6bd4b7aaa00-zero_mean_unit_variance.ijm'.
100%|██████████████████████████████████████████| 845/845 [00:00<00:00, 866kB/s]
computing SHA256 of 5ab6d42cb36097a2f1a2d6bd4b7aaa00-zero_mean_unit_variance.ijm (result: 767f2c3a50e36365c30b9e46e57fcf82e606d337e8a48d4a2440d

Wed Jul 30 09:03:20 2025
Saved binarized 3D mask.
(99, 512, 512)
2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C2
Saved as Zarr: ./zarr_files/2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C2.zarr
Z_dask shape: (99, 512, 512)


[32m2025-07-30 09:03:22.614[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
[32m2025-07-30 09:03:22.681[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m


Wed Jul 30 09:03:22 2025


[32m2025-07-30 09:03:44.678[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
[32m2025-07-30 09:03:45.095[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
[32m2025-07-30 09:04:04.466[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
[32m2025-07-30 09:04:04.915[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3

Wed Jul 30 09:20:17 2025
Saved binarized 3D mask.
(99, 512, 512)
2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C3
Saved as Zarr: ./zarr_files/2015.01.17_ETP_1h-1h_G_Pos001_S001_17_C3.zarr
Z_dask shape: (99, 512, 512)


[32m2025-07-30 09:20:19.768[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m
[32m2025-07-30 09:20:19.788[0m | [1mINFO    [0m | [36mbioimageio.spec._internal.io_utils[0m:[36mopen_bioimageio_yaml[0m:[36m112[0m - [1mloading affable-shark from https://uk1s3.embassy.ebi.ac.uk/public-datasets/bioimage.io/affable-shark/1.2/files/rdf.yaml[0m


Wed Jul 30 09:20:19 2025


ComposerError: expected a single document in the stream
  in "/root/.cache/bioimageio/a9089c4a932954227e300606fff4c424-rdf.yaml", line 16, column 5
but found another document
  in "/root/.cache/bioimageio/a9089c4a932954227e300606fff4c424-rdf.yaml", line 17, column 5

Петля для опрацювання усіх зображень в папці (може зайти до кількох годин):

In [None]:
for phase in phases:
  process_images(dict[phase][0])
  print(f"Processing of phase {phase} completed")
print(f"Completed all.")

# Джерела

* Microscopy data analysis: machine learning and the BioImage Archive Course, 2025
* https://matplotlib.org/stable/api/pyplot_summary.html
* https://pypi.org/project/zarr/#:~:text=Zarr%20is%20a%20Python%20package%20providing%20an%20implementation,any%20NumPy%20dtype.%20Chunk%20arrays%20along%20any%20dimension.
* https://docs.dask.org/en/stable/index.html
* https://pypi.org/project/numpy/
* https://pypi.org/project/bioimageio.core/
* https://pypi.org/project/imageio/#:~:text=Imageio%20is%20a%20Python%20library%20that%20provides%20an,and%20is%20easy%20to%20install.%20Main%20website%3A%20https%3A%2F%2Fimageio.readthedocs.io%2F
* https://pypi.org/project/tifffile/
