## To-do
- Now that we have import and metadata extraction working, we need to start preprocessing (mostly interpolating timepoints for z-slices if recorded on frame-by-frame basis by the scope) and some scheme for identification of a nuclear and a spot channel that is compatible with switching between the two channels (e.g. using mCherry to segment nuclei during cycles but not at the division).
- Makes sense to use dask for visualization (e.g. choosing a threshold).
- Write DoG/segmentation fuction so that it can take either 2D or 3D data - give the option to segment off of a projection, or off of raw 3D data.
    - Write in options for DoG and LoG segmentation algorithm with standard nuclear sizes vs box DoG/LoG vs watershed.
        - Actually, box filtering might not be very helpful if we're cutting off part of the nucleus is z - the BP filtering will project it into a distorted gaussian if we're not right in the middle of the nucleus, and then misplace the centroid and botch the diameter estimation from $\sigma$. For 3D segmentation, it might be better to use a single filter to find markers then perform a watershed.
- 3D DoG notes:
    - $\sigma_{x, y} = 8$ works perfectly to segment out nuclei during nc 13.
    - $\sigma_z$ is BP-filtered (1, 9) where 9 is the Z-sigma corresponding to the whole nucleus. This allow the BP to be very permissive in Z and filter out the nuclei in x and y.
- Proposed procedure for local peak finding:
    - Run box DoG as below with permissive BP in z and LoG approximation in (x, y), only varying $\sigma$ in the latter.
    - Peak-finding on standard image (e.g. $\sigma_{x, y} = 8$), then use coordinates as initial guess for next sigma values.
- Simple BP filter + peak finding does a good job finding markers. Give option then to watershed segment directly off of the image, off of distance-transformed otsu thresholded image, and off of edge-finding.
    - For data with the mid-nuclear plane on the boundary of our z-stack, might be useful to give the option to segment in 2D, then threshold each nuclear column locally to identify the nucleus.
    - Need to write loop over timepoints, clean up small objects at each step, then commit segmentation to file.

In [1]:
from preprocessing.import_data import import_save_dataset

# from nuclear_segmentation import segment_nuclei
import napari

trim_series = True
lif_test_name = "test_data/2021-06-14/p2pdpwt"
lsm_test_name = "test_data/2023-04-07/p2pdp_zld-sites-ctrl_fwd_1"

(
    channels_full_dataset,
    original_global_metadata,
    original_frame_metadata,
    export_global_metadata,
    export_frame_metadata,
) = import_save_dataset(lsm_test_name, trim_series=trim_series, mode="tiff")

  warn('Due to an issue with JPype 0.6.0, reading is slower. '
  imsave(collated_data_path, channel_data, plugin="tifffile")
  imsave(collated_data_path, channel_data, plugin="tifffile")


In [2]:
nuclear_channel = channels_full_dataset[1]

In [3]:
viewer = napari.view_image(nuclear_channel, name="Nuclear Channel")
napari.run()

In [4]:
from nuclear_segmentation import segmentation
import numpy as np
from dask.distributed import LocalCluster

In [5]:
cluster = LocalCluster(
    host="localhost",
    scheduler_port=8786,
    threads_per_worker=1,
    n_workers=8,
    memory_limit="4GB",
)

In [6]:
cluster

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 8
Total threads: 8,Total memory: 29.80 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:8786,Workers: 8
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 29.80 GiB

0,1
Comm: tcp://127.0.0.1:33807,Total threads: 1
Dashboard: http://127.0.0.1:39009/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:33809,
Local directory: /tmp/dask-scratch-space/worker-82t7vfk6,Local directory: /tmp/dask-scratch-space/worker-82t7vfk6

0,1
Comm: tcp://127.0.0.1:41031,Total threads: 1
Dashboard: http://127.0.0.1:43733/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:34979,
Local directory: /tmp/dask-scratch-space/worker-g3dgpu4c,Local directory: /tmp/dask-scratch-space/worker-g3dgpu4c

0,1
Comm: tcp://127.0.0.1:33709,Total threads: 1
Dashboard: http://127.0.0.1:37809/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:40975,
Local directory: /tmp/dask-scratch-space/worker-agknre4d,Local directory: /tmp/dask-scratch-space/worker-agknre4d

0,1
Comm: tcp://127.0.0.1:44457,Total threads: 1
Dashboard: http://127.0.0.1:33587/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:36931,
Local directory: /tmp/dask-scratch-space/worker-ku53ol5l,Local directory: /tmp/dask-scratch-space/worker-ku53ol5l

0,1
Comm: tcp://127.0.0.1:43089,Total threads: 1
Dashboard: http://127.0.0.1:41973/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:35159,
Local directory: /tmp/dask-scratch-space/worker-2a1r2kav,Local directory: /tmp/dask-scratch-space/worker-2a1r2kav

0,1
Comm: tcp://127.0.0.1:36155,Total threads: 1
Dashboard: http://127.0.0.1:40187/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:34405,
Local directory: /tmp/dask-scratch-space/worker-ojaupuo2,Local directory: /tmp/dask-scratch-space/worker-ojaupuo2

0,1
Comm: tcp://127.0.0.1:41365,Total threads: 1
Dashboard: http://127.0.0.1:36113/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:41419,
Local directory: /tmp/dask-scratch-space/worker-beq_p5qu,Local directory: /tmp/dask-scratch-space/worker-beq_p5qu

0,1
Comm: tcp://127.0.0.1:43353,Total threads: 1
Dashboard: http://127.0.0.1:46791/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:40229,
Local directory: /tmp/dask-scratch-space/worker-gzx_6h_w,Local directory: /tmp/dask-scratch-space/worker-gzx_6h_w


In [7]:
denoised = segmentation.denoise_movie_parallel(
    nuclear_channel,
    denoising="gaussian",
    denoising_sigma=3,
    address="localhost:8786",
)

mask = segmentation.binarize_movie_parallel(
    denoised,
    thresholding="global_otsu",
    closing_footprint=segmentation.ellipsoid(3, 3),
    address="localhost:8786",
)

markers = segmentation.mark_movie_parallel(
    nuclear_channel,
    mask,
    low_sigma=[3, 5.5, 5.5],
    high_sigma=[10, 14.5, 14.5],
    max_footprint=((1, 25), segmentation.ellipsoid(3, 3)),
    max_diff=1,
    address="localhost:8786",
)

marker_coords = np.array(np.nonzero(markers)).T

labels = segmentation.segment_movie_parallel(
    denoised,
    markers,
    mask,
    watershed_method="raw",
    min_size=200,
    address="localhost:8786",
)

This may cause some slowdown.
Consider scattering data ahead of time and using futures.


Using the rule of thumb $r \approx \sigma \sqrt{2} \ (2D)$ and $r \approx \sigma \sqrt{3} \ (3D)$ as rough bounds for the kernels used for band-pass filtering seems to net a perfect segmentation.

In [35]:
cluster.close()

In [36]:
viewer.add_points(marker_coords)

<Points layer 'marker_coords' at 0x7ff440a35420>

In [37]:
viewer.add_labels(labels)

<Labels layer 'labels' at 0x7ff5d6bdd930>

In [19]:
reload(process_metadata)

<module 'preprocessing.process_metadata' from '/home/ybadal/Documents/Berkeley/github_repositories/transcription_pipeline/preprocessing/process_metadata.py'>

In [29]:
from skimage.measure import regionprops_table
import pandas as pd
import trackpy as tp
from preprocessing import process_metadata


def segmentation_df(segmentation_mask, frame_metadata, *, extra_properties=tuple()):
    """
    Constructs a trackpy-compatible pandas DataFrame for tracking from a
    frame-indexed array of segmentation masks.

    :param segmentation_mask: Integer-labelled segmentation, as returned by
        :func:``scikit.segmentation.watershed``.
    :type segmentation_mask: Numpy array of integers.
    :param dict frame_metadata: Dictionary of frame-by-frame metadata for all files and
        series in a dataset.
    :param extra_properties: Properties of each labelled region in the segmentation
        mask to measure and add to the DataFrame. With no extra properties, the
        DataFrame will have columns only for the frame, label, and centroid
        coordinates.
    :type extra_properties: Tuple of strings, optional.
    :param str z_label: Axis label corresponding to z-axis, used to interpolate
        time between z-slices if necessary.
    :return: pandas DataFrame of frame, label, centroids, and imaging time for each
        labelled region in the segmentation mask (along with other measurements
        specified by extra_properties).
    :rtype: pandas DataFrame
    """
    # Go over every frame and make a pandas-compatible dict for each labelled object
    # in the segmentation.
    movie_properties = []

    num_timepoints = segmentation_mask.shape[0]
    for i in range(num_timepoints):
        frame_properties = regionprops_table(
            segmentation_mask[i], properties=("label", "centroid") + extra_properties
        )
        num_labels = np.unique(frame_properties["label"]).size
        frame_properties["frame"] = np.full(num_labels, i + 1)

        frame_properties = pd.DataFrame.from_dict(frame_properties)
        movie_properties.append(frame_properties)

    movie_properties = pd.concat(movie_properties)
    movie_properties = movie_properties.reset_index(drop=True)  # Reset index of rows

    # Rename centroid columns
    num_dim_frame = segmentation_mask.ndim - 1
    rename_columns = {}
    spatial_axes = "zyx"
    for i in range(num_dim_frame):
        old_column_name = "".join(["centroid-", str(i)])
        new_column_name = spatial_axes[i]
        rename_columns[old_column_name] = new_column_name

    movie_properties.rename(rename_columns, axis=1, inplace=True)

    # Add imaging time for each particle
    time = process_metadata.extract_time(frame_metadata)[0]
    time_apply = lambda row: time(int(row["frame"]), row["z"])
    movie_properties["t_s"] = movie_properties.apply(time_apply, axis=1)

    # Add imaging time in number of frames for each particles
    time_frame = process_metadata.extract_renormalized_frame(frame_metadata)
    time_frame_apply = lambda row: time_frame(int(row["frame"]), row["z"])
    movie_properties["t_frame"] = movie_properties.apply(time_frame_apply, axis=1)

    return movie_properties


def link_dataframe(
    segmentation_dataframe,
    *,
    search_range,
    memory,
    pos_columns,
    t_column,
    velocity_predict=True,
    **kwargs
):
    if velocity_predict:
        pred = tp.predict.NearestVelocityPredict()
        link = pred.link_df
    else:
        link = tp.link_df

    linked_dataframe = link(
        segmentation_dataframe,
        search_range=search_range,
        memory=memory,
        pos_columns=pos_columns,
        t_column=t_column,
        **kwargs,
    )

    # Reindex dataframe
    linked_dataframe = linked_dataframe.reset_index(drop=True)

    # Increment particle labels by 1 to avoid erasing 0-th particle
    linked_dataframe["particle"] = linked_dataframe["particle"].apply(lambda x: x + 1)

    return linked_dataframe


def reorder_labels(segmentation_mask, linked_dataframe):
    reordered_mask = np.zeros(segmentation_mask.shape, dtype=segmentation_mask.dtype)

    # Switch labels using 'particle' column in linked dataframe
    for i, properties in linked_df.iterrows():
        frame_index = int(properties["frame"]) - 1
        old_label = properties["label"]
        new_label = properties["particle"]

        object = segmentation_mask[frame_index] == old_label
        reordered_mask[frame_index][object] = new_label

    return reordered_mask

In [26]:
test_df = segmentation_df(labels, export_frame_metadata[1])

In [27]:
test_df

Unnamed: 0,label,z,y,x,frame,t_s,t_frame
0,1,10.477820,190.494601,310.617276,1,8.282933,0
1,2,11.199429,137.553692,2.783446,1,8.853380,0
2,3,11.739154,54.508823,412.919331,1,9.280044,0
3,4,11.401199,125.459960,423.489407,1,9.012884,0
4,5,11.111701,159.143131,501.971669,1,8.784030,0
...,...,...,...,...,...,...,...
23586,166,16.139521,9.670222,477.432113,167,2849.281971,171
23587,167,16.984944,118.391468,2.570891,167,2849.950295,171
23588,168,17.528785,1.667377,142.132196,167,2850.380212,171
23589,169,17.717391,1.704348,334.980435,167,2850.529310,171


In [30]:
linked_df = link_dataframe(
    test_df,
    search_range=18,
    memory=3,
    pos_columns=["y", "x"],
    t_column="t_frame",
    velocity_predict=True,
)

Frame 171: 170 trajectories present.


In [31]:
linked_df

Unnamed: 0,label,z,y,x,frame,t_s,t_frame,particle
0,1,10.477820,190.494601,310.617276,1,8.282933,0,1
1,2,11.199429,137.553692,2.783446,1,8.853380,0,2
2,3,11.739154,54.508823,412.919331,1,9.280044,0,3
3,4,11.401199,125.459960,423.489407,1,9.012884,0,4
4,5,11.111701,159.143131,501.971669,1,8.784030,0,5
...,...,...,...,...,...,...,...,...
23586,166,16.139521,9.670222,477.432113,167,2849.281971,171,265
23587,167,16.984944,118.391468,2.570891,167,2849.950295,171,327
23588,168,17.528785,1.667377,142.132196,167,2850.380212,171,352
23589,169,17.717391,1.704348,334.980435,167,2850.529310,171,351


In [32]:
reordered = reorder_labels(labels, linked_df)

In [33]:
viewer.add_labels(reordered)

<Labels layer 'reordered [1]' at 0x7f1b54090fd0>

- Trackpy seems to work pretty well. There is a frame (at the interface between two series) where there is a slight jump in positions it isn't able to deal with. This should be fixable with a use of the predictor method.