## To-do
- Now that we have import and metadata extraction working, we need to start preprocessing (mostly interpolating timepoints for z-slices if recorded on frame-by-frame basis by the scope) and some scheme for identification of a nuclear and a spot channel that is compatible with switching between the two channels (e.g. using mCherry to segment nuclei during cycles but not at the division).
- Makes sense to use dask for visualization (e.g. choosing a threshold).
- Write DoG/segmentation fuction so that it can take either 2D or 3D data - give the option to segment off of a projection, or off of raw 3D data.
    - Write in options for DoG and LoG segmentation algorithm with standard nuclear sizes vs box DoG/LoG vs watershed.
        - Actually, box filtering might not be very helpful if we're cutting off part of the nucleus is z - the BP filtering will project it into a distorted gaussian if we're not right in the middle of the nucleus, and then misplace the centroid and botch the diameter estimation from $\sigma$. For 3D segmentation, it might be better to use a single filter to find markers then perform a watershed.
- 3D DoG notes:
    - $\sigma_{x, y} = 8$ works perfectly to segment out nuclei during nc 13.
    - $\sigma_z$ is BP-filtered (1, 9) where 9 is the Z-sigma corresponding to the whole nucleus. This allow the BP to be very permissive in Z and filter out the nuclei in x and y.
- Proposed procedure for local peak finding:
    - Run box DoG as below with permissive BP in z and LoG approximation in (x, y), only varying $\sigma$ in the latter.
    - Peak-finding on standard image (e.g. $\sigma_{x, y} = 8$), then use coordinates as initial guess for next sigma values.
- Simple BP filter + peak finding does a good job finding markers. Give option then to watershed segment directly off of the image, off of distance-transformed otsu thresholded image, and off of edge-finding.
    - For data with the mid-nuclear plane on the boundary of our z-stack, might be useful to give the option to segment in 2D, then threshold each nuclear column locally to identify the nucleus.
    - Need to write loop over timepoints, clean up small objects at each step, then commit segmentation to file.

In [1]:
from preprocessing.import_data import import_save_dataset

# from nuclear_segmentation import segment_nuclei
import napari

trim_series = True
lif_test_name = "test_data/2021-06-14/p2pdpwt"
lsm_test_name = "test_data/2023-04-07/p2pdp_zld-sites-ctrl_fwd_1"

(
    channels_full_dataset,
    original_global_metadata,
    original_frame_metadata,
    export_global_metadata,
    export_frame_metadata,
) = import_save_dataset(lsm_test_name, trim_series=trim_series, mode="tiff")

  warn('Due to an issue with JPype 0.6.0, reading is slower. '
  imsave(collated_data_path, channel_data, plugin="tifffile")
  imsave(collated_data_path, channel_data, plugin="tifffile")


In [2]:
nuclear_channel_metadata = export_frame_metadata[1]
nuclear_channel = channels_full_dataset[1]

In [3]:
viewer = napari.view_image(nuclear_channel, name="Nuclear Channel")
napari.run()

In [4]:
from nuclear_segmentation import segmentation

from tracking import track_features

import numpy as np
from dask.distributed import LocalCluster, Client

In [5]:
cluster = LocalCluster(
    host="localhost",
    scheduler_port=8786,
    threads_per_worker=1,
    n_workers=12,
    memory_limit="4GB",
)

In [6]:
client = Client(cluster)

In [7]:
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 12
Total threads: 12,Total memory: 44.70 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:8786,Workers: 12
Dashboard: http://127.0.0.1:8787/status,Total threads: 12
Started: Just now,Total memory: 44.70 GiB

0,1
Comm: tcp://127.0.0.1:34193,Total threads: 1
Dashboard: http://127.0.0.1:41009/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:41651,
Local directory: /tmp/dask-scratch-space/worker-o6l094ta,Local directory: /tmp/dask-scratch-space/worker-o6l094ta

0,1
Comm: tcp://127.0.0.1:37613,Total threads: 1
Dashboard: http://127.0.0.1:43235/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:38943,
Local directory: /tmp/dask-scratch-space/worker-i0olwhjx,Local directory: /tmp/dask-scratch-space/worker-i0olwhjx

0,1
Comm: tcp://127.0.0.1:40383,Total threads: 1
Dashboard: http://127.0.0.1:42613/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:40031,
Local directory: /tmp/dask-scratch-space/worker-x3oevbla,Local directory: /tmp/dask-scratch-space/worker-x3oevbla

0,1
Comm: tcp://127.0.0.1:38509,Total threads: 1
Dashboard: http://127.0.0.1:35787/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:44491,
Local directory: /tmp/dask-scratch-space/worker-7_ce6_yu,Local directory: /tmp/dask-scratch-space/worker-7_ce6_yu

0,1
Comm: tcp://127.0.0.1:41029,Total threads: 1
Dashboard: http://127.0.0.1:38365/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:41481,
Local directory: /tmp/dask-scratch-space/worker-8ac22vkv,Local directory: /tmp/dask-scratch-space/worker-8ac22vkv

0,1
Comm: tcp://127.0.0.1:42443,Total threads: 1
Dashboard: http://127.0.0.1:39087/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:38295,
Local directory: /tmp/dask-scratch-space/worker-nqqx4736,Local directory: /tmp/dask-scratch-space/worker-nqqx4736

0,1
Comm: tcp://127.0.0.1:40157,Total threads: 1
Dashboard: http://127.0.0.1:39053/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:35663,
Local directory: /tmp/dask-scratch-space/worker-_rbjx17_,Local directory: /tmp/dask-scratch-space/worker-_rbjx17_

0,1
Comm: tcp://127.0.0.1:39019,Total threads: 1
Dashboard: http://127.0.0.1:34815/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:37989,
Local directory: /tmp/dask-scratch-space/worker-nh_6lrew,Local directory: /tmp/dask-scratch-space/worker-nh_6lrew

0,1
Comm: tcp://127.0.0.1:40147,Total threads: 1
Dashboard: http://127.0.0.1:43911/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:38077,
Local directory: /tmp/dask-scratch-space/worker-z530psvl,Local directory: /tmp/dask-scratch-space/worker-z530psvl

0,1
Comm: tcp://127.0.0.1:42697,Total threads: 1
Dashboard: http://127.0.0.1:37567/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:33237,
Local directory: /tmp/dask-scratch-space/worker-vvocle7o,Local directory: /tmp/dask-scratch-space/worker-vvocle7o

0,1
Comm: tcp://127.0.0.1:35273,Total threads: 1
Dashboard: http://127.0.0.1:44173/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:36087,
Local directory: /tmp/dask-scratch-space/worker-yqb3qicp,Local directory: /tmp/dask-scratch-space/worker-yqb3qicp

0,1
Comm: tcp://127.0.0.1:40397,Total threads: 1
Dashboard: http://127.0.0.1:43781/status,Memory: 3.73 GiB
Nanny: tcp://127.0.0.1:46663,
Local directory: /tmp/dask-scratch-space/worker-vk9bi84a,Local directory: /tmp/dask-scratch-space/worker-vk9bi84a


In [8]:
%%time

(
    denoised,
    denoised_futures,
    nuclear_channel_futures,
) = segmentation.denoise_movie_parallel(
    nuclear_channel,
    denoising="gaussian",
    denoising_sigma=3,
    client=client,
)

mask, mask_futures, _ = segmentation.binarize_movie_parallel(
    denoised_futures,
    thresholding="global_otsu",
    closing_footprint=segmentation.ellipsoid(3, 3),
    client=client,
    futures_in=False,
)

markers, markers_futures, _ = segmentation.mark_movie_parallel(
    *nuclear_channel_futures,  # Wrapped in list from previous parallel run, needs unpacking
    mask_futures,
    low_sigma=[3, 5.5, 5.5],
    high_sigma=[10, 14.5, 14.5],
    max_footprint=((1, 25), segmentation.ellipsoid(3, 3)),
    max_diff=1,
    client=client,
    futures_in=False,
)

marker_coords = np.array(np.nonzero(markers)).T

labels, labels_futures, _ = segmentation.segment_movie_parallel(
    denoised_futures,
    markers_futures,
    mask_futures,
    watershed_method="raw",
    min_size=200,
    client=client,
    futures_in=False,
)

segmentation_dataframe = track_features.segmentation_df(
    labels,
    nuclear_channel,
    nuclear_channel_metadata,
)

tracked_dataframe = track_features.link_df(
    segmentation_dataframe,
    search_range=18,
    adaptive_stop=1,
    adaptive_step=0.99,
    memory=1,
    pos_columns=["x", "y"],
    t_column="frame_reverse",
    velocity_predict=True,
)

centroids = np.unique(
    np.array(
        [
            [row["frame"] - 1, int(row["z"]), int(row["y"]), int(row["x"])]
            for _, row in tracked_dataframe.iterrows()
        ]
    ),
    axis=0,
)

reordered_labels, _, _ = track_features.reorder_labels_parallel(
    labels_futures,
    tracked_dataframe,
    client=client,
    futures_in=False,
    futures_out=False,
)

Frame 167: 41 trajectories present.
CPU times: user 24.4 s, sys: 40.8 s, total: 1min 5s
Wall time: 3min 29s


Using the rule of thumb $r \approx \sigma \sqrt{2} \ (2D)$ and $r \approx \sigma \sqrt{3} \ (3D)$ as rough bounds for the kernels used for band-pass filtering seems to net a perfect segmentation.

In [9]:
viewer.add_labels(reordered_labels)

<Labels layer 'reordered_labels' at 0x7f5c8c596f80>

In [41]:
cluster.close()

Mitosis detection:
- Iterate over all tracks: 
    - For all tracks expand the search radius restricted to tracks that end within a couple frames of where that track ends.
    - Use total number of nuclei to narrow down center-frame of mitosis (can also use brightness if available).
    - For all tracks that start within a few frames of mitosis, check to see if there is a nearest-neighbor with matching features (e.g. antiparallel velocity vector).
        - Add option to populate dataframe with image of nucleus, could be used to add image feature (e.g. symmetry axis) or train ML model.
    - If nearest-neighbor is found, assign division frame and parent.
    - Iterate over rows to clip parent track and add identity to daughter nuclei for compatibility with reorder_labels.
    - Run reorder_labels and visualize division times.
- Two main approaches:
    - At start point of every track, predict position in previous frame of all particles within some search radius given velocity. Predict own position. If within some collision radius, assign a mitotic event and note the labels.
    - At start point of every track, make particle-indexed array with all points within some search radius. Add (1 - dot product of velocities) as an extra coordinate and find nearest-neighbor. If within some collision radius, assign a mitotic event. This method has the advantage that we can rescale the (1 - dot product of velocities)-coordinate arbitrarily to assign whatever weight we want to antiparallel velocities.

In [104]:
tracked_dataframe

Unnamed: 0,label,z,y,x,frame,t_s,t_frame,particle,v_z,v_y,v_x
0,1,14.909759,210.269216,504.405988,167,2848.309818,171,1,0.129783,-0.251747,-0.331569
1,2,14.958818,158.286138,456.895890,167,2848.348601,171,2,-0.034603,-0.521121,0.276433
2,3,14.825406,174.249374,157.196094,167,2848.243136,171,3,0.232748,-0.625031,0.689285
3,4,14.718019,215.220187,271.829508,167,2848.158244,171,4,0.088420,-0.406058,0.359737
4,5,14.889322,239.022783,62.278645,167,2848.293663,171,5,0.174325,-0.057268,0.558899
...,...,...,...,...,...,...,...,...,...,...,...
23586,39,13.309906,242.117774,29.797088,1,10.521756,0,267,-0.124809,-0.117792,-0.028492
23587,41,13.624967,35.222641,4.736303,1,10.770818,0,140,-0.298789,3.055246,0.006839
23588,42,14.084977,113.976705,68.034117,1,11.134466,0,100,-0.184273,2.324255,0.735423
23589,43,13.532374,143.066172,246.001071,1,10.697622,0,119,-0.555679,1.383463,2.206625


In [31]:
import pandas as pd
import numpy as np


def tracks_start_end(tracked_dataframe):
    """
    Uses input `tracked_dataframe` with tracking information to construct
    sub-dataframes with entries corresponing to the start and end of a particle track
    (i.e. the first and last frames it is present in respectively) and a sub-dataframe
    of all singletons (particles not connected in any other frames).
    :param tracked_dataframe: DataFrame of measured features after tracking with
        :func:`~link_dataframe`.
    :type linked_dataframe: pandas DataFrame
    :return: Tuple(`track_first_frames`, `track_last_frames`, `track_singletons`) where
        *`track_first_frames` contains all rows in the input `tracked_dataframe`
        corresponding to the start of a particle track.
        *`track_last_frames` contains all rows in the input `tracked_dataframe`
        corresponding to the end of a particle track.
        *`track_singletons` contains all rows in the input `tracked_dataframe`
        corresponding to disconnected particles.
    :rtype: Tuple of pandas DataFrames
    """
    first_frame = []
    last_frame = []
    singletons = []

    for particle_group in tracked_dataframe.groupby("particle"):
        _, particle = particle_group
        if particle.shape[0] == 1:
            singletons.append(particle.index[0])
        else:
            first_frame.append(particle["frame"].idxmin())
            last_frame.append(particle["frame"].idxmax())

    track_first_frames = tracked_dataframe.iloc[first_frame]
    track_last_frames = tracked_dataframe.iloc[last_frame]
    track_singletons = tracked_dataframe.iloc[singletons]

    return track_first_frames, track_last_frames, track_singletons


def _find_sibling(
    tracked_dataframe, track_start_row, search_range_mitosis, antiparallel_threshold
):
    """
    Finds the index in `tracked_dataframe` of the likeliest sibling of a new particle
    (given by the row `track_start_row` of `tracked_dataframe`). This determines
    candidate siblings based on proximity (within a cuboid of dimensions set by
    `search_range_mitosis` of the particle centroid) and antiparallel velocity vectors
    as determined by a thresholded normalized dot product (an `antiparallel_threshold`
    value of 0 corresponds to perfectly antiparallel vectors, 1 corresponds to
    orthogonal vectors). Within any remaining candidates, the nearest-neighbor is
    returned.
    """
    # Select subdataframe for first frame of this particle
    frame = track_start_row["frame"]
    position = np.array([track_start_row[pos] for pos in pos_columns])
    vel_columns = ["".join(["v_", pos]) for pos in pos_columns]
    direction_vector = np.array([track_start_row[vel] for vel in vel_columns])
    direction_vector /= np.linalg.norm(direction_vector)
    frame_subdataframe = tracked_dataframe[tracked_dataframe["frame"] == frame]

    # Select for points within some search cube of the new particle
    for i, pos in enumerate(pos_columns):
        frame_subdataframe = frame_subdataframe[
            (frame_subdataframe[pos] - position[i]).abs() < search_range_mitosis[i]
        ]

    # Select for tracks with antiparallel velocities within some threshold
    candidate_velocities = frame_subdataframe[vel_columns]
    candidate_direction_vectors = candidate_velocities.divide(
        candidate_velocities.apply(np.linalg.norm, axis=1), axis=0
    )

    dot_product_complement = 1 + candidate_direction_vectors.dot(direction_vector)

    frame_subdataframe = frame_subdataframe[
        dot_product_complement < antiparallel_threshold
    ]

    # Pick nearest-neighbor of the remaining particles to find the sibling
    if not frame_subdataframe.empty:
        sibling_index = (
            (frame_subdataframe[pos_columns] - position)
            .apply(np.linalg.norm, axis=1)
            .idxmin()
        )
    else:
        sibling_index = None

    return sibling_index


def _assign_siblings(tracked_dataframe, search_range_mitosis, antiparallel_threshold):
    """
    Returns an (2, n)-shape ndarray with each element along the 0-th axis corresponding
    respectively to the index in `tracked_dataframe` of a new track and to the index
    of its sibling. This can then be used to construct lineages.
    """
    track_first_frames, _, _ = tracks_start_end(tracked_dataframe)
    track_start_index = []
    siblings = []
    for i, track_start_row in track_first_frames.iterrows():
        track_siblings = _find_sibling(
            tracked_dataframe,
            track_start_row,
            search_range_mitosis,
            antiparallel_threshold,
        )
        if track_siblings is not None:
            track_start_index.append(i)
            siblings.append(track_siblings)
    sibling_array = np.array([track_start_index, siblings]).T
    return sibling_array

In [32]:
pos_columns = ["y", "x"]
antiparallel_threshold = 0.1
search_range_mitosis = [30, 30]

In [33]:
sibling_array = _assign_siblings(
    tracked_dataframe, search_range_mitosis, antiparallel_threshold
)

In [121]:
mitosis_dataframe = tracked_dataframe.copy()

# Assign parent by indexing over dataframe
parent_index_array = []
for sibling_pair in sibling_array:
    parent_particle = mitosis_dataframe["particle"].iloc[sibling_pair[1]]
    parent_subdataframe = mitosis_dataframe["particle"] == parent_particle

    frame = mitosis_dataframe["frame"].iloc[sibling_pair[1]]
    parent_track = mitosis_dataframe["frame"].loc[parent_subdataframe] < frame

    parent_frame_series = mitosis_dataframe["frame"].iloc[parent_track.index]
    if parent_frame_series.empty:
        parent_index = np.nan
    else:
        parent_index = parent_frame_series.idxmax()
    parent_index_array.append(parent_index)

parent_index_array = np.array(parent_index_array)

# Assign new labels to siblings and clip parent track
new_label = tracked_dataframe["particle"].max() + 1
for sibling_pair in sibling_array:
    old_label = mitosis_dataframe["particle"].iloc[sibling_pair[1]]
    sibling_subdataframe = mitosis_dataframe["particle"] == old_label
    division_frame = mitosis_dataframe["frame"].iloc[sibling_pair[0]]
    sibling_new_track = (
        mitosis_dataframe["frame"].iloc[sibling_subdataframe.index] >= division_frame
    )
    sibling_new_track["particle"] = new_label
    new_label += 1

# Add parents to sibling pairs
mitosis_dataframe["parent"] = np.nan
for i, sibling_pair in enumerate(sibling_array):
    if not np.isnan(parent_index_array[i]):
        parent_label = mitosis_dataframe["particle"].iloc[int(parent_index_array[i])]
        for child in sibling_pair:
            mitosis_dataframe.loc[child, "parent"] = parent_label

mitosis_dataframe['parent'] = mitosis_dataframe['parent'].astype('Int64')

In [122]:
mitosis_dataframe[np.logical_not(np.isnan(mitosis_dataframe['parent']))]

Unnamed: 0,label,z,y,x,frame,t_s,t_frame,particle,v_z,v_y,v_x,parent
440,102,15.990534,3.605649,168.009263,165,2815.930548,169,59,0.405986,-0.117941,1.060810,59
504,168,18.465973,1.176004,143.877537,165,2817.887434,169,168,0.171907,0.011458,-0.175257,59
3356,135,13.839709,245.098607,149.497504,148,2532.141498,152,8,-0.599597,-0.339941,-0.833768,8
3393,172,16.160280,254.198500,169.970170,148,2533.975958,152,181,-0.631869,0.282688,0.299041,8
13132,62,12.629363,149.975359,32.177366,93,1617.720205,97,105,-1.249354,-0.452568,0.101457,105
...,...,...,...,...,...,...,...,...,...,...,...,...
23336,45,8.514699,40.322946,412.277578,7,135.901138,8,72,-0.496520,8.798116,-3.445656,72
23338,47,7.694001,57.290917,404.681395,7,135.252359,8,78,1.154044,-6.473850,0.321599,72
23342,51,8.241069,187.697176,440.946089,7,135.684828,8,155,-0.039536,-4.876377,-7.706510,155
23345,54,8.796984,175.387950,426.578940,7,136.124290,8,40,-0.798535,4.951389,5.289274,155


In [230]:
track_first_frames, track_last_frames, track_singletons = tracks_start_end(
    tracked_dataframe
)