# Introduction

This notebook applies the moving region segmentation analysis to a single experiment. Each experiment contains hbec cilia cells from the same culture, and the same concentration of medium (mucin) is used. Within an experiment groups of cells have been subjected to knock down of various genes using CRISPR. The groups are:

- Negative control (NT)
- fam13a gAA
- fam13a g1
- Positive control (DNA_gAA)

An experiment contains videos from up to 24 wells, and each of those videos contain multiple replicates of the groups mentioned above. The videos record the movement of fluorescent beads, at a frame rate of 10fps.

NOTE: the entire experiment must be processed at the same time as threshold value calculated is global across all the videos in a single experiment

### Moving Region Segmentation

In this notebook, the moving regions for each well are identified and saved as arrays for further processing. The steps in this calculation are as follows:

For each video of each well a region-of-interest (ROI) and max-frame are identified The ROI indicates the boundaries of the well. The max-frame provides the maximum intensity porjection across the time axis. 

1. ROI calculation
    - the frame with the highest brightness is passed to the function `image.roi.circle`. This function uses the Sobel operator and Hough transform to return the ROI.
2. Max frame calculation
    - For each video, take the brightest pixel across frames (so the time dimension) to obtain a static image with the maximum projection view.
    - Subtract the median across all frames and time points to remove background noise
    - Use an Otsu threshold to split the pixels in the image into two classes, highlighting trails of movement
    - The area of moving region is filled in using an averaging algorithm described in the `segment` function
3. combine the two masks obtained above to return the moving region mask

# Imports

In [None]:
%reload_ext autoreload
%autoreload 2

import numpy as np
import os
from matplotlib import pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from tqdm import tqdm
import skimage
from joblib import Parallel, delayed

from fam13a import consts, utils, track, image

# Declare constants

In [None]:
PROJ_ROOT = utils.here()
# declare the data input directory
HBEC_ROOT = os.path.join(PROJ_ROOT, 'data', 'interim', 'hbec')

print(os.listdir(HBEC_ROOT))

In [None]:
# set the experiment data to process
EXP_ID = 'N67030-59_ON'
# declare the various output directories
DATA_ROOT = os.path.join(HBEC_ROOT, EXP_ID)
EXP_ROOT = os.path.join(PROJ_ROOT, 'data', 'processed', 'hbec', EXP_ID)

ROI_OUTPUT_ROOT = os.path.join(EXP_ROOT, 'roi')
SEG_OUTPUT_ROOT = os.path.join(EXP_ROOT, 'segmented', 'movement')
NOISY_OUTPUT_ROOT = os.path.join(EXP_ROOT, 'segmented', 'noisy')
MAX_FRAME_OUTPUT_ROOT = os.path.join(EXP_ROOT, 'max_frame')

for root in [ROI_OUTPUT_ROOT, SEG_OUTPUT_ROOT, NOISY_OUTPUT_ROOT, MAX_FRAME_OUTPUT_ROOT]:
    os.makedirs(root, exist_ok=True)

# set level of parallelisation
NCPUS = 7
# define the common file extension used in the input data files
EXTENSION = '.ome.tif'

# find all relevant data files in the data directory 
files = sorted([_f for _f in os.listdir(DATA_ROOT) if _f.endswith('tif')])
# remove the extension the file names, se we keep only the bit with useful information
files = [_f.split(EXTENSION)[0] for _f in files]
print(files)

# Setup

Define some helper functions for convienence and to facilitate parallelisation. These are not declared in `src` as they are tightly coupled to this segmentation process

In [None]:
def clean(_file):
    frames = utils.frames_from_stack(os.path.join(DATA_ROOT, f'{_file}{EXTENSION}'))
    # find the ROI assuming it is a circle
    roi = image.roi.circle(frames.max(axis=0))
    
    # extract the maximal projection of the video 
    # the median is used as an estimate of background in the video
    # it is subtracted in order to emphasise the moving regions of the video
    max_frame = np.max(frames, axis=0) - np.median(frames, axis=0)
    return roi, max_frame


def segment(_file, roi, max_frame, thresh_value):
    
    # apply the OTSU threshold and set all pixels outside the ROI to 0 
    noisy_mask = (max_frame > thresh_value)
    noisy_mask[~roi] = 0
    
    # we now need to cleanup the mask, and to do this we take overlapping patches from the noisy mask
    # then for each patch if a large enough portion is segmented we set the whole patch to be segmented
    # then recombine all patches, with the overlaps giving multiple predictions (segmented or not) per pixel
    # and take the average of all those predictions
    
    # use 'reflect' mode to account for patches at the boundary of the image, want to get a consistent number
    # of overlaps for each pixel
    patched_mask = image.patch.extract(noisy_mask, image.consts.HBEC_WINDOW_SIZE,
                                       image.consts.HBEC_STEPS_SIZE, mode='reflect')
    # for each patch calculate the segmentation ratio and then threshold
    # this gives a prediction for the entire patch of segmented or not
    for idx, patch in enumerate(patched_mask):
        patched_mask[idx] = patch.sum()/patch.size > image.consts.HBEC_RATIO_PER_PATCH_THRESH
    patched_mask = patched_mask.astype(int)
    
    # merge the patches back into a single image, taking the average value for overlapping pixels
    merged = image.patch.merge(patched_mask, noisy_mask.shape, image.consts.HBEC_STEPS_SIZE, padded=True)
    # round the averaged pixel values, 
    # i.e. if atleast 1/2 of all predictions mark the pixel as segmented then it is
    merged = np.round(merged)
    
    # save the max_frame, ROI, initial segmentation, and cleaned segmentation as numpy arrays
    np.save(os.path.join(SEG_OUTPUT_ROOT, f'{_file}.npy'), merged)
    np.save(os.path.join(ROI_OUTPUT_ROOT, f'{_file}.npy'), roi)
    np.save(os.path.join(MAX_FRAME_OUTPUT_ROOT, f'{_file}.npy'), max_frame)
    np.save(os.path.join(NOISY_OUTPUT_ROOT, f'{_file}.npy'), noisy_mask)

# Processing

In [None]:
# extract the ROI and max_frame from each video
# parallelise the processing across the different files/videos
# with NCPUS=7, the total time to run across 24 wells is 9 min
with Parallel(n_jobs=NCPUS, verbose=20) as par:
    cleaned = par(delayed(clean)(_file) for _file in files)

# split the ROIs and max_frames into separate lists
rois, max_frames = list(zip(*cleaned))
# convert list of max_frames into a single numpy array as required by the downstream process
max_frames = np.stack(max_frames)

In [None]:
# calculate a global threshold value for all max_frames using the OTSU algorithm 
thr_val = skimage.filters.threshold_otsu(max_frames)

In [None]:
# ensure all the output directories exist
for root in [SEG_OUTPUT_ROOT, ROI_OUTPUT_ROOT, MAX_FRAME_OUTPUT_ROOT, NOISY_OUTPUT_ROOT]:
    os.makedirs(os.path.abspath(root), exist_ok=True)

In [None]:
# run the segmentation process for each ROI, max_frame pair
# parallelised across ROIs/max_frames
# with NCPUS=7, the total time to run across 24 wells is 30 sec
with Parallel(n_jobs=NCPUS, verbose=20) as par:
    par(delayed(segment)(_file, roi, max_frame, thr_val) for _file, roi, max_frame in zip(files, rois, max_frames))