# How to use this notebook

**For a test run**: 
- In "Kernel" select "Restart and Clear Output"
- then use Shift + Enter to run the individual cells after setting them up

**if all cells are correctly set up**: just press "Run"


This notebook processes the output from CellProfiler (CSV table with the track data & measurements) and aligns the tracks according to a reference time.

The notebook includes step-by-step processing of tracks tabels obtained from CellProfiler. 
Functions used to process the tracks are written within the module trackprocessor.py.

# Import Required Packages

In [1]:
# import neccessary packages
import os
import glob
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
from tqdm import tqdm

import networkx as nx
from networkx.drawing.nx_agraph import to_agraph 

from skimage import measure

from functools import partial
import pathos.pools as pp
import dill

import trackprocessor

In [2]:
import importlib

In [3]:
importlib.reload(trackprocessor);

# Environment configuration

In [4]:
%matplotlib notebook

In [5]:
plt.ioff() # turn interactive plotting off

In [6]:
plt.rcParams.update({'figure.max_open_warning': 0}) # ignore max plotted figures warning

In [7]:
dill.settings['recurse'] = True

# Parameters setting

## Input and output folders

In [8]:
# NOTE: do not enclose the path string by a seperator ("\" for Windows or "/" for Linux or MacOS)
# Windows example: C:\blabla\blabla
# Linux or MacOs example: /home/blabla/blabla
base_input_path = r"/media/mphan/Data/Perso/Phan/LOB/NucleoTeloTrack/2020-09_RUN3_CP4.0.3/siCTRL_20190524_Pos02_cl16"
base_output_path = r"/media/mphan/Data/Perso/Phan/LOB/NucleoTeloTrack/Output3"

## Reading csv files from input folder

In [9]:
# Create "Movies" subfolder
base_output_spath = os.path.join(base_output_path,"Movies")
if not os.path.exists(base_output_spath):
    os.makedirs(base_output_spath)

# Read all csv file from folder
input_files = glob.glob(os.path.join(base_input_path,"**/*.csv"),recursive=True)
print("nb. of files:",len(input_files))
[print(i,":",input_files[i].split(base_input_path)[1]) for i in range(len(input_files))];

nb. of files: 3
0 : /Image.csv
1 : /Nuclei.csv
2 : /Telomere.csv


## State transistions labels

**State labels** are defined as the classification labels assigned by the CellProfiler pipeline.
In our case, we used 5 labels: interphase, prophase, prometaphase, metaphase and anaphase.
Each label will have a unique **state number** which will be use to define the **state transition** from one to the other using the **transition rule graph (section 2.3)**

However, it is possible to have more than 5 labels and then change the transition rule graph accordingly.

In [10]:
# Define state transistions
state_labels = ["interphase","prophase","prometaphase","metaphase","anaphase"]
numbers = np.arange(len(state_labels))+1 # the number is assigned automatically in increasing order from 1
state_numbers = pd.Series(index=state_labels,data=numbers)
print(state_numbers);

interphase      1
prophase        2
prometaphase    3
metaphase       4
anaphase        5
dtype: int64


## Transistion rule graph

Below you can define the authorized transitions. For instance/

**interphase** can transition to **prophase** or **prometaphase** but not to other states.

You can sequentially define all authorised transitions and review them in the graph that will be saved

In [11]:
# Initialize graph with multiple directions
G=nx.OrderedMultiDiGraph() # this graph type keeps order of input nodes

# Add nodes
G.add_nodes_from(state_labels);

# Add self transistions
G.add_edges_from(list(zip(state_labels,state_labels)));

# Define transistion rules
G.add_edges_from([("interphase",item) for item in ["prophase","prometaphase"]]);
G.add_edges_from([("prophase",item) for item in ["prometaphase","metaphase"]]);
G.add_edges_from([("prometaphase",item) for item in ["prophase","metaphase"]]);
G.add_edges_from([("metaphase",item) for item in ["prometaphase","anaphase"]]);
G.add_edges_from([("anaphase",item) for item in ["interphase"]]);

In [12]:
# Save fig
fig = plt.figure(figsize=(10,7))
ax = fig.add_subplot(111)
pos = nx.circular_layout(G)
nx.draw_networkx(G, pos=pos, ax=ax, width=1, arrowsize=20, 
                 min_source_margin=50, min_target_margin=50,
                 node_shape="s", node_color="none")
fig.savefig(os.path.join(base_output_path,"transistion_rule.png"))
plt.tight_layout();

### Test remove transistion functions

## Excluded border conditions

2 options to remove objects touching the border:
1. **circle (percentage argument)**: for _CellProfiler version <4.0_, using parameters "AreaShape_Center_X", "AreaShape_Center_Y", "AreaShape_MinorAxisLength", "AreaShape_MajorAxisLength".
This method will approximate the object as a circle based on the parameters above and the user can specify a cutoff percentage for which the object will be excluded. For instance, if the criteria is circle with 0.8, it means object where <80% of the area is in the frame will be excluded (ie 20% of object is outside of the frame).



2. **Bounding Box**: for _CellProfiler versions >4.0_; the BoundingBoxMaximum and BoundingBoxMinimum coordinates were added in the v4.0 and can be used to remove objects whose coordinates intersect with image boundaries

In [13]:
# Define exclude border condition
# criterion can be "bbox" or "circle"
# if criterion is bbox: {"criterion":"bbox"}
# if criterion is "circle", you can set percentage, e.g. {"criterion":"circle", "percentage":0.8} means take 80% of circle area
exclude_borderobjs_conds = {"criterion":"circle", "bymajor":True, "percentage":0.7}

## Alignment conditions

1st order: if the track goes through metapahse then use the last metaphase as time 0 
2nd order: if the track starts with anaphase, then assign time point 1 


**NOTE**: can change this alignment for reversine or prophase as reference time


In [14]:
# Define rule for aligning time points
align_conds={"state_numbers":[state_numbers["metaphase"],state_numbers["anaphase"]],
             "align_modes":["last","first"],
             "shifts":[0,1]}

## Features

In [15]:
# Define features will be added after alignment
features = ["ImageNumber","ObjectNumber","TrackObjects_Label",
            "AreaShape_Area",
            "AreaShape_Perimeter",
            "AreaShape_FormFactor",
            "Intensity_IntegratedIntensity_H2B_Smooth",
            "Intensity_IntegratedIntensity_TRF1_Smooth",
            "Intensity_MeanIntensity_H2B_Smooth",
            "Intensity_MeanIntensity_TRF1_Smooth",
            "Mean_Telomere_AreaShape_Area",
            "Mean_Telomere_AreaShape_Perimeter",
            "Mean_Telomere_Distance_Minimum_Nuclei",
            "Mean_Telomere_Distance_Centroid_Nuclei",
            "Mean_Telomere_Intensity_IntegratedIntensity_TRF1_Smooth",
            "Children_Telomere_Count"]

# Processing

## Process a specific file

This is used to rerun or test a specific file. Set following 2 cells as "Code" if want to run or "Raw NBConvert" if don't want to.

Otherwise, go to the next section.

## Process all files

You can set the miminum number of timepoints for the tracks ie in this case we only use tracks with at least 5 timepoints

In [16]:
def compact_func(f,base_input_path,base_output_spath,
                 features,transistion_graph,
                 nrows_limit,min_nb_timepoints,
                 exclude_borderobjs_conds,align_conds):
    
    # configure output path
    output_path = base_output_spath
    basestr = f.split(base_input_path)[1].split('.csv')[0]
    for name in basestr.split(os.sep):
        if name != "":
            output_path = os.path.join(output_path,name)
    
#     try:
    trackprocessor.process_data(f,output_path,features,transistion_graph,
                          nrows_limit,min_nb_timepoints,
                          exclude_borderobjs_conds,align_conds)
#     except:
#         return (False,basestr)
    
#     return (True,basestr)

In [17]:
partial_func = partial(compact_func,
                       base_input_path=base_input_path,base_output_spath=base_output_spath,
                       features=features,transistion_graph=G,
                       nrows_limit=30,min_nb_timepoints=2,
                       exclude_borderobjs_conds=exclude_borderobjs_conds,align_conds=align_conds)

In [18]:
partial_func(input_files[1])

149
