# Image Analysis Workspace


### Import modules

This notebook executes all functions of the image_analysis tools and allows the user to visualize and save the results. The convention for describing the workspace is as follows:

$\textbf{bold text}$  :    to describe path


$\color{blue}{blue \hspace{0.2cm} text}$  :    to describe functions


$\color{red}{red \hspace{0.2cm} text}$  :    to describe variables

The user has to run the cell below to load the image analysis tools and the other necessary packages to load and read the raw image.

In [None]:
from trackscopy_fluorescence import image_analysis_tools as im
from trackscopy_fluorescence import swim_mode_analysis_tools as sm

import os
import numpy as np
import tifffile as tf
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
import pickle

## Specify path of raw-image

The cell below loads the red channel and the green channel of the raw image. However the analysis works separately on the two channels, so the user can choose to load the two channels separately in any way. One has to only make sure that they are loaded as numpy arrays. 

In the cell below, an example syntax is provided where the raw image is not splitted and has only the red and green channels.

The user has to update the path of raw image and the extensions. Then the user has to run the cell below to load it. 

Note that running the cell below will automatically create a sub-directory '$\textbf{...[directory containing raw image]\_results\_image\_analysis}$'

In [None]:
#PARENT_DIRECTORY = '///'
#SUB_DIRECTORY = '///'
#DIRECTORY = os.path.join(PARENT_DIRECTORY,SUB_DIRECTORY)

DIRECTORY = 'sample_data'

FILE_NAME = 'motility_bacteria'

ext = '.tif'
ext_out = '.tif'

full_filename = (FILE_NAME+ext)
full_file =  os.path.join(DIRECTORY,full_filename)

res_path = os.path.join(DIRECTORY,'results_image_analysis')

if os.path.exists(res_path)!= True:
    
    os.mkdir(res_path)

R_im = tf.imread(full_file)[:,0,:,:]
G_im = tf.imread(full_file)[:,1,:,:]

##  Perform image analysis for red and green channels

The user has to check $\textbf{'....\\TrackScoPy-main\\trackscopy\_fluorescence\\default\_parameters\_image\_analysis.py'}$ to check the parameters. Then the user has to run the two cells below to perform all the image analysis steps for both the red-channel and the green-channel. By default, it executes the following functions for both the channels:

$\color{blue}{background\_correction( )}$ 

$\color{blue}{segmentation( )}$  
        
$\color{blue}{connected\_regions\_estimation( )}$ 
        
$\color{blue}{tracking( )}$

$\color{blue}{save\_parameters( )}$

The output 'data' (the user can change it freely to any name) will be an object (class) containing the following properties:

$\color{red}{background\_corrected\_image}$ :     Image object

$\color{red}{minimum\_projection}$ :     Numpy array 

$\color{red}{segmented\_image}$ :     Image Object

$\color{red}{connected\_position\_list}$ :    Numpy array

$\color{red}{tracklist}$ :     List of Numpy arrays (each corresponding to a trajectory)

In [None]:
# Red channel

data_red = im.Image_Analysis(R_im) ## initialize
#data_red.blur_factor = 15 ## for the analysis of images provided in sample_data_drift except the calibration data, uncomment this line
data_red.Analysis()

In [None]:
# Green channel

data_green = im.Image_Analysis(G_im) ## initialize
#data_green.blur_factor = 15 ## for the analysis of images provided in sample_data_drift except the calibration data, uncomment this line
data_green.Analysis()

### Visualize the tracks in background

The user can display the trajectories to observe if the analysis worked for both the red and green channels. The syntax in the two cells below show the trajectories from origin as well as in the background of the image selected. 

In [None]:
background_image_red = 2*data_red.background_corrected_image.im_array[-1]

im.plot_tracks_with_background(data_red.smooth_tracks,background_image_red)

In [None]:
background_image_green = 2*data_green.background_corrected_image.im_array[-1]

im.plot_tracks_with_background(data_green.smooth_tracks,background_image_green)

### note in case of error:

Sometimes it might happen that one of the above steps, e.g., the tracking fails or the user is not satisfied with the result. Then, the user has to change the parameters and run the particular analysis step again. The user can have a look at the parameters that the analysis used, by the syntax $\color{red}{data.parameter\_dict}$.

Then the user can change one or more particular parameters by the syntax $\color{red}{data.[parameter] = [new \hspace{0.2cm} value]}$, for example: let's say the tracking failed for blur_factor = 5 but the user has a feeling that it should work for blur_factor = 15, then the user has to update it by $\color{red}{data.blur\_factor = 15}$ and run the analysis again by modifying the parameter after the initialization step (as shown in the cells above as comments). 

#### To see options, what the user can do with an object, one can press "tab" after the syntax "data_red." or "data_green.".

## Classification of particles

Now that we have the tracking data for each of the detected particles in both red and green channels, we can classify them based on certain conditions imposed on their trajectories. 

Although, this framework can be used in many instances, where different portions of the same particle are tracked in different color channels, in this notebook, we work with a very specfic case as discussed below.      

## Swim mode detection

We consider the motility of a polarly flagellated soil bacterium $\textit{Pseudomonas putida}$ which can swim in three different modes- $\textbf{push}$ (where the bacterium rotates its flagellar bundle in the counter-clockwise direction to move forward along its axis, pointing from the flagellar bundle to cell body), $\textbf{pull}$ (where the bacterium rotates its flagellar bundle in the clockwise direction to move backward with respect to its axial direction) and $\textbf{wrap}$ (where the bacterium wraps its flagellar bundle around its cell body, in a screw-thread fashion to move). We track the cell body and the flagella of the bacteria in red and green channels respectively (as done above). Then, based on the positional alignment of the trajectories of the detected particles and their center of masses, we can classify each bacterium by their swim-modes. See the main text in the associated paper for further details of the detection algorithm:

$\textit{[Reference: paper (arXiv OR published\ version)]}$

The rest of the notebook is designed to execute the above.

### Specify calibration file (only for data with drift, skip cell below for data without drift)

If the images are tracked in presence of a drift (flow), the user has to perform the rest of the analysis in the co-moving frame. So, the user will need the components of the flow velocity. 

One way of doing that will be to do a calibration experiment where the user can introduce some passive non-swimming cells/objects in the flow, and track them to obtain the mean and standard deviation of their velocities in the red channel. Then the user can generate the calibration file corresponding to this passive swimming data only from the red-channel. The function $\color{blue}{im.generate\_calibration( )}$ calculates the x and y components of the mean and the standard deviation of the velocities from the tracks in units of pixels/frames respectively. Then, it saves these four values as four rows in the text file $\textbf{...\\[directory containing raw image]\\results\_image\_analysis\\calibration.txt}$. 

#### If the user wants to evaluate the calibration for the current file (calibration file), the commented two lines have to be uncommented. 

#### If the calibration file already exists, for the data with drift, it looks for the file and loads it.  In this case, make sure that the 1st and 3rd lines are commented out (default). 

#### Make sure to skip this step (one has to comment the whole cell out) if the data is without any flow.

In [None]:
#calibration = sm.generate_calibration(data_red.smooth_tracks)
cal_file = os.path.join(res_path,'calibration.txt')
#np.savetxt(cal_file,calibration)
calibration = np.loadtxt(cal_file)

Now, the user has to check $\textbf{'....\\TrackScoPy-main\\trackscopy\_fluorescence\\default\_parameters\_swim\_mode\_detection.py'}$ to check the parameters. Then the user has to run the cell below to perform the swim-mode detection analysis. By default, it executes the following functions:

$\color{blue}{Analysis( )}$ 

$\color{blue}{sort\_data( )}$  
        
$\color{blue}{plot\_tracks( )}$ 

$\color{blue}{save\_parameters( )}$

The output 'analysis' (the user can change it freely to any name) will be an object (class) containing the following properties:

$\color{red}{data\_with\_id}$ :     a complete Pandas dataframe corresponding to all possible detected swim-modes

$\color{red}{sorted\_data}$ :    a list of dataframes, each corresponding to a possible swimmer sorted by ID 

$\color{red}{swimmers}$ :     a list of Objects, corresponding to the sorted data, each inhereting their corresponding dataframes

$\color{red}{parameter\_dict}$ :     a dictionary of swim-mode analysis parameters

If the user wants to save the plot from the $\color{blue}{plot\_tracks( )}$ function, the corresponding path of the image has to be specified as an input inside the function. The syntax would be $\color{blue}{plot\_tracks([path\_to\_image])}$  

### Swimmer object

As mentioned in the cell above, $\color{red}{analysis.swimmers}$ corresponds to a list of Objects. The swimmer object consists of the following properties:

$\color{red}{dataframe}$ :   the dataframe inherited from the main analysis sorted by ID

$\color{red}{mean\_speed}$ :    a float corresponding to the average velocity of the swimmer in microns/sec 

$\color{red}{run\_time}$ :     an integer corresponding to the run-time of the swimmer in frames

$\color{red}{episodes}$ :    a list of Objects, which are subsets of the swimmer sorted by a specific feature (swim-mode, orientation or flow)

$\color{red}{sorted\_episodes\_by}$ :    a string corresponding to the feature mentioned above

And it contains the following function:

$\color{blue}{extract\_episodes( )}$     default input: sort_by = 'SWIM-MODE', can be changed into 'FLOW' or 'ORIENTATION'

#### Episode object

As mentioned in the cell above, $\color{red}{analysis.swimmer[i].episodes}$ corresponds to a list of Objects. The episode object consists of the following properties:

$\color{red}{dataframe}$ :   the dataframe inherited from the swimmer sorted by a specific feature (swim-mode, orientation or flow)

$\color{red}{mean\_speed}$ :    a float corresponding to the average velocity of the episode in microns/sec 

$\color{red}{run\_time}$ :     an integer corresponding to the run-time of the episode in frames

$\color{red}{sorted\_by}$ :    a string corresponding to the feature mentioned above by which it has been sorted from a given swimmer

$\color{red}{identity}$ :    a string corresponding to the value of the feature (e.g., type of swim-mode like 'pull')

In case of data with drift, incorporate the calibration file (loaded in the previous step) by adding "calibration=calibration" in the Swim_Mode_Detection" class. Otherwise, one has to leave it out (as commented below). And just like earlier, this will be an initialization step. The user can alter any of the parameters (see second comment) and run the analysis.

In [None]:
analysis = sm.Swim_Mode_Detection(data_red,data_green) ### initialize 
#analysis = sm.Swim_Mode_Detection(data_red,data_green,calibration=calibration) ### initialize 
#analysis.BACKGROUND_FACTOR = 0.05
analysis.Analysis()

The analysis automatically plots the trajectories classified by swim-mode detection. Blue corresponds to 'push', red corresponds to 'pull', green corresponds to 'wrap' and black corresponds to 'passive'.

## Check consistency of analysis results with observations

In [None]:
for i in range(len(analysis.swimmers)):
    print(f'Swimmer-{i}')
    for episode in analysis.swimmers[i].episodes:
        print(episode.identity)

## Visualize the swim-mode detection over the frames of image

The user can observe the motion of the pointing vector from the center of mass of the green signal to that of the red_signal by $\color{blue}{analysis.write\_arrow\_image([path\_to\_video])}$. 

In [None]:
#arrow_path = os.path.join(res_path,f'{FILE_NAME}_arrow_image.tiff')
#analysis.write_arrow_image(path=arrow_path)

Also, the user can save an overlay of the segmented red and green channels along with the detected swim-modes by the function $\color{blue}{analysis.write\_swim\_image([path\_to\_video])}$. 

However, this are optional. The cells below execute it, but it can be commented out. 

In [None]:
swim_path = os.path.join(res_path,f'{FILE_NAME}_swim_image.tiff')
analysis.write_swim_image(path=swim_path)

## Modifying default parameters and repeating analysis

The analysis is executed using the default parameters. It is recommended not to change them unless the type of dataset is completely different (different mutants, different size of bacteria, etc.). Instead the user can modify one or more of the parameters of the analysis and re-run it to obtain different results. The cell below is an example how to do that. The user should uncomment and modify accordingly to do it.

#### As stated earlier, for further options, what the user can do with the object, one can press "tab" after the syntax "analysis.".

In [None]:
#analysis.BAC_SIZE = 13
#analysis.ARROW_LENGTH = 60

#analysis.Analysis()

## Modifying swim-mode analysis results

Although, the analysis parameters are correct, the analysis might detect a few outliers which can be modified and updated by the user to improve the statistics. Below is an example syntax how to do it, where a few frames for the fourth swimmer are detected to be in passive swim-mode instead of wrap swim-mode. Or the user can also specify the indices explicitly and change the features as given by the ## syntax.   

In [None]:
#df = analysis.sorted_data[3]
#df.loc[df['SWIM-MODE'] == 'passive', 'SWIM-MODE'] = 'wrap'
#print(df)

##df.loc[[1,2,3],'SWIM-MODE'] = 'push'

#print(analysis.sorted_data[3])

#analysis.extract_swimmers()
#analysis.plot_tracks()

## Save Analysis Results as a pickle file

Once the  analysis works as per the user's requirement after the automatic analysis and manual corrections (if required), the user can save the analysis results in two ways. The user can save the whole analysis containing the features of all the red and green channels as well as the swim-mode analysis results  as a $\textbf{.pickle}$ file by the following syntax:

$\color{blue}{analysis.save([file name]})$

This will be a large file, around 2 times the size of the raw_image. 

Otherwise, the user can save the parameters used for the red and green channel image analysis, parameters for the swim-mode detection along with the swim-mode detection analysis results only by the following syntax:

$\color{blue}{analysis.save\_detection\_results\_only([file name]})$

For both cases, the user should not forget to run the syntax $\color{blue}{data.save\_parameters( )}$ (especially after manually correcting the parameters in the middle of analysis) before the user saves the final file. 

In [None]:
pickle_file = os.path.join(res_path,f'{FILE_NAME}_swim_mode_analysis.pickle')
analysis.save_detection_results_only(pickle_file)

## Save Analysis Results in a separate directory

The user will have the option to save the analysis results in a separate directory 

$\textbf{...\\[directory containing raw image]\\results\_image\_analysis\\[FILE\_NAME]\_individual\_analysis}$

inside which all dataframes will be saved individually as .csv files. Also, the parameters will be saved as a .txt file. One can uncomment to execute it. If the user is working with flow data, the orientation (ornt) has to be set to 'True'.

In [None]:
#ornt = True
ornt = False
analysis.save_results_individual(res_path,FILE_NAME,ORIENTATION=ornt)

## Analyse together a group of image files

An example function is provided which executes all the steps as mentioned above. This function can be applied over a number of image files, as done in the next cell below. It iterates over a given directory to look for all sub-directories inside it, for each of which, it automatically generates the calibration file and analyzes the other substacks in that sub-directory. One can modify according to one's arrangement and nomenclature of the files and uncomment to execute it. 

In [None]:
'''def analyse_together(full_file,res_path,calibration):
    
    R_im = tf.imread(full_file)[:,0,:,:]
    G_im = tf.imread(full_file)[:,1,:,:]

    filename = full_file.split('.')[0].split('/')[-1]
    
    data_red = im.Image_Analysis(R_im) ## initialize
    data_red.Analysis()
            
    data_green = im.Image_Analysis(G_im) ## initialize
    data_green.Analysis()
    
    if len(calibration) != 0:
        analysis = sm.Swim_Mode_Detection(data_red,data_green,calibration=calibration)
    else:
        analysis = sm.Swim_Mode_Detection(data_red,data_green)
        
    analysis.BAC_SIZE = 8

    analysis.Analysis()
    analysis.save_parameters()
    pickle_file = os.path.join(res_path,f'{filename}_swim_mode_analysis.pickle')
    analysis.save_detection_results_only(pickle_file)'''

In [None]:
'''PARENT_DIRECTORY = '///

res = 'analysis_result_collective'
cal_filename = '00_calibration.tif'

ext = '.tif'
ext_out = '.tif'


SUB_DIRECTORIES = [x for x in next(os.walk(PARENT_DIRECTORY), (None, [], None))[1] if x!=res]

for SUB_DIRECTORY in SUB_DIRECTORIES:
    
    print(f'Sub-directory: {SUB_DIRECTORY}')
    
    DIRECTORY = os.path.join(PARENT_DIRECTORY,SUB_DIRECTORY)
    res_path = os.path.join(DIRECTORY,'results_image_analysis')

    if os.path.exists(res_path)!= True:
        
        os.mkdir(res_path)    
    
    FILE_NAMES = [x for x in next(os.walk(DIRECTORY), (None, None, []))[2] if x!= cal_filename and if x.split('.')[-1]==ext]
    cal_file = os.path.join(DIRECTORY,cal_filename)
    caltxt_file = os.path.join(res_path,'calibration.txt')
    
    if os.path.exists(cal_file) == True:
        
        print('Generating Calibration File')
        
        if os.path.exists(caltxt_file) == True:
                calibration = np.loadtxt(caltxt_file)
        else:
            R_im = tf.imread(cal_file)[:,0,:,:]
            data_red = im.Image_Analysis(R_im) ## initialize
            data_red.Analysis()    
            calib = sm.generate_calibration(data_red.smooth_tracks)
            np.savetxt(caltxt_file,calib)
            calibration = np.loadtxt(caltxt_file)
        
    else:
        
        print('Calibration file not found')
        
        calibration = []
            
            
    for FILE_NAME in tqdm(FILE_NAMES,desc='Analysing all substacks'):
        
        print(FILE_NAME)
        
        full_file =  os.path.join(DIRECTORY,FILE_NAME)

        res_path = os.path.join(DIRECTORY,'results_image_analysis')

        if os.path.exists(res_path)!= True:

            os.mkdir(res_path)
            
        analyse_together(full_file,res_path,calibration)'''