Cyna Shirazinejad, 7/7/21

# Notebook 1: loading data for model generation

outline:

* load all data, including:
    * movies from AP2-tagRFP-T, tagGFP2-DNM2 cell lines
* filter out 'valid' tracks
    * valid' tracks are tracks which consist of tracks that appear and disappear 
      in the bounds of the movie with no more than 2 consecutive gaps
    * this is characterized when using AP2 as the primary channel for tracking
* creating dataframes of features from tracked events from fitted amplitude and position space to target feature space
    * each track will be decomposed into features, described in the notebook
    * the number of cell line tags will be included as a label (2 or 3)
    * the experiment number will be included as a label (1-8)
    * the date of the experiment
    * the cmeAnalysis classification as "DNM2-positive" (cmeAnalysisDNM2+) 
      or "DNM2-negative" will be included as a label (1 or 0)
* save dataframes and tracks for future notebooks

# user parameters to toggle plot-generation and/or dataframe construction and corresponding calculations

In [35]:
# set a path to the prefix of the pooled working directory with all of the data 
# the folder that contains all data for this analysis is 'ap2dynm2arcp3_project'
# (this folder, containing all raw and tracking data, is available on GitHub)
unique_user_path_tracks = '/Volumes/Google Drive/My Drive/Drubin Lab/ap2dynm2arcp3_project/tracking_data/AD_cellline_analysis_formatted/' # needs to be set for each user
unique_user_path_notebook = '/Users/cynashirazinejad/Documents/GitHub/Jin_Shirazinejad_et_al_branched_actin_manuscript/analysis'
unique_user_saved_outputs = '/Volumes/GoogleDrive/My Drive/Drubin Lab/ap2dynm2arcp3_project/stable_outputs_simplified'

# import all necessary Python modules

In [36]:
%load_ext autoreload
%autoreload 2
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
sys.path.append(unique_user_path_notebook+'/cmeAnalysisPostProcessingPythonScripts') # add custom Python scripts to the local path
import display_tracks
import merge_tools
import feature_extraction_with_buffer
import generate_index_dictionary

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [37]:
analysis_metadata = {}
analysis_metadata['path_tracks'] = unique_user_path_tracks
analysis_metadata['path_notebook'] = unique_user_path_notebook
analysis_metadata['path_outputs'] = unique_user_saved_outputs

# auto directory adding for notebooks

In [38]:
if 'plots' not in os.listdir(unique_user_saved_outputs):
    os.mkdir(unique_user_saved_outputs+'/plots/')
if 'dataframes' not in os.listdir(unique_user_saved_outputs):
    os.mkdir(unique_user_saved_outputs+'/dataframes/')

# all track feature options:

conventions:
1. intensities: fitted amplitude of fluorescence (excluding fitted local background)
2. positions: fitted positions (x,y) of two dimensional point-spread-functions per frame in track
3. voxel-width = 108 nm

features:

1. 'lifetime': time between the first and last frame of detected AP2 (seconds)
2. 'max_int_ap2': maximum intensity (a.u. fluorescence)
3. 'max_int_dnm2': maximum intensity (a.u. fluorescence)
4. 'dist_traveled_ap2': track start-to-finish net movement (pixels)
5. 'dist_traveled_dnm2': track start-to-finish net movement (pixels)
6. 'max_dist_between_ap2_dnm2': the maximum frame-to-frame separation between AP2 and DNM2 (pixels)
7. 'md_ap2': mean displacement (pixels)
8. 'md_dnm2': mean displacement (pixels)
9. 'time_to_peak_ap2': time for the intensity to reach its peak (seconds) [0 if peak is first frame]
10. 'time_to_peak_dnm2': time for the intensity to reach its peak (seconds) [0 if peak is first frame]
11. 'time_after_peak_ap2': time for intensity to decay from its peak (seconds) [0 if peak is last frame]
12. 'time_after_peak_dnm2': time for intensity to decay from its peak (seconds) [0 if peak is last frame]
13. 'time_between_peaks_ap2_dnm2': time between peaks of two channels (seconds)
14. 'avg_int_change_to_peak_ap2': average change in intensity to the peak (a.u. fluorescence) [0 if peak is first frame]
15. 'avg_int_change_to_peak_dnm2': average change in intensity to the peak (a.u. fluorescence) [0 if peak is first frame]
16. 'avg_int_change_after_peak_ap2': average change in intensity after the peak (a.u. fluorescence) [0 if peak is last frame]
17. 'avg_int_change_after_peak_dnm2': average change in intensity after the peak (a.u. fluorescence) [0 if peak is last frame]
18. 'peak_int_diff_ap2_dnm2': difference between maximum intensity of channel 0 and channel 1 (a.u. fluorescence)
19. 'ratio_max_int_ap2_dnm2': ratio between maximum intensity of channel 0 and channel 1 (unitless)
20. 'mean_ap2': average of fluorescence (a.u. fluorescence)
21. 'mean_dnm2': average of fluorescence (a.u. fluorescence)
22. 'variation_ap2': variation of fluorescence (a.u. fluorescence^2)
23. 'variation_dnm2': variation of fluorescence (a.u. fluorescence^2)
24. 'skewness_ap2': skewness of fluorescence (unitless)
25. 'skewness_dnm2': skewness of fluorescence (unitless)
26. 'kurtosis_ap2': kurtosis of fluorescence (unitless)
27. 'kurtosis_dnm2': kurtosis of fluorescence (unitless)
28. 'number_significant_dnm2': number of significant detections with p-val lower than provided threshold (counts) [p-val < 0.01]
29. 'max_consecutive_significant_dnm2': maximum number of consecutive significant detections with p-val lower than provided threshold (counts) [p-val < 0.01]
30. 'fraction_significant_dnm2': fraction of event with significant detections with p-val lower than provided threshold (unitless) [p-val < 0.01]
31. 'fraction_peak_ap2': fraction of the event where the peak is located (unitless)
32. 'fraction_peak_dnm2': fraction of the event where the peak is located (unitless)

In [39]:
# the physical units of each track feature
feature_units = ['seconds',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'pixels',
                 'pixels',
                 'pixels',
                 'pixels',
                 'pixels',
                 'seconds',
                 'seconds',
                 'seconds',
                 'seconds',
                 'seconds',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'unitless',
                 'a.u. fluorescence',
                 'a.u. fluorescence',
                 'a.u. fluorescence**2',
                 'a.u. fluorescence**2',
                 'unitless',
                 'unitless',
                 'unitless',
                 'unitless',
                 'counts',
                 'counts',
                 'unitless',
                 'unitless',
                 'unitless']

In [40]:
possible_track_features_labels = ['lifetime',
                                 'max_int_ap2',
                                 'max_int_dnm2',
                                 'dist_traveled_ap2',
                                 'dist_traveled_dnm2',
                                 'max_dist_between_ap2_dnm2',
                                 'md_ap2',
                                 'md_dnm2',
                                 'time_to_peak_ap2',
                                 'time_to_peak_dnm2',
                                 'time_after_peak_ap2',
                                 'time_after_peak_dnm2',
                                 'time_between_peaks_ap2_dnm2',
                                 'avg_int_change_to_peak_ap2',
                                 'avg_int_change_to_peak_dnm2',
                                 'avg_int_change_after_peak_ap2',
                                 'avg_int_change_after_peak_dnm2',
                                 'peak_int_diff_ap2_dnm2',
                                 'ratio_max_int_ap2_dnm2',
                                 'mean_ap2',
                                 'mean_dnm2',
                                 'variation_ap2',
                                 'variation_dnm2',
                                 'skewness_ap2',
                                 'skewness_dnm2',
                                 'kurtosis_ap2',
                                 'kurtosis_dnm2',
                                 'number_significant_dnm2',
                                 'max_consecutive_significant_dnm2',
                                 'fraction_significant_dnm2',
                                 'fraction_peak_ap2',
                                 'fraction_peak_dnm2']

In [41]:
possible_track_features = ['lifetime',
                            'max_int_ch0',
                            'max_int_ch1',
                            'dist_traveled_ch0',
                            'dist_traveled_ch1',
                            'max_dist_between_ch0_ch1',
                            'md_ch0',
                            'md_ch1',
                            'time_to_peak_ch0',
                            'time_to_peak_ch1',
                            'time_after_peak_ch0',
                            'time_after_peak_ch1',
                            'time_between_peaks_ch0_ch1',
                            'avg_int_change_to_peak_ch0',
                            'avg_int_change_to_peak_ch1',
                            'avg_int_change_after_peak_ch0',
                            'avg_int_change_after_peak_ch1',
                            'peak_int_diff_ch0_ch1',
                            'ratio_max_int_ch0_ch1',
                            'mean_ch0',
                            'mean_ch1',
                            'variation_ch0',
                            'variation_ch1',
                            'skewness_ch0',
                            'skewness_ch1',
                            'kurtosis_ch0',
                            'kurtosis_ch1',
                            'number_significant_ch1',
                            'max_consecutive_significant_ch1',
                            'fraction_significant_ch1',
                            'fraction_peak_ch0',
                            'fraction_peak_ch1']

In [42]:
analysis_metadata['feature_units'] = feature_units
analysis_metadata['possible_track_features'] = possible_track_features
analysis_metadata['possible_track_features_labels'] = possible_track_features_labels

# extract features from all tracks, labeled by experiment (0-7), number of imaging channels/labels, and date of experiment

In [None]:
df_merged_features, merged_all_valid_tracks = display_tracks.upload_tracks_and_metadata('/Users/cynashirazinejad/Desktop/test3',
                                                               [1],
                                                               'Cell',
                                                               possible_track_features,
                                                               possible_track_features_labels)


folders to mine:
['200804_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell004_1s', '200804_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell005_1s', '200804_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell006_1s', '200804_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell007_1s', '200804_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell010_1s', '200819_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell004_1s', '200819_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell005_1s', '200819_ap2-dnm2_ap2-dnm2_wildtype_no-treatment_Cell006_1s']


The number of tracks returned: 7510

The number of tracks returned: 7955

The number of tracks returned: 5950

The number of tracks returned: 7295



In [None]:
# save the dataframe for subsequent notebooks
compression_opts = dict(method='zip',
                        archive_name=unique_user_saved_outputs+'/dataframes/df_merged_features.csv')  

df_merged_features.to_csv(unique_user_saved_outputs+'/dataframes/df_merged_features.zip', index=False,
                                                          compression=compression_opts) 

# save all valid tracks, split to allow for <100 mb permitted size

In [None]:
number_of_track_splits = 20
analysis_metadata['number_of_track_splits'] = number_of_track_splits

np.save(analysis_metadata['path_outputs']+'/dataframes/analysis_metadata', analysis_metadata)

In [None]:
# split tracks
split_valid_tracks = np.array_split(np.array(list(merged_all_valid_tracks)),number_of_track_splits)

In [None]:
# save each track array chunk
for i in range(len(split_valid_tracks)):

    np.save(unique_user_saved_outputs+"/dataframes/merged_all_valid_tracks_"+str(i), np.array(split_valid_tracks[i]))

In [None]:
df_merged_features