# Master video notebook!
This title might be a bit ambitious, but this notebook is supposed to be able to do all of the administration work when it comes to analysing videos. Of course, many functions are already inherent to the kymo_class file and the plot_data file. Ultimately, this notebook is about creating a file hierarchy for the analysis files.

In step one, the Dropbox is scoured for information about videos. If the videos do not have a VideoInfo.txt, the program will look for a .csv, if there is no .csv, the program will look for a .xlsx file. This is currently in conflict with what is happening in the kymo_class.py file, so that one will have to be amended.

### Let's say that there are three ways to initiate a kymograph class:
1. No info file is submitted, and the class will look for such a file itself.
2. An info file is submitted, and the class will use the data in there.
3. A kymograph is submitted, and the class will instantiate with the parameters that are passed with the kymograph.

TODO: Make it so in the kymo_class
TODO: Streamline the variable storage such that edge properties are stored in the edge_analysis class, and video properties are stored in the video_analysis class.

### Below code:
Are just import statements

In [83]:
from IPython.display import clear_output
import re
from amftrack.pipeline.development.high_mag_videos.kymo_class import *
from amftrack.pipeline.development.high_mag_videos.plot_data import (
    save_raw_data,
    plot_summary,
    read_video_data
)
import sys
import os
import imageio.v2 as imageio
import matplotlib.pyplot as plt
import cv2
from tifffile import imwrite
from tqdm import tqdm
%matplotlib widget
%load_ext autoreload
%autoreload 2
from amftrack.pipeline.functions.image_processing.extract_graph import (
    from_sparse_to_graph,
    generate_nx_graph,
    clean_degree_4,
)
import scipy
import matplotlib as mpl

from amftrack.pipeline.launching.run import (
    run_transfer,
)
from amftrack.pipeline.launching.run_super import run_parallel_transfer

import dropbox
from amftrack.util.dbx import upload_folders, download, read_saved_dropbox_state, save_dropbox_state, load_dbx, download, get_dropbox_folders, get_dropbox_video_folders
from subprocess import call
import logging
import datetime
import glob

logging.basicConfig(stream=sys.stdout, level=logging.debug)
mpl.rcParams['figure.dpi'] = 300


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## File declaration
As this notebook is designed to work with Snellius, two items to separate are the raw video files and the anaylsis. The raw video files are large, bulky and not so easy to flip through. Ideally, the video files would be downloaded and the analysis would be stored on a separate folder structure entirely. That way, large scale analysis of analysis folders can happen when there are thousands of videos in the dataset, without having to have those raw video folders on hand.

### Input and output:
Please give separately the folder where raw video data is stored, and where the analysis will be stored.

In [37]:
videos_folder = "/gpfs/scratch1/shared/amftrackflow/videos/"
analysis_folder = "/gpfs/home6/svstaalduine/Analysis/"

# Module 1: Downloading videos from DropBox

## Dropbox scrounging
The below code is meant to scour the dropbox for information files on the videos. It is also to create a list of the videos within a certain database. The expectation at the very least is that one plate contains many videos, all labeled with a number. The code will take this list, and recreate the hierarchy within the Analysis_Output folder.

### Input and output:
The input will be the highest folder of the dropbox that needs to be analyzed, in addition to a folder which will store the excel sheets that will be downloaded for the DataFrame. The output will be a DataFrame that can be filtered in the next code block to prepare for downloading.

In [69]:
# dropbox_address = "/DATA/FLUORESCENCE/DATA_NileRed/20230201_Plate552/"
dropbox_address=  "/DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/"
# dropbox_address = "/DATA/TransportROOT/DATA/20230331_Plate773/"

excel_storage = "/gpfs/home6/svstaalduine/excel_storage/"

In [70]:
all_folders_drop, excel_drop, txt_drop = get_dropbox_video_folders(dropbox_address, True)

clear_output(wait=False)
print(all_folders_drop['tot_path_drop'][0])
all_folders_drop 

DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/01/Img


Unnamed: 0,folder,Plate number,Date Imaged,tot_path_drop,video
0,20230313_Plate031_01/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,01/Img/
1,20230313_Plate031_02/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,02/Img/
2,20230313_Plate031_03/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,03/Img/
3,20230313_Plate031_04/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,04/Img/
4,20230313_Plate031_05/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,05/Img/
5,20230313_Plate031_06/Img/,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,06/Img/


In [71]:
excel_addresses = np.array([re.search("^.*Plate.*\/.*Plate.*$", entry, re.IGNORECASE) for entry in excel_drop])
excel_addresses = excel_addresses[excel_addresses != None]
excel_addresses = [address.group(0) for address in excel_addresses]

info_addresses  = []
for address in np.concatenate([excel_addresses,txt_drop]):
    csv_name_len = len(address.split(os.sep)[-1])
    if not os.path.exists(analysis_folder + address[6:-csv_name_len]):
        os.makedirs(analysis_folder + address[6:-csv_name_len])
    download(address, analysis_folder + address[6:])
    info_addresses.append(analysis_folder + address[6:])
clear_output(wait=False)
print("All files downloaded!")

All files downloaded!


In [78]:
merge_frame = read_video_data(info_addresses, all_folders_drop)
print(merge_frame.columns)
merge_frame

Index(['unique_id', 'video_int', 'treatment', 'strain', 'xpos', 'ypos', 'mode',
       'binning', 'magnification', 'fps', 'time', 'file_name', 'index',
       'Plate number', 'imaging_day', 'tot_path_drop', 'video_folder',
       'plate_id_xl', 'tot_path', 'video_id', 'plate_nr'],
      dtype='object')


Unnamed: 0,unique_id,video_int,treatment,strain,xpos,ypos,mode,binning,magnification,fps,...,file_name,index,Plate number,imaging_day,tot_path_drop,video_folder,plate_id_xl,tot_path,video_id,plate_nr
0,20230313_Plate031_01,1,0Myr,C2,28520,80140,BF,2,50,20,...,20230313_Plate031,0,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,01/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/01/Img/,1,31
1,20230313_Plate031_02,2,0Myr,C2,27140,80180,BF,2,50,20,...,20230313_Plate031,1,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,02/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/02/Img/,2,31
2,20230313_Plate031_03,3,0Myr,C2,32600,86020,BF,2,50,20,...,20230313_Plate031,2,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,03/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/03/Img/,3,31
3,20230313_Plate031_04,4,0Myr,C2,35220,78840,BF,2,50,20,...,20230313_Plate031,3,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,04/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/04/Img/,4,31
4,20230313_Plate031_05,5,0Myr,C2,38860,93780,BF,2,50,20,...,20230313_Plate031,4,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,05/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/05/Img/,5,31
5,20230313_Plate031_06,6,0Myr,C2,39560,93820,BF,2,50,20,...,20230313_Plate031,5,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,06/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/06/Img/,6,31


## Dropbox filtering
If you want all videos in a plate or dataset, you can run this block and forget about it.
Otherwise, this block is where filtering can take place to only download videos with certain properties, like imaging mode. Then a file structure will be created in the videos and analysis folder, and videoInfo.txt folders are created for every video, in the analysis folder. The intent here is to have a uniform Analysis folder structure that works with the Morrison setup.

### Input
Use the section between commented lines to filter the DataFrame, otherwise leave blank
### Output
Within the video and analysis folder, a hierarchy will be created to mimic that of the dropbox folder structure, using the filtered DataFrame. Inside the analysis folder hierarchy, the VideoInfo.txt file will be generated. Either from an existing videoInfo.txt, or the excel/csv sheets.

In [79]:
#####################################################################################
### This is where you can apply the filters. Only those videos will be downloaded ###
#####################################################################################

download_frame = merge_frame.copy()

#####################################################################################
### Below code will prepare for those videos to be downloaded into videos_folder  ###
#####################################################################################

download_frame = download_frame.rename(columns={'tot_path' : 'folder'})
download_frame

Unnamed: 0,unique_id,video_int,treatment,strain,xpos,ypos,mode,binning,magnification,fps,...,file_name,index,Plate number,imaging_day,tot_path_drop,video_folder,plate_id_xl,folder,video_id,plate_nr
0,20230313_Plate031_01,1,0Myr,C2,28520,80140,BF,2,50,20,...,20230313_Plate031,0,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,01/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/01/Img/,1,31
1,20230313_Plate031_02,2,0Myr,C2,27140,80180,BF,2,50,20,...,20230313_Plate031,1,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,02/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/02/Img/,2,31
2,20230313_Plate031_03,3,0Myr,C2,32600,86020,BF,2,50,20,...,20230313_Plate031,2,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,03/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/03/Img/,3,31
3,20230313_Plate031_04,4,0Myr,C2,35220,78840,BF,2,50,20,...,20230313_Plate031,3,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,04/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/04/Img/,4,31
4,20230313_Plate031_05,5,0Myr,C2,38860,93780,BF,2,50,20,...,20230313_Plate031,4,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,05/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/05/Img/,5,31
5,20230313_Plate031_06,6,0Myr,C2,39560,93820,BF,2,50,20,...,20230313_Plate031,5,31,20230313,DATA/MYRISTATE/DATA/2_weeks/20230313_Plate031/...,06/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/06/Img/,6,31


In [103]:
for index, row in download_frame.iterrows():
    target_anals_file = f"{analysis_folder}{row['folder'][:-4]}"
    target_video_file = f"{videos_folder}{row['folder']}"
    
    if not os.path.exists(target_anals_file):
        os.makedirs(target_anals_file)
    if not os.path.exists(target_video_file):
        os.makedirs(target_video_file)
    
    row.to_json(f"{target_anals_file}/video_data.json", orient="index")

## Downloading
This section, there is one block of code that will ask you one last time whether all of the parameters are correct. The block of code after that will initiate Snellius jobs to download the videos in the DataFrame from the dropbox. Downloading videos is not that costly, but of course we prefer it to be done as efficiently as possible.
### Input:
Nothing
### Output:
Print statement with the DataFrame and the folders where everything will be stored.
Subsequent block of code will download raw video files to the videos folder.

In [99]:
run_parallel_transfer(
    "from_drop.py",
    [videos_folder],
    download_frame,
    1,
    "10:00:00",
    "transfer_test"
)
clear_output(wait=False)

print("Sent all the jobs! Use the command '$ squeue' in the terminal to see the progress")

sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003245


sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003246
Submitted batch job 3003247


sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.
sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003248


sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003249


sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003250


sbatch: Single-node jobs run on a shared node by default. Add --exclusive if you want to use a node exclusively.
sbatch: A full node consists of 32 CPU cores, 229376 MiB of memory and 0 GPUs and can be shared by up to 32 jobs.
sbatch: By default shared jobs get 7168 MiB of memory per CPU core, unless explicitly overridden with --mem-per-cpu, --mem-per-gpu or --mem.
sbatch: You will be charged for 0.03125 node, based on the number of CPUs, GPUs and the amount memory that you've requested.


Submitted batch job 3003251


KeyboardInterrupt: 

# Module 2: Analysis
Now that the files have been downloaded, it's time to analyse them. In the below code, you'll be able to either do a complete survey of the analysis folder for as many videos as possible, or use the DataFrame of recently downloaded videos to filter for the videos you want to analyse.

### Input:
DataFrame filters of all videos to be analysed
### Output:
Print statements for all parameters of the analysis session that is about to take place.

In [105]:
img_infos = glob.glob(f"{analysis_folder}/**/video_data.json", recursive=True)
vid_anls_frame = pd.DataFrame()
for address in img_infos:
    add_info = pd.read_json(address, orient='index').T
    vid_anls_frame = pd.concat([vid_anls_frame, add_info], ignore_index=True)

vid_anls_frame

Unnamed: 0,imaging_day,storage_path,plate_id,root,strain,treatment,crossing_day,video_int,time_(s),mode,...,magnification,time,file_name,index,Plate number,video_folder,plate_id_xl,folder,video_id,plate_nr
0,20230331,Dropbox\DATA\TransportROOT\DATA,20230331_Plate773,Carrot,C2,001P100N100C,20230327,68,30.0,BF,...,50.0,,,,,,,,,
1,20230331,Dropbox\DATA\TransportROOT\DATA,20230331_Plate773,Carrot,C2,001P100N100C,20230327,18,30.0,BF,...,50.0,,,,,,,,,
2,20230331,Dropbox\DATA\TransportROOT\DATA,20230331_Plate773,Carrot,C2,001P100N100C,20230327,14,30.0,BF,...,50.0,,,,,,,,,
3,20230331,Dropbox\DATA\TransportROOT\DATA,20230331_Plate773,Carrot,C2,001P100N100C,20230327,56,30.0,BF,...,50.0,,,,,,,,,
4,20230331,Dropbox\DATA\TransportROOT\DATA,20230331_Plate773,Carrot,C2,001P100N100C,20230327,6,30.0,BF,...,50.0,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69,20230313,,,,C2,0Myr,,5,,BF,...,50,30,20230313_Plate031,4,031,05/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/05/Img/,05,31
70,20230313,,,,C2,0Myr,,1,,BF,...,50,30,20230313_Plate031,0,031,01/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/01/Img/,01,31
71,20230313,,,,C2,0Myr,,4,,BF,...,50,10,20230313_Plate031,3,031,04/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/04/Img/,04,31
72,20230313,,,,C2,0Myr,,3,,BF,...,50,10,20230313_Plate031,2,031,03/Img/,20230313_Plate031,MYRISTATE/DATA/2_weeks/20230313_Plate031/03/Img/,03,31


## Run Analysis job
Below code will use the aforementioned DataFrame to initiate analysis jobs on Snellius.
## Input:
Snellius job parameters
## Output:
Analysis folder will be populated with analysis tiffs and csv sheets.

# Module 3: Bulk Analysis
## First part: Assemble Edge DataFrame
In this initial part of the bulk analysis, all of the analysis folders will be looked through to find the edge data we're looking for.