# Master video notebook!
This title might be a bit ambitious, but this notebook is supposed to be able to do all of the administration work when it comes to downloading, processing and analysing videos. The most important functions that will be called are stored in other python files, such that this notebook will remain legible. Analysis will be able to be done with a hierarchy structure of dataset, plate, video, hypha.

BEWARE:
When saving this document through a GitHub commit, make sure to clear all outputs. This document can easily go over the 100 MB size limit of GitHub files. 

### In MODULE one,
the Dropbox is scoured for information about videos. If the videos do not have a VideoInfo.txt, the program will look for a .csv, if there is no .csv, the program will look for a .xlsx file. Once these files have been found, all information will be merged into a pandas dataframe, and saved as a json file for the dataset and for each video. Some datasets contain thousands of videos, so scouring the dropbox for info on all of them is going to be an hours-long affair. Plan your analysis accordingly.

After scouring is complete, a final filtering step can be taken, whereupon the whole list of videos can be downloaded. NB: Downloading happens in two ways: videos are downloaded to the specified analysis folder, whereas video parameters and analysis will be downloaded to the specified analysis folder. This separation is done such that videos can be stored on larger storage drives, and analysis folders on faster storage drives.

(if Snellius is still used, it is recommended to use your scratch storage to store videos, and your home storage to store analysis. Scratch storage gets wiped every two weeks, but is much larger than home storage. )

TODO: Give options to download with SLURM job or manually.

### In MODULE two,
the downloaded videos with their respective information can be filtered, then analysed with a large SLURM job. In the future there might need to be functionality that allows processing without the use of a SLURM job. If you're reading this in 2024, you better apply for another Snellius grant!

### In MODULE three,
This is where all the bulk analysis is going to be. In high_mag_analysis.py, there are a number of classes and functions that will help you with parsing the data into meaningful graphs. This MODULE assumes the existence of the video_info.json files that are generated partly in MODULE 1.

### Below code:
Are just import statements

In [None]:
from IPython.display import clear_output
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import re

import sys
import dropbox
import pandas as pd
import numpy as np
import imageio.v2 as imageio
import matplotlib.pyplot as plt
import cv2
import logging
import datetime
import glob
import json
import scipy
import matplotlib as mpl

from subprocess import call
from tifffile import imwrite
from tqdm import tqdm
from pathlib import Path
from amftrack.util.dbx import upload_folder, download, read_saved_dropbox_state, save_dropbox_state, load_dbx, get_dropbox_folders_prince, get_dropbox_video_folders, download_video_folders_drop, download_analysis_folders_drop
from amftrack.pipeline.launching.run import (
    run_transfer,
)
from amftrack.pipeline.launching.run_super import run_parallel_transfer
from amftrack.pipeline.launching.run_super import run_parallel_flows
from amftrack.pipeline.functions.transport_processing.high_mag_videos.plot_data import (
    plot_summary,
    save_raw_data,
)
from amftrack.pipeline.functions.transport_processing.high_mag_videos.high_mag_videos_fun import (
    filter_kymo
)
from amftrack.pipeline.functions.transport_processing.high_mag_videos.high_mag_analysis import (
    HighmagDataset,
    VideoDataset,
    EdgeDataset,
    index_videos_dropbox_new,
    analysis_run,
)
from amftrack.pipeline.functions.transport_processing.high_mag_videos.kymo_class import (
    KymoVideoAnalysis,
    KymoEdgeAnalysis
)

%matplotlib widget
%load_ext autoreload
%autoreload 2
logging.basicConfig(stream=sys.stdout, level=logging.debug)
mpl.rcParams['figure.dpi'] = 200
print("import great succes!")

## File declaration
As this notebook is designed to work with Snellius (now also on a local computer!), two items to separate are the raw video files and the analysis. The raw video files are large, bulky and not so easy to flip through. Ideally, the video files would be downloaded and the analysis would be stored on a separate folder structure entirely. That way, large scale analysis of analysis folders can happen when there are thousands of videos in the dataset, without having to have those raw video folders on hand.

Below function will basically make your folders fertile ground to accept all the video info folders and raw video files.

### Input:
Please give separately the folder where raw video data is stored, and where the analysis will be stored. Also give the dropbox address of the dataset you want to analyze.

### Output:
The specified dropbox folder will be looked through, and all relevant video information will be downloaded to an analysis folder structure identical to what is present on teh dropbox. The relevant raw video folder structure will also be generated, if specified so. Will also create cache files in the form of .json files such that next time, the scrounging does not have to happen.

In [None]:
# videos_folder = "F:\\AMOLF_Data\\videos\\"
# analysis_folder = "F:\\AMOLF_Data\\analysis\\"

# videos_folder = "/gpfs/scratch1/shared/amftrackflow/videos/"
# analysis_folder = "/gpfs/home6/svstaalduine/Analysis/"

videos_folder = "/scratch-shared/amftrack/videos/"
analysis_folder = "/scratch-shared/amftrack/analysis_videos/"

In [None]:
# dropbox_address = "/DATA/FLUORESCENCE/DATA_NileRed/"        # Rachael's dataset
# dropbox_address=  "/DATA/MYRISTATE/MorrisonDATA/2_weeks/"           # Hannah's dataset
# dropbox_address = "/DATA/TransportROOT/DATA/"               # Victoria's dataset
dropbox_address = "/DATA/CocoTransport/"                    # Corentin's dataset
dropbox_address = "/DATA/FLUORESCENCE/DATA_NileRed/20221109_Plate462/" # Default video path

## Where to go?
If you want to download videos:
Use MODULE 1

If you want to analyze already downloaded videos:
Skip MODULE 1, use MODULE 2

# MODULE 1: Downloading
This section, there is one block of code that will ask you one last time whether all of the parameters are correct. The block of code after that will initiate Snellius jobs to download the videos in the DataFrame from the dropbox. Downloading videos is not that costly, but of course we prefer it to be done as efficiently as possible.
## I'm not on Snellius! How do i download stuff??
Easy. Just skip the second block of code. The one below will just use the dropbox API to properly download all your raw data.
WARNING: This process can be quite long if you are queueing up a lot of videos. Do not use that block of code on Snellius, they will get mad at you (and prematurely stop your running program), just use the SLURM job in that case.

### Input:
Addresses of where to find the files on dropbox, where to store videos on snellius and where to store analysis. Using a start and end date really speeds up scrounging. I know you don't actually need to load all of them right now. I see you! There are some folders on dropbox that do not have a date. These are never filtered out by start and end date.
### Output:
Print statement with the DataFrame and the folders where everything will be stored.
Subsequent block of code will download raw video files to the videos folder.

In [None]:
video_param_frame = index_videos_dropbox_new(analysis_folder, videos_folder, dropbox_address, REDO_SCROUNGING=True,
                                            # date_start=20230906,
                                            # date_end=20240130,
                                            plate_names=None)
video_param_frame

In [None]:
video_param_frame['tot_path_drop'].iloc[-1]


In [None]:
#####################################################################################
### This is where you can apply the filters. Only those videos will be downloaded ###
#####################################################################################

download_frame = video_param_frame.copy()
# download_frame = download_frame[download_frame['mode'] == "F"]
# download_frame = download_frame[download_frame['unique_id']=="20221109_Plate462_17"]
# download_frame = download_frame[download_frame['unique_id']=="20230810_Plate441_090"]
# download_frame = download_frame[download_frame['unique_id']=="20201209_Plate40_22"]
# download_frame['plate_int'] = [entry.split('_')[-1] for entry in download_frame['plate_id']]
# download_frame['video_int'] = [entry.split('_')[-1] for entry in download_frame['unique_id']]

# download_frame = download_frame[download_frame['plate_int'] == "Plate558"].reset_index(drop=True)
# download_frame = download_frame[download_frame['imaging_day'] == "20201209"].reset_index(drop=True)
# download_frame = download_frame[download_frame['imaging_day'] == "20230906"].reset_index(drop=True)

# for entry in download_frame['folder']:
#     entry_list = list(entry) 
#     entry_list[58] = 'P'
#     entry="".join(entry_list)
#     print(entry)
# download_frame = download_frame[26:28]


#####################################################################################
### Below code will prepare for those videos to be downloaded into videos_folder  ###
#####################################################################################
print(f"Number of videos that will be downloaded: {len(download_frame)}")
# print(download_frame.head(5)['folder'].iloc[1])
print(download_frame['tot_path_drop'].iloc[0])

In [None]:
run_parallel_transfer(
    "from_drop_video.py",
    [videos_folder],
    download_frame,
    1,
    "10:00:00",
    "download_job"
)
# clear_output(wait=False)

print("Sent all the jobs! Use the command '$ squeue' in the terminal to see the progress")

### Download videos from Dropbox (Not a SLURM job)
This block of code can be used to download videos individually from dropbox. 
Be aware:
- This is significantly slower than launching a SLURM job
- This downloads videos sequentially, not in parallel
- If this function is running for too long on Snellius, it might get you booted from the interactive node
- Videos are large. Make sure you have the space.

In [None]:
download_video_folders_drop(download_frame, videos_folder)
clear_output(wait=False)
print("All videos downloaded!")

### Download Analysis folders from Dropbox (not a SLURM job)
Similar warnings apply as the video download function above. The file sizes for the analysis folders are, however, vastly smaller than video files. This allows for a bit more wiggle room.

In [None]:
download_analysis_folders_drop(analysis_folder, dropbox_address)
clear_output(wait=False)
print("All analysis folders downloaded!")

# Module 2: Processing

Now that the files have been downloaded, it's time to analyse them. In the below code, you'll be able to either do a complete survey of the analysis folder for as many videos as possible, or use the DataFrame of recently downloaded videos to filter for the videos you want to analyse.

Also possible to analyse videos directly in this notebook. Be aware again that this is a sequential, and slower analysis than running a SLURM job. 

### Input:
DataFrame filters of all videos to be analysed
### Output:
Print statements for all parameters of the analysis session that is about to take place.

In [None]:
print(dropbox_address)

In [None]:
folder_filter = dropbox_address[5:]

img_infos = glob.glob(f"{analysis_folder}{folder_filter}/**/video_data.json", recursive=True)
vid_anls_frame = pd.DataFrame()
for address in img_infos:
    add_info = pd.read_json(address, orient='index').T
    
#     print("hello",add_info['analysis_folder'])
    vid_anls_frame = pd.concat([vid_anls_frame, add_info], ignore_index=True)

vid_anls_frame = vid_anls_frame.sort_values('unique_id').reset_index(drop=True)


vid_anls_frame.tail(20)

In [None]:
print(vid_anls_frame['analysis_folder'][0])

In [None]:
####################################################################################
### This is where you can apply the filters. Only those videos will be analyzed. ###
####################################################################################
analysis_frame = vid_anls_frame

analysis_frame['plate_int'] = [entry.split('_')[-1] for entry in analysis_frame['plate_id']]
# analysis_frame['video_int'] = [entry.split('_')[-1] for entry in analysis_frame['unique_id']]
# analysis_frame = vid_anls_frame[vid_anls_frame['imaging_day'].ge("20230905")].reset_index(drop=True)
# analysis_frame = vid_anls_frame[vid_anls_frame['xpos'].le(100)].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['mode']=="F"]
# analysis_frame = analysis_frame[analysis_frame['plate_int'] == "Plate310"].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['imaging_day'] == "20230906"].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['imaging_day'] <= "20230313"].reset_index(drop=True)

# analysis_frame = analysis_frame[analysis_frame['plate_int'] == "Plate310"].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['imaging_day']=="20230906"].reset_index(drop=True)

# analysis_frame = analysis_frame[analysis_frame['unique_id'] == "20230906_Plate310_006"].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['video_int'].isin([1])]
# analysis_frame = analysis_frame.iloc[88:91].reset_index(drop=True)
####################################################################################
### Below code will prepare for those videos to be downloaded to videos_folder.  ###
####################################################################################

print(f"Number of videos to be analyzed: {len(analysis_frame)}")
analysis_frame['analysis_folder'].iloc[0]

## Run SLURM Analysis job
Two options: For small analysis, use the first block. This will just do the calculations on the machine. For large-scale analysis, use the second block, as it will create a Snellius job.
## Input:
Snellius job parameters. I don't know what caused it but using a max of 8 is too much to run parallel. It will kill 4 jobs because of OOM. So from this point on it should be 4.
## Output:
Analysis folder will be populated with analysis tiffs and csv sheets. At the same time, this analysis folder will also be uploaded to the dropbox.

In [None]:
### LARGE VIDEO ANALYSIS

nr_parallel = np.min([len(analysis_frame.index), 1])

run_parallel_flows(
    "flux_extract.py",
    [analysis_folder, 9, 0.95, 0.005, 80, dropbox_address],
    analysis_frame,
    nr_parallel,
    "2:00:00",
    "flux_extract",
    node = "fat_rome"
)
# clear_output(wait=False)

print("Sent all the jobs! Use the command '$ squeue' in the terminal to see the progress")

## Run local analysis
This is code where you can run a specialized analysis on a limited number of videos. This uses the same analysis frame which you use to filter videos for the SLURM jobs.

First the analysis function is defined, which you can change to fit the parameters you want. Then the next block of code will use that function to go through each row in the video analyis dataframe and executes the analysis. NOTE: This is not code to go through the analysis, that is for MODULE 3.


In [None]:
# print(analysis_frame.iloc[26:28], "<- frame and folder ->", analysis_folder)
# print(dropbox_address)

edges_objs = analysis_run(analysis_frame.iloc[0:4], analysis_folder, videos_folder, dropbox_address,
             logging=True,                 # Print progress to console
             kymo_normalize=True,          # Normalize the kymograph for visual representation
             kymo_section_width=1.8,       # Width of kymograph lines, adjusted for magnification
             thresh_adjust=0,              # Adjustment for thresholding the frangi filter. Will be added to Otsu threshold.
             frangi_range=np.arange(10, 120, 20), # Range of distances in pixels to expect hyp
             close_size=200,               # Size of kernel to use in closing operation after thresholding frangi filter
             edge_len_min=40,              # Minimum edge length to select for in pixels
             save_edge_extraction_plot=True, # Save picture of extracted edges
             make_video=False,             # Make mp4 of raw data TIFFs
             create_snapshot=True,         # Save image of edge
             create_edge_video=False,      # Save video of edge
             photobleach_adjust=False,     # Adjust kymograph for photobleaching
             speed_ext_window_number=9,    # Size range to investigate speeds
             speed_ext_window_start = 3,   # Start size of window for GST
             speed_ext_c_thresh=0.95,      # Confidence threshold for speed determination
             speed_ext_c_falloff = 0.005,  # Confidence falloff as window size increases
             speed_ext_blur_size = 3,      # Kymograph blur Gaussian kernel size
             speed_ext_blur=True,          # Whether to preblur at all
             speed_ext_max_thresh = 80,    # Maximum expected speeds (in um/s)
             dropbox_upload=False,         # Whether to upload results to dropbox
             fourier_radius = 30,          # From what pixel radius to include the fourier spectrum during fourier analysis
             fourier_prominence = 1        # Prominence metric for finding peaks in fourier analysis
             )

In [None]:
plt.close('all')

In [None]:
plt.close('all')
fig, ax = plt.subplots(2,3)
for edge_objs in edges_objs:
    for edge in edge_objs:
        
        print(edge.edge_name)
        print(edge.space_pixel_size)
#         print(edge.filtered_left[0][0])
        if edge.edge_name==(5, 3):
            ax[0][0].imshow(edge.filtered_left[0], aspect='auto',vmin=0,vmax=255)
            ax[0][1].imshow(edge.filtered_right[0], aspect='auto',vmin=0,vmax=255)
            print("this is the mean signal going to the left: ", np.min(edge.kymos))
            print("this is the mean signal going to the right: ", np.max(edge.kymos))
            static=edge.kymos[0]-edge.filtered_left[0]-edge.filtered_right[0]
            ax[0][2].imshow(static, aspect='auto',vmin=0,vmax=255)
            ax[1][0].imshow(edge.kymos[0], aspect='auto',vmin=0,vmax=255)
            ax[1][1].imshow(edge.filtered_left[0]+edge.filtered_right[0], aspect='auto',vmin=0,vmax=255)
            ax[1][2].imshow(edge.filtered_left[0]+edge.filtered_right[0]+np.min(static), aspect='auto',vmin=0,vmax=255)
            print("this is the min signal standing still: ", np.min(static))

plt.show()

In [None]:
# for edge_group in edges_objs:
#     print(edge_group)
# print(np.array(edges_objs).flatten())
small_data_obj = HighmagDataset(analysis_frame.iloc[2:4], analysis_folder, videos_folder)

In [None]:
for edge in small_data_obj.edge_objs:
    print(edge.filtered_left[0])
#     edge.show_summary()

### Width profile Kymograph analysis
This is going to be some special code to extract multiple kymographs from the same edge, all next to each other. Requires running the previous code to get the analysis objects.

In [None]:
# edges_objs=small_data_obj.edge_objs
print([[edge.edge_name for edge in edge_list] for edge_list in edges_objs])
edge_interest = edges_objs[0][0]

width_len = 1
#TODO: Get effective mean speed calculation in here too

kymos = edge_interest.extract_multi_kymo(width_len, target_length=70)
fourier_kymos = edge_interest.fourier_kymo()
speeds, times = edge_interest.extract_speeds(15)


In [None]:
print(np.array(times).shape)
speed_max = 80
fig, ax = plt.subplots(width_len)
for i in range(width_len):
#     ax.set_title("Speeds along the width")
    ax.plot(times[i], np.nanmean(speeds[i][0], axis=1))
    ax.plot(times[i], np.nanmean(speeds[i][1], axis=1))
    ax.fill_between(times[i],
                       np.nanmean(speeds[i][0],axis=1) + np.nanstd(speeds[i][0], axis=1),
                       np.nanmean(speeds[i][0], axis=1) - np.nanstd(speeds[i][0],axis=1),
                       alpha=0.5,
                       facecolor='tab:blue')
    ax.fill_between(times[i],
                       np.nanmean(speeds[i][1], axis=1) + np.nanstd(speeds[i][1], axis=1),
                       np.nanmean(speeds[i][1], axis=1) - np.nanstd(speeds[i][1],axis=1),
                       alpha=0.5,
                       facecolor='tab:orange')
    ax.set_ylim((-speed_max, speed_max))
    ax.set_xlabel("time (s)")
    ax.set_ylabel("Velocity $(\mu m /s)$")

fig, ax = plt.subplots()
for i in range(1, width_len):
    ax.set_title("Speeds along the width")
    ax.scatter(i, np.mean(np.nanmean(speeds[i][0], axis=1)), c='tab:blue', label='to tip')
    ax.errorbar(i, np.mean(np.nanmean(speeds[i][0], axis=1)), np.nanstd(speeds[i][0].flatten()), capsize=5, c='tab:blue')
    ax.scatter(i, np.mean(np.nanmean(speeds[i][1], axis=1)), c='tab:orange', label='to root')
    ax.errorbar(i, np.mean(np.nanmean(speeds[i][1], axis=1)), np.nanstd(speeds[i][1].flatten()), capsize=5, c='tab:orange')
    ax.set_ylim([-15, 15])
    ax.set_xlabel("Width fraction nr")
    ax.set_ylabel("Velocity $(\mu m /s)$")
    ax.grid(True)

In [None]:
for i in range(len(kymos)):
    fig, ax = plt.subplots()
    ax.imshow(kymos[i])

# Module 3: Bulk Analysis
## First part: Assemble Edge DataFrame


In this initial part of the bulk analysis, all of the analysis folders will be looked through to find the edge data we're looking for. Additionally, there is an optional part to download the analysis folder back to the analysis folder we specified right at the top.

## Assuming all the analysis folders are already downloaded:
You can use below code to read the video_data.json files that are created during indexing of all the videos

In [None]:
folder_filter = dropbox_address[5:]

img_infos = glob.glob(f"{analysis_folder}{folder_filter}/**/video_data.json", recursive=True)
vid_anls_frame = pd.DataFrame()
for address in tqdm(img_infos):
    add_info = pd.read_json(address, orient='index').T
    vid_anls_frame = pd.concat([vid_anls_frame, add_info], ignore_index=True)

vid_frame = vid_anls_frame.sort_values('unique_id').reset_index(drop=True)

In [None]:
####################################################################################
### This is where you can apply the filters. Only those videos will be analyzed. ###
####################################################################################

# analysis_frame = vid_frame[vid_frame['imaging_day'].le(20230725)].reset_index(drop=True)
analysis_frame = vid_frame

analysis_frame['plate_int'] = [entry.split('_')[-1] for entry in analysis_frame['plate_id']]
analysis_frame['video_int'] = [entry.split('_')[-1] for entry in analysis_frame['unique_id']]

analysis_frame = analysis_frame[analysis_frame['plate_int'] == "Plate310"].reset_index(drop=True)
analysis_frame = analysis_frame[analysis_frame['imaging_day']=="20230906"].reset_index(drop=True)
# analysis_frame = analysis_frame[analysis_frame['xpos'].le(100)].reset_index(drop=True)

### Run below to update analysis and videos folder entries if you've downloaded the analysis
# analysis_frame['analysis_folder'] = [str((Path(analysis_folder) / entry['folder']).parent) for index, entry in analysis_frame.iterrows()]
# analysis_frame['videos_folder'] = [str(Path(videos_folder) / entry['folder']) for index, entry in analysis_frame.iterrows()]

####################################################################################
### Below code will prepare for those videos to be downloaded to videos_folder.  ###
####################################################################################
print(len(analysis_frame))
analysis_frame['analysis_folder'].iloc[4]

In [None]:
data_obj = HighmagDataset(analysis_frame, analysis_folder, videos_folder)

In [None]:
plt.close('all')

In [None]:
edges=imageio.imread('/gpfs/home6/svstaalduine/Analysis/Cocotransport/20230901_Plate310/036/Video segmentation.png')
display(edges)

### Example code for plotting fraction bar graphs


In [None]:
speed_ranges = [edge.time_data['speed_weight_mean'].quantile(.95) - edge.time_data['speed_weight_mean'].quantile(.05) for edge in data_obj.edge_objs]
data_obj.edges_frame['speed_range'] = speed_ranges
data_int_obj = data_obj.filter_edges("coverage_tot",">=",.3)
data_int_obj = data_int_obj.filter_edges("mode","==","BF")
# speed_linear_rachael = data_int_obj.filter_edges("speed_range", "<=", 1.0)
# speed_bulk_rachael = data_int_obj.filter_edges("speed_range", ">=", 1.0)
# print(len(speed_linear_rachael.edges_frame))
# print(len(speed_bulk_rachael.edges_frame))

ax = data_int_obj.edges_frame.plot.hist(column='speed_range', figsize=(8, 4), bins=np.linspace(0, 20, 50), color='green')
# ax.set_title("Lid on speed ranges")
ax.set_xlabel("$\overline{v}_{95\%} - \overline{v}_{5\%}$")

### Example code for creating different plate maps
Below you can see the filtering options for different plates and the plot_plate_locs function that outputs a map with dots or arrows depending on your wishes. Current drawing modes are:
- 'scatter' for dots of the videos, separated by magnification
- 'speeds_mean' for black arrows denoting the effective mean speed of the flows
- 'speeds_both' for blue and orange arrows denoting the effective speed of flows in both directions
- 'vid_labels'  for a list of what videos were taken at each position

In [None]:
plt.close('all')
mpl.rcParams['figure.dpi'] =500
print(data_obj.video_frame.columns)

for plate_id in tqdm(data_obj.video_frame['plate_id'].unique()):
    plate_group = data_obj.filter_edges('plate_id', '==', plate_id)
    plate_group = plate_group.filter_edges('coverage_tot', '>=', 0.3)
    plate_group = plate_group.filter_edges('mode', '==', "F")
#     plate_group = plate_group.filter_edges('speed_ranges', '>=', 3.0)
    if len(plate_group.video_frame) <= 1:
        continue
    else:
        plate_group.plot_plate_locs(analysis_folder, spd_thresh=0.5, modes=['scatter', 'vid_labels'])

mpl.rcParams['figure.dpi'] = 200


### Example code 50x speed arrows

In [None]:
plt.close('all')
data_interest = data_obj.filter_edges('video_int', ">=", "081")
data_interest = data_interest.filter_edges('video_int', "<=", "082")
# data_interest = data_interest.filter_edges('mode', "==", "BF")
# data_interest = data_interest.filter_edges('imaging_day', "==", "20221109")

for vid_obj in data_interest.video_objs:
    vid_obj.plot_speed_arrows(plot_flux=True, save_im=False, video_txt_size=40)

# for edge in data_interest.edge_objs:
#     edge.direction_filter()

### Example code plot summaries of videos


In [None]:
plt.close('all')

data_obj_filt = data_obj.filter_edges("video_int", "==", "17")
# data_obj_filt = data_obj_filt.filter_edges("video_int", ">=", "071")
# data_obj_filt = data_obj_filt.filter_edges("video_int", "<=", "18")
# data_obj_filt = data_obj_filt.filter_edges("coverage_tot", ">=", 0.5)
# data_obj_filt = data_obj_filt.filter_edges("speed_range", ">=", 1.5)
# data_obj_filt = data_obj_filt.filter_edges('coverage_left', '<=', 0.3)
# data_obj_filt = data_obj_filt.filter_edges('coverage_right', '<=', 0.3)

# print(data_obj_filt)
for edge in data_obj_filt.edge_objs:
#     video.show_segmentation()
#     filter_kymo(edge)
#     print("should work")
    edge.show_summary()
    

In [None]:
kymos = data_obj_filt.extract_multi_kymo(width_len, target_length=70)
fourier_kymos = edge_interest.fourier_kymo()
speeds, times = edge_interest.extract_speeds(15)

### Example code binned violin-plot
bin-column represents the value to be binned, then multiple violin plots are graphed on the same axis.

In [None]:
plt.close('all')

cover_filter_data = data_obj.filter_edges('coverage_tot', '>=', 0.5)
filter_BF = cover_filter_data.filter_edges('mode', '==', 'BF')
filter_FL = cover_filter_data.filter_edges('mode', '==', 'F')
# filter_BF = filter_BF.filter_edges('imaging_day', '>=', '20230814')
# filter_BF = cover_filter_data
bin_column = 'edge_width'

# bins = np.linspace(5, 12.5, 10)
bins = np.linspace(filter_BF.return_edge_frame()[bin_column].min(), filter_BF.return_edge_frame()[bin_column].max(), 20)
binsfl = np.linspace(filter_FL.return_edge_frame()[bin_column].min(), filter_FL.return_edge_frame()[bin_column].max(), 20)
bin_series = filter_BF.bin_values(bin_column, bins)
bin_seriesfl = filter_FL.bin_values(bin_column, binsfl)
# print(bin_series)
print(bin_series)
labels = []
fig, ax, means_right = filter_BF.plot_violins('speed_right', bins, c='tab:orange', labels=labels)
fig, ax, means_left = filter_BF.plot_violins('speed_left', bins, c='tab:blue', ax=ax, fig=fig, labels=labels)
fig, ax, means_rightfl = filter_FL.plot_violins('speed_right', binsfl, c='tab:red', labels=labels)
fig, ax, means_leftfl = filter_FL.plot_violins('speed_left', binsfl, c='tab:green', ax=ax, fig=fig, labels=labels)
# fig, ax, means_mean = filter_BF.plot_violins('speed_mean', bins, c='tab:red', ax=ax, fig=fig, labels=labels)

linear_right = np.polyfit(bins[:-1],means_right[:-1],1)
lin_right_func = np.poly1d(linear_right) 
linear_left = np.polyfit(bins[:-1],means_left[:-1],1)
lin_left_func = np.poly1d(linear_left)
linear_rightfl = np.polyfit(binsfl[:-1],means_rightfl[:-1],1)
lin_right_funcfl = np.poly1d(linear_rightfl) 
linear_leftfl = np.polyfit(binsfl[:-1],means_leftfl[:-1],1)
lin_left_funcfl = np.poly1d(linear_leftfl)
print(means_right)

ax.plot(bins, lin_right_func(bins), '--k', label='fit right')
ax.plot(bins, lin_left_func(bins), '--k', label='fit left')

ax.plot(binsfl, lin_right_funcfl(binsfl), '--k', label='fit right fl')
ax.plot(binsfl, lin_left_funcfl(binsfl), '--k', label='fit left fl')

# ax.axhline(c='black', alpha=0.5, linestyle='--')
ax.set_ylabel('v $(\mu m / s)$')
ax.set_xlabel('hyphal width $(\mu m)$')
# ax.legend(*zip(*labels))



### Example code for bin-less violin plots
This can be for comparing videos, plates, anything with a unique ID

In [None]:
cover_filter_data = data_obj.filter_edges('coverage_tot', '>=', 0.5)
cover_filter_data = cover_filter_data.filter_edges('plate_int', '<=', "Plate449")

filter_BF = cover_filter_data

labels = []
fig, ax, means_right  = filter_BF.plot_violins('speed_right', bin_separator='imaging_day', c='tab:orange', labels=labels)
fig, ax, means_left  = filter_BF.plot_violins('speed_left', bin_separator='imaging_day', c='tab:blue', ax=ax, fig=fig, labels=labels)
fig, ax, means_mean = filter_BF.plot_violins('speed_mean', bin_separator='imaging_day', c='tab:red', ax=ax, fig=fig, labels=labels)

ax.axhline(c='black', alpha=0.5, linestyle='--')
ax.set_ylabel('v $(\mu m / s)$')
ax.set_xlabel('Plate id\'s')
ax.legend(*zip(*labels))
fig.tight_layout()

In [None]:
print(data_obj.video_frame['video_int'].to_string())

### Example code on visualizing 4x/50x comparisons

In [None]:
data_4x_filter = data_obj.filter_edges('magnification', '==', 4.0)
mag_corr_groups = [data_obj.context_4x(row) for index, row in data_4x_filter.video_frame.iterrows()]
for group in tqdm(mag_corr_groups):
    group.plot_4x_locs(analysis_folder)

In [None]:
plt.close('all')

spd_maxes_BF= []
spd_maxes_BF_440= []
spd_maxes_BF_441= []
spd_maxes_BF_449= []
spd_maxes_BF_414= []
flux_avgs_BF= []
spd_maxes_FL= []
flux_avgs_FL= []


print(data_obj.video_frame['plate_int'].unique())

linear_edges = data_obj
# linear_edges = linear_edges.filter_edges('plate_int',  '==', 'Plate440')
# linear_edges = linear_edges.filter_edges('imaging_day',  '>=', '20230820')
linear_edges = linear_edges.filter_edges('imaging_day',  '>=', '20230810')
# linear_edges = linear_edges.filter_edges('coverage_left', '>=', 0.45)
# linear_edges = linear_edges.filter_edges('coverage_right', '>=', 0.45)
linear_edges = linear_edges.filter_edges('coverage_tot', '>=', 0.5)
# linear_edges = linear_edges.filter_edges('mode',  '==', 'BF')
linear_edges_440 = linear_edges.filter_edges('plate_int',  '==', 'Plate440')
linear_edges_441 = linear_edges.filter_edges('plate_int',  '==', 'Plate441')
linear_edges_449 = linear_edges.filter_edges('plate_int',  '==', 'Plate449')
linear_edges_414 = linear_edges.filter_edges('plate_int',  '==', 'Plate414')
linear_edges_BF = linear_edges.filter_edges('mode',  '==', 'BF')
linear_edges_FL = linear_edges.filter_edges('mode',  '==', 'F')

for edge in tqdm(linear_edges_BF.edge_objs):
    spd_maxes_BF.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.5, spd_cutoff = 0.5, bin_res=1000, plot_fig=False))
    flux_avgs_BF.append(edge.time_data['flux_mean'].mean())
# for edge in tqdm(linear_edges_440.edge_objs):
#     spd_maxes_BF_440.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.0, spd_cutoff = 0.0, bin_res=1000, plot_fig=False))
# for edge in tqdm(linear_edges_441.edge_objs):
#     spd_maxes_BF_441.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.0, spd_cutoff = 0.0, bin_res=1000, plot_fig=False))
# for edge in tqdm(linear_edges_449.edge_objs):
#     spd_maxes_BF_449.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.0, spd_cutoff = 0.0, bin_res=1000, plot_fig=False))
# for edge in tqdm(linear_edges_414.edge_objs):
#     spd_maxes_BF_414.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.0, spd_cutoff = 0.0, bin_res=1000, plot_fig=False))
for edge in tqdm(linear_edges_FL.edge_objs):
    spd_maxes_FL.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.5, spd_cutoff = 0.5, bin_res=1000, plot_fig=False))
    flux_avgs_FL.append(edge.time_data['flux_mean'].mean())

    

In [None]:
print(linear_edges_BF.edges_frame['unique_id'].to_string())
print(linear_edges_FL.edges_frame['unique_id'].to_string())

In [None]:
spd_maxes_BF = np.array(spd_maxes_BF)
spd_maxes_FL = np.array(spd_maxes_FL)
spd_maxes_BF_440= np.array(spd_maxes_BF_440)
spd_maxes_BF_441= np.array(spd_maxes_BF_441)
spd_maxes_BF_449= np.array(spd_maxes_BF_449)
spd_maxes_BF_414= np.array(spd_maxes_BF_414)

colors=['tab:green', 'tab:red', 'tab:orange', 'tab:blue', 'tab:purple']

video_bins=[[0,  500]]



fig, ax = plt.subplots(1, figsize=(5, 5))

for i, bins in enumerate(video_bins):
    ax.scatter(linear_edges_BF.edges_frame['edge_width'].astype(float), spd_maxes_BF.T[1], s = 10, marker='o',c=colors[i], alpha=0.2)#, label=f'to tip, hypha {i+1}')
    
    ax.scatter(linear_edges_BF.edges_frame['edge_width'].astype(float), spd_maxes_BF.T[0], s = 10, marker='x',c=colors[i], alpha=0.2)#, label=f'to root, hypha {i+1}')
#     ax[0].scatter(-linear_edges_BF.edges_frame['ypos'].astype(float)[bins[0]:bins[1]], np.array(flux_avgs_BF[bins[0]:bins[1]]), marker='P',c=colors[i+1], alpha=0.7, label=f'flux, hypha {i+1}')
# for i, bins in enumerate(video_bins):
#     ax[1].scatter(linear_edges_FL.edges_frame['edge_width'].astype(float), spd_maxes_FL.T[1], marker='s',c=colors[i+1], alpha=0.7, label=f'to tip, hypha {i+1}')
#     ax[1].scatter(linear_edges_FL.edges_frame['edge_width'].astype(float), spd_maxes_FL.T[0], marker='o',c=colors[i+1], alpha=0.7, label=f'to root, hypha {i+1}')
#     ax[1].scatter(-linear_edges_FL.edges_frame['ypos'].astype(float)[bins[0]:bins[1]], np.array(flux_avgs_FL[bins[0]:bins[1]]), marker='P',c=colors[i+2], alpha=0.7, label=f'flux, hypha {i+1}')


coeff1, covm1 =np.polyfit(linear_edges_BF.edges_frame['edge_width'].astype(float), spd_maxes_BF.T[1],1,cov=True)
func=np.polyval(coeff1,linear_edges_BF.edges_frame['edge_width'].astype(float))
coefferror=np.sqrt(np.diag(covm1))
functop=func+coefferror[0]*linear_edges_BF.edges_frame['edge_width'].astype(float)
funcbot=func-coefferror[0]*linear_edges_BF.edges_frame['edge_width'].astype(float)

coeff2, covm2 =np.polyfit(linear_edges_BF.edges_frame['edge_width'].astype(float), spd_maxes_BF.T[0],1,cov=True)
func2=np.polyval(coeff2,linear_edges_BF.edges_frame['edge_width'].astype(float))
coefferror2=np.sqrt(np.diag(covm2))
functop2=func2+coefferror2[0]*linear_edges_BF.edges_frame['edge_width'].astype(float)
funcbot2=func2-coefferror2[0]*linear_edges_BF.edges_frame['edge_width'].astype(float)

# for j in range(2):
ax.plot(linear_edges_BF.edges_frame['edge_width'].astype(float),func,':k')
ax.plot(linear_edges_BF.edges_frame['edge_width'].astype(float),func2,':k')
ax.grid(True)
ax.set_ylabel("Measured speed $(\mu m /s)$")
ax.set_xlabel("width $(\mu m)$")
#     ax[j].set_title("Speeds along a single hypha")
ax.set_ylim([-20, 20])
ax.set_xlim([3, 15])
ax.axhline(c='black')
#     ax[j].legend()
fig.tight_layout()
# print(linear_edges.edges_frame['video_int'].to_string())

In [None]:
fig, ax = plt.subplots()
# ax.hist(flux_avgs_BF, density=True, alpha=0.6, bins=40, range=[-350, 350], label="BF")
ax.hist(flux_avgs_FL, density=True, alpha=0.6, bins=40, range=[-350, 350], label="Flux histogram (Fluorescence)")
print(np.median(flux_avgs_FL))
print(np.std(flux_avgs_FL))
ax.set_ylabel("Frequency")
ax.set_xlabel("Flux (to root --> to tip)")
ax.axvline(linestyle='--', c='black')
ax.axvline(np.median(flux_avgs_FL), linestyle='--', c='tab:blue', label='Median ($\mu = 27, std= 99$)')
ax.legend()

fig, ax = plt.subplots()
ax.hist(spd_maxes_BF.T[0], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:cyan')#, label='Bright-field, n=1647')
ax.hist(spd_maxes_BF.T[1], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:cyan')
ax.hist(spd_maxes_FL.T[0], alpha=0.6, density=True, bins=80, range=[-10, 10], color='m')#, label='Fluorescence, n=1572')
ax.hist(spd_maxes_FL.T[1], alpha=0.6, density=True, bins=80, range=[-10, 10], color='m')
ax.set_xlabel("Velocity $(\mu m / s)$")
ax.set_ylabel("Frequency")
# ax.set_title("Speed profiles lid On")
ax.legend()

### Plot for a beautiful butterfly

In [None]:
# spd_maxes_BF = np.array(spd_maxes_BF)
# spd_maxes_FL = np.array(spd_maxes_FL)

colors=['tab:orange', 'tab:blue', 'tab:green', 'tab:red', 'tab:purple']
# titles=['Plate 440', 'Plate 441', 'Plate 449', 'Plate 414', 'tab:purple']

video_bins=[[0,  500]]

fig, ax = plt.subplots(1,2, figsize=(12, 4))

for i, bins in enumerate(video_bins):
    ax[0].scatter(-spd_maxes_BF.T[0], spd_maxes_BF.T[1], marker='o',c='cyan', alpha=0.2, label=f'Brightfield')
    ax[1].scatter(-spd_maxes_FL.T[0], spd_maxes_FL.T[1], marker='o',c='pink', alpha=0.2, label=f'Fluorescence')
#     ax[2].scatter(-spd_maxes_BF_449.T[0], spd_maxes_BF_449.T[1], marker='s',c=colors[i], alpha=0.2, label=f'Brightfield')
#     ax[1].scatter(-spd_maxes_FL.T[0], spd_maxes_FL.T[1], marker='s',c=colors[i+1], alpha=0.2, label=f'Fluorescence')

for i, axis in enumerate(ax.flatten()):    
    axis.grid(True)
#     axis.set_ylabel("Speed $(\mu m /s)$ towards Tip")
#     axis.set_xlabel("Speed $(\mu m /s)$ towards Root")
#     axis.set_title(f"{titles[i]}")
    axis.set_aspect('equal')
    axis.set_ylim([0, 10])
    axis.set_xlim([0, 10])
#     axis.legend()
# fig.suptitle("Root speed against tip speed")

fig.tight_layout()
# print(linear_edges.edges_frame['video_int'].to_string())

In [None]:
fig, ax = plt.subplots()

for plate_id in tqdm(data_obj.video_frame['plate_id'].unique()):
    plate_group = data_obj.filter_edges('coverage_tot', '>=', 0.3)
    plate_group = plate_group.filter_edges('plate_id', '==', plate_id)
#     plate_group = plate_group.filter_edges('mode', '==', "BF")
    if len(plate_group.video_frame) <= 1:
        continue
    else:
#         print("start time only hours: " + str(plate_group.edges_frame["record_time"][0]))
        start_time=int(plate_group.edges_frame["record_time"][0].split(":")[0])*60+int(plate_group.edges_frame["record_time"][0].split(":")[1])
        plate_group.edges_frame["record_time"]=[int(entry.split(":")[0])*60 + int(entry.split(":")[1])-start_time for entry in plate_group.edges_frame["record_time"]]
        ax.scatter(plate_group.edges_frame['record_time'], plate_group.edges_frame['flux_avg'], alpha=0.1, color='orange')
        
# data_filt = data_obj.filter_edges("imaging_day", "==", "20230814")
# data_filt = data_filt.filter_edges("plate_int", "==", "Plate449")

# data_filter = data_obj.filter_edges("plate_id","==","20230814_Plate449")



# data_filter.edges_frame['record_time'] = [int(entry.split(":")[0])*60 + int(entry.split(":")[1]) for entry in data_filter.edges_frame['record_time']]



ax.set_xlabel("time of recording (min)")
ax.set_ylabel("flux avg")
ax.set_title("Speed profiles lid On")
# ax.set_ylabel("")
# ax.set_xlabel("")
ax.legend()

# Example Code compare groups velocity

In [None]:
# THIS CODE IS MADE FOR 20230814_Plate449
# It doesn make sense to filter in this way for other datasets
# other parts of the code can be copied to filter for things that make more sense like mode == BF or something
spd_maxes_Prelaser = []
spd_maxes_Postlaser = []
flux_avgs_Prelaser = []
flux_avgs_Postlaser = []

filtered_data_obj = data_obj.filter_edges("video_int", ">=", "071")
filtered_data_obj = filtered_data_obj.filter_edges("video_int", "<=", "076")

for i, edge in enumerate(tqdm(filtered_data_obj.edge_objs)):
    
    if i % 3 == 0:
        spd_maxes_Prelaser.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.5, spd_cutoff = 0.5, bin_res=1000, plot_fig=False))
        flux_avgs_Prelaser.append(edge.time_data['flux_mean'].mean())
    elif i % 3 == 1:
        spd_maxes_Postlaser.append(edge.plot_speed_histo(spd_extent=10, spd_tiff_lowbound=0.5, spd_cutoff = 0.5, bin_res=1000, plot_fig=False))
        flux_avgs_Postlaser.append(edge.time_data['flux_mean'].mean())
    

# videos_before_laser = filtered_data_obj.filter_edges("video_int","==", )
spd_maxes_Prelaser = np.array(spd_maxes_Prelaser)
spd_maxes_Postlaser = np.array(spd_maxes_Postlaser)




fig, ax = plt.subplots()
ax.hist(spd_maxes_Prelaser.T[0], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:orange', label='before laser, n=2')
ax.hist(spd_maxes_Postlaser.T[0], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:blue', label='after laser, n=2')
ax.hist(spd_maxes_Prelaser.T[1], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:orange')
ax.hist(spd_maxes_Postlaser.T[1], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:blue')
# ax.hist(spd_maxes_FL.T[0], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:blue', label='Fluorescence, n=1011')
# ax.hist(spd_maxes_FL.T[1], alpha=0.6, density=True, bins=80, range=[-10, 10], color='tab:blue')
ax.set_xlabel("Velocity $(\mu m / s)$")
ax.set_ylabel("Frequency")
ax.set_title("Speed profiles lid On")
ax.legend()

In [None]:
plt.close('all')

In [None]:
small_data_obj = HighmagDataset(analysis_frame, analysis_folder, videos_folder)