# The following notebook uses the dataframe on which tracking has been performed and uses the spots coordinates to extract activity from alternate channels. 

In [None]:
from os import path
import pandas as pd
from IPython.display import display
import numpy as np 
import sys 
import time 
import zarr
import os

sys.path.append('../src/')

from extract_pixel_data import Extractor

### Do not change the code in the cell below

In [None]:
# This assumes that your notebook is inside 'Jupyter Notebooks', which is at the same level as 'movie_data'
base_dir = os.path.join(os.path.dirname(os.path.abspath("__file__")), '..', 'movie_data')

zarr_directory = 'zarr_file/all_channels_data'
zarr_full_path = os.path.join(base_dir, zarr_directory)

input_directory = 'datasets'
input_file_name = 'track_df_c3_cleaned.pkl'
input_directory_full = os.path.join(base_dir,input_directory, input_file_name)

output_directory = 'datasets'
output_file_name = 'track_df_cleaned_final_full.pkl'
output_directory_full = os.path.join(base_dir,output_directory, output_file_name)

In [None]:
track_df = pd.read_pickle(input_directory_full)
# read the zarr file which contains the data of all three channels 
z = zarr.open(zarr_full_path, mode='r')

### Read the instructions carefully before proceeding with the next steps in the notebook

The **main object** to work around with is the **Extractor** object and other methods from it will be used in the next steps. You can always refer to the details of the functions and the class by adding a ? before the object or its methods to access its description. However, below a detailed explanation is provided to run the next steps. 

Firstly its important to note the channel you have performed detection and tracking on (the channel on which detection and tracking is performed are the same). In the case of the specific example in this notebook detection and tracking were performed on channel 3.

**Parameters for setting up the extractor Object**

Parameters to change: 

1. **radii**: This is the list of the variable sigma estimated by gaussian fitting (note: the convention for the list is z,y,x). In the example below its [4,2,2]. You have also provided these estimates in the start in notebook 01 where you have performed the detection. 
2. **n_jobs**: This parameter allows for parallel processing in certain methods of the class. The default value is -1 which means it will use all available cores - 1. However, you can change it to any number that is below the number of cores in your computer e.g. set it to 3 if you want to use 3 cores. 

**If you have not changed names of columns for the dataframes in the previous steps you can ignore below**

Fixed Parameters (unlikely to be changed): 

You will not need to change the parameters below until or unless you change the name of columns in the previous notebooks. 

1. **radi_col_name**: It is a list of the variable sigma estimated by gaussian fitting (note: the convention for the list is z,y,x). This parameter will be fixed for most parts and is important for the function when it extracts pixel information from other channels. 
2. **frame_col_name**: As the name suggests it takes in the name of the column which stores the frame number 


**Parameters to change in the extractor.voxel_sum_fixed_background() method**
This function calculates a localised voxel sum. This means that it constructs a volume around the spots and a larger volume as background. Then it finds the background locally and subtracts it to get the voxel sum.

You will just have to change the channel number anytime this function is called. You will run this function even on the function you have performed detections and tracking on. 

1. **channel**: The number of channel for which you want to find the voxel sum e.g. if you want to find the voxel sum for channel 3 spots then you will pass 3. 


mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_variable_bd(center_col_names = col_names, channel = 3)

**Parameters to change in the extractor.extract_pixels_data_variable() method**
This function calculates multiple values for each spot. It calculates the mean, maximum value, maximum amplitude pixel location etc. 

1. **channel**: The number of channel for which you want to find the voxel sum e.g. if you want to find the value for channel 3 spots then you will pass 3. 


You will have to use the following two functions for each channel to get the desired results
1. voxel_sum_fixed_background()
2. extract_pixels_data_variable()


The below function you will run only for the two channels on which detection was not performed previously. Since we will be finding the gaussian estimates and we have already done that for one channel in the start(in our case its channel 3). 

extractor.run_parallel_frame_processing()

**Parameters to change in the extractor.run_parallel_frame_processing()**: 
1. **expected_sigma**: The expected radius value for a spot in z,y,x axes. This is the same parameter as the simga estimates in notebook 01. 
2. **center_col_name**: The name of the peak pixel columns in z,y,x. In the case of this notebook and for channel 2 it will be ['c2_peak_z', 'c2_peak_y', 'c2_peak_x']. Note that these column names can be adjusted by you. If you are working with channel 1 then you can name the columns 'c1_peak_z', 'c1_peak_y', 'c1_peak_x'. 
3. **dist_between_spots**: This is the same as explained in notebook 01. It is the radius in pixels between which two spots cannot exist. If the parameter is 10, it means no two spots can exist between a radius of 5 pixels. 
4. **channel**: The channel on which to perform the gaussian fitting 
5. **max_frames**: The number of frames on which you want to perform the gaussian fitting. This is useful for testing however, as a final run you will set all_frames = True and then max_frames will be ignored. 
6. **all_frames**: If you want to do gaussian fitting on all frames set this to True. If you want to do it to max_frames then set it as false. 

In [None]:
#Setting values of the parameters. Note that if the order of your channels change i.e the channel number changes on 
# which you performed detection then you might have to change the parameters as explained above. 
#########################

radii_extractor = [4,2,2] #radii for the extractor object 
n_cores = -1 #n_jobs for extractor class 

background_radius_for_voxel_sum = [1,1,1] # the background_radius for voxel_sum_fixed_background().
                                            # This is the same for channels and can be changed if you think it can yield better results in your case

expected_sigma_value = [4,2,2] # for gaussian fitting 
distance_between_spots = 10 # for gaussian fitting (in terms of pixels)
max_frames_to_do_gaussian_fitting = 2 # for gaussian fitting 
process_all_frames = True #for gaussian fitting 

########################

## In the below cell the Extractor object is created 

In [None]:
extractor = Extractor(z, dataframe = track_df, radii=radii_extractor, frame_col_name = 'frame', 
                      radi_col_name = ['sigma_z', 'sigma_y', 'sigma_x'], n_jobs = n_cores)

# Extract Information for Channel 3 (Clathrin)

## Extracting voxel sum

In [None]:
# pass the channel for which voxel sum is needed and the coordinates around which voxel sum is supposed to be calculated
# convention for coords [z,y,x] 
# convention for channel is 1 for channel 1, 2 for channel 2 and so on 
# channel number is to passed according to whichever channel we want to extract the data for
start_time = time.time()
voxel_sum_array_3, _, adj_voxel_sum_3 = extractor.voxel_sum_fixed_background(center_col_names = ['mu_z', 'mu_y', 'mu_x'], channel = 3,
                                                                            background_radius=background_radius_for_voxel_sum)
end_time = time.time()
print('time taken (seconds)', end_time - start_time)

In [None]:
# pass the channel for which pixel values are needed and the coordinates around which values are supposed to be calculated
# convention for coords [z,y,x] 
# convention for channel is 1 for channel 1, 2 for channel 2 and so on 
start_time = time.time()
offset = [0,0]
col_names = ['mu_z', 'mu_y', 'mu_x']
# mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_variable_bd(center_col_names = col_names, channel = 3)

mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_fixed_bd(center_col_names = col_names, channel = 3)

end_time = time.time()

# Adding the extracted data to the dataframe

max_loc = np.array(max_loc)
track_df['c3_mean_amp'] = mean
track_df['c3_voxel_sum'] = voxel_sum_array_3
track_df['c3_voxel_sum_adjusted'] = adj_voxel_sum_3 
track_df['c3_peak_amp'] = maximum 
track_df['c3_peak_x'] = max_loc[:,2]
track_df['c3_peak_y'] = max_loc[:,1]
track_df['c3_peak_z'] = max_loc[:,0]
print('time taken (seconds)', end_time - start_time)

# Extract information for Channel 2 

In [None]:
# pass the channel for which pixel values are needed and the coordinates around which values are supposed to be calculated
# convention for coords [z,y,x] 
# convention for channel is 1 for channel 1, 2 for channel 2 and so on 
start_time = time.time()
offset = [0,0]
col_names = ['mu_z', 'mu_y', 'mu_x']
# mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_variable_bd(center_col_names = col_names, channel = 2)
mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_fixed_bd(center_col_names = col_names, channel = 2)

end_time = time.time()

max_loc = np.array(max_loc)
track_df['c2_amp'] = mean
track_df['c2_peak'] = maximum
track_df['c2_peak_x'] = max_loc[:,2]
track_df['c2_peak_y'] = max_loc[:,1]
track_df['c2_peak_z'] = max_loc[:,0]
print('time taken (seconds)', end_time - start_time)

## Finding peak pixel for Channel 2 as not sure about the offset between Channel 3 and Channel 2
**Reason for offset is being shot from different cameras**

## Finding mean value around the peak pixel value for channel 2 

In [None]:
start_time = time.time()
col_names = ['c2_peak_z', 'c2_peak_y', 'c2_peak_x']
peak_mean,maxima,_,_,_ = extractor.extract_pixels_data_variable_bd(center_col_names = col_names, 
                                                     channel = 2)
end_time = time.time()
track_df['c2_peak_mean'] = peak_mean
print('time taken (seconds)', end_time - start_time)

## Finding voxel sum around peak for channel 2 

In [None]:
# pass the channel for which voxel sum is needed and the coordinates around which voxel sum is supposed to be calculated
# convention for coords [z,y,x] 
# convention for channel is 1 for channel 1, 2 for channel 2 and so on 
start_time = time.time()
voxel_sum_array_2, _, adj_voxel_sum_array_2 = extractor.voxel_sum_fixed_background(center_col_names = ['c2_peak_z', 'c2_peak_y', 'c2_peak_x'],
                                       channel = 2, background_radius = background_radius_for_voxel_sum)
end_time = time.time()

#calculated around the peak value coordinates
track_df['c2_voxel_sum'] = voxel_sum_array_2
track_df['c2_voxel_sum_adjusted'] = adj_voxel_sum_array_2

print('time taken (seconds)', end_time - start_time)

### Gaussian Fitting for Channel 2 around peak values 

In [None]:
#Calculating the gaussian fitting estimates around peak coords for channel 2 
#expected sigma value needed for gaussian fitting 
#set all frames to False for processing limited frames 
#max_frames determines the frames to be processed if all_frames is false
start_time = time.time()
channel2_gaussians_df = extractor.run_parallel_frame_processing(expected_sigma = expected_sigma_value, 
                                        center_col_name = ['c2_peak_z', 'c2_peak_y', 'c2_peak_x'], 
                                       dist_between_spots = distance_between_spots , channel = 2,  
                                       max_frames =  max_frames_to_do_gaussian_fitting, all_frames = process_all_frames)
end_time = time.time()
print('time taken (seconds)', end_time - start_time)

In [None]:
track_df['c2_gaussian_amp'] = channel2_gaussians_df['amplitude']
track_df['c2_mu_x'] = channel2_gaussians_df['mu_x']
track_df['c2_mu_y'] = channel2_gaussians_df['mu_y']
track_df['c2_mu_z'] = channel2_gaussians_df['mu_z']
track_df['c2_sigma_x'] = channel2_gaussians_df['sigma_x']
track_df['c2_sigma_y'] = channel2_gaussians_df['sigma_y']
track_df['c2_sigma_z'] = channel2_gaussians_df['sigma_z']

# Extract information for channel 1

In [None]:
offset = [0,0]
col_names = ['mu_z', 'mu_y', 'mu_x']
radi_list = ['sigma_z', 'sigma_y', 'sigma_x']
# mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_variable_bd(center_col_names = col_names, channel = 1)
mean,maximum,minimum,pixel_values,max_loc = extractor.extract_pixels_data_fixed_bd(center_col_names = col_names, channel = 1)

max_loc = np.array(max_loc)
track_df['c1_amp'] = mean
track_df['c1_peak'] = maximum
track_df['c1_peak_x'] = max_loc[:,2]
track_df['c1_peak_y'] = max_loc[:,1]
track_df['c1_peak_z'] = max_loc[:,0]


In [None]:
voxel_sum_array_1, _, adj_voxel_sum_array_1 = extractor.voxel_sum_fixed_background(center_col_names = ['mu_z', 'mu_y', 'mu_x'], channel = 1,
                                                                                  background_radius = background_radius_for_voxel_sum)
#calculated around the peak value coordinates
track_df['c1_voxel_sum'] = voxel_sum_array_1
track_df['c1_voxel_sum_adjusted'] = adj_voxel_sum_array_1

### Perform Guassian Fitting for channel 1

In [None]:
start_time = time.time()
channel1_gaussians_df = extractor.run_parallel_frame_processing(expected_sigma = [4,2,2], 
                                        center_col_name = ['c1_peak_z', 'c1_peak_y', 'c1_peak_x'], 
                                       dist_between_spots = 10, channel = 1,  
                                       max_frames =  max_frames_to_do_gaussian_fitting, all_frames = process_all_frames)
end_time = time.time()
print('time taken (seconds)', end_time - start_time)

In [None]:
track_df['c1_gaussian_amp'] = channel1_gaussians_df['amplitude']
track_df['c1_mu_x'] = channel1_gaussians_df['mu_x']
track_df['c1_mu_y'] = channel1_gaussians_df['mu_y']
track_df['c1_mu_z'] = channel1_gaussians_df['mu_z']
track_df['c1_sigma_x'] = channel1_gaussians_df['sigma_x']
track_df['c1_sigma_y'] = channel1_gaussians_df['sigma_y']
track_df['c1_sigma_z'] = channel1_gaussians_df['sigma_z']

### In the below steps dataframe is being cleaned to make column names consistent for next steps 
##Renaming columns to maintain data integrity 
new_col_names = {
    'amplitude': 'c3_gaussian_amp', 
    'mu_x': 'c3_mu_x', 
    'mu_y': 'c3_mu_y', 
    'mu_z': 'c3_mu_z', 
    'sigma_x': 'c3_sigma_x', 
    'sigma_y': 'c3_sigma_y', 
    'sigma_z': 'c3_sigma_z', 
    'c2_peak': 'c2_peak_amp', 
    'c1_peak': 'c1_peak_amp'
}
track_df.rename(columns=new_col_names, inplace=True)

In [None]:
track_df.to_pickle(output_directory_full)

# To test the following functions change the markdown files to code files 
1. Voxel sum using the variable sigma/radi values directly inferred from the dataset 
2. Extracting pixel values mean,max etc using fixed radi
3. Performing Gaussian Fitting on peak coords for single frame

In [None]:
# # calculate distances between the calculated mu values for the channels
# distances = np.sqrt(np.sum((track_df[['c1_mu_x', 'c1_mu_y', 'c1_mu_z']].values - track_df[['c2_mu_x', 'c2_mu_y', 'c2_mu_z']].values)**2, axis=1))
# track_df['c1_c2_dist'] = distances
# distances_ch3_1 = np.sqrt(np.sum((track_df[['c3_mu_x', 'c3_mu_y', 'c3_mu_z']].values - track_df[['c1_mu_x', 'c1_mu_y', 'c1_mu_z']].values)**2, axis=1))
# track_df['c3_c1_dist'] = distances_ch3_1
# import matplotlib.pyplot as plt
# plt.hist(distances, bins=100)
# plt.xlabel('Distance between mu values of channel 1 and channel 2 (pixels)')
# plt.ylabel('Frequency')
# plt.hist(distances_ch3_1, bins=100)
# plt.xlabel('Distance between mu values of channel 1 and channel 3 (pixels)')
# plt.ylabel('Frequency')
# plt.show()

In [None]:
# # calculate offset between the calculated mu values for the channels
# offsets = track_df[['c3_mu_x', 'c3_mu_y', 'c3_mu_z']].values - track_df[['c2_mu_x', 'c2_mu_y', 'c2_mu_z']].values
# track_df['c3_c2_offset_x'] = offsets[:,0]
# track_df['c3_c2_offset_y'] = offsets[:,1]
# track_df['c3_c2_offset_z'] = offsets[:,2]

# offsets = track_df[['c3_mu_x', 'c3_mu_y', 'c3_mu_z']].values - track_df[['c1_mu_x', 'c1_mu_y', 'c1_mu_z']].values
# track_df['c3_c1_offset_x'] = offsets[:,0]
# track_df['c3_c1_offset_y'] = offsets[:,1]
# track_df['c3_c1_offset_z'] = offsets[:,2]

# plt.plot(track_df['c3_c2_offset_x'], track_df['c3_c2_offset_y'], 'o')
# plt.xlabel('Offset in x (pixels)')
# plt.ylabel('Offset in y (pixels)')
# plt.title('Offset between mu values of channel 3 and channel 2')
# plt.show()

In [None]:
plt.hist(track_df['c3_c2_offset_x'],50)
plt.hist(track_df['c3_c2_offset_y'],50)
plt.hist(track_df['c3_c2_offset_z'],50)
plt.xlabel('Offset (pixels)')

plt.show()



In [None]:
# report mean and std value of offset xyz
print('Mean offset x:', np.mean(track_df['c3_c2_offset_x']))
print('Mean offset y:', np.mean(track_df['c3_c2_offset_y']))
print('Mean offset z:', np.mean(track_df['c3_c2_offset_z']))
print('Std offset x:', np.std(track_df['c3_c2_offset_x']))
print('Std offset y:', np.std(track_df['c3_c2_offset_y']))
print('Std offset z:', np.std(track_df['c3_c2_offset_z']))

### Function 1 (Voxel Sum)
start_time = time.time()
voxel_sum_array_variable, _ = extractor.voxel_sum_variable_bd(center_col_names = ['mu_z', 'mu_y', 'mu_x'], channel = 3)
end_time = time.time()
print('time taken (seconds)', end_time - start_time)

### Function 2 (Extracting Pixel Values)
start_time = time.time()
offset = [0,0]
col_names = ['mu_z', 'mu_y', 'mu_x']
radii = [4, 2, 2]
mean_v,maximum_v,minimum_v,pixel_values_v,max_loc_v = extractor.extract_pixels_data_fixed_bd(center_col_names = col_names, channel = 3)

end_time = time.time()
print('time taken (seconds)', end_time - start_time)

### Function 3 (Gaussian fitting on one frame for one channel)
df = extractor.gaussian_fitting_single_frame(expected_sigma = [4,2,2], 
                              center_col_names = ['c2_peak_z', 'c2_peak_y', 'c2_peak_x'],
                                      frame = 0, channel = 2, dist_between_spots = 10)