# This is the first step in the pipeline
### Spots are detected in this notebook. The input file is expected to be in the zarr format 
### All three channels are imported at the same time and the detection step allows for parallel processing. Cores to be utilized can be increased as available. Keep in mind that limitation can be posed by the RAM of your machine. As more cores are utilized more RAM is needed. 


## Note: 
**Detection can be only performed on 1 channel at a time**

In [1]:
import pandas as pd
import time
import os
import sys
import zarr
%load_ext memory_profiler


pythonPackagePath = os.path.abspath('../src/')
sys.path.append(pythonPackagePath)
from parallel import Detector

### Do not change the code in cell below 

In [2]:
# This assumes that your notebook is inside 'Jupyter Notebooks', which is at the same level as 'test_data'
base_dir = os.path.join(os.path.dirname(os.path.abspath("__file__")), '..', 'test_data')

zarr_directory = 'zarr_file/all_channels_data'
zarr_full_path = os.path.join(base_dir, zarr_directory)

save_directory = 'datasets'
save_directory_full = os.path.join(base_dir, save_directory)

In [3]:
#Import the zarr file by adding file path in read mode
z2 = zarr.open(zarr_full_path, mode='r')

In [4]:
z2.info

0,1
Type,zarr.core.Array
Data type,uint16
Shape,"(130, 3, 75, 258, 275)"
Chunk shape,"(1, 1, 75, 258, 275)"
Order,C
Read-only,True
Compressor,"Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)"
Store type,zarr.storage.DirectoryStore
No. bytes,4150575000 (3.9G)
No. bytes stored,1126870917 (1.0G)


In [5]:
frames = z2.shape[0]
print(f'the number of frames are {frames}')

the number of frames are 130


## In the below cell Detector object is initilized to perform detection. More details on the Detector object can be attained by the following line of code: 
**copy and paste in a new cell**

?Detector

In [6]:
#channel_to_detect  (enter 1 for channel 1, 2 for channel 2 and so on)
#save_dir is going to be the directory to save the parallel_frame_processing output
#n_jobs is -1 by default which use the cores of your machine - 1 
#dist between spots is divided by two. In this case no two maximas can be within 5 pixels. 
detector = Detector(zarr_obj = z2, 
                    save_directory = save_directory_full, 
                    spot_intensity = 180, 
                    dist_between_spots = 10, 
                    sigma_estimations = [4,2,2], n_jobs = 2, channel_to_detect = 3)

In [10]:
#the following function returns the dataframe and also saves it to the provided path in pkl format
#set all_frames = True, to process all the time frames 
#max_frames is useful when you just want to perform detection on a subset of frames. 
#Note: when all_frames= True then max_frames is ignored 
df = detector.run_parallel_frame_processing(max_frames = 2, all_frames = True)

Processing frames: 100%|██████████████████████| 130/130 [07:38<00:00,  3.52s/it]


frame number is 0
(75, 258, 275)
(75, 275, 258)
local_maximas detected are 578
10%(58 of 578)
20%(116 of 578)
30%(174 of 578)
40%(232 of 578)
50%(289 of 578)
60%(347 of 578)
70%(405 of 578)
80%(463 of 578)
90%(521 of 578)
100%(578 of 578)
(578, 7)
the number of times the gaussian fitting worked was 578 and the number of times the gaussian did not fit was 0
frame number is 3
(75, 258, 275)
(75, 275, 258)
local_maximas detected are 552
10%(56 of 552)
20%(111 of 552)
30%(166 of 552)
40%(221 of 552)
50%(276 of 552)
60%(332 of 552)
70%(387 of 552)
80%(442 of 552)
90%(497 of 552)
100%(552 of 552)
(552, 7)
the number of times the gaussian fitting worked was 552 and the number of times the gaussian did not fit was 0
frame number is 5
(75, 258, 275)
(75, 275, 258)
local_maximas detected are 558
10%(56 of 558)
20%(112 of 558)
30%(168 of 558)
40%(224 of 558)
50%(279 of 558)
60%(335 of 558)
70%(391 of 558)
80%(447 of 558)
90%(503 of 558)
100%(558 of 558)
(558, 7)
the number of times the gaussian f

In [11]:
df.shape

(59178, 8)

In [12]:
df['frame'].value_counts()

frame
0      578
2      567
5      558
3      552
1      552
      ... 
129    349
127    347
121    345
126    344
128    335
Name: count, Length: 130, dtype: int64