### Preliminary

Formats the code in our notebook cells

In [1]:
%reload_ext lab_black

### Step 1: Import the function `get_mda_files_dataframe`

+ `get_mda_files_dataframe` return information about your `.mda` files in the form of a pandas dataframe.

+ Each row of the dataframe represents a `.mda` file.

+ You may filter this dataframe to only run a subset of the `.mda` files.

In [2]:
from franklab_mountainsort import get_mda_files_dataframe

We can look at what inputs the function takes by looking at the docstring. You can look at docstrings of python functions by typing the function name and then a question mark immediately after it.

In [3]:
get_mda_files_dataframe?

[0;31mSignature:[0m [0mget_mda_files_dataframe[0m[0;34m([0m[0mdata_path[0m[0;34m,[0m [0mrecursive[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Parameters
----------
data_path : str
recursive : bool, optional
    Recursive search using glob.

Returns
-------
mda_files_dataframe : pandas.DataFrame

Examples
--------
```
all_animal_info = get_mda_files_dataframe(
    '/data2/data1_backup/anna/*/preprocessing/')
```

```
single_animal_info = get_mda_files_dataframe(
    '/data2/data1_backup/anna/remy/preprocessing/')
```
[0;31mFile:[0m      /data2/edeno/franklab_mountainsort/franklab_mountainsort/core.py
[0;31mType:[0m      function


We see that `get_mda_files_dataframe` takes a string called `data_path`---which is the filepath to the preprocessing folder---and an optional boolean which we will ignore for now.

`data_path` is a path to the `preprocessing` folder for a particular animal. For example, if we want to look at animal Jaq, we set the data_path to the string "/stelmo/abhilasha/animals/Jaq/preprocessing". This is the preprocessing folder for the animal Jaq.

In [4]:
import os

data_path = "/data2/edeno/"
animal = "remy"

preprocessing_folder = os.path.join(data_path, animal, "preprocessing")
print(f"data_path = {preprocessing_folder}")

mda_file_info = get_mda_files_dataframe(preprocessing_folder)

mda_file_info

data_path = /data2/edeno/remy/preprocessing


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,task,mda_filepath,geom_filepath
animal,date,electrode_number,epoch,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
remy,20170920,1,1,s1,/data2/edeno/remy/preprocessing/20170920/20170...,
remy,20170920,2,1,s1,/data2/edeno/remy/preprocessing/20170920/20170...,
remy,20170920,3,1,s1,/data2/edeno/remy/preprocessing/20170920/20170...,
remy,20170920,4,1,s1,/data2/edeno/remy/preprocessing/20170920/20170...,
remy,20170920,5,1,s1,/data2/edeno/remy/preprocessing/20170920/20170...,
remy,...,...,...,...,...,...
remy,20170922,28,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,29,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,30,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,31,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,


We see this returns a pandas dataframe of 1280 rows, each row representing an mda file. For example, if we look at the filepath for the first entry, we can see that this is the `.mda` file for a single tetrode during a sleep epoch.

In [5]:
mda_file_info.iloc[0].mda_filepath

'/data2/edeno/remy/preprocessing/20170920/20170920_remy_01_s1.mda/20170920_remy_01_s1.nt1.mda'

We can filter the pandas dataframe for specific days if we only want to run those days. For example, suppose we want the last day, `20190830`. We can use the `query` function

In [6]:
mda_file_info.query("date == 20170922")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,task,mda_filepath,geom_filepath
animal,date,electrode_number,epoch,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
remy,20170922,1,1,s1,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,2,1,s1,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,3,1,s1,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,4,1,s1,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,5,1,s1,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,...,...,...,...,...
remy,20170922,28,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,29,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,30,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,
remy,20170922,31,5,s3,/data2/edeno/remy/preprocessing/20170922/20170...,


## Step 2: Spike sort using the mda files dataframe

First we import the function `spike_sort_all`

In [7]:
from franklab_mountainsort import spike_sort_all

We can again look at what inputs are required and the meaning the parameters by looking at the docstring.

In [8]:
spike_sort_all?

[0;31mSignature:[0m
[0mspike_sort_all[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mmda_file_info[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmountainlab_output_folder[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfiring_rate_thresh[0m[0;34m=[0m[0;36m0.01[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0misolation_thresh[0m[0;34m=[0m[0;36m0.97[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnoise_overlap_thresh[0m[0;34m=[0m[0;36m0.03[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mpeak_snr_thresh[0m[0;34m=[0m[0;36m1.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mextract_marks[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mextract_clips[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mclip_time[0m[0;34m=[0m[0;36m1.5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfreq_min[0m[0;34m=[0m[0;36m300[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfreq_max[0m[0;34m=[0m[0;36m6000[0m[0;34m,[0m[

Next we import logging so that we can see the output of the function.

In [9]:
import logging

FORMAT = "%(asctime)s %(message)s"

logging.basicConfig(level="INFO", format=FORMAT, datefmt="%d-%b-%y %H:%M:%S")

Next we set the location of the temp directory. Care should be taken in setting the location of the temp directory because data transfer costs can be incurred when the directory is on a different machine without a fast connection.

In [10]:
temp_path = os.path.join(data_path, animal, "temp")
os.environ["ML_TEMPORARY_DIRECTORY"] = temp_path

Finally, we pass the mda_file_info dataframe to the spike sort all function. This is where we set our spike sorting parameters and do the spike sorting.

In [None]:
spike_sort_all(
    mda_file_info.query("date == 20170922"),
    mountainlab_output_folder=None,
    firing_rate_thresh=0.01,
    isolation_thresh=0.97,
    noise_overlap_thresh=0.03,
    peak_snr_thresh=1.5,
    extract_marks=True,
    extract_clips=True,
    clip_time=1.5,
    freq_min=600,
    freq_max=6000,
    adjacency_radius=-1,
    detect_threshold=3,
    detect_interval=10,
    detect_sign=-1,
    sampling_rate=30000,
    drift_track=True,)

15-Jan-20 11:23:51 Processing 32 electrodes...
15-Jan-20 11:23:51 Temp directory: /data2/edeno/remy/temp
15-Jan-20 11:23:51 Processing animal: remy, date: 20170922, electrode: 1
15-Jan-20 11:23:51 Parameters: {'logger': <Logger remy_20170922_nt1 (DEBUG)>, 'mountain_out_electrode_dir': '/data2/edeno/remy/mountainlab_output/20170922/nt1', 'drift_track': True, 'geom': None, 'sampling_rate': 30000, 'detect_sign': -1, 'detect_interval': 10, 'detect_threshold': 3, 'adjacency_radius': -1, 'freq_max': 6000, 'freq_min': 600, 'clip_time': 1.5, 'extract_clips': True, 'extract_marks': True, 'peak_snr_thresh': 1.5, 'noise_overlap_thresh': 0.03, 'isolation_thresh': 0.97, 'firing_rate_thresh': 0.01, 'mountainlab_output_folder': '/data2/edeno/remy/mountainlab_output', 'preprocessing_folder': '/data2/edeno/remy/preprocessing', 'electrode_number': 1, 'date': '20170922', 'animal': 'remy'}
15-Jan-20 11:23:51 Concatenated .mda file does not exist.Creating /data2/edeno/remy/mountainlab_output/20170922/nt1/r

RUNNING: ml-run-process ephys.bandpass_filter --inputs timeseries:/data2/edeno/remy/mountainlab_output/20170922/nt1/raw.mda --parameters freq_max:6000 freq_min:600 samplerate:30000 --outputs timeseries_out:/data2/edeno/remy/mountainlab_output/20170922/nt1/filt.mda.prv
[34m[ Getting processor spec... ][0m
[34m[ Checking inputs and substituting prvs ... ][0m
[34m[ Computing process signature ... ][0m
[34mProcess signature: edff6a3f511354023b66309fe775c44b239d821f[0m
[34m[ Checking outputs... ][0m
[34m{"timeseries_out":"/data2/edeno/remy/mountainlab_output/20170922/nt1/filt.mda.prv"}[0m
[34mProcessing ouput - /data2/edeno/remy/mountainlab_output/20170922/nt1/filt.mda.prv[0m
[34mfalse[0m
[34m{"timeseries_out":"/data2/edeno/remy/temp/output_edff6a3f511354023b66309fe775c44b239d821f_timeseries_out.mda"}[0m
[34m[ Checking process cache ... ][0m
[34m[ Creating temporary directory ... ][0m
[34m[ Creating links to input files... ][0m
[34m[ Preparing temporary outputs... ]

15-Jan-20 11:28:43 remy 20170922 nt1 sorting spikes...
15-Jan-20 11:28:43 Finding list of mda file from mda directories of date:20170922, ntrode:1
15-Jan-20 11:28:43 Segment 1: t1=0, t2=36599584, t1_min=0.000, t2_min=20.333


[34m[ Getting processor spec... ][0m
[34m[ Checking inputs and substituting prvs ... ][0m
[34m[ Computing process signature ... ][0m
[34mProcess signature: 8a3811cd0792ed3cd8d1ba18082201e9dd949a3b[0m
[34m[ Checking outputs... ][0m
[34m{"timeseries_out":"/data2/edeno/remy/mountainlab_output/20170922/nt1/pre.mda.prv"}[0m
[34mProcessing ouput - /data2/edeno/remy/mountainlab_output/20170922/nt1/pre.mda.prv[0m
[34mfalse[0m
[34m{"timeseries_out":"/data2/edeno/remy/temp/output_8a3811cd0792ed3cd8d1ba18082201e9dd949a3b_timeseries_out.mda"}[0m
[34m[ Checking process cache ... ][0m
[34m[ Creating temporary directory ... ][0m
[34m[ Creating links to input files... ][0m
[34m[ Preparing temporary outputs... ][0m
[34mProcessing ouput - /data2/edeno/remy/temp/output_8a3811cd0792ed3cd8d1ba18082201e9dd949a3b_timeseries_out.mda[0m
[34mfalse[0m
[34m[ Initializing process ... ][0m
[34m[ Running ... ] /home/edeno/miniconda3/envs/franklab_mountainsort/bin/python3 /home/edeno/

15-Jan-20 11:30:16 Segment 2: t1=36599585, t2=72664702, t1_min=20.333, t2_min=40.369


[34m[ Getting processor spec... ][0m
[34m[ Checking inputs and substituting prvs ... ][0m
[34m[ Computing process signature ... ][0m
[34mProcess signature: d11643e1bdc229ca54650483bebcb3aba91cf5c5[0m
[34m[ Checking outputs... ][0m
[34m{"firings_out":"/data2/edeno/remy/mountainlab_output/20170922/nt1/firings-1.mda"}[0m
[34mProcessing ouput - /data2/edeno/remy/mountainlab_output/20170922/nt1/firings-1.mda[0m
[34mfalse[0m
[34m{"firings_out":"/data2/edeno/remy/mountainlab_output/20170922/nt1/firings-1.mda"}[0m
[34m[ Checking process cache ... ][0m
[34m[ Creating temporary directory ... ][0m
[34m[ Creating links to input files... ][0m
[34m[ Preparing temporary outputs... ][0m
[34mProcessing ouput - /data2/edeno/remy/mountainlab_output/20170922/nt1/firings-1.mda[0m
[34mfalse[0m
[34m[ Initializing process ... ][0m
[34m[ Running ... ] /home/edeno/miniconda3/envs/franklab_mountainsort/bin/python3 /home/edeno/miniconda3/envs/franklab_mountainsort/etc/mountainlab/