## NWB-Datajoint tutorial 1

**Note: make a copy of this notebook and run the copy to avoid git conflicts in the future**

This is the first in a multi-part tutorial on the NWB-Datajoint pipeline used in Loren Frank's lab, UCSF. It demonstrates how to run spike sorting within the pipeline.

If you have not done [tutorial 0](0_intro.ipynb) yet, make sure to do so before proceeding.

Let's start by importing the `nwb_datajoint` package, along with a few others. 

In [None]:
import os
import numpy as np
import datajoint as dj


In [None]:


#import nwb_datajoint as nd

# ignore datajoint+jupyter async warnings
import warnings
warnings.simplefilter('ignore', category=DeprecationWarning)
warnings.simplefilter('ignore', category=ResourceWarning)
os.environ['NWB_DATAJOINT_TEMP_DIR']="/stelmo/nwb/tmp"
os.environ['KACHERY_STORAGE_DIR']="/stelmo/nwb/kachery-storage"
os.environ['FIGURL_CHANNEL']="franklab2"



In [None]:
# import tables so that we can call them easily
from nwb_datajoint.common import (RawPosition, HeadDir, Speed, LinPos, StateScriptFile, VideoFile,
                                  IntervalPositionInfo, IntervalLinearizedPosition,
                                  DataAcquisitionDevice, CameraDevice, Probe,
                                  DIOEvents,
                                  ElectrodeGroup, Electrode, Raw, SampleCount,
                                  LFPSelection, LFP, LFPBandSelection, LFPBand,
                                  SortGroup, SpikeSortingFilterParameters, SpikeSortingArtifactDetectionParameters,
                                  SpikeSortingRecordingSelection, SpikeSortingRecording, 
                                  SpikeSortingWorkspace, 
                                  SpikeSorter, SpikeSorterParameters, SortingID,
                                  SpikeSortingSelection, SpikeSorting, 
                                  SpikeSortingMetricParameters,
                                  ModifySortingParameters, ModifySortingSelection, ModifySorting, 
                                  AutomaticCurationParameters, AutomaticCurationSelection,
                                  AutomaticCuration,
                                  CuratedSpikeSortingSelection, CuratedSpikeSorting,
                                  UnitInclusionParameters,
                                  FirFilter,
                                  IntervalList, SortInterval,
                                  Lab, LabMember, LabTeam, Institution,
                                  BrainRegion,
                                  SensorData,
                                  Session, ExperimenterList,
                                  Subject,
                                  Task, TaskEpoch,
                                  Nwbfile, AnalysisNwbfile, 
                                  KacheryChannel, NwbfileKacherySelection, NwbfileKachery,
                                  AnalysisNwbfileKacherySelection, AnalysisNwbfileKachery)

In [None]:
poskey = {'nwb_file_name': 'chimi20200216_new_.nwb', 'position_info_param_name':'default_decoding'}
#IntervalPositionInfo. 
#IntervalLinearizedPosition 
lposkey= {'position_info_param_name': 'default',  'nwb_file_name': 'chimi20200216_new_.nwb', 'interval_list_name': 'pos 1 valid times',  'track_graph_name': '6 arm', 'linearization_param_name': 'default'}

In [3]:
ipi = (IntervalPositionInfo & poskey).fetch1()
AnalysisNwbfileKacherySelection().insert1({'channel_name':'franklab2', 'analysis_file_name': ipi['analysis_file_name']}, skip_duplicates=True)
AnalysisNwbfileKachery.populate()
AnalysisNwbfileKachery()

In [None]:
ilp = (IntervalLinearizedPosition & lposkey).fetch1()
AnalysisNwbfileKacherySelection().insert1({'channel_name':'franklab2', 'analysis_file_name': ilp['analysis_file_name']}, skip_duplicates=True)
ilp

### nwb_file_name = 'despereaux20191125_.nwb'

In [None]:
SpikeSortingRecording & {'nwb_file_name':nwb_file_name}

In [None]:
key = {'nwb_file_name': 'chimi20200216_new_.nwb', 'position_info_param_name':'default_decoding'}

In [4]:
key = {'nwb_file_name':nwb_file_name}
(SpikeSortingRecording & key)

nwb_file_name  name of the NWB file,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode,probe_type,probe_shank  shank number within probe,probe_electrode  electrode,region_id,name  unique label for each contact,original_reference_electrode  the configured reference electrode for this electrode,x  the x coordinate of the electrode position in the brain,y  the y coordinate of the electrode position in the brain,z  the z coordinate of the electrode position in the brain,filtering  description of the signal filtering,impedance  electrode impedance,bad_channel  if electrode is 'good' or 'bad' as observed during recording,x_warped  x coordinate of electrode position warped to common template brain,y_warped  y coordinate of electrode position warped to common template brain,z_warped  z coordinate of electrode position warped to common template brain,contacts  label of electrode contacts used for a bipolar signal -- current workaround
beans20190718_.nwb,0,0,128c-4s8mm6cm-20um-40um-sl,0,0,1,0,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,1,128c-4s8mm6cm-20um-40um-sl,0,1,1,1,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,3,128c-4s8mm6cm-20um-40um-sl,0,3,1,3,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,4,128c-4s8mm6cm-20um-40um-sl,0,4,1,4,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,5,128c-4s8mm6cm-20um-40um-sl,0,5,1,5,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,7,128c-4s8mm6cm-20um-40um-sl,0,7,1,7,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,8,128c-4s8mm6cm-20um-40um-sl,0,8,1,8,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,9,128c-4s8mm6cm-20um-40um-sl,0,9,1,9,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,11,128c-4s8mm6cm-20um-40um-sl,0,11,1,11,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,
beans20190718_.nwb,0,12,128c-4s8mm6cm-20um-40um-sl,0,12,1,12,1,0.0,0.0,0.0,,0.0,False,0.0,0.0,0.0,


In [None]:
key = {'nwb_file_name':  nwb_file_name}
SpikeSortingWorkspace().url(key)

In [None]:
from nwb_datajoint.decoding import UnitMarkParameters, UnitMarks, MarkParameters

In [5]:
UnitMarks.populate()

array(['0', '1'], dtype=object)

Set up the lab members and team information for this sort

In [6]:
#Uncomment to set sort group
#SortGroup().set_group_by_electrode_group(nwb_file_name)

array([0, 1, 2, 3])

In [None]:
SortGroup.SortGroupElectrode & {'nwb_file_name': nwb_file_name}

#### Define sort interval
Next, we make a decision about the time interval for our spike sorting. Let's re-examine `IntervalList`.

In [7]:
IntervalList & {'nwb_file_name' : nwb_file_name}

Deleting 0 rows from `common_spikesorting`.`sort_group`
Nothing to delete.


For our example, let's choose the first 600 seconds of the first run interval (`02_r1`) as our sort interval. To do so, we first fetch `valid_times` of this interval, define our new sort interval, and add this to the `SortInterval` table.

In [8]:
interval_list_name = '02_r1'

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,"sort_reference_electrode_id  the electrode to use for reference. -1: no reference, -2: common median"
beans20190718_.nwb,0,1
beans20190718_.nwb,1,1
beans20190718_.nwb,2,1
beans20190718_.nwb,3,1
beans20190718_.nwb,4,1
beans20190718_.nwb,5,1
beans20190718_.nwb,6,1
beans20190718_.nwb,7,1


In [None]:
interval_list = (IntervalList & {'nwb_file_name' : nwb_file_name,
                            'interval_list_name' : interval_list_name}).fetch1('valid_times')
print(interval_list)

In [9]:
sort_interval = interval_list[0]
sort_interval_name = interval_list_name
sort_interval = np.copy(interval_list[0]) 
sort_interval[1] = sort_interval[0]+300
sort_interval_name = 'test'

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode
beans20190718_.nwb,0,0,0
beans20190718_.nwb,0,0,1
beans20190718_.nwb,0,0,3
beans20190718_.nwb,0,0,4
beans20190718_.nwb,0,0,5
beans20190718_.nwb,0,0,7
beans20190718_.nwb,0,0,8
beans20190718_.nwb,0,0,9
beans20190718_.nwb,0,0,11
beans20190718_.nwb,0,0,12


In [None]:
# Check out SortInterval
(SortInterval & {'nwb_file_name' : nwb_file_name})

In [10]:
# Specify the required attributes.
# This time, the entries take the form of a dictionary.
#SortInterval.insert1({'nwb_file_name' : nwb_file_name,
#                      'sort_interval_name' : sort_interval_name,
#                      'sort_interval' : sort_interval}, replace=True)

In [None]:
# See results
SortInterval & {'nwb_file_name' : nwb_file_name, 'sort_interval_name': sort_interval_name}

Now set the filtering parameters. Here we insert the default parameters and a new set of filtering parameters for hippocampal data
|

In [12]:
SpikeSortingFilterParameters().insert_default()
filter_param_dict = SpikeSortingFilterParameters.fetch('filter_parameter_dict')
filter_param_dict = filter_param_dict[0]

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,"sort_reference_electrode_id  the electrode to use for reference. -1: no reference, -2: common median"
beans20190718_.nwb,8,-1


In [13]:
filter_param_dict['frequency_min'] = 600
SpikeSortingFilterParameters().insert1({'filter_parameter_set_name': 'franklab_default_hippocampus', 
                                       'filter_parameter_dict' : filter_param_dict}, skip_duplicates=True)

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,electrode_group_name  electrode group name from NWBFile,electrode_id  the unique number for this electrode
beans20190718_.nwb,8,0,0
beans20190718_.nwb,8,0,4
beans20190718_.nwb,8,0,8
beans20190718_.nwb,8,0,12
beans20190718_.nwb,8,0,16
beans20190718_.nwb,8,0,20
beans20190718_.nwb,8,0,24
beans20190718_.nwb,8,0,28


Similarly, we set up the SpikeSortingArtifactParameters which can allow us to remove artifacts from the data
For the moment we just set up a "none" parameter set which will do nothing when used

In [14]:
SpikeSortingArtifactDetectionParameters().insert_default()

nwb_file_name  name of the NWB file,interval_list_name  descriptive name of this interval list,valid_times  numpy array with start and end times for each interval
beans20190718_.nwb,01_s1,=BLOB=
beans20190718_.nwb,02_r1,=BLOB=
beans20190718_.nwb,03_s2,=BLOB=
beans20190718_.nwb,04_r2,=BLOB=
beans20190718_.nwb,pos 0 valid times,=BLOB=
beans20190718_.nwb,pos 1 valid times,=BLOB=
beans20190718_.nwb,pos 2 valid times,=BLOB=
beans20190718_.nwb,pos 3 valid times,=BLOB=
beans20190718_.nwb,raw data valid times,=BLOB=


Now we set up the recording parameters so we can get the recording extractor

In [15]:
sort_group_id = 2 # use sort group 2
sort_interval_name = 'test'
filter_param_name = 'franklab_default_hippocampus'
artifact_param_name = 'none'
interval_list = '02_r1'
lab_team = 'Loren Frank'

In [16]:
# collect the params
key = dict()
key['nwb_file_name'] = nwb_file_name
key['sort_group_id'] = sort_group_id
key['filter_parameter_set_name'] = filter_param_name
key['sort_interval_name'] = sort_interval_name
key['artifact_parameter_name'] = artifact_param_name
key['interval_list_name'] = interval_list
key['team_name'] = 'Loren Frank'

ssr_key = key

[[1.56349063e+09 1.56349340e+09]]


In [17]:
SpikeSortingRecordingSelection()

[1.56349064e+09 1.56349065e+09]


In [18]:
SpikeSortingRecordingSelection.insert1(key, skip_duplicates=True)

nwb_file_name  name of the NWB file,sort_interval_name  name for this interval,sort_interval  1D numpy array with start and end time for a single interval to be used for spike sorting
,,


In [19]:
SpikeSortingRecording.populate()

Now we need to populate the SpikeSortingWorkspace table to make this recording available via kachery

In [20]:
SpikeSortingRecording()

In [None]:
SpikeSortingWorkspace.populate()


For our example, we will be using `mountainsort4`.

In [22]:
#SpikeSortingWorkspace().url(key)

In [None]:
SpikeSorter().insert_from_spikeinterface()
SpikeSorterParameters().insert_from_spikeinterface()

In [None]:
sorter_name='mountainsort4'

In [24]:
# Let's look at the default params
ms4_default_params = (SpikeSorterParameters & {'sorter_name' : sorter_name,
                                               'spikesorter_parameter_set_name' : 'default'}).fetch1()
print(ms4_default_params)

In [None]:
# Change the default params
param_dict = ms4_default_params['parameter_dict']
# Detect upward downward going spikes
param_dict['detect_sign'] = -1 
#We will sort electrodes together that are within 100 microns of each other
param_dict['adjacency_radius'] = 100
param_dict['curation'] = False
# Turn filter off since we will filter it prior to starting sort
param_dict['filter'] = False
param_dict['freq_min'] = 0
param_dict['freq_max'] = 0
# Turn whiten off since we will whiten it prior to starting sort
param_dict['whiten'] = False
# set num_workers to be the same number as the number of electrodes
param_dict['num_workers'] = 4
param_dict['verbose'] = True
# set clip size as number of samples for 1.33 millisecond
param_dict['clip_size'] = np.int(1.33e-3 * (Raw & {'nwb_file_name' : nwb_file_name}).fetch1('sampling_rate'))
param_dict['noise_overlap_threshold'] = 0



In [25]:
param_dict

In [None]:
# Give a unique name here
parameter_set_name = 'franklab_tetrode_hippocampus_30KHz'
SpikeSorterParameters()

In [None]:
# Insert
SpikeSorterParameters.insert1({'sorter_name': sorter_name,
                               'spikesorter_parameter_set_name': parameter_set_name,
                               'parameter_dict': param_dict}, skip_duplicates=True)

In [None]:
# Check that insert was successful
#p = (SpikeSorterParameters & {'sorter_name': sorter_name, 'parameter_set_name': parameter_set_name}).fetch1()
p = (SpikeSorterParameters & {'sorter_name': sorter_name}).fetch()
p

#### Bringing everything together

We now collect all the decisions we made up to here and put it into `SpikeSortingSelection` table (note: this is different from spike sor*ter* parameters defined above).

In [27]:
key = (SpikeSortingWorkspace & ssr_key).fetch1("KEY")
key['sorter_name'] = sorter_name
key['spikesorter_parameter_set_name'] = 'franklab_tetrode_hippocampus_30KHz'
ss_key = key

{'sorter_name': 'mountainsort4', 'parameter_set_name': 'default', 'parameter_dict': {'detect_sign': -1, 'adjacency_radius': -1, 'freq_min': 300, 'freq_max': 6000, 'filter': True, 'whiten': True, 'curation': False, 'num_workers': None, 'clip_size': 50, 'detect_threshold': 3, 'detect_interval': 10, 'noise_overlap_threshold': 0.15}, 'filter_parameter_dict': {'frequency_min': 300, 'frequency_max': 6000, 'filter_width': 1000, 'filter_chunk_size': 2000000}}


In [None]:
# insert
SpikeSortingSelection.insert1(key, skip_duplicates=True)

In [None]:
#(SpikeSortingParameters & {'nwb_file_name' : nwb_file_name, 'sort_interval_name' : sort_interval_name}).delete()

In [None]:
# inspect
(SpikeSortingSelection & {'nwb_file_name' : nwb_file_name})

#### Running spike sorting
Now we can run spike sorting. As we said it's nothing more than populating another table (`SpikeSorting`) from the entries of `SpikeSortingParameters`.

In [None]:
# Specify entry (otherwise runs everything in SpikeSortingParameters)
# `proj` gives you primary key"
SpikeSorting.populate([(SpikeSortingSelection & {'nwb_file_name' : nwb_file_name, 'sort_interval_name' : sort_interval_name}).proj()])

In [29]:
#SpikeSortingWorkspace().url(key)

sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,parameter_dict  dictionary of parameter names and values,filter_parameter_dict  dictionary of filter parameter names and
mountainsort4,beans,=BLOB=,=BLOB=


#### Define quality metric parameters

We're almost done. There are more parameters related to how to compute the quality metrics for curation. We just use the default options here. 

In [30]:
SpikeSortingMetricParameters()

cluster_metrics_list_name  the name for this list of cluster metrics,metric_dict  dict of SpikeInterface metrics with True / False elements to indicate whether a given metric should be computed.,metric_parameter_dict  dict of parameters for the metrics
franklab_default_cluster_metrics,=BLOB=,=BLOB=
franklab_default_cluster_metrics_test,=BLOB=,=BLOB=
kyu,=BLOB=,=BLOB=


In [31]:
metric_dict = SpikeSortingMetricParameters().get_metric_dict()
metric_param_dict = SpikeSortingMetricParameters().get_metric_parameter_dict()

In [None]:
for k in metric_dict:
    print(f"'{k}': {metric_dict[k]}\n")

In [None]:
metric_dict['noise_overlap'] = True
metric_dict['firing_rate'] = True
metric_dict['num_spikes'] = True
for k in metric_dict:
    print(f"'{k}': {metric_dict[k]}\n")

In [None]:
cluster_metrics_list_name = 'franklab_cluster_metrics_09-19-2021'

In [None]:
#(SpikeSortingMetricParameters & {'cluster_metrics_list_name' : cluster_metrics_list_name}).delete()

Add the cluster metrics to the table if they are not there already.

In [None]:
SpikeSortingMetricParameters.insert1({'cluster_metrics_list_name' : cluster_metrics_list_name,
                            'metric_dict' : metric_dict, 
                            'metric_parameter_dict' : metric_param_dict}, skip_duplicates=True)


Add the default Automatic curation parameters

In [33]:
param = AutomaticCurationParameters().get_default_parameters()
AutomaticCurationParameters().insert1({'automatic_curation_parameter_set_name':'none', 
                                      'automatic_curation_parameter_dict': param}, skip_duplicates=True)

Now add an entry to select those parameters for automatic curation of this recording

In [None]:
# first get the sorting ID
acs_key = (SpikeSortingRecording & ssr_key).fetch1('KEY')
acs_key['sorting_id'] = (SpikeSorting & ss_key).fetch1('sorting_id')
acs_key['automatic_curation_parameter_set_name'] = 'none'
acs_key['cluster_metrics_list_name'] = cluster_metrics_list_name
AutomaticCurationSelection.insert1(acs_key, skip_duplicates=True)

Now we populate the Autocuration table, which in this case just computes the metrics and does not add labels.

In [34]:
#AutomaticCuration.delete()

team_name,team_description
Alison Comrie,
Anna Gillespie,
AutoTrack,physiological data from dorsal and intermediate hippocampus
Beans,test
BeansXulu,
Jennifer Guidera,
Loren Frank,
"Rhino Nevers, David Kastner",


In [35]:
AutomaticCuration.populate(acs_key)

We can now curate the recording using the figurl interface. To do so, we get the figurl link for this recording

In [36]:
SpikeSortingWorkspace().url(ssr_key)

Once you're done with manual curation, you can add the units (with an optional new set of metrics) to the final CuratedSortingTable which includes only accepted units.

In [37]:
css_key = (AutomaticCuration & acs_key).fetch1('KEY')
css_key['sorting_id']
css_key['final_cluster_metrics_list_name'] = cluster_metrics_list_name
CuratedSpikeSortingSelection.insert1(css_key, skip_duplicates=True)

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval,artifact_param_name  name for this set of parameters,cluster_metrics_list_name  the name for this list of cluster metrics,interval_list_name  descriptive name of this interval list,team_name,import_path  optional path to previous curated sorting output
beans20190718_ emonroe_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,Beans,
beans20190718_asilva_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,Beans,
beans20190718_dkastner_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,AutoTrack,
beans20190718_emonroe_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,Beans,
beans20190718_jhbak_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,Beans,
beans20190718_xulu_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,BeansXulu,
CH5_20210109_.nwb,1,mountainsort4,franklab_tetrode_hippocampus_up_down_30KHz,CH5_01_r1_full,default,franklab_default_cluster_metrics,01_r1,AutoTrack,
CH5_20210109_.nwb,3,mountainsort4,franklab_tetrode_hippocampus_up_down_30KHz,CH5_01_r1_full,default,franklab_default_cluster_metrics,01_r1,AutoTrack,
CH5_20210109_.nwb,4,mountainsort4,franklab_tetrode_hippocampus_up_down_30KHz,CH5_01_r1_full,default,franklab_default_cluster_metrics,01_r1,AutoTrack,
CH5_20210109_.nwb,5,mountainsort4,franklab_tetrode_hippocampus_up_down_30KHz,CH5_01_r1_full,default,franklab_default_cluster_metrics,01_r1,AutoTrack,


In [38]:
CuratedSpikeSorting.populate(css_key)

In [39]:
CuratedSpikeSorting.Unit()

nwb_file_name  name of the NWB file,sort_group_id  identifier for a group of electrodes,sorter_name  the name of the spike sorting algorithm,parameter_set_name  label for this set of parameters,sort_interval_name  name for this interval,artifact_param_name  name for this set of parameters,cluster_metrics_list_name  the name for this list of cluster metrics,interval_list_name  descriptive name of this interval list,team_name,import_path  optional path to previous curated sorting output
beans20190718_.nwb,8,mountainsort4,beans,beans_02_r1_10s,default,franklab_default_cluster_metrics,02_r1,Beans,


In [None]:
sort_groups = (SortGroup & {'nwb_file_name' : nwb_file_name}).fetch('sort_group_id')
sort_groups

In [None]:
SpikeSorting()

In [None]:
dj.ERD(SpikeSorting)+5-6

In [None]:
dj.ERD(ModifySorting)+3-3

In [None]:
units = CuratedSpikeSorting().Unit().fetch()

In [None]:
units['noise_overlap']