# Running Array Electrophysiology Analysis Locally

### **Overview**

This notebook provides a step-by-step guide to running electrophysiology (ephys) data processing with SpikeInterface using the Utah Organoids pipeline. This includes preprocessing, spike sorting, and quality assessment.

By the end of this notebook, you will:
- Select and verify a session for analysis
- Populate ephys tables with computed results
- Run spike sorting and post-processing

**_Note:_**

- This notebook uses example data, replace values with actual database entries.
- Ensure your local machine has access to the raw ephys files before running computations.
- Processing can take several minutes to hours depending on data size and system resources.

### **Key Steps**

- **Setup**

- **Step 1: Select a Session of Interest**

- **Step 2: Run Ephys Pipeline Computations**


#### **Setup**


First, import the necessary packages for the data pipeline and essential schemas.


In [1]:
import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [2]:
import datajoint as dj
import numpy as np
import matplotlib.pyplot as plt
import datetime as datetime

In [3]:
from workflow.pipeline import culture, ephys, ephys_sorter

[2025-03-03 14:17:02,584][INFO]: Connecting milagros@db.datajoint.com:3306


[2025-03-03 14:17:04,144][INFO]: Connected milagros@db.datajoint.com:3306


#### **Step 1: Select Session of Interest**

To process ephys data, select a session stored in the `EphysSession` table.

In [9]:
ephys_key = {
    "organoid_id": "MB07",
    "experiment_start_time": "2024-09-07 14:49:00",
    "insertion_number": 0,
    "start_time": "2024-09-07 14:49:00",
    "end_time": "2024-09-07 14:54:00",
}

Verify the session exists:

In [10]:
ephys.EphysSession * ephys.EphysSessionProbe & ephys_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,session_type,probe  unique identifier for this model of probe (e.g. serial number),port_id,"used_electrodes  list of electrode IDs used in this session (if null, all electrodes are used)"
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,spike_sorting,Q983,C,=BLOB=


If the session is not found, double-check the organoid ID and timestamps.

#### **Step 2: Run Ephys Pipeline Computations**


Once the session is verified, the next steps involve populating tables that handle spike sorting.

#### Populate `EphysSessionInfo` Table

In [11]:
ephys.EphysSessionInfo.populate(ephys_key)

{'success_count': 0, 'error_list': []}

In [12]:
ephys.EphysSessionInfo & ephys_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,session_info  Session header info from intan .rhd file. Get this from the first session file.
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,=BLOB=


#### Select Clustering Parameters & Task

Check available clustering parameter sets:

In [13]:
ephys.ClusteringParamSet()

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
0,spykingcircus2,Default parameters for spyking circus2 using SpikeInterface v0.100.1,b6fb9ec2-768c-66b0-2b71-9b8ac91e94da,=BLOB=
1,spykingcircus2,Default parameter set for spyking circus2 using SpikeInterface v0.101.*,434894d0-eb7b-db6c-80e6-638a1322c568,=BLOB=
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=
5,spykingcircus2,Spyking circus2 with a detection threshold 5 (neg direction),4c895afd-a1b1-5d64-b747-e8489078e2e3,=BLOB=
11,spykingcircus2,waveform>threshold: .25->2,17d41d84-067d-791c-8706-8cab83020b84,=BLOB=
12,spykingcircus2,waveform>threshold: .25->2 attempt 2,2b28cf23-2456-8202-b70f-96871b837a26,=BLOB=
13,spykingcircus2,waveform>threshold: .25->2 attempt 2,1faf6aee-71d6-fe26-74ec-6bb7cdc0f30f,=BLOB=
14,spykingcircus2,apply_preprocessing = False,ce720015-b59a-08d6-198e-def81c860f46,=BLOB=
15,spykingcircus2,"apply_preprocessing, matched_filtering, and apply_motion_correction = False",5f7a8362-c31c-061e-14b2-74ad55466546,=BLOB=
16,spykingcircus2,"default parameters, different format",0a3d0360-c0de-6c30-9c35-7c931a9a6f62,=BLOB=


Retrieve an existing clustering task for this session:

In [14]:
ephys.ClusteringTask & ephys_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,clustering_output_dir  clustering output directory relative to the clustering root data directory
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,2,MB5-8_raw/202409071449_202409071454/MB07/kilosort2_2
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,23,MB5-8_raw/202409071449_202409071454/MB07/kilosort3_23
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,24,MB5-8_raw/202409071449_202409071454/MB07/kilosort2_24
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,27,MB5-8_raw/202409071449_202409071454/MB07/kilosort3_27
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,MB5-8_raw/202409071449_202409071454/MB07/kilosort2-5_250
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,401,MB5-8_raw/202409071449_202409071454/MB07/kilosort4_401


Fetch the corresponding task key:

In [15]:
task_key = (ephys.ClusteringTask & ephys_key & "paramset_idx=250").fetch1("KEY")
task_key

{'organoid_id': 'MB07',
 'experiment_start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'insertion_number': 0,
 'start_time': datetime.datetime(2024, 9, 7, 14, 49),
 'end_time': datetime.datetime(2024, 9, 7, 14, 54),
 'paramset_idx': 250}

#### Preprocessing

Preprocessing prepares the raw data for spike sorting.

In [16]:
ephys_sorter.PreProcessing.populate(task_key)

{'success_count': 0, 'error_list': []}

In [17]:
ephys_sorter.PreProcessing & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,execution_time  datetime of the start of this step,execution_duration  execution duration in hours
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,2024-11-14 21:43:50,0.0106521


#### Spike Sorting

Now, run automated spike sorting using the selected clustering method.

In [18]:
ephys_sorter.SIClustering.populate(task_key)

{'success_count': 0, 'error_list': []}

In [19]:
ephys_sorter.SIClustering & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,execution_time  datetime of the start of this step,execution_duration  execution duration in hours
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,2024-11-14 21:45:01,0.0711042


The spike sorting output is stored in `ephys.ClusteringTask`, which can be queried with:

In [20]:
# This is the output directory of the spike sorting
ephys.ClusteringTask & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,clustering_output_dir  clustering output directory relative to the clustering root data directory
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,MB5-8_raw/202409071449_202409071454/MB07/kilosort2-5_250


#### Postprocessing

Post-processing extracts spike features and firing rates.



In [21]:
ephys_sorter.PostProcessing.populate(task_key)

{'success_count': 0, 'error_list': []}

In [22]:
ephys_sorter.PostProcessing & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,execution_time  datetime of the start of this step,execution_duration  execution duration in hours
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,2024-11-14 21:49:17,0.0109288


#### Curate Clustering Results

In [23]:
ephys.CuratedClustering.populate(task_key)

{'success_count': 0, 'error_list': []}

In [24]:
ephys.CuratedClustering & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250


In [25]:
ephys.CuratedClustering.Unit & task_key


organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,unit,electrode_config_hash,probe_type  e.g. A1x32-6mm-100-177-H32_21mm,"electrode  electrode index, starts at 0","cluster_quality_label  cluster quality type - e.g. 'good', 'MUA', 'noise', etc.",spike_count  how many spikes in this recording for this unit,"spike_times  (s) spike times of this unit, relative to the start of the EphysRecording",spike_sites  array of electrode associated with each spike,"spike_depths  (um) array of depths associated with each spike, relative to the (0, 0) of the probe"
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,0,699af5e0-31fa-acc9-1aeb-132c6972d25e,A1x32-6mm-100-177-H32_21mm,31,mua,13,=BLOB=,=BLOB=,=BLOB=
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,1,699af5e0-31fa-acc9-1aeb-132c6972d25e,A1x32-6mm-100-177-H32_21mm,31,mua,14,=BLOB=,=BLOB=,=BLOB=
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,2,699af5e0-31fa-acc9-1aeb-132c6972d25e,A1x32-6mm-100-177-H32_21mm,31,mua,134,=BLOB=,=BLOB=,=BLOB=


#### Extract Waveforms

Waveform extraction helps assess unit quality.

In [26]:
ephys.WaveformSet.populate(task_key)

{'success_count': 0, 'error_list': []}

In [27]:
ephys.WaveformSet & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250


#### Compute Quality Metrics

Finally, assess the quality of spike sorting results.



In [28]:
ephys.QualityMetrics.populate(task_key)

{'success_count': 0, 'error_list': []}

In [29]:
ephys.QualityMetrics & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250


In [30]:
ephys.QualityMetrics.Cluster & task_key

organoid_id  e.g. O17,experiment_start_time,insertion_number,start_time,end_time,paramset_idx,unit,firing_rate  (Hz) firing rate for a unit,snr  signal-to-noise ratio for a unit,presence_ratio  fraction of time in which spikes are present,isi_violation  rate of ISI violation as a fraction of overall rate,number_violation  total number of ISI violations,amplitude_cutoff  estimate of miss rate based on amplitude histogram,isolation_distance  distance to nearest cluster in Mahalanobis space,l_ratio,d_prime  Classification accuracy based on LDA,nn_hit_rate  Fraction of neighbors for target cluster that are also in target cluster,nn_miss_rate  Fraction of neighbors outside target cluster that are in target cluster,silhouette_score  Standard metric for cluster overlap,max_drift  Maximum change in spike depth throughout recording,cumulative_drift  Cumulative change in spike depth throughout recording,contamination_rate
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,0,0.0433333,13.1681,0.4,591.716,1,,34389800000000.0,,1.65063,0.25,0.0523649,0.0534661,,,0.0
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,1,0.0466667,15.0743,0.6,0.0,0,,2992050000000000.0,,5.11831,0.607143,0.0,0.21984,,,0.0
MB07,2024-09-07 14:49:00,0,2024-09-07 14:49:00,2024-09-07 14:54:00,250,2,0.446667,6.57527,0.6,250.613,45,,3021.65,0.0416452,2.15045,0.94403,0.37037,0.38304,,,1.0


### **Next Steps**

Now that you have run the full ephys pipeline, you can:
- Visualize clustering results in `EXPLORE` notebooks
- Analyze extracted spikes and firing rates