# Creating a New Clustering Parameter Set in the Utah Organoids Pipeline


### **Overview**

This notebook guides users through the process of **defining and inserting a new clustering parameter set** for spike sorting within the Utah Organoids pipeline.

A parameter set (`paramset_idx`) stores the settings used for **SpikeInterface sorting algorithms** (e.g., Kilosort, SpykingCircus). Defining a **custom parameter set** allows for **tuned spike sorting** based on experimental needs.

By the end of this notebook, you will:
- Inspect available sorting methods and parameter sets
- Modify default parameters or define a new configuration
- Insert a new ClusteringParamSet entry into the database



**_Note:_**

- This notebook uses example data, replace values with actual database entries.
- You can modify existing parameters or create a new one from scratch.

### **Key Steps**


**- Setup**

**- Step 1: Inspect Available Parameter Sets**

**- Step 2: Define and Insert a New Parameter Set (Optional)**


### **Setup**


First, import the necessary packages for the data pipeline and essential schemas.


In [1]:
## Ensure the correct working directory

import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [2]:
import datajoint as dj

In [3]:
from workflow.pipeline import ephys

[2025-03-03 12:44:48,440][INFO]: Connecting milagros@db.datajoint.com:3306
[2025-03-03 12:44:50,107][INFO]: Connected milagros@db.datajoint.com:3306


#### **Step 1: Inspect Available Parameter Sets**


Before defining a new parameter set, check the existing clustering methods and stored parameter sets.

View available spike sorting methods supported by SpikeInterface:

In [4]:
ephys.ClusteringMethod()

clustering_method,clustering_method_desc
combinato,
hdsort,
herdingspikes,
ironclust,
kilosort,
kilosort2,
kilosort2.5,
kilosort3,
kilosort4,
klusta,


Check existing parameter sets:


In [5]:
ephys.ClusteringParamSet()

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
0,spykingcircus2,Default parameters for spyking circus2 using SpikeInterface v0.100.1,b6fb9ec2-768c-66b0-2b71-9b8ac91e94da,=BLOB=
1,spykingcircus2,Default parameter set for spyking circus2 using SpikeInterface v0.101.*,434894d0-eb7b-db6c-80e6-638a1322c568,=BLOB=
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=
5,spykingcircus2,Spyking circus2 with a detection threshold 5 (neg direction),4c895afd-a1b1-5d64-b747-e8489078e2e3,=BLOB=
11,spykingcircus2,waveform>threshold: .25->2,17d41d84-067d-791c-8706-8cab83020b84,=BLOB=
12,spykingcircus2,waveform>threshold: .25->2 attempt 2,2b28cf23-2456-8202-b70f-96871b837a26,=BLOB=
13,spykingcircus2,waveform>threshold: .25->2 attempt 2,1faf6aee-71d6-fe26-74ec-6bb7cdc0f30f,=BLOB=
14,spykingcircus2,apply_preprocessing = False,ce720015-b59a-08d6-198e-def81c860f46,=BLOB=
15,spykingcircus2,"apply_preprocessing, matched_filtering, and apply_motion_correction = False",5f7a8362-c31c-061e-14b2-74ad55466546,=BLOB=
16,spykingcircus2,"default parameters, different format",0a3d0360-c0de-6c30-9c35-7c931a9a6f62,=BLOB=


Filter for specific sorting methods (e.g., Kilosort):

In [6]:
ephys.ClusteringParamSet & 'clustering_method LIKE "%kilosort%"'

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=
23,kilosort3,default kilosort3 with no drift correction,35b4975e-704f-b1a8-1648-5b89086a711c,=BLOB=
24,kilosort2,default kilosort2,512b734c-0b8d-8833-1e2b-b3967e8652de,=BLOB=
25,pykilosort,default pykilosort,d492d2c0-f5f8-5523-f4c6-f6e821fb77ae,=BLOB=
27,kilosort3,kilosort3 skipping kilosort preprocessing,d74142fc-5a80-d7f4-dff2-607342b0c5c6,=BLOB=
250,kilosort2.5,kilosort2.5 params,00eb514d-f8c8-543b-572a-2ec7fec0acf3,=BLOB=
400,kilosort4,Kilosort4 default params with SpikeInterface version 0.101+,85dd2fa2-9e7c-7984-9d3c-dc24264c432a,=BLOB=
401,kilosort4,Kilosort4 default params with SpikeInterface version 0.101+ without drift correction,75a2f1d3-b077-78bb-3a68-b8456644a9f1,=BLOB=


### **Step 2: Define and Insert a New Parameter Set (Optional)**

A new parameter set can be defined based on a modification of an existing set or a custom configuration from default sorter parameters.

#### Option 1: Modify an Existing Parameter Set

To modify an existing parameter set, fetch an existing configuration (e.g., `paramset_idx = 400`), update its values, and insert it as a new entry.

In [7]:
# Fetch an existing parameter set
param_dict = (ephys.ClusteringParamSet & "paramset_idx = 400").fetch1()

# Modify specific parameters
param_dict["params"]["SI_SORTING_PARAMS"]["do_correction"] = False
param_dict["paramset_idx"] = 401
param_dict["paramset_desc"] = (
    "Kilosort4 default params with SpikeInterface version 0.101+ without drift correction"
)

# Remove hash before inserting as a new entry
param_dict.pop("param_set_hash")

# Confirm changes
param_dict

{'paramset_idx': 401,
 'clustering_method': 'kilosort4',
 'paramset_desc': 'Kilosort4 default params with SpikeInterface version 0.101+ without drift correction',
 'params': {'SI_PREPROCESSING_METHOD': 'organoid_preprocessing',
  'SI_SORTING_PARAMS': {'scaleproc': 200,
   'n_pcs': 3,
   'do_CAR': False,
   'skip_kilosort_preprocessing': True,
   'keep_good_only': True,
   'do_correction': False},
  'SI_POSTPROCESSING_PARAMS': {'extensions': {'random_spikes': {},
    'waveforms': {},
    'templates': {},
    'noise_levels': {},
    'correlograms': {},
    'isi_histograms': {},
    'principal_components': {'n_components': 5, 'mode': 'by_channel_local'},
    'spike_amplitudes': {},
    'spike_locations': {},
    'template_metrics': {'include_multi_channel_metrics': True},
    'template_similarity': {},
    'unit_locations': {},
    'quality_metrics': {}},
   'job_kwargs': {'n_jobs': 10, 'chunk_duration': '2s'},
   'export_to_phy': True,
   'export_report': True}}}

##### Sample parameter dictionary. 

It expects to have `SI_SORTING_PARAMS`, `SI_PREPROCESSING_METHOD`, `SI_QUALITY_METRICS_PARAMS`, `SI_JOB_KWARGS`

- `SI_SORTING_PARAMS`: Run `si.sorters.get_default_sorter_params(sorter_name)` to get the default parameter for a sorter. Modify values if needed. If empty, the sorter will be run with the default parameter.

- `SI_PREPROCESSING_METHOD`: Select a preprocesesing function from `si_preprocessing.py`
- `SI_WAVEFORM_EXTRACTION_PARAMS`: Waveform extraction parameters. If empty, the sorter will use the default parameter.
- `SI_QUALITY_METRICS_PARAMS`: Quality metric parameters. If empty, the sorter will use the default parameter.
- `SI_JOB_KWARGS`: Sorter job parameters. If empty, the sorter will use the default parameter.


Insert modified parameter set into the database:

In [8]:
# Insert it to the `ephys.ClusteringParamSet` table
ephys.ClusteringParamSet.insert_new_params(
    clustering_method=param_dict["clustering_method"],
    paramset_idx=param_dict["paramset_idx"],
    paramset_desc=param_dict["paramset_desc"],
    params=param_dict["params"],
)

Verify insertion:

In [9]:
# Check if the new paramset has been inserted
ephys.ClusteringParamSet & "paramset_idx = 401"

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
401,kilosort4,Kilosort4 default params with SpikeInterface version 0.101+ without drift correction,75a2f1d3-b077-78bb-3a68-b8456644a9f1,=BLOB=


#### Option 2: Create a New Parameter Set from Default Values

Alternatively, you can create a new parameter set from default sorter settings.

Get default sorting parameters for `SpykingCircus2`:

In [11]:
import spikeinterface as si

params_spykingcircus2 = si.sorters.get_default_sorter_params("spykingcircus2")
params_spykingcircus2


{'general': {'ms_before': 2, 'ms_after': 2, 'radius_um': 100},
 'sparsity': {'method': 'snr',
  'amplitude_mode': 'peak_to_peak',
  'threshold': 0.25},
 'filtering': {'freq_min': 150,
  'freq_max': 7000,
  'ftype': 'bessel',
  'filter_order': 2},
 'detection': {'peak_sign': 'neg', 'detect_threshold': 4},
 'selection': {'method': 'uniform',
  'n_peaks_per_channel': 5000,
  'min_n_peaks': 100000,
  'select_per_channel': False,
  'seed': 42},
 'apply_motion_correction': True,
 'motion_correction': {'preset': 'nonrigid_fast_and_accurate'},
 'merging': {'similarity_kwargs': {'method': 'cosine',
   'support': 'union',
   'max_lag_ms': 0.2},
  'correlograms_kwargs': {},
  'auto_merge': {'min_spikes': 10, 'corr_diff_thresh': 0.25}},
 'clustering': {'legacy': True},
 'matching': {'method': 'wobble'},
 'apply_preprocessing': True,
 'matched_filtering': True,
 'cache_preprocessing': {'mode': 'memory',
  'memory_limit': 0.5,
  'delete_cache': True},
 'multi_units_only': False,
 'job_kwargs': {'n_j

Modify specific parameters:

In [None]:
params_spykingcircus2["detect_threshold"] = 10  # Adjust detection threshold


In [12]:
params = {}
params["SI_PREPROCESSING_METHOD"] = "CatGT"
params["SI_SORTING_PARAMS"] = params_spykingcircus2
params["SI_POSTPROCESSING_PARAMS"] = {
    "extensions": {
        "random_spikes": {},
        "waveforms": {},
        "templates": {},
        "noise_levels": {},
        # "amplitude_scalings": {},
        "correlograms": {},
        "isi_histograms": {},
        "principal_components": {"n_components": 5, "mode": "by_channel_local"},
        "spike_amplitudes": {},
        "spike_locations": {},
        "template_metrics": {"include_multi_channel_metrics": True},
        "template_similarity": {},
        "unit_locations": {},
        "quality_metrics": {},
    },
    "job_kwargs": {"n_jobs": 0.8, "chunk_duration": "2s"},
    "export_to_phy": True,
    "export_report": True,
}

Insert the new parameter set into the database:

In [13]:
ephys.ClusteringParamSet.insert_new_params(
    clustering_method="spykingcircus2",
    paramset_idx=102,
    paramset_desc="spykingcircus2 with SpikeInterface version 0.101+ and `detect_threshold=10`",
    params=params,
)

Verify the new parameter set:

In [14]:
ephys.ClusteringParamSet & "paramset_idx = 102"


paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
102,spykingcircus2,spykingcircus2 with SpikeInterface version 0.101+ and `detect_threshold=10`,8a9e5403-fd6e-fa93-1486-212f57613b64,=BLOB=


### **Next Steps**

Now that the parameter set is created, you can:
- Assign it to a clustering task (`CREATE_new_clustering_task.ipynb`)
- Run spike sorting with the new parameter set (`RUN` notebook)
