# DataJoint Workflow Guide for Creating a New Clustering Parameter Set


This notebook guides users through the process of optionally specifying a new parameter set for the `array-ephys` analysis within the broader DataJoint pipeline, tailored to meet specific requirements for ephys imaging processing methods.


**_Note:_**

- The examples in this notebook use a sample dataset. Replace these entries with your actual database entries to access and analyze your data.


### **Key Steps**


**- Setup**

**- Step 1: Create and Insert a New `paramset_idx` (Optional)**


### **Setup**


First, import the necessary packages for the data pipeline and essential schemas.


In [1]:
## Ensure the correct working directory

import os

if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")

In [2]:
import datajoint as dj

In [3]:
from workflow.pipeline import ephys

[2024-07-16 16:07:15,141][INFO]: Connecting milagros@db.datajoint.com:3306
[2024-07-16 16:07:16,732][INFO]: Connected milagros@db.datajoint.com:3306


#### **Step 1: Create and Insert a New `paramset_idx` (Optional)**


The `ephys.ClusteringParamSet` table stores parameters used for SpikeInterface under a `paramset_idx`, which corresponds to the parameter set used for spike sorting.


List of available sorters supported by SpikeInterface:

In [4]:
ephys.ClusteringMethod()

clustering_method,clustering_method_desc
combinato,
hdsort,
herdingspikes,
ironclust,
kilosort,
kilosort2,
kilosort2.5,
kilosort3,
kilosort4,
klusta,


In [5]:
ephys.ClusteringParamSet()

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
0,spykingcircus2,Default parameters for spyking circus2 using SpikeInterface v0.100.1,b6fb9ec2-768c-66b0-2b71-9b8ac91e94da,=BLOB=
1,spykingcircus2,Default parameter set for spyking circus2 using SpikeInterface v0.101.*,434894d0-eb7b-db6c-80e6-638a1322c568,=BLOB=
2,kilosort2,kilosort2 with SpikeInterface version 0.101+,79a731f3-f1b6-c110-5f8a-e25227464de7,=BLOB=


To insert a new parameter set based on a current paramset configuration, you may want to fetch thes default paramset (e.g., `paramset_idx = 1`) as a dictionary, and then add any modifications, a description, and insert it in the `ClusteringParamSet` table with a unique identifier `paramset_idx`.


In [10]:
param_dict = (ephys.ClusteringParamSet & "paramset_idx = 1").fetch1()
param_dict

{'paramset_idx': 1,
 'clustering_method': 'spykingcircus2',
 'paramset_desc': 'Default parameter set for spyking circus2 using SpikeInterface v0.101.*',
 'param_set_hash': UUID('434894d0-eb7b-db6c-80e6-638a1322c568'),
 'params': {'SI_PREPROCESSING_METHOD': 'organoid_preprocessing',
  'SI_SORTING_PARAMS': {'general': {'ms_before': 2,
    'ms_after': 2,
    'radius_um': 100},
   'filtering': {'freq_min': 150},
   'detection': {'peak_sign': 'neg', 'detect_threshold': 4},
   'selection': {'method': 'smart_sampling_amplitudes',
    'n_peaks_per_channel': 5000,
    'min_n_peaks': 20000,
    'select_per_channel': False},
   'clustering': {'legacy': False},
   'matching': {'method': 'circus-omp-svd', 'method_kwargs': {}},
   'apply_preprocessing': True,
   'cache_preprocessing': {'mode': 'memory',
    'memory_limit': 0.5,
    'delete_cache': True},
   'multi_units_only': False,
   'job_kwargs': {'n_jobs': 0.8},
   'debug': False},
  'SI_POSTPROCESSING_PARAMS': {'extensions': {'random_spikes': 

##### Sample parameter dictionary. 

It expects to have `SI_SORTING_PARAMS`, `SI_PREPROCESSING_METHOD`, `SI_QUALITY_METRICS_PARAMS`, `SI_JOB_KWARGS`

- `SI_SORTING_PARAMS`: Run `si.sorters.get_default_sorter_params(sorter_name)` to get the default parameter for a sorter. Modify values if needed. If empty, the sorter will be run with the default parameter.

- `SI_PREPROCESSING_METHOD`: Select a preprocesesing function from `si_preprocessing.py`
- `SI_WAVEFORM_EXTRACTION_PARAMS`: Waveform extraction parameters. If empty, the sorter will use the default parameter.
- `SI_QUALITY_METRICS_PARAMS`: Quality metric parameters. If empty, the sorter will use the default parameter.
- `SI_JOB_KWARGS`: Sorter job parameters. If empty, the sorter will use the default parameter.


In [12]:
# Modify the default paramset with the new param value/s, `paramset_idx`, and `paramset_desc`
param_dict["params"]["SI_POSTPROCESSING_PARAMS"]["extensions"]["template_metrics"][
    "include_multi_channel_metrics"
] = False
param_dict["paramset_idx"] = 101
param_dict["paramset_desc"] = (
    "Spyking circus2 using SpikeInterface v0.101.* and `include_multi_channel_metrics=False`"
)

In [13]:
# Confirm the new paramset before inserting
param_dict

{'paramset_idx': 101,
 'clustering_method': 'spykingcircus2',
 'paramset_desc': 'Spyking circus2 using SpikeInterface v0.101.* and `include_multi_channel_metrics=False`',
 'param_set_hash': UUID('434894d0-eb7b-db6c-80e6-638a1322c568'),
 'params': {'SI_PREPROCESSING_METHOD': 'organoid_preprocessing',
  'SI_SORTING_PARAMS': {'general': {'ms_before': 2,
    'ms_after': 2,
    'radius_um': 100},
   'filtering': {'freq_min': 150},
   'detection': {'peak_sign': 'neg', 'detect_threshold': 4},
   'selection': {'method': 'smart_sampling_amplitudes',
    'n_peaks_per_channel': 5000,
    'min_n_peaks': 20000,
    'select_per_channel': False},
   'clustering': {'legacy': False},
   'matching': {'method': 'circus-omp-svd', 'method_kwargs': {}},
   'apply_preprocessing': True,
   'cache_preprocessing': {'mode': 'memory',
    'memory_limit': 0.5,
    'delete_cache': True},
   'multi_units_only': False,
   'job_kwargs': {'n_jobs': 0.8},
   'debug': False},
  'SI_POSTPROCESSING_PARAMS': {'extensions': 

In [14]:
# Insert it to the `ephys.ClusteringParamSet` table
ephys.ClusteringParamSet.insert_new_params(
    clustering_method=param_dict["clustering_method"],
    paramset_idx=param_dict["paramset_idx"],
    paramset_desc=param_dict["paramset_desc"],
    params=param_dict["params"],
)

In [15]:
# Check if the new paramset has been inserted
ephys.ClusteringParamSet & "paramset_idx = 101"

paramset_idx,clustering_method,paramset_desc,param_set_hash,params  dictionary of all applicable parameters
101,spykingcircus2,Spyking circus2 using SpikeInterface v0.101.* and `include_multi_channel_metrics=False`,fd4eb67f-5784-a6ae-6cd8-25a429cad653,=BLOB=


Another option is to use the default parameter set from `SpykingCircus2` and modify the dictionary as desired:


In [16]:
import spikeinterface as si

params_spykingcircus2 = si.sorters.get_default_sorter_params(
    "spykingcircus2"
)  # api for getting default sorting parameters

params_spykingcircus2
params_spykingcircus2["detect_threshold"] = 10

In [17]:
params = {}
params["SI_PREPROCESSING_METHOD"] = "CatGT"
params["SI_SORTING_PARAMS"] = params_spykingcircus2
params["SI_POSTPROCESSING_PARAMS"] = {
    "extensions": {
        "random_spikes": {},
        "waveforms": {},
        "templates": {},
        "noise_levels": {},
        # "amplitude_scalings": {},
        "correlograms": {},
        "isi_histograms": {},
        "principal_components": {"n_components": 5, "mode": "by_channel_local"},
        "spike_amplitudes": {},
        "spike_locations": {},
        "template_metrics": {"include_multi_channel_metrics": True},
        "template_similarity": {},
        "unit_locations": {},
        "quality_metrics": {},
    },
    "job_kwargs": {"n_jobs": 0.8, "chunk_duration": "2s"},
    "export_to_phy": True,
    "export_report": True,
}

In [19]:
# ephys.ClusteringParamSet.insert_new_params(
#     clustering_method='spykingcircus2',
#     paramset_idx=102,
#     paramset_desc="spykingcircus2 with SpikeInterface version 0.101+ and `detect_threshold=10`",
#     params=params,
# )