# Sorting Notebook

This notebook will download and sort electrophysiology collected using an Intan headstage, in the .rhd format. 

The data is intracranial mouse recording, from a 16 channel microarray. The paper can be found here: https://doi.org/10.1371/journal.pone.0221510


# Getting Set Up

Open a terminal. Make sure "Sorter" environment is active. 

```
conda deactivate
conda activate sorter
```


In [3]:
# Imports
import os
from pathlib import Path

from sorting_scripts import get_file
from spikeinterface.sorters import run_sorter
import spikeinterface.full as si
import probeinterface as pi
from pathlib import Path
import spikeinterface.sorters as ss
import spikeinterface.full as si
import probeinterface as pi
import json
import spikeinterface as si
from spikeinterface.curation import apply_curation
from pathlib import Path


In [4]:
# Set Patient and Session
patient = "raw_intan"
session = "Session1"

In [5]:
# Set base paths
codespace = Path.home() / "codespace"
base_folder = codespace / "data"
session_location =  base_folder / patient / session
sorted_data = session_location / "sorted"
sorter_output_folder = sorted_data / "sorter_folder" 

analyzer_folder = sorted_data / "analyzer_folder"

os.chdir(session_location)

intan_file = get_file.get_rhd_file(session_location)

1
Found Intan file: /home/marco/codespace/data/raw_intan/Session1/raw/Intan RHD file1.rhd


# Load recording into spike interface

In [7]:
# Load Recording, creates recording object in memory
import spikeinterface.full as si
rec = si.read_intan(intan_file, stream_id = "0")
rec

In [8]:
# Attach probe to recording object

# from probeinterface.plotting import plot_probe, plot_probegroup

probe_path = codespace / "sorting_script/Custom_Probes/neuronexus-A16x1_2mm_50_177_A16.json"

# Load from JSON
probegroup = pi.read_probeinterface(probe_path)

# Extract the single Probe for SpikeInterface
probe = probegroup.probes[0]

# Attach to recording
rec = rec.set_probe(probe)

n_rec = rec.get_num_channels()
n_probe = probe.get_contact_count()

if n_probe != n_rec:
    raise ValueError(f"Probe contacts ({n_probe}) != recording channels ({n_rec}). "
                     f"Pick the correct probe variant or subset/remap accordingly.")


# Load sorter and analyzer if they exist

In [None]:
# Load sorting object from sorting directory
sorting_KS4 = ss.read_sorter_folder(sorter_output_folder)

In [13]:
# Load analyzer object from analyzer directory
sorting_analyzer = si.load_sorting_analyzer(analyzer_folder)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


# If they do not exist, run the sorter and create analyzer

In [9]:
# Run Kilosort, in order to create sorting object as well as sorting folder

sorting_KS4 = run_sorter(
    sorter_name="kilosort4",
    recording=rec,
    folder=sorter_output_folder,
    remove_existing_folder = True,
    verbose = True
)

write_binary_recording (no parallelization):   0%|          | 0/1201 [00:00<?, ?it/s]

kilosort.run_kilosort:  
kilosort.run_kilosort: Computing preprocessing variables.
kilosort.run_kilosort: ----------------------------------------
kilosort.run_kilosort: N samples: 24000480
kilosort.run_kilosort: N seconds: 1200.024
kilosort.run_kilosort: N batches: 401
kilosort.run_kilosort: Preprocessing filters computed in 0.97s; total 0.97s
kilosort.run_kilosort:  
kilosort.run_kilosort: Resource usage after preprocessing
kilosort.run_kilosort: ********************************************************
kilosort.run_kilosort: CPU usage:     6.70 %
kilosort.run_kilosort: Mem used:      4.70 %     |       2.93 GB
kilosort.run_kilosort: Mem avail:    59.17 / 62.10 GB
kilosort.run_kilosort: ------------------------------------------------------
kilosort.run_kilosort: GPU usage:    `conda install pynvml` for GPU usage
kilosort.run_kilosort: GPU memory:    1.86 %     |      0.27   /    14.58 GB
kilosort.run_kilosort: Allocated:     0.06 %     |      0.01   /    14.58 GB
kilosort.run_kilosor

kilosort4 run time 44.52s


In [10]:
# Create Sorting Analyzer
import spikeinterface.full as si

# Load Recording
recording = si.read_intan(intan_file, stream_id = "0")
recording = recording.set_probe(probe, in_place=False)
recording = si.unsigned_to_signed(recording)
recording_filtered = si.bandpass_filter(recording)

job_kwargs = dict(n_jobs=-1, progress_bar=True, chunk_duration="1s")

sorting_analyzer = si.create_sorting_analyzer(sorting=sorting_KS4, recording=recording_filtered, folder=analyzer_folder, overwrite = True,
format="binary_folder", **job_kwargs)

sorting_analyzer.compute("random_spikes", method="uniform", max_spikes_per_unit=500)
sorting_analyzer.compute("waveforms", **job_kwargs)
sorting_analyzer.compute("templates", **job_kwargs)
sorting_analyzer.compute("noise_levels")
sorting_analyzer.compute("unit_locations", method = "monopolar_triangulation")
sorting_analyzer.compute("isi_histograms")
sorting_analyzer.compute("correlograms", window_ms=100, bin_ms=5)
sorting_analyzer.compute("principal_components", n_components=3, mode="by_channel_global", whiten=True, **job_kwargs)
sorting_analyzer.compute("quality_metrics", metric_names=["snr", "firing_rate"])
sorting_analyzer.compute("template_similarity")
sorting_analyzer.compute("spike_amplitudes", **job_kwargs)

estimate_sparsity (workers: 16 processes):   0%|          | 0/1201 [00:00<?, ?it/s]



compute_waveforms (workers: 16 processes):   0%|          | 0/1201 [00:00<?, ?it/s]

noise_level (no parallelization):   0%|          | 0/20 [00:00<?, ?it/s]

Fitting PCA:   0%|          | 0/28 [00:00<?, ?it/s]

Projecting waveforms:   0%|          | 0/28 [00:00<?, ?it/s]

spike_amplitudes (workers: 16 processes):   0%|          | 0/1201 [00:00<?, ?it/s]

<spikeinterface.postprocessing.spike_amplitudes.ComputeSpikeAmplitudes at 0x731e0ff93790>

# Now that Sorter and Analyzer each exist, run the curation gui from the terminal
## Make sure to replace with the correct path


```

sigui --mode=web --curation "/home/marco/codespace/data/raw_intan/Session1/sorted/analyzer_folder"

```

In [12]:
# Apply curation and save new analyzer to disk
import json
from spikeinterface.curation import apply_curation

curation_filepath = f"{analyzer_folder}/spikeinterface_gui/curation_data.json"

with open(curation_filepath, "r") as f:
    curation_dict = json.load(f)

out = Path(f"{sorted_data}/clean_analyzer-12-17")

print("Wrote:", out.with_suffix(".zarr") if out.suffix != ".zarr" else out)

clean_analyzer = apply_curation(sorting_analyzer, curation_dict_or_model=curation_dict)
clean_analyzer = clean_analyzer.save_as(format="zarr", folder=out)

Wrote: /home/marco/codespace/data/raw_intan/Session1/sorted/clean_analyzer-12-17.zarr


ValueError: Folder already exists /home/marco/codespace/data/raw_intan/Session1/sorted/clean_analyzer-12-17.zarr

In [13]:
# access the sorting object wrapped by the analyzer, which contains newly updated data
from spikeinterface.core import load_sorting_analyzer
analyzer_obj = load_sorting_analyzer("/home/marco/codespace/data/raw_intan/Session1/sorted/clean_analyzer-12-17.zarr")

sorting_obj = analyzer_obj.sorting

In [14]:
# Check out the contents

unit_ids = sorting_obj.unit_ids
sampling_frequency = sorting_obj.sampling_frequency
print(f"Unit IDs: {unit_ids}")
print(f"Sampling Frequency: {sampling_frequency} Hz")

Unit IDs: [ 0  2  3 27]
Sampling Frequency: 20000.0 Hz


In [15]:
# Check out the spike times for some unit

unit_to_get = unit_ids[1] # Get the first unit
spike_train_indices = sorting_obj.get_unit_spike_train(unit_id=unit_to_get, segment_index=0) # segment_index=0 for single-segment data

spike_times = sorting_obj.get_unit_spike_train(unit_id=unit_to_get)

sampling_frequency = sorting_obj.get_sampling_frequency()

spike_times_sec = spike_times / sampling_frequency

print(spike_times_sec[:100])



[ 1.325    2.13905  2.49245  3.23335  5.857    6.2363   6.5755  10.94465
 10.9523  10.96305 12.8091  12.8155  13.638   16.91485 16.9198  16.92625
 16.93545 25.22505 25.39205 25.75585 26.06215 26.2456  26.4304  26.43485
 26.44085 26.4478  26.81735 27.0411  27.33275 31.0245  31.2123  31.21775
 31.2304  31.2399  34.58165 35.16525 35.1706  35.85885 35.8665  35.8739
 35.9816  35.98715 35.9972  36.492   43.9117  44.01015 44.07445 44.12115
 44.267   44.3721  49.70565 49.9829  52.3549  52.49505 53.73875 55.34415
 56.27095 57.4014  57.828   58.03445 58.7424  65.6309  66.374   67.8293
 68.1064  68.22225 68.3775  68.5525  68.6454  68.80635 69.10745 69.1138
 69.1243  74.2405  74.3888  75.11585 81.83085 82.23575 82.9987  84.57395
 84.687   84.81035 89.3278  90.34405 90.54505 91.0445  91.27265 91.5672
 91.8034  97.2213  98.70675 98.76985 98.80545 98.87365 98.9727  99.06875
 99.0775  99.0817  99.1407  99.18715]


In [16]:
!box folders:items 352606395707

[2m----- Folder 352606396623 -----[22m
[36mType:[39m folder
[36mID:[39m '352606396623'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m Intan_RDH_2000

[2m----- Folder 354522525287 -----[22m
[36mType:[39m folder
[36mID:[39m '354522525287'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m Intan_RDH_2000 (1)


In [28]:
!box folders:items 352606396623

[2m----- Folder 352605477299 -----[22m
[36mType:[39m folder
[36mID:[39m '352605477299'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m Session1

[2m----- Folder 352604968054 -----[22m
[36mType:[39m folder
[36mID:[39m '352604968054'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m Session2


In [29]:
!box folders:items 352605477299

[2m----- Folder 352607353389 -----[22m
[36mType:[39m folder
[36mID:[39m '352607353389'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m raw

[2m----- Folder 354623627238 -----[22m
[36mType:[39m folder
[36mID:[39m '354623627238'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m sorted


In [30]:
!box folders:upload 354623627238

[91mCould not read directory 354623627238[39m
[91m[39m

In [58]:
!box folders:upload "/home/marco/codespace/data/Intan_RDH_2000/Session1/sorted/cleaned_analyzer.zarr" -p 354623627238

[36mType:[39m folder
[36mID:[39m '355655922049'
[36mSequence ID:[39m '0'
[36mETag:[39m '0'
[36mName:[39m cleaned_analyzer.zarr
[36mCreated At:[39m '2025-12-12T11:40:22-08:00'
[36mModified At:[39m '2025-12-12T11:40:22-08:00'
[36mDescription:[39m ''
[36mSize:[39m 0
[36mPath Collection:[39m
[36m    Total Count:[39m 5
[36m    Entries:[39m
[36m        -[39m
[36m            Type:[39m folder
[36m            ID:[39m '0'
[36m            Sequence ID:[39m null
[36m            ETag:[39m null
[36m            Name:[39m All Files
[36m        -[39m
[36m            Type:[39m folder
[36m            ID:[39m '352606395707'
[36m            Sequence ID:[39m '0'
[36m            ETag:[39m '0'
[36m            Name:[39m Cloud_Sorter
[36m        -[39m
[36m            Type:[39m folder
[36m            ID:[39m '352606396623'
[36m            Sequence ID:[39m '0'
[36m            ETag:[39m '0'
[36m            Name:[39m Intan_RDH_2000
[36m        -[39m
[36m