# Spike sorting

Ready to move on? Let's try to figure out what we have in terms of action potentials. Extracting action potentials from multielectrode voltage traces, and assigning them to specific neurons, has long been more art than science. One reason is that ground truth is generally not available, so quantifying whether one algorithm performs better than another has been difficult. Also, traditionally, different spike sorters have required input data in slightly different formats, and presented their results in different formats. Format conversion is not hard, but unpleasant enough that unbiased and quantitative comparison has been rare. And spike sorting is a slow process, so running lots of spike sorters on your data requires a real commitment of time.

Faster computers and the publication of a generalized interface to spike sorting have improved the situation recently. In this exercise, you will feed a section of our data into one modern spike sorter, then compare your results with other students who used different sorters.

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cheninstitutecaltech/Caltech_DATASAI_Neuroscience_23/blob/main/07_13_23_day4_adapting_preprocessing_data/code/diy_notebooks/colab/DataSAI_Wagenaar_Sorting.ipynb)

## Installing the sorter and the generalized interface

The wrapper software is called "spikeinterface" (https://elifesciences.org/articles/61834) and is just one pip away:

In [None]:
!pip install spikeinterface

It comes pre-configured with just a few spike sorters:

In [None]:
import spikeinterface.sorters as ss
ss.installed_sorters()

but we can easily install several more:

In [None]:
!pip install herdingspikes
!pip install mountainsort5

The standard invocation for importing spikeinterface is a little elaborate:

In [None]:
import spikeinterface as si
import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
import spikeinterface.sorters as ss
import spikeinterface.postprocessing as spost
import spikeinterface.qualitymetrics as sqm
import spikeinterface.comparison as sc
import spikeinterface.exporters as sexp
import spikeinterface.widgets as sw
from probeinterface import Probe
from probeinterface.plotting import plot_probe

import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path


## Loading data into spikeinterface

Loading raw data into spikeinterface is straightforward, though I had to jump through some hoops to make it load our SALPA-preprocessed data. Check out the "SI Hoops" notebook for details. It also teaches you how to tell spikeinterface about the geometry of the Neuropixels probe.

In [None]:
from google.colab import drive
drive.mount('/content/drive')
!ls /content/drive/MyDrive/datasai-daw

In [None]:
root = "/content/drive/MyDrive/datasai-daw/data/2021-07-20_11-59-01"
src = Path(root) / "Record Node 115"

To load the raw data, you would do:

    rec = se.read_openephys(src, stream_name="Record Node 115#Neuropix-PXI-111.0")

However, we will read the pre-processed data:

In [None]:
rec = si.load_extractor(src / "salpa")

This data set is an hour long, so spike sorting can take many hours. For the purposes of this tutorial, we will work with a subset of the data:

In [None]:
rec_sub = rec.frame_slice(start_frame=0.0*fs_Hz, end_frame=5.0*60*fs_Hz) # grab 5 minutes

That's still 6.5 GB of data, so feel free to experiment with an even shorter snippet. However, too short a snippet will make the sorters produce unreliable output.

Choose one of the installed sorters:

In [None]:
ss.installed_sorters()

and educate yourself on the available parameters for that sorter:

In [None]:
ss.get_default_sorter_params('herdingspikes') # or 'mountainsort5', etc.

It is worth looking at the documentation for the sorter to see what they have to say about the parameters. Especially important are options that allow you to use more than one CPU or GPU core. Also, make sure that your Colab runtime has a GPU and lots of memory.

Next, set a destination folder:

In [None]:
dst = Path("/content/drive/MyDrive")

and run *one* of the following:

In [None]:
sorting_hs = ss.run_sorter("herdingspikes", rec_sub, output_folder=dst / 'res_slp_hs', verbose=True, filter=False)

In [None]:
sorting_ms = ss.run_sorter("mountainsort5", rec_sub, output_folder=dst / 'res_slp_ms', verbose=True, filter=False)

In [None]:
sorting_tri = ss.run_sorter("tridesclous2", rec_sub, output_folder=dst / 'res_slp_tri', verbose=True, filter=False)

In [None]:
sorting_sc2 = ss.run_sorter("spykingcircus2", rec_sub, output_folder=dst / 'res_slp_sc2', verbose=True, filter=False)

You may well run into a few errors. That's OK. Resolving those is part of the exercise. But don't bang your head against any brick walls. Ask for help instead!