## Quality metrics and Curation modules

In this workshop, we will take a look at how to curate the output of a spike-sorting analyses using the `curation` and `qualitymetrics` modules.

The dataset we will be using is a cerebellar cortex recording (cut down to 5 minutes and 26 channels) analyzed with Kilosort 2

In [None]:
from pathlib import Path
import numpy as np

import spikeinterface.core as si
import spikeinterface.curation as scur
import spikeinterface.preprocessing as spre
import spikeinterface.qualitymetrics as sqm
import spikeinterface.widgets as sw

In [None]:
%matplotlib widget

In [None]:
base_folder = Path("../../SpikeInterface Dataset Tutorial/")
curation_dataset = base_folder / "dataset_curation"

In [None]:
recording = si.load_extractor(curation_dataset / "curation_recording")
sorting = si.load_extractor(curation_dataset / "curation_sorting")

print(recording)
print(sorting)

Before analyzing our output, we can perform a fast curation:
- Remove any duplicated spikes (spikes happening less than 0.3ms apart)
- Remove excess spikes (kilosort sometimes outputs spikes hapenning out of the recording bounds)
- Remove redundant units (high fraction of shared spikes)

In [None]:
sorting = scur.remove_duplicated_spikes(sorting, censored_period_ms=0.3, method="keep_first_iterative")
sorting = scur.remove_excess_spikes(sorting, recording)
sorting = scur.remove_redundant_units(sorting, align=False, remove_strategy="max_spikes")
sorting

We still have 52 units (on redundants in this dataset), but probably not all of them are really good!

Let's create a `SortingAnalyzer` to start looking at out data

In [None]:
recording_f = spre.bandpass_filter(recording, freq_min=120, freq_max=8000, filter_order=2, ftype="bessel")

analyzer = si.create_sorting_analyzer(sorting, recording_f, format="memory", sparse=False)
analyzer.compute({
    'noise_levels': {},
    'random_spikes': {'max_spikes_per_unit': 1_000},
    'templates': {'ms_before': 1.5, 'ms_after': 3.5},
    'spike_amplitudes': {},
    'correlograms': {'bin_ms': 0.5}
})

Let's look at the most basic metric: the average firing rate (in Hz) of our units:

In [None]:
sqm.compute_firing_rates(analyzer)

We can see that the firing rate varies a lot, with some units being below 0.5 Hz (probably bad units), and some units above 100 Hz (not uncommon for Purkinje cells simple spikes).

We can compute all of SpikeInterface quality metrics with the following command:

In [None]:
quality_metrics = sqm.compute_quality_metrics(analyzer)
quality_metrics.head()

As we can see, there are a lot of metrics (some containing redundant information).
For demonstration purposes, we will focus on 4 of those metrics (which I use all the time):
- `firing_rate`: The mean firing rate (in Hz). The total number of spikes divided by the duration of the recording. This helps a lot for classifying units (knowing the cell type) and to find aberrant units.
- `SNR`: The Signal-to-Noise Ratio (amplitude of the spike divided by the noise level). A low SNR (< 3) is usually problematic.
- `rp_contamination`: Contamination (i.e. $FP \over TP+FP$) estimation by looking at the refractory period violations. Makes the hypothesis that the contaminant spikes happen at random.
- `sd_ratio`: The ratio between the standard deviation of spike amplitudes and the noise level. Under the assumption that all spikes have the same shape, this ratio should be $1.0$. Several safeguards are included to remove effects of drift, bursting ...

In [None]:
quality_metrics = quality_metrics[["firing_rate", "snr", "rp_contamination", "sd_ratio"]]
quality_metrics

From having looked at the dataset extensively, I know what units are very good:

In [None]:
good_unit_ids = np.array([3, 13, 19, 34, 39, 40, 41], dtype=np.int32)
ok_unit_ids = np.array([11, 18, 22, 51], dtype=np.int32)

In [None]:
quality_metrics.loc[good_unit_ids]

In [None]:
quality_metrics.loc[ok_unit_ids]

Looking at the metrics on the good units, we can create rules to only keep units that are of sufficient quality. For example:
- A `firing_rate` greater than 1.0 Hz
- A `snr` greater than 1.1
- A `rp_contamination` below 20%
- A `sd_ratio` below 1.5

In [None]:
rule = "firing_rate > 1.0 & snr > 1.1  & rp_contamination < 0.2 & sd_ratio < 1.5"
good_metrics = quality_metrics.query(rule)

curated_unit_ids = list(good_metrics.index)
print(curated_unit_ids)

In [None]:
curated_sorting = sorting.select_units(curated_unit_ids)
curated_analyzer = analyzer.select_units(curated_unit_ids)

curated_sorting

We removed half of the units in the Kilosort output! (we started with 52).

The metrics and thresholds used will, of course, depend on the recording type, and need to be tuned.
After tuning, we can have a powerful automated curation, that is not perfect, but removes a lot of the obviously garbage units.

The `curation` module also offers a method to find split units, that you can inspect and decide whether you want to merge them.

In [None]:
pairs = scur.get_potential_auto_merge(curated_analyzer)
pairs

We can see that the merge function found a pair that is potentially a good merge.

We can check it by plotting the correlograms and templates:

In [None]:
for pair in pairs:
    sw.plot_crosscorrelograms(analyzer, unit_ids=pair, min_similarity_for_correlograms=None, backend="matplotlib")

In [None]:
sparsity_for_plot = si.estimate_sparsity(recording_f, sorting)

In [None]:
sw.plot_unit_templates(analyzer, unit_ids=pair, sparsity=sparsity_for_plot, unit_colors={pair[0]: "r", pair[1]: "b"}, backend="ipywidgets")

Indeed, the correlograms and templates seem to match!

We can thus create a script to merge the units together:

In [None]:
curation_sorting = scur.CurationSorting(curated_sorting)
curation_sorting.merge(pairs)

In [None]:
curation_sorting.sorting

## Exercise

In the `auto_merge` function, two important parameters are:
- The `corr_diff_thresh` (0.16 by default). The maximum accepted difference between the correlograms
- The `template_diff_thresh` (0.25 by default). The maximum accepted difference between the templates

Increase these values to see if you can find new pairs that are potential merges, and check them.