diff --git a/doc/modules/qualitymetrics/example_cutoff.png b/doc/modules/qualitymetrics/example_cutoff.png new file mode 100644 index 0000000000..91743d80c2 Binary files /dev/null and b/doc/modules/qualitymetrics/example_cutoff.png differ diff --git a/doc/modules/qualitymetrics/noise_cutoff.rst b/doc/modules/qualitymetrics/noise_cutoff.rst index 429c0b94b3..af15571cf2 100644 --- a/doc/modules/qualitymetrics/noise_cutoff.rst +++ b/doc/modules/qualitymetrics/noise_cutoff.rst @@ -1,30 +1,128 @@ -Noise cutoff (not currently implemented) -======================================== +Noise cutoff (:code:`noise_cutoff`) +=================================== Calculation ----------- -Metric describing whether an amplitude distribution is cut off, similar to _amp_cutoff :ref:`amplitude cutoff ` but without a Gaussian assumption. -A histogram of amplitudes is created and quantifies the distance between the low tail, mean number of spikes and high tail in terms of standard deviations. +Metric describing whether an amplitude distribution is cut off as it approaches zero, similar to :ref:`amplitude cutoff ` but without a Gaussian assumption. -A SpikeInterface implementation is not yet available. +The **noise cutoff** metric assesses whether a unit’s spike‐amplitude distribution is truncated +at the low-end, which may be due to the high amplitude detection threshold in the deconvolution step, +i.e., if low‐amplitude spikes were missed. It does not assume a Gaussian shape; +instead, it directly compares counts in the low‐amplitude bins to counts in high‐amplitude bins. + +1. **Build a histogram** + + For each unit, divide all amplitudes into ``n_bins`` equally spaced bins over the range of the amplitude. + If the number of spikes is large, you may consider using a larger ``n_bins``. For a small number of spikes, consider a smaller ``n_bins``. + Let :math:`n_i` denote the count in the :math:`i`-th bin. + +2. **Identify the “low” region** + - Compute the amplitude value at the specified ``low_quantile`` (for example, 0.10 = 10th percentile), denoted as :math:`\text{amp}_{low}`. + - Find all histogram bins whose upper edge is below that quantile value. These bins form the "low‐quantile region". + - Compute + + .. math:: + L_{\mathrm{bins}} = \bigl\{i : \text{upper_edge}_i \le \text{amp}_{low}\bigr\}, \quad + \mu_{\mathrm{low}} = \frac{1}{|L_{\mathrm{bins}}|}\sum_{i\in L_{\mathrm{bins}}} n_i. + +3. **Identify the “high” region** + + - Compute the amplitude value at the specified ``high_quantile`` (for example, 0.25 = top 25th percentile), denoted as :math:`\text{amp}_{high}`. + - Find all histogram bins whose lower edge is greater than that quantile value. These bins form the "high‐quantile region". + - Compute + + .. math:: + H_{\mathrm{bins}} &= \bigl\{i : \text{lower_edge}_i \ge \text{amp}_{high}\bigr\}, \\ + \mu_{\mathrm{high}} &= \frac{1}{|H_{\mathrm{bins}}|}\sum_{i\in H_{\mathrm{bins}}} n_i, \quad + \sigma_{\mathrm{high}} = \sqrt{\frac{1}{|H_{\mathrm{bins}}|-1}\sum_{i\in H_{\mathrm{bins}}}\bigl(n_i-\mu_{\mathrm{high}} \bigr)^2}. + +4. **Compute cutoff** + + The *cutoff* is given by how many standard deviations away the low-amplitude bins are from the high-amplitude bins, defined as + + .. math:: + \mathrm{cutoff} = \frac{\mu_{\mathrm{low}} - \mu_{\mathrm{high}}}{\sigma_{\mathrm{high}}}. + + + - If no low‐quantile bins exist, a warning is issued and ``cutoff = NaN``. + - If no high‐quantile bins exist or :math:`\sigma_{\mathrm{high}} = 0`, a warning is issued and ``cutoff = NaN``. + +5. **Compute the low-to-peak ratio** + + - Let :math:`M = \max_i\,n_i` be the height of the largest bin in the histogram. + - Define + + .. math:: + \mathrm{ratio} = \frac{\mu_{\mathrm{low}}}{M}. + + + - If there are no low bins, :math:`\mathrm{ratio} = NaN`. + + +Together, ``(cutoff, ratio)`` quantify how suppressed the low‐end of the amplitude distribution is relative to the top quantile and to the peak. Expectation and use ------------------- Noise cutoff attempts to describe whether an amplitude distribution is cut off. -The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL]_) to avoid making the Gaussianity assumption on spike distributions. -Noise cutoff provides an estimate of false negative rate, so a lower value indicates fewer missed spikes (a more complete unit). +Larger values of ``cutoff`` and ``ratio`` suggest that the distribution is cut-off. +IBL uses the default value of 1 (equivalent to e.g. ``low_quantile=0.01, n_bins=100``) to choose the number of +lower bins, with a suggested threshold of 5 for ``cutoff`` to determine whether a unit is cut off or not. +In practice, the IBL threshold is quite conservative, and a lower threshold might work better for your data. +We suggest plotting the data using the :py:func:`~spikeinterface.widgets.plot_amplitudes` widget to view your data when choosing your threshold. +It is suggested to use this metric when the amplitude histogram is **unimodal**. + +The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL2024]_) to avoid making the Gaussian assumption on spike distributions. + +Example code +------------ + +.. code-block:: python + + import numpy as np + import matplotlib.pyplot as plt + from spikeinterface.full as si + + # Suppose `sorting_analyzer` has been computed with spike amplitudes: + # Select units you are interested in visualizing + unit_ids = ... + + # Compute noise_cutoff: + summary_dict = compute_noise_cutoff( + sorting_analyzer=sorting_analyzer + high_quantile=0.25, + low_quantile=0.10, + n_bins=100, + unit_ids=unit_ids + ) + +Reference +--------- + +.. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_noise_cutoff + +Examples with plots +------------------- + +Here is shown the histogram of two units, with the vertical lines separating low- and high-amplitude regions. + +- On the left, we have a unit with no truncation at the left end, and the cutoff and ratio are small. +- On the right, we have a unit with truncation at -1, and the cutoff and ratio are much larger. +.. image:: example_cutoff.png + :width: 600 Links to original implementations --------------------------------- * From `IBL implementation `_ +Note: Compared to the original implementation, we have added a comparison between the low-amplitude bins to the largest bin (``noise_ratio``). +The selection of low-amplitude bins is based on the ``low_quantile`` rather than the number of bins. Literature ---------- -Metric introduced by [IBL]_ (adapted from [Hill]_'s amplitude cutoff metric). +Metric introduced by [IBL2024]_. diff --git a/doc/references.rst b/doc/references.rst index b583df82b0..10048a9d91 100644 --- a/doc/references.rst +++ b/doc/references.rst @@ -123,6 +123,8 @@ References .. [IBL] `Spike sorting pipeline for the International Brain Laboratory. 2022. `_ +.. [IBL2024] `Spike sorting pipeline for the International Brain Laboratory - Version 2. 2024. `_ + .. [Jackson] `Quantitative assessment of extracellular multichannel recording quality using measures of cluster separation. Society of Neuroscience Abstract. 2005. `_ .. [Jain] `UnitRefine: A Community Toolbox for Automated Spike Sorting Curation. 2025 `_ diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py index a2a04a656a..e2102d3ff2 100644 --- a/src/spikeinterface/qualitymetrics/misc_metrics.py +++ b/src/spikeinterface/qualitymetrics/misc_metrics.py @@ -37,6 +37,167 @@ _default_params = dict() +def compute_noise_cutoffs(sorting_analyzer, high_quantile=0.25, low_quantile=0.1, n_bins=100, unit_ids=None): + """ + A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution. + + Based on the histogram of the (transformed) amplitude: + + 1. This method compares counts in the lower-amplitude bins to counts in the top 'high_quantile' of the amplitude range. + It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away + from that mean the lower-quantile bins lie. + + 2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio. + + Parameters + ---------- + sorting_analyzer : SortingAnalyzer + A SortingAnalyzer object. + high_quantile : float, default: 0.25 + Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region. + low_quantile : int, default: 0.1 + Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region. + n_bins: int, default: 100 + The number of bins to use to compute the amplitude histogram. + unit_ids : list or None + List of unit ids to compute the amplitude cutoffs. If None, all units are used. + + Returns + ------- + noise_cutoff_dict : dict of floats + Estimated metrics based on the amplitude distribution, for each unit ID. + + References + ---------- + Inspired by metric described in [IBL2024]_ + + """ + res = namedtuple("cutoff_metrics", ["noise_cutoff", "noise_ratio"]) + if unit_ids is None: + unit_ids = sorting_analyzer.unit_ids + + noise_cutoff_dict = {} + noise_ratio_dict = {} + if not sorting_analyzer.has_extension("spike_amplitudes"): + warnings.warn( + "`compute_noise_cutoffs` requires the 'spike_amplitudes` extension. Please run sorting_analyzer.compute('spike_amplitudes') to be able to compute `noise_cutoff`" + ) + for unit_id in unit_ids: + noise_cutoff_dict[unit_id] = np.nan + noise_ratio_dict[unit_id] = np.nan + return res(noise_cutoff_dict, noise_ratio_dict) + + amplitude_extension = sorting_analyzer.get_extension("spike_amplitudes") + peak_sign = amplitude_extension.params["peak_sign"] + if peak_sign == "both": + raise TypeError( + '`peak_sign` should either be "pos" or "neg". You can set `peak_sign` as an argument when you compute spike_amplitudes.' + ) + + amplitudes_by_units = _get_amplitudes_by_units(sorting_analyzer, unit_ids, peak_sign) + + for unit_id in unit_ids: + amplitudes = amplitudes_by_units[unit_id] + + # We assume the noise (zero values) is on the lower tail of the amplitude distribution. + # But if peak_sign == 'neg', the noise will be on the higher tail, so we flip the distribution. + if peak_sign == "neg": + amplitudes = -amplitudes + + cutoff, ratio = _noise_cutoff(amplitudes, high_quantile=high_quantile, low_quantile=low_quantile, n_bins=n_bins) + noise_cutoff_dict[unit_id] = cutoff + noise_ratio_dict[unit_id] = ratio + + return res(noise_cutoff_dict, noise_ratio_dict) + + +_default_params["noise_cutoff"] = dict(high_quantile=0.25, low_quantile=0.1, n_bins=100) + + +def _noise_cutoff(amps, high_quantile=0.25, low_quantile=0.1, n_bins=100): + """ + A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution. + + Based on the histogram of the (transformed) amplitude: + + 1. This method compares counts in the lower-amplitude bins to counts in the higher_amplitude bins. + It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away + from that mean the lower-quantile bins lie. + + 2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio. + + Parameters + ---------- + amps : array-like + Spike amplitudes. + high_quantile : float, default: 0.25 + Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region. + low_quantile : int, default: 0.1 + Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region. + n_bins: int, default: 100 + The number of bins to use to compute the amplitude histogram. + + Returns + ------- + cutoff : float + (mean(lower_bins_count) - mean(high_bins_count)) / std(high_bins_count) + ratio: float + mean(lower_bins_count) / highest_bin_count + + """ + n_per_bin, bin_edges = np.histogram(amps, bins=n_bins) + + maximum_bin_height = np.max(n_per_bin) + + low_quantile_value = np.quantile(amps, q=low_quantile) + + # the indices for low-amplitude bins + low_indices = np.where(bin_edges[1:] <= low_quantile_value)[0] + + high_quantile_value = np.quantile(amps, q=1 - high_quantile) + + # the indices for high-amplitude bins + high_indices = np.where(bin_edges[:-1] >= high_quantile_value)[0] + + if len(low_indices) == 0: + warnings.warn( + "No bin is selected to test cutoff. Please increase low_quantile. Setting noise cutoff and ratio to NaN" + ) + return np.nan, np.nan + + # compute ratio between low-amplitude bins and the largest bin + low_counts = n_per_bin[low_indices] + mean_low_counts = np.mean(low_counts) + ratio = mean_low_counts / maximum_bin_height + + if len(high_indices) == 0: + warnings.warn( + "No bin is selected as the reference region. Please increase high_quantile. Setting noise cutoff to NaN" + ) + return np.nan, ratio + + if len(high_indices) == 1: + warnings.warn( + "Only one bin is selected as the reference region, and thus the standard deviation cannot be computed. " + "Please increase high_quantile. Setting noise cutoff to NaN" + ) + return np.nan, ratio + + # compute cutoff from low-amplitude and high-amplitude bins + high_counts = n_per_bin[high_indices] + mean_high_counts = np.mean(high_counts) + std_high_counts = np.std(high_counts) + if std_high_counts == 0: + warnings.warn( + "All the high-amplitude bins have the same size. Please consider changing n_bins. " + "Setting noise cutoff to NaN" + ) + return np.nan, ratio + + cutoff = (mean_low_counts - mean_high_counts) / std_high_counts + return cutoff, ratio + + def compute_num_spikes(sorting_analyzer, unit_ids=None, **kwargs): """ Compute the number of spike across segments. diff --git a/src/spikeinterface/qualitymetrics/quality_metric_list.py b/src/spikeinterface/qualitymetrics/quality_metric_list.py index f7411f6376..89fea6a25e 100644 --- a/src/spikeinterface/qualitymetrics/quality_metric_list.py +++ b/src/spikeinterface/qualitymetrics/quality_metric_list.py @@ -18,6 +18,7 @@ compute_firing_ranges, compute_amplitude_cv_metrics, compute_sd_ratio, + compute_noise_cutoffs, ) from .pca_metrics import ( @@ -51,6 +52,7 @@ "firing_range": compute_firing_ranges, "drift": compute_drift_metrics, "sd_ratio": compute_sd_ratio, + "noise_cutoff": compute_noise_cutoffs, } # a dict converting the name of the metric for computation to the output of that computation @@ -81,6 +83,7 @@ "nn_noise_overlap": ["nn_noise_overlap"], "silhouette": ["silhouette"], "silhouette_full": ["silhouette_full"], + "noise_cutoff": ["noise_cutoff", "noise_ratio"], } # this dict allows us to ensure the appropriate dtype of metrics rather than allow Pandas to infer them @@ -116,4 +119,6 @@ "nn_noise_overlap": float, "silhouette": float, "silhouette_full": float, + "noise_cutoff": float, + "noise_ratio": float, } diff --git a/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py b/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py index f35ad0861c..db3e55b629 100644 --- a/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py +++ b/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py @@ -43,6 +43,7 @@ compute_quality_metrics, ) +from spikeinterface.qualitymetrics.misc_metrics import _noise_cutoff from spikeinterface.core.basesorting import minimum_spike_dtype @@ -50,6 +51,21 @@ job_kwargs = dict(n_jobs=2, progress_bar=True, chunk_duration="1s") +def test_noise_cutoff(): + """ + Generate two artifical gaussian, one truncated and one not. Check the metrics are higher for the truncated one. + """ + np.random.seed(1) + amps = np.random.normal(0, 1, 1000) + amps_trunc = amps[amps > -1] + + cutoff1, ratio1 = _noise_cutoff(amps=amps) + cutoff2, ratio2 = _noise_cutoff(amps=amps_trunc) + + assert cutoff1 <= cutoff2 + assert ratio1 <= ratio2 + + def test_compute_new_quality_metrics(small_sorting_analyzer): """ Computes quality metrics then computes a subset of quality metrics, and checks