Skip to content

Latest commit

 

History

History
85 lines (51 loc) · 3.71 KB

nearest_neighbor.rst

File metadata and controls

85 lines (51 loc) · 3.71 KB

Nearest Neighbor Metrics (nn_hit_rate, nn_miss_rate, nn_isolation, nn_noise_overlap)

Calculation

There are several implementations of nearest neighbor metrics which can be used to evaluate unit quality.

When calling the compute_quality_metrics() function, the following options are available to calculate NN metrics:

  • The nearest_neighbor option will return nn_hit_rate and nn_miss_rate (based on [Siegle] inspired by [Chung]).
  • The nn_isolation option will return the nearest neighbor isolation metric (adapted from [Chung]).
  • The nn_noise_overlap option will return the nearest neighbor isolation metric (adapted from [Chung]).

All options involve non-parametric calculations in PCA space.

nearest_neighbor

The membership function, ρ is defined such that for any spike gi in some cluster :math:`G, ρ(gi) = G. Additionally, the nearest neighbour function nk(gi) is defined such that the output of the function is the set of k spikes which are closest to gi.

For a unit associated with cluster C, a subset of spikes are randomly drawn to form the cluster A. A subset of spikes which are not in C are drawn to form the cluster B. Note that |A| = |B|. The NN-hit rate for C is then:

$$NN_{\textrm{hit}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A : \rho(n_i(x)) = A \} |}{ | A | }$$

Similarly, the NN-miss rate for C is:

$$NN_{\textrm{miss}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in B : \rho(n_i(x)) = A \} |}{ | B | }$$

NN-hit rate gives an estimate of contamination (an uncontaminated unit should have a high NN-hit rate). NN-miss rate gives an estimate of completeness. A more complete unit should have a low NN-miss rate.

nn_isolation

The overall logic of this approach is to choose a cluster for which the isolation is to be computed, and compute the pairwise isolation score between the chosen cluster and every other cluster. The isolation score is then the minimum of the pairwise scores (the worst case).

Let A and B be two clusters from sorting. We set |A| = |B| by subsampling as appropriate to match the size of the smaller cluster (or the max_spikes_for_nn parameter value, if using). We also restrict the waveforms to channels with significant signal.

The pairwise isolation between clusters A and B is then:

$$NN_{\textrm{isolation}}(A, B) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A \cup B : \rho(n_i(x)) = \rho(x) \} |}{ | A \cup B | }$$

Note that nn_isolation is affected by the size of the clusters, so setting the max_spikes_for_nn may aid downstream comparison of scores.

nn_noise_overlap

A noise cluster is generated by randomly sampling voltage snippets from the recording. Following a similar procedure to that of the nn_isolation method, compute isolation between the cluster of interest and the generated noise cluster. noise overlap is then 1 − NNisolation.

This metric gives an indication of the contamination present in the unit cluster.

References

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_metrics

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_isolation

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_noise_overlap

Literature

Introduced by [Chung] and adapted by [Siegle] and Kyu Hyun Lee.