Natural breaks splitting #225

JelleAalbers · 2019-12-27T14:56:48Z

This adds support for variations of Jenks/Fischer natural breaks for splitting peaks, as an alternative to or replacement for the local minimum prominence finding algorithm method.

You can find some background info on wikipedia and here. In computer vision, when used on an image intensity histogram, this is known as Otsu's method.

Natural breaks finds a point in the waveform that maximizes the 'goodness of split':

      s(left)  +  s(right)
1  -  --------------------
           s(original)

Here s is the (weighted) sum squared deviation from the (weighted) mean, and 'left' and 'right' mean the left and right part of the waveform after the split. To be fully precise, for a waveform w(t), the (weighted) mean is m = sum[t w(t)]/sum[w(t)] and s = sum[ w(t) (t - m)^2 ].

Values near 1 can be interpreted as a strong advice to split the peak; values near 0 as strong advice not to do so.

In this implementation, the user supplies a threshold function that depends on the peak area. The algorithm probes this to decide whether to actually accept the best split and split the peak in two halves. If we do, we recurse on the split halves, until they either drop below some minimum area, have no acceptable splits anymore, or we reach a configurable recursion limit.

The code also supports two modified goodness of split functions:

Normalized variance: This replaces s with the usual, i.e normalized, variance: s = sum[ w(t) (t - m)^2 ] / sum[w(t)]. This is a kind of F-statistic, and has been proposed as a test for bimodality in the past (Larkin, 1979).
Low Split: Supress splits at high parts of the waveform, by multipling the goodness of split by [the sum waveform value at sample where we split] divided by [the maximum sum waveform amplitude]. To make this work for jagged low-energy waveforms, we apply a ~150-ns square filter, i.e. a moving average, to the waveform.

Below you can see how these perform on a few prototypical peaks in XENON1T data at high-energies (so features are clear enough to see by eye). These were found with our current default local minimum clustering.

You can see ordinary natural breaks algorithm reaches quite high 'goodness of split' values in the middle of a normal Gaussian-ish peak. Low Split and Normalized Variance both show a much larger difference between good and bad cases (their values are just lower overall). Normalized Variance, however, has trouble recognizing long tails; Low Split again does quite well here. Thus I'd lean towards using Low Split for now.

Well-resolved peaks

Sticky tails

Multiple modes

JelleAalbers force-pushed the natbreaks branch 2 times, most recently from 678c68a to 43aef02 Compare December 30, 2019 14:08

JelleAalbers added 9 commits January 5, 2020 12:38

Natural breaks peak splitting, recursion

f610450

Export gof computation

7e729ab

Clip waveform to 0

f091707

Use total instead of average intraclass variance

1d1b7cd

Refactor peak splitting, filtering for low_split

4955262

Specify threshold by log10(area) interpolation

21d19da

Default run start time to 0 (with warning)

b281de5

Minor changes

969d94f

Store gofs, flexible threshold and actual recursion

da5960e

JelleAalbers force-pushed the natbreaks branch from da255ae to da5960e Compare January 5, 2020 11:39

JelleAalbers and others added 3 commits January 5, 2020 20:06

Recurse for real now

aa22cc0

NB value to function, turn off numba caching for splitters

ba7943b

Merge branch 'master' into natbreaks

414647f

JelleAalbers merged commit 2146f9e into AxFoundation:master Jan 16, 2020

JelleAalbers deleted the natbreaks branch January 16, 2020 13:13

JelleAalbers mentioned this pull request Jan 28, 2020

Natural breaks clustering XENONnT/straxen#45

Merged

JoranAngevaare mentioned this pull request Jul 16, 2020

change S1 split threshold main contexts XENONnT/straxen#150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural breaks splitting #225

Natural breaks splitting #225

JelleAalbers commented Dec 27, 2019 •

edited

Natural breaks splitting #225

Natural breaks splitting #225

Conversation

JelleAalbers commented Dec 27, 2019 • edited

Well-resolved peaks

Sticky tails

Multiple modes

JelleAalbers commented Dec 27, 2019 •

edited