Guard hitlet entropy test from numerical errors #772
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
... by avoiding waveforms for which normalization is not reliable in float32.
What is the problem / what does the code in this PR do
While installing strax I got a failure in one of the hitlet tests that uses hypothesis to generate test data. This is the same test as in #539 / #544. The PR makes sure the test is skipped on data that is very susceptible to floating point errors. This avoids failures from slight implementation differences between the numba code and the numpy code we test against.
Can you briefly describe how it works?
The waveform that triggered the failure was something like [1, -1, 1e-5] (with a few extra zeros). Note the amplitude is much larger than its sum, causing numerical errors in float32 (e.g. the first step in the conditional entropy computation is to divide by the waveform sum), and about ~1% differences between the numpy and numba computation for waveforms like the above.
I arbitrarily put the threshold for skipping the test at sum < 1e-4 * min-to-max amplitude. Now I get no more failures from hypothesis even with a max_examples of several thousand.