Property-based testing with hypothesis #208

GenevieveBuckley · 2021-04-23T01:58:54Z

We talked a while ago about whether it would be good to add property-based testing with hypothesis to dask-image.

I'd like to suggest a good place to start, which is writing an equivalence test for the affine_transform function (found in the ndinterp subpackage). You can see some (non-property-based) equivalence testing happening here in validate_affine_transform.

Unlike most of the rest of dask-image, the affine transform function doesn't simply call the equivalent scipy function on chunks of the dask array. It's a bit more complicated, which it a really good one to try. Here's a recent bug report where the user found a mismatch between the default kwarg values expected by dask-image and scipy, which led to this error: #204

Earlier discussion: napari/napari#2444

As background, Hypothesis is basically super-powered random testing, and I wrote this short paper on applying it to scientific code. ... For Napari's first property-based tests, I'd use the following as a checklist (and try the ghostwriter):

"fuzz tests": feed in weird but technically-possible inputs, and see if anything crashes. In Numpy (for example) this has found issues with unicode strings, zero-dimensional arrays, and comparing arrays-of-structs containing nan. The key is to think carefully about how to generate all possible values, as bugs often rely on the interaction of two or more edge cases. Once you have some library-specific generators, it's easy to reuse them for later steps.

Round-trip tests: save your data, load it, assert no changes! Simple, important, and highly effective. Test format conversions in-memory, as well as persistent storage formats.

Equivalence: fantastic when you have a function that should behave identically to a reference implementation, even if only over a subset of inputs. Traditionally great for refactoring, but perhaps you also have Dask- or Napari-specific implementations of functions in scipy or scikit-*, and could run both on small arrays? Presenting identical data as Numpy vs Dask arrays could also be interesting.

Look for parametrized tests, and consider changing them to use Hypothesis. This is best when the full product is too slow to run regularly, e.g. array dimensionality * size * dtype * bitwidth * endianness * random contents (for each operand...) and using Hypothesis would allow a less-restricted test. Note that you can also supply some arguments from @pytest.mark.parametrize and others from @hypothesis.given in the same test!

"Classic" properties like asserting bounds on outputs, idempotence, commutative operations, etc. These mostly apply to "algorithmic" code and are far from the only use of a PBT library, but still very useful when applicable.

(No worries if you're swamped, or have stuff you're working on for other projects. It's good to document good places to start, so people can pick them up in the future.)

The text was updated successfully, but these errors were encountered:

Zac-HD mentioned this issue Jul 14, 2022

🏃 Sprints meta-issue HypothesisWorks/hypothesis#3402

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Property-based testing with hypothesis #208

Property-based testing with hypothesis #208

GenevieveBuckley commented Apr 23, 2021 •

edited

Loading

Property-based testing with hypothesis #208

Property-based testing with hypothesis #208

Comments

GenevieveBuckley commented Apr 23, 2021 • edited Loading

GenevieveBuckley commented Apr 23, 2021 •

edited

Loading