Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property-based testing with hypothesis #208

Open
GenevieveBuckley opened this issue Apr 23, 2021 · 0 comments
Open

Property-based testing with hypothesis #208

GenevieveBuckley opened this issue Apr 23, 2021 · 0 comments

Comments

@GenevieveBuckley
Copy link
Collaborator

GenevieveBuckley commented Apr 23, 2021

Hi @Zac-HD and @FudgeMunkey,

We talked a while ago about whether it would be good to add property-based testing with hypothesis to dask-image.

I'd like to suggest a good place to start, which is writing an equivalence test for the affine_transform function (found in the ndinterp subpackage). You can see some (non-property-based) equivalence testing happening here in validate_affine_transform.

Unlike most of the rest of dask-image, the affine transform function doesn't simply call the equivalent scipy function on chunks of the dask array. It's a bit more complicated, which it a really good one to try. Here's a recent bug report where the user found a mismatch between the default kwarg values expected by dask-image and scipy, which led to this error: #204

Earlier discussion: napari/napari#2444

As background, Hypothesis is basically super-powered random testing, and I wrote this short paper on applying it to scientific code. ... For Napari's first property-based tests, I'd use the following as a checklist (and try the ghostwriter):

  • "fuzz tests": feed in weird but technically-possible inputs, and see if anything crashes. In Numpy (for example) this has found issues with unicode strings, zero-dimensional arrays, and comparing arrays-of-structs containing nan. The key is to think carefully about how to generate all possible values, as bugs often rely on the interaction of two or more edge cases. Once you have some library-specific generators, it's easy to reuse them for later steps.
  • Round-trip tests: save your data, load it, assert no changes! Simple, important, and highly effective. Test format conversions in-memory, as well as persistent storage formats.
  • Equivalence: fantastic when you have a function that should behave identically to a reference implementation, even if only over a subset of inputs. Traditionally great for refactoring, but perhaps you also have Dask- or Napari-specific implementations of functions in scipy or scikit-*, and could run both on small arrays? Presenting identical data as Numpy vs Dask arrays could also be interesting.
  • Look for parametrized tests, and consider changing them to use Hypothesis. This is best when the full product is too slow to run regularly, e.g. array dimensionality * size * dtype * bitwidth * endianness * random contents (for each operand...) and using Hypothesis would allow a less-restricted test. Note that you can also supply some arguments from @pytest.mark.parametrize and others from @hypothesis.given in the same test!
  • "Classic" properties like asserting bounds on outputs, idempotence, commutative operations, etc. These mostly apply to "algorithmic" code and are far from the only use of a PBT library, but still very useful when applicable.

(No worries if you're swamped, or have stuff you're working on for other projects. It's good to document good places to start, so people can pick them up in the future.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant