assume plus example interacts badly with pytest.mark.parametrize #2125

aarchiba · 2019-10-07T23:11:31Z

I can see why I see hypothesis.errors.UnsatisfiedAssumption when I supply an example that fails to satisfy the assumption, though it is inconvenient (I prefer to leave old failures lying around in @example in case I want to remove the assumption later). But it seems like @example is not respecting the parametrized argument it contains? That is, the hypothesis-generated parts are being combined with both parametrize values, and when the example doesn't pass the assumption, I see UnsatisfiedAssumption - even though it is for a combination of values I did not specify.

Sorry, that's not a very clear explanation. But perhaps this contrived example will help:

import pytest
from hypothesis import assume, given, example
from hypothesis.strategies import floats


@given(floats(0, 1))
@example(t=1, f=0.56789)
@pytest.mark.parametrize("t", [0.5, 1])
def test_unsatisfied_assumption(t, f):
    assume(f < t)
    assert f < t

yields

E           hypothesis.errors.UnsatisfiedAssumption

and

Falsifying example: test_unsatisfied_assumption(t=0.5, f=0.56789)

The text was updated successfully, but these errors were encountered:

Zac-HD · 2019-10-07T23:30:55Z

Hmm, this will probably be a pain to track down and fix - @example and @parametrize both work by attaching some attributes to the test for later use, and apparently we don't check for overlap that well.

We'll also need to decide what should happen when you pass different arguments to @example() and @given() - it's not actually checked for but I think if it works that's an accident - and write either some tests or a deprecation warning.

Either way, thanks so much for bringing this to our attention!

(I should also mention that I'm a huge fan of how you're applying property-based testing in science! 😁 I've been pushing it at conferences for a while but I'm no longer a regular enough contributor to have much luck with PRs...)

aarchiba · 2019-10-07T23:51:21Z

Hmm, this will probably be a pain to track down and fix - @example and @parametrize both work by attaching some attributes to the test for later use, and apparently we don't check for overlap that well.

Ah, that makes sense mechanically. There's also some fragility in terms of order of fixtures, parametrize arguments, and hypothesis-supplied arguments.

We'll also need to decide what should happen when you pass different arguments to @example() and @given() - it's not actually checked for but I think if it works that's an accident - and write either some tests or a deprecation warning.

I'm not quite sure what you mean here? I really like being able to combine @given and parametrize; not using parametrize would mean a lot of copied code. And if you've got both, @example really needs to specify which parametrized values cause the problems - in my case there are two "formats" that go through quite different code paths. Well, okay, I suppose you could supply only the given values in example and rely on the fact that parametrize hits all the possibilities. But I make my @example lines by copying the falsifying-example code from hypothesis reports, which really does need to indicate which parameter values triggered the problem.

Either way, thanks so much for bringing this to our attention!

(I should also mention that I'm a huge fan of how you're applying property-based testing in science! grin I've been pushing it at conferences for a while but I'm no longer a regular enough contributor to have much luck with PRs...)

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?): I was talking about checking frequentist statistical tests - if the K-S test tells you that there's only a one in a thousand chance that your uniformly-distributed values would be this clumpy, there's a decent chance you have a bug somewhere. If you repeat the test with a different seed and it fails again? Probably a bug. I put a few such tests into scipy years and years ago but I bet people have replaced them because they took a long time.

Zac-HD · 2019-10-08T00:00:28Z

I'm not quite sure what you mean here? I really like being able to combine @given and parametrize; not using parametrize would mean a lot of copied code. And if you've got both, @example really needs to specify which parametrized values cause the problems

Yes, absolutely! Mixing @given and @parametrize is really useful, but as #1996 demonstrates there are a bunch of subtle interactions that we don't quite have nailed down yet. I definitely want to support this workflow well, just a little nervous about the level of hackery it might take 🙂

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?)

Niiice. Hypothesis by design doesn't cover this kind of thing though; the net effect of a pile of heuristics and an example-generator that often mutates existing examples is that you really can't rely on getting any particular distribution out, or even find a simple way to characterise the one you get beyond "it's likely to trigger exceptions" 😉

aarchiba · 2019-10-08T00:06:55Z

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?)

Niiice. Hypothesis by design doesn't cover this kind of thing though; the net effect of a pile of heuristics and an example-generator that often mutates existing examples is that you really can't rely on getting any particular distribution out, or even find a simple way to characterise the one you get beyond "it's likely to trigger exceptions" wink

Oh, that's not what I'm worried about. I'm picturing using hypothesis to generate, say, distribution parameters or array sizes or whatever, and then the endpoint test would be one of these statistical things - I generate as many points as hypothesis asked for according to the distribution parameters then run them through one of these tests. The problem arises with the fact that these tests are inherently flaky by hypothesis standards (and also slow). I'm still thinking about whether it's feasible to combine the two techniques.

Zac-HD · 2019-10-08T01:18:14Z

Right, you could use Hypothesis for that but personally I wouldn't, partly because it would be flaky and partly because I'm not sure shrinking would help much.

If you're also generating a seed for your PRNG though, I can see that Hypothesis' replay capabilities could make it worthwhile 😄

Zac-HD added the interop how to play nicely with other packages label Oct 7, 2019

Zac-HD mentioned this issue May 7, 2020

Fix assume-example-parametrize interaction #2420

Merged

Zac-HD closed this as completed in #2420 May 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assume plus example interacts badly with pytest.mark.parametrize #2125

assume plus example interacts badly with pytest.mark.parametrize #2125

aarchiba commented Oct 7, 2019

Zac-HD commented Oct 7, 2019

aarchiba commented Oct 7, 2019

Zac-HD commented Oct 8, 2019

aarchiba commented Oct 8, 2019

Zac-HD commented Oct 8, 2019 •

edited

Loading

assume plus example interacts badly with pytest.mark.parametrize #2125

assume plus example interacts badly with pytest.mark.parametrize #2125

Comments

aarchiba commented Oct 7, 2019

Zac-HD commented Oct 7, 2019

aarchiba commented Oct 7, 2019

Zac-HD commented Oct 8, 2019

aarchiba commented Oct 8, 2019

Zac-HD commented Oct 8, 2019 • edited Loading

Zac-HD commented Oct 8, 2019 •

edited

Loading