Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assume plus example interacts badly with pytest.mark.parametrize #2125

Closed
aarchiba opened this issue Oct 7, 2019 · 5 comments · Fixed by #2420
Closed

assume plus example interacts badly with pytest.mark.parametrize #2125

aarchiba opened this issue Oct 7, 2019 · 5 comments · Fixed by #2420
Labels
interop how to play nicely with other packages

Comments

@aarchiba
Copy link
Contributor

aarchiba commented Oct 7, 2019

I can see why I see hypothesis.errors.UnsatisfiedAssumption when I supply an example that fails to satisfy the assumption, though it is inconvenient (I prefer to leave old failures lying around in @example in case I want to remove the assumption later). But it seems like @example is not respecting the parametrized argument it contains? That is, the hypothesis-generated parts are being combined with both parametrize values, and when the example doesn't pass the assumption, I see UnsatisfiedAssumption - even though it is for a combination of values I did not specify.

Sorry, that's not a very clear explanation. But perhaps this contrived example will help:

import pytest
from hypothesis import assume, given, example
from hypothesis.strategies import floats


@given(floats(0, 1))
@example(t=1, f=0.56789)
@pytest.mark.parametrize("t", [0.5, 1])
def test_unsatisfied_assumption(t, f):
    assume(f < t)
    assert f < t

yields

E           hypothesis.errors.UnsatisfiedAssumption

and

Falsifying example: test_unsatisfied_assumption(t=0.5, f=0.56789)
@Zac-HD Zac-HD added the interop how to play nicely with other packages label Oct 7, 2019
@Zac-HD
Copy link
Member

Zac-HD commented Oct 7, 2019

Hmm, this will probably be a pain to track down and fix - @example and @parametrize both work by attaching some attributes to the test for later use, and apparently we don't check for overlap that well.

We'll also need to decide what should happen when you pass different arguments to @example() and @given() - it's not actually checked for but I think if it works that's an accident - and write either some tests or a deprecation warning.

Either way, thanks so much for bringing this to our attention!

(I should also mention that I'm a huge fan of how you're applying property-based testing in science! 😁 I've been pushing it at conferences for a while but I'm no longer a regular enough contributor to have much luck with PRs...)

@aarchiba
Copy link
Contributor Author

aarchiba commented Oct 7, 2019

Hmm, this will probably be a pain to track down and fix - @example and @parametrize both work by attaching some attributes to the test for later use, and apparently we don't check for overlap that well.

Ah, that makes sense mechanically. There's also some fragility in terms of order of fixtures, parametrize arguments, and hypothesis-supplied arguments.

We'll also need to decide what should happen when you pass different arguments to @example() and @given() - it's not actually checked for but I think if it works that's an accident - and write either some tests or a deprecation warning.

I'm not quite sure what you mean here? I really like being able to combine @given and parametrize; not using parametrize would mean a lot of copied code. And if you've got both, @example really needs to specify which parametrized values cause the problems - in my case there are two "formats" that go through quite different code paths. Well, okay, I suppose you could supply only the given values in example and rely on the fact that parametrize hits all the possibilities. But I make my @example lines by copying the falsifying-example code from hypothesis reports, which really does need to indicate which parameter values triggered the problem.

Either way, thanks so much for bringing this to our attention!

(I should also mention that I'm a huge fan of how you're applying property-based testing in science! grin I've been pushing it at conferences for a while but I'm no longer a regular enough contributor to have much luck with PRs...)

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?): I was talking about checking frequentist statistical tests - if the K-S test tells you that there's only a one in a thousand chance that your uniformly-distributed values would be this clumpy, there's a decent chance you have a bug somewhere. If you repeat the test with a different seed and it fails again? Probably a bug. I put a few such tests into scipy years and years ago but I bet people have replaced them because they took a long time.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 8, 2019

I'm not quite sure what you mean here? I really like being able to combine @given and parametrize; not using parametrize would mean a lot of copied code. And if you've got both, @example really needs to specify which parametrized values cause the problems

Yes, absolutely! Mixing @given and @parametrize is really useful, but as #1996 demonstrates there are a bunch of subtle interactions that we don't quite have nailed down yet. I definitely want to support this workflow well, just a little nervous about the level of hackery it might take 🙂

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?)

Niiice. Hypothesis by design doesn't cover this kind of thing though; the net effect of a pile of heuristics and an example-generator that often mutates existing examples is that you really can't rely on getting any particular distribution out, or even find a simple way to characterise the one you get beyond "it's likely to trigger exceptions" 😉

@aarchiba
Copy link
Contributor Author

aarchiba commented Oct 8, 2019

Thanks! I actually gave a talk at a python in astronomy conference about property-based testing several years before I even heard of hypothesis, though it was a case that I don't see how hypothesis covers (yet?)

Niiice. Hypothesis by design doesn't cover this kind of thing though; the net effect of a pile of heuristics and an example-generator that often mutates existing examples is that you really can't rely on getting any particular distribution out, or even find a simple way to characterise the one you get beyond "it's likely to trigger exceptions" wink

Oh, that's not what I'm worried about. I'm picturing using hypothesis to generate, say, distribution parameters or array sizes or whatever, and then the endpoint test would be one of these statistical things - I generate as many points as hypothesis asked for according to the distribution parameters then run them through one of these tests. The problem arises with the fact that these tests are inherently flaky by hypothesis standards (and also slow). I'm still thinking about whether it's feasible to combine the two techniques.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 8, 2019

Right, you could use Hypothesis for that but personally I wouldn't, partly because it would be flaky and partly because I'm not sure shrinking would help much.

If you're also generating a seed for your PRNG though, I can see that Hypothesis' replay capabilities could make it worthwhile 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop how to play nicely with other packages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants