-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-bug discovery during generation #847
Comments
|
We should indeed fix this! I'm much less eager to skip generation though - particularly I don't see much use in "run until you find a bug, then stop". This seems to fall in to a awkward spot where it's mostly a case for AFL, and if Hypothesis can't find the bug you'll just waste a lot of cycles on it - especially if your tests are running approximately serially rather than interleaved. I agree with the 'correct logic'; not much more to say there. I don't like the idea of reintroducing time into the number of examples run. If we're not simply running until we hit max_examples - which is after all the expected behaviour in all non-buggy runs! - I favor option three as a simple version of option four. If you found one or several bugs in |
I find myself running with large numbers of examples fairly often. It's not quite "run until you hit a bug", but it's still going to be running for minutes or tens of minutes. When I'm doing this I would definitely find it annoying if Hypothesis found a bug and then spent ages looking for another one. I agree that something like AFL would be more useful here, but for now Hypothesis scratches a lot of itches that AFL can't (and Hypothesis should get more AFL-like over time, ideally)
I don't think it's too problematic. The problem with timeout was that it gave users a false sense of confidence because it looked like you were running more examples than you were. In this case that's a non-issue, because Hypothesis is running either the amount of examples you asked for, or it found a bug in which case there was never any guarantee of how many examples were run anyway. |
|
To make the point, we should also update our headline example in the readme: @given(st.lists(st.floats(allow_nan=False)))
def test_mean(xs):
mean = sum(xs) / len(xs)
assert min(xs) <= mean <= max(xs)(Note that this is on the current released version! We already find two distinct bugs, just by shrinking the list 😄) |
|
I'm going to start working on this |
|
I'm not working on this anymore. Can a maintainer grab this? |
As a consequence of #836 we can now find and display multiple bugs if these occur during either shrinking or retrieving examples from the database.
Amusingly though, we can't currently find multiple bugs in the phase that is actually designed to find bugs, generation! We should probably fix that.
However it is not sufficient to just run generation to completion, because a reasonable usage pattern is "Run generation until you find a bug and then stop and tell me that bug", which could be done by e.g. setting
max_examplesabsurdly high and turning off the time limit. In this case we would not want to keep running indefinitely after we've found a bug.I think the correct logic here is:
max_examplesin the sense that we stop running generation after we've generated that many valid or interesting examples.The devil is of course in the details of what "reasonably soon" means. A non-exhaustive list of possibilities:
My preference is probably some hybrid of 1 and 3.
The text was updated successfully, but these errors were encountered: