New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deadline feature to make slow tests an error #843
Conversation
|
A whole bunch of questions for discussion:
Broadly I'm sympathetic to the aims and would probably find this net useful in my own tests, but I'm also concerned about the impact for users with inherently slow tests (eg using largish arrays, AI/ML stuff, etc). Changing the deadline on the default settings object is an obvious solution to me, but we need to think about advertising that kind of option for when many tests are slow. And then once we have a good story for "this module has slow tests", can we reduce the default deadline? I'm also cautious in light of our review guide on adding new settings. Not a blocker, but it makes me cautious. |
The answer to both of these is that we're doing a lot of low level byte stream processing, and it's currently very copy heavy and written in pure Python. Trying to improve on the low level operations before I move the core out of Python isn't really worth the time - it's high cost and low return, and we should see a lot of improvements for free once we have a Rust/C/whatever core. Places where I know it's currently slower than I'd like:
Django is a special case. In that case I'm almost certain it's not generation that's slow but Django's set up/teardown for transactional tests. (I think this is also a separate issue for discussion. This feature is useful regardless of whether data generation is slow - the only connection to slow data generation is that it stops people from thinking that it's slower than it is when the problem is their tests)
Mostly it's as good as anything else. 300ms is the natural point at which you would hit a minute (the historic default timeout which is going away) with the default
There are already reporting options for total test execution time in
I would rather not special case this in the data generation. Hypothesis can already generate quite large examples,
Yes, changing the defaults is the expected thing to do here. I'll try to make that clearer. I'm not that concerned about the impact on users with intrinsically slow tests, because it's a very simple piece of configuration to turn off the behaviour.
I think this is more or less the optimal kind of setting: It's something where the user can meaningfully have an opinion about, that has no reference to Hypothesis internals, and there's no one size fits all solution that we can just infer automatically so not making it configurable would be extremely frustrating to people. |
|
That all makes sense!
I was also thinking it's on the high side - 300ms to match the historical timeout of one minute would be my preference among arbitrary numbers.
We're misreporting it as part of generation time then, but that's #846 |
Fair enough. Happy to change it to 300ms. |
6aa1a3b
to
ed0f6c1
Compare
RELEASE.rst
Outdated
| @@ -4,7 +4,7 @@ This release introduces a :attr:`~hypothesis.settings.deadline` | |||
| setting to Hypothesis. | |||
|
|
|||
| When set this turns slow tests into errors. By default it is unset but will | |||
| warn if you exceed 500ms, which will become the default value in a future | |||
| warn if you exceed 300ms, which will become the default value in a future | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
300 here but 200 in implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. It helps to save the file you're editing. I originally went for 300, then decided that actually I preferred 200 - if gives you a comfortable amount of time to get in under the timeout with other things that are not being counted contributing time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to decide if it’s 200ms, 500ms, or something else!
| self.__warned_deadline = True | ||
| note_deprecation(( | ||
| 'Test took %.2fms to run. In future the default ' | ||
| 'deadline setting will be 200ms, which will ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s not what it says in the setting definition.
src/hypothesis/core.py
Outdated
| '%.2fms') % ( | ||
| runtime, self.settings.deadline, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these parentheses are all over the place. I think
raise DeadlineExceeded(
'Test took %.2fms, which exceeds the deadline of '
'%.2fms' % (runtime, self.settings.deadline)
)works just as well, no?
cbc5ed8
to
b4368c9
Compare
b4368c9
to
9873212
Compare
Can you add those additional stats as part of this PR? I'd like to see them together, but will approve as-is if you think that's going to take too long. |
No, it's a substantial chunk more work because it has to deal with things like time spent in inline |
Slow tests are a major problem for property-based testing - they both significantly limit its utility and also make us look bad (performance is high up on the list of things I hear complaints about. Some of this is legitimate - Hypothesis data generation performance is quite bad in places - but a lot of it it's actually the tests that are slow).
With the deprecation of timeout (#580) this problem is going to only get worse because it's been made more visible to the user (which was the goal! Long-run this is a clear improvement).
This change is designed to ratchet up that visibility in a way that is actually actionable to users by showing them how slow their tests are and where they are slow. In particular because the deadline participates in the normal shrinking process, users can see whether this is something triggered by a particular example (in which case Hypothesis is giving them ad hoc performance testing! Though I suspect they're more likely to use assume to guide away from that than fix the problem), or whether the test is just intrinsically slow.
It's likely that in a lot of cases the way people will respond here will be to either raise or remove entirely the test deadline, but I think that's OK. Even when they do that it's making it more visible where the problem is, and it at least gives them the capability to debug it even if they choose not to.
Notes