Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deadline feature to make slow tests an error #843

Merged
merged 11 commits into from Sep 13, 2017
Merged

Conversation

DRMacIver
Copy link
Member

@DRMacIver DRMacIver commented Sep 10, 2017

Slow tests are a major problem for property-based testing - they both significantly limit its utility and also make us look bad (performance is high up on the list of things I hear complaints about. Some of this is legitimate - Hypothesis data generation performance is quite bad in places - but a lot of it it's actually the tests that are slow).

With the deprecation of timeout (#580) this problem is going to only get worse because it's been made more visible to the user (which was the goal! Long-run this is a clear improvement).

This change is designed to ratchet up that visibility in a way that is actually actionable to users by showing them how slow their tests are and where they are slow. In particular because the deadline participates in the normal shrinking process, users can see whether this is something triggered by a particular example (in which case Hypothesis is giving them ad hoc performance testing! Though I suspect they're more likely to use assume to guide away from that than fix the problem), or whether the test is just intrinsically slow.

It's likely that in a lot of cases the way people will respond here will be to either raise or remove entirely the test deadline, but I think that's OK. Even when they do that it's making it more visible where the problem is, and it at least gives them the capability to debug it even if they choose not to.

Notes

  • This tests the amount of time spent in the test body, not the amount of time spent in test generation. I think that's the correct behaviour, but it also happened to just be easier.
  • This gets away with not having to deal with a bunch of annoying edge cases around what happens if you find slow tests while shrinking other examples etc because of Display all slipped errors found during shrinking #836. As such, it should be merged after that
  • The tests for this will not pass until Remove strategy_test_suite #844 is merged.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 10, 2017

A whole bunch of questions for discussion:

  • You say "generation is slow in places" - do you have a sense of what those places are and how slow? (eg the Django test_is_blank_slate semi-regularly takes more than a second for ten examples)
  • What's stopping performance improvements in generation? Simply that implementation is tricky, unclear problem, no obvious target, etc...
  • Why 500ms as the default value? "It's as good as anything else" is sufficient, but we should probably leave a comment to that effect somewhere.
  • If we gather these execution stats, I'd like to at least think about reporting options even when time is below the threshold.
  • What about working out when there's a ~quadratic (or worse) slowdown, and try to trigger issues by feeding it larger inputs.

Broadly I'm sympathetic to the aims and would probably find this net useful in my own tests, but I'm also concerned about the impact for users with inherently slow tests (eg using largish arrays, AI/ML stuff, etc). Changing the deadline on the default settings object is an obvious solution to me, but we need to think about advertising that kind of option for when many tests are slow. And then once we have a good story for "this module has slow tests", can we reduce the default deadline?

I'm also cautious in light of our review guide on adding new settings. Not a blocker, but it makes me cautious.

@DRMacIver
Copy link
Member Author

DRMacIver commented Sep 10, 2017

You say "generation is slow in places" - do you have a sense of what those places are and how slow? (eg the Django test_is_blank_slate semi-regularly takes more than a second for ten examples)
What's stopping performance improvements in generation? Simply that implementation is tricky, unclear problem, no obvious target, etc...

The answer to both of these is that we're doing a lot of low level byte stream processing, and it's currently very copy heavy and written in pure Python. Trying to improve on the low level operations before I move the core out of Python isn't really worth the time - it's high cost and low return, and we should see a lot of improvements for free once we have a Rust/C/whatever core.

Places where I know it's currently slower than I'd like:

  • Floating point generation (this was true before recent changes and I don't think the recent changes made it that much worse)
  • Anything where examples are large

Django is a special case. In that case I'm almost certain it's not generation that's slow but Django's set up/teardown for transactional tests.

(I think this is also a separate issue for discussion. This feature is useful regardless of whether data generation is slow - the only connection to slow data generation is that it stops people from thinking that it's slower than it is when the problem is their tests)

Why 500ms as the default value? "It's as good as anything else" is sufficient, but we should probably leave a comment to that effect somewhere.

Mostly it's as good as anything else. 300ms is the natural point at which you would hit a minute (the historic default timeout which is going away) with the default max_examples, and 500ms was a nice round number close to that. I'm not wed to 500ms - if anything I suspect it's too large and am tempted to lower it.

If we gather these execution stats, I'd like to at least think about reporting options even when time is below the threshold.

There are already reporting options for total test execution time in --hypothesis-show-statistics and I'm planning to add additional statistics around what fraction of that is generation/test execution

What about working out when there's a ~quadratic (or worse) slowdown, and try to trigger issues by feeding it larger inputs.

I would rather not special case this in the data generation. Hypothesis can already generate quite large examples,

Changing the deadline on the default settings object is an obvious solution to me, but we need to think about advertising that kind of option for when many tests are slow.

Yes, changing the defaults is the expected thing to do here. I'll try to make that clearer.

I'm not that concerned about the impact on users with intrinsically slow tests, because it's a very simple piece of configuration to turn off the behaviour.

I'm also cautious in light of our review guide on adding new settings. Not a blocker, but it makes me cautious.

I think this is more or less the optimal kind of setting: It's something where the user can meaningfully have an opinion about, that has no reference to Hypothesis internals, and there's no one size fits all solution that we can just infer automatically so not making it configurable would be extremely frustrating to people.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 11, 2017

That all makes sense!

[500ms] is as good as anything else.

I was also thinking it's on the high side - 300ms to match the historical timeout of one minute would be my preference among arbitrary numbers.

Django is a special case. In [test_is_blank_slate] I'm almost certain it's not generation that's slow but Django's set up/teardown for transactional tests.

We're misreporting it as part of generation time then, but that's #846

@DRMacIver
Copy link
Member Author

I was also thinking it's on the high side - 300ms to match the historical timeout of one minute would be my preference among arbitrary numbers.

Fair enough. Happy to change it to 300ms.

RELEASE.rst Outdated
@@ -4,7 +4,7 @@ This release introduces a :attr:`~hypothesis.settings.deadline`
setting to Hypothesis.

When set this turns slow tests into errors. By default it is unset but will
warn if you exceed 500ms, which will become the default value in a future
warn if you exceed 300ms, which will become the default value in a future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

300 here but 200 in implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. It helps to save the file you're editing. I originally went for 300, then decided that actually I preferred 200 - if gives you a comfortable amount of time to get in under the timeout with other things that are not being counted contributing time.

Copy link
Contributor

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to decide if it’s 200ms, 500ms, or something else!

self.__warned_deadline = True
note_deprecation((
'Test took %.2fms to run. In future the default '
'deadline setting will be 200ms, which will '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s not what it says in the setting definition.

'%.2fms') % (
runtime, self.settings.deadline,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these parentheses are all over the place. I think

                    raise DeadlineExceeded(
                        'Test took %.2fms, which exceeds the deadline of '
                        '%.2fms' % (runtime, self.settings.deadline)
                    )

works just as well, no?

@Zac-HD
Copy link
Member

Zac-HD commented Sep 13, 2017

There are already reporting options for total test execution time in --hypothesis-show-statistics and I'm planning to add additional statistics around what fraction of that is generation/test execution

Can you add those additional stats as part of this PR? I'd like to see them together, but will approve as-is if you think that's going to take too long.

@DRMacIver
Copy link
Member Author

Can you add those additional stats as part of this PR? I'd like to see them together, but will approve as-is if you think that's going to take too long.

No, it's a substantial chunk more work because it has to deal with things like time spent in inline draw() calls.

@DRMacIver DRMacIver merged commit 8482cc2 into master Sep 13, 2017
@DRMacIver DRMacIver deleted the DRMacIver/deadlines branch September 13, 2017 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants