Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hypothesis.errors.Flaky raised with tight deadline setting #892

Closed
The-Compiler opened this issue Sep 25, 2017 · 1 comment · Fixed by #899
Closed

hypothesis.errors.Flaky raised with tight deadline setting #892

The-Compiler opened this issue Sep 25, 2017 · 1 comment · Fixed by #899

Comments

@The-Compiler
Copy link
Contributor

I'm currently seeing a failure which confuses me a bit (which was probably introduced by things getting slower because of coverage). It roughly looks like this:

_________________ test_parse_content_disposition[inline; {}] __________________
caplog = <pytest_catchlog.CompatLogCaptureFixture object at 0x7feb5c413470>
template = 'inline; {}'
stubs = <module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>
    @pytest.mark.parametrize('template', [
>       '{}',
        'attachment; filename="{}"',
        'inline; {}',
        'attachment; {}="foo"',
        'attachment; filename*=iso-8859-1''{}',
        'attachment; filename*={}',
    ])
    @hypothesis.given(strategies.text(alphabet=[chr(x) for x in range(255)]))
    def test_parse_content_disposition(caplog, template, stubs, s):
tests/unit/browser/webkit/http/test_http_hypothesis.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.tox/py36/lib/python3.6/site-packages/hypothesis/executors.py:58: in default_new_style_executor
    return function(data)
.tox/py36/lib/python3.6/site-packages/hypothesis/core.py:140: in run
    return test(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
args = ()
kwargs = {'caplog': <pytest_catchlog.CompatLogCaptureFixture object at 0x7feb5c413470>, 's': '\x00', 'stubs': <module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>, 'template': 'inline; {}'}
text_repr = "caplog=<pytest_catchlog.CompatLogCaptureFixture at 0x7feb5c413470>, template='inline; {}', stubs=<module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>, s='\\x00'"
    @functools.wraps(test)
    def test_or_flaky(*args, **kwargs):
        text_repr = arg_string(test, args, kwargs)
        raise Flaky(
            (
                'Hypothesis %s(%s) produces unreliable results: Falsified'
                ' on the first call but did not on a subsequent one'
>           ) % (test.__name__, text_repr,))
E       hypothesis.errors.Flaky: Hypothesis test_parse_content_disposition(caplog=<pytest_catchlog.CompatLogCaptureFixture at 0x7feb5c413470>, template='inline; {}', stubs=<module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>, s='\x00') produces unreliable results: Falsified on the first call but did not on a subsequent one
.tox/py36/lib/python3.6/site-packages/hypothesis/core.py:87: Flaky
------------------------------ Captured log call -------------------------------
[...]
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_parse_content_disposition(caplog=<pytest_catchlog.CompatLogCaptureFixture at 0x7feb5c413470>, template='inline; {}', stubs=<module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>, s='\x00')
Failed to reproduce exception. Expected: 
Traceback (most recent call last):
  File "/home/user/qutebrowser.git/.tox/py36/lib/python3.6/site-packages/hypothesis/core.py", line 705, in evaluate_test_data
    self.search_strategy, self.test,
  File "/home/user/qutebrowser.git/.tox/py36/lib/python3.6/site-packages/hypothesis/executors.py", line 58, in default_new_style_executor
    return function(data)
  File "/home/user/qutebrowser.git/.tox/py36/lib/python3.6/site-packages/hypothesis/core.py", line 140, in run
    return test(*args, **kwargs)
  File "/home/user/qutebrowser.git/tests/unit/browser/webkit/http/test_http_hypothesis.py", line 30, in test_parse_content_disposition
    '{}',
  File "/home/user/qutebrowser.git/.tox/py36/lib/python3.6/site-packages/hypothesis/core.py", line 608, in timed_test
    '%.2fms') % (runtime, self.settings.deadline)
hypothesis.errors.DeadlineExceeded: Test took 406.39ms, which exceeds the deadline of 400.00ms
Falsifying example: test_parse_content_disposition(caplog=<pytest_catchlog.CompatLogCaptureFixture at 0x7feb5c413470>, s='\x00', stubs=<module 'helpers.stubs' (<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7febd89cf780>)>, template='inline; {}')
You can add @seed(302934307671667531413257853548643485645) to this test or run pytest with --hypothesis-seed=302934307671667531413257853548643485645 to reproduce this failure.

I don't see how that test could be flaky, it'll always pass with that input. However, I think what happened is that it was just above the deadline once, and just above it when re-running? I'm not sure how, but I feel like hypothesis could give me a better indication of what happened there.

@DRMacIver
Copy link
Member

I don't see how that test could be flaky, it'll always pass with that input. However, I think what happened is that it was just above the deadline once, and just above it when re-running? I'm not sure how, but I feel like hypothesis could give me a better indication of what happened there.

Hmm. Yeah, I was worried about this possibility and was hoping it wouldn't come up quite so soon. Thanks, past-@DRMacIver 😜.

I agree we should handle this better. The easy options as I see it are:

  • Lower the deadline slightly for replay - if, say, the last example still takes > 90% of the deadline, raise anyway with a better error.
  • (Possibly overlapping with the above) When shrinking a slow example that initially exceeds the deadline by some large margin, temporarily raise the deadline so that when we replay it at the end we get something that is comfortably above that margin.
  • Change the error message for Flaky when the exception is a DeadlineExceeded to be more informative (nb: Must deal properly with when the deadline is one of multiple failures)

I suspect the correct answer is "all of the above"

DRMacIver added a commit that referenced this issue Sep 28, 2017
Deadlines currently interact poorly with shrinking in the
shrinker will try to find an example which is right on the
boundary of failing, which means that given the slightly
unreliable nature of timing, often when you replay the
example it doesn't fail! This results in a confusing user
experience.

The fix is twofold:

1. Give a better error message when this happens.
2. Try to avoid it happening in the first place by raising
   the deadline temporarily during the shrinking process, so
   that we stop at a boundary that is significantly above that
   of the deadline.

This fixes #892.
DRMacIver added a commit that referenced this issue Sep 28, 2017
Deadlines currently interact poorly with shrinking in the
shrinker will try to find an example which is right on the
boundary of failing, which means that given the slightly
unreliable nature of timing, often when you replay the
example it doesn't fail! This results in a confusing user
experience.

The fix is twofold:

1. Give a better error message when this happens.
2. Try to avoid it happening in the first place by raising
   the deadline temporarily during the shrinking process, so
   that we stop at a boundary that is significantly above that
   of the deadline.

This fixes #892.
DRMacIver added a commit that referenced this issue Sep 28, 2017
Deadlines currently interact poorly with shrinking in the
shrinker will try to find an example which is right on the
boundary of failing, which means that given the slightly
unreliable nature of timing, often when you replay the
example it doesn't fail! This results in a confusing user
experience.

The fix is twofold:

1. Give a better error message when this happens.
2. Try to avoid it happening in the first place by raising
   the deadline temporarily during the shrinking process, so
   that we stop at a boundary that is significantly above that
   of the deadline.

This fixes #892.
DRMacIver added a commit that referenced this issue Sep 29, 2017
Deadlines currently interact poorly with shrinking in the
shrinker will try to find an example which is right on the
boundary of failing, which means that given the slightly
unreliable nature of timing, often when you replay the
example it doesn't fail! This results in a confusing user
experience.

The fix is twofold:

1. Give a better error message when this happens.
2. Try to avoid it happening in the first place by raising
   the deadline temporarily during the shrinking process, so
   that we stop at a boundary that is significantly above that
   of the deadline.

This fixes #892.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants