Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uninformative flaky error messages when interestingness_origin changes #3940

Closed
DRMacIver opened this issue Mar 28, 2024 · 1 comment · Fixed by #3949
Closed

Uninformative flaky error messages when interestingness_origin changes #3940

DRMacIver opened this issue Mar 28, 2024 · 1 comment · Fixed by #3949
Labels
interop how to play nicely with other packages legibility make errors helpful and Hypothesis grokable

Comments

@DRMacIver
Copy link
Member

In some circumstances (I can provide a reproducer if necessary, but it's a bit fiddly) I get the following error message:

    def conclude_test(self, status, interesting_origin):
        """Says that ``status`` occurred at node ``node``. This updates the
        node if necessary and checks for consistency."""
        if status == Status.OVERRUN:
            return
        i = self.__index_in_current_node
        node = self.__current_node

        if i < len(node.values) or isinstance(node.transition, Branch):
            inconsistent_generation()

        new_transition = Conclusion(status, interesting_origin)

        if node.transition is not None and node.transition != new_transition:
            # As an, I'm afraid, horrible bodge, we deliberately ignore flakiness
            # where tests go from interesting to valid, because it's much easier
            # to produce good error messages for these further up the stack.
            if isinstance(node.transition, Conclusion) and (
                node.transition.status != Status.INTERESTING
                or new_transition.status != Status.VALID
            ):
>               raise Flaky(
                    f"Inconsistent test results! Test case was {node.transition!r} "
                    f"on first run but {new_transition!r} on second"
                )
E               hypothesis.errors.Flaky: Inconsistent test results! Test case was Conclusion(status=Status.INTERESTING, interesting_origin=InterestingOrigin(exc_type=<class 'trio.NonBaseMultiError'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/trio/_core/_run.py', lineno=881, context=(), group_elems=(InterestingOrigin(exc_type=<class 'hypothesis.errors.InvalidArgument'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/hypothesis/control.py', lineno=92, context=InterestingOrigin(exc_type=<class 'KeyError'>, filename='/Users/drmaciver/Projects/ai-evaluation/tests/test_problem_set_operations.py', lineno=34, context=(), group_elems=()), group_elems=()), InterestingOrigin(exc_type=<class 'hypothesis.errors.InvalidArgument'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/hypothesis/control.py', lineno=92, context=InterestingOrigin(exc_type=<class 'KeyError'>, filename='/Users/drmaciver/Projects/ai-evaluation/tests/test_problem_set_operations.py', lineno=34, context=(), group_elems=()), group_elems=())))) on first run but Conclusion(status=Status.INTERESTING, interesting_origin=InterestingOrigin(exc_type=<class 'trio.NonBaseMultiError'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/trio/_core/_run.py', lineno=881, context=(), group_elems=(InterestingOrigin(exc_type=<class 'hypothesis.errors.InvalidArgument'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/hypothesis/control.py', lineno=92, context=InterestingOrigin(exc_type=<class 'KeyError'>, filename='/Users/drmaciver/Projects/ai-evaluation/tests/test_problem_set_operations.py', lineno=34, context=(), group_elems=()), group_elems=()), InterestingOrigin(exc_type=<class 'hypothesis.errors.InvalidArgument'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/hypothesis/control.py', lineno=92, context=InterestingOrigin(exc_type=<class 'KeyError'>, filename='/Users/drmaciver/Projects/ai-evaluation/tests/test_problem_set_operations.py', lineno=34, context=(), group_elems=()), group_elems=()), InterestingOrigin(exc_type=<class 'hypothesis.errors.InvalidArgument'>, filename='/Users/drmaciver/Projects/ai-evaluation/.venv/lib/python3.12/site-packages/hypothesis/control.py', lineno=92, context=InterestingOrigin(exc_type=<class 'KeyError'>, filename='/Users/drmaciver/Projects/ai-evaluation/tests/test_problem_set_operations.py', lineno=34, context=(), group_elems=()), group_elems=())))) on second

.venv/lib/python3.12/site-packages/hypothesis/internal/conjecture/datatree.py:1045: Flaky

This is extremely hard to read of course. After some massaging of the data I concluded that what's happening is that this test is indeed lightly flaky (in slightly hard to fix ways) because although it consistently fails with the same exception each time, the number of times it fails with that exception varies, which produces this result.

This is a particularly uninformative example of the error message, but I suspect this should be fixed in a fairly general way by extending the workaround labelled a horrible bodge further, and not error here when the status of the InterestingnessOrigin is the same. If we flakily transition from one sort of failure to another then it's probably better to just not worry about it and assume it's a problem in our detection of whether two errors are the same rather than the test.

@Zac-HD
Copy link
Member

Zac-HD commented Mar 30, 2024

Regardless of what else happens here, I think we should:

  • Overhaul that error message. Specifically, I want it to look like <message>\n first: ...\n ...\n{what this means}. We can use our pretty-printer to show everything within an reasonable line-length without wasting vertical space. The last chunk there should have a clear 1-3 sentence of what it means for Hypothesis to consider a test flaky and why it matters. I'd also be keen to offer automatic suggestions for resolving the problem, if we can think of any.
  • Upstream, fix Move pytest-trio's hook for deterministic Hypothesis tests upstream into Trio python-trio/trio#2981.
    We don't need additional work for anyio, because asyncio has deterministic scheduling by default.

Beyond that though, I'd prefer not to weaken our flakiness detection - once your tests is nondeterministic to the point that the same choice sequence can give different test outcomes, it's a matter of pure luck if you can still shrink and replay failing examples.

@Zac-HD Zac-HD added the interop how to play nicely with other packages label Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop how to play nicely with other packages legibility make errors helpful and Hypothesis grokable
Projects
None yet
3 participants