Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explaining failing examples - by showing which arguments (don't) matter #3411

Closed
Zac-HD opened this issue Jul 17, 2022 · 2 comments · Fixed by #3555
Closed

Explaining failing examples - by showing which arguments (don't) matter #3411

Zac-HD opened this issue Jul 17, 2022 · 2 comments · Fixed by #3555
Assignees
Labels
internals Stuff that only Hypothesis devs should ever see legibility make errors helpful and Hypothesis grokable new-feature entirely novel capabilities or strategies

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Jul 17, 2022

Hypothesis has many features designed to help users find bugs - but helping users understand bugs is equally important! Our headline feature for that is shrinking, but I think we should treat minimal failing examples as a baseline1. That's why I implemented basic fault-localization in explain mode, and want to take that further by generalizing failing examples.

One key insight here is that the feature should be UX-first, defined by the question "what output would help users understand why their test failed"2. The approach I've chosen amounts to:

  1. Shrink to a minimal failing example,
  2. Determine which arguments can be freely varied without changing the failure, and
  3. Print a comment like # or any other generated value next to each such argument.

Of these, the difficult part is modifying the conjecture internals for (2):

  • Identify the span corresponding to each argument to @given
  • Replay up the start of that span, use new random bits within it, and replay the suffix after the span (using some new ConjectureData internals)
  • Track which arguments ever failed to reproduce the failure. Optimization: check which previously-executed examples met the criteria and count them towards the analysis.
  • We'll have a distinct comment for "varying all these repros" and otherwise just report "varying these one-at-a-time repros". Trying to report subsets is confusing, expensive to compute, and not that useful.

This approach is coarser-grained than the prior art (see #2192), but conversely can be used with data than does not match a context-free grammar. On the whole, I like it much more 🙂

Footnotes

  1. not least because the threshold problem can make failures look less important, e.g. https://github.com/HypothesisWorks/hypothesis/issues/2180

  2. rather than e.g. "what cool algorithmic or instrumentation trick could I pull?", as is considerably more common.

@Zac-HD Zac-HD added new-feature entirely novel capabilities or strategies legibility make errors helpful and Hypothesis grokable internals Stuff that only Hypothesis devs should ever see labels Jul 17, 2022
@Zac-HD
Copy link
Member Author

Zac-HD commented Jul 17, 2022

A simple but concrete example to illustrate:

from hypothesis import given, strategies as st

@given(st.integers(), st.integers())
def test_division(x, y):
    x / y

Currently reports:

Falsifying example: test_division(
    y=0, x=0,
)

Desired report:

Falsifying example: test_division(
    x=0,  # or any other generated value
    y=0,
)

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 9, 2023

I have a working prototype! It only shows comments if the end of the buffer can vary, but I've plumbed everything though and handling arbitrary segments shouldn't be much harder - just need to work out when to start replaying the saved suffix. Still very exciting to see 🎉

update: complete working implementation at Zac-HD/hypothesis@creation-reprs...which-parts-matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Stuff that only Hypothesis devs should ever see legibility make errors helpful and Hypothesis grokable new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant