Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Pull coverage guided testing from the feature set #1551

Closed
DRMacIver opened this issue Aug 30, 2018 · 10 comments · Fixed by #1564
Closed

RFC: Pull coverage guided testing from the feature set #1551

DRMacIver opened this issue Aug 30, 2018 · 10 comments · Fixed by #1564
Labels
opinions-sought performance

Comments

@DRMacIver
Copy link
Member

DRMacIver commented Aug 30, 2018

I like that we have coverage guided testing, but it causes a bunch of problems (e.g. #1392, #1493, general performance issues), and realistically at the sort of workloads we actually have a good UX for right now, I don't think it can realistically ever pull its weight.

On the UX front, I think what we want is some sort of fuzzer support for Hypothesis, so that we can e.g. run a Hypothesis based test under python-afl and most of the benefits of the coverage guidance would be better served by adding that, and the current generation of coverage guided testing adds very little either currently or towards that goal.

So I'd like to propose that we do the following:

  • Rip out all of the coverage guidance code, so that now all behaviour is equivalent to current behaviour with use_coverage=False
  • Deprecate the use_coverage setting
  • Open up a ticket describing a concrete plan towards fuzzer support (I actually think this can be done quite easily)

Thoughts?

@DRMacIver DRMacIver added opinions-sought performance language: Python labels Aug 30, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Aug 31, 2018

Well... python-afl is going to have exactly the same problem as discussed in #1493 (still using sys.settrace) and the only way it wouldn't also reproduce #1392 is if that's something about our @proxies decorator instead of coverage (which I think is likely).

So while I wouldn't mind an option to have test case generation driven by afl, I don't think removing or deprecating the current coverage-driven system would be helpful. I could be convinced have use_coverage=False as the default though.

@Zalathar
Copy link
Contributor

Zalathar commented Aug 31, 2018

True, but I think the idea is that people will be more willing to accept those limitations in a dedicated fuzzing mode, as long as they can run debuggers and coverage on their “normal” test runs without jumping through hoops.

@DRMacIver
Copy link
Member Author

DRMacIver commented Aug 31, 2018

Well... python-afl is going to have exactly the same problem as discussed in #1493 (still using sys.settrace) and the only way it wouldn't also reproduce #1392 is if that's something about our @proxies decorator instead of coverage (which I think is likely).

Yes, but I think that's OK.

  • It would result in things working by default, which is an important property
  • The fuzzing workflow would populate the example database, so you could then run it with Hypothesis in normal mode
  • There is vastly less code (like three orders of magnitude less) in the pathway from fuzzing to running Hypothesis than there is in the normal Hypothesis execution, so there is much less to go wrong and need debugging.

So while I wouldn't mind an option to have test case generation driven by afl, I don't think removing or deprecating the current coverage-driven system would be helpful.

I don't think that's the right way of looking at it. The question is not "would removing it would be helpful?" but "would retaining it be helpful?". It's a very costly feature in terms of sharp edges for users, performance, and code complexity, and it offers no user visible functionality except for allegedly being better at finding bugs and I'm not convinced it either does or can pull its weight there - the original idea was that it would improve over time as I did more research into the subject, but the reality is that a) I'm not going to be doing that research in the next year or two and b) it's really unclear whether this is a problem worth solving in the normal Hypothesis execution budget.

I could be convinced have use_coverage=False as the default though.

Setting use_coverage=False was going to be my initial proposal, but then I asked myself the following question: Under what circumstances would we ever recommend setting use_coverage to True? I couldn't come up with any. Can you?

@Zac-HD
Copy link
Member

Zac-HD commented Aug 31, 2018

The question is not "would removing it would be helpful?" but "would retaining it be helpful?".

Setting use_coverage=False was going to be my initial proposal, but then I asked myself the following question: Under what circumstances would we ever recommend setting use_coverage to True? I couldn't come up with any. Can you?

When you want to run in fuzzing mode! My suggestion is basically to start by flipping the default, then add the fuzzing mode which makes use_coverage=True use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

So retaining it basically buys out of a compatible deprecation cycle, and clarifies the messaging around what the setting is for.

@DRMacIver
Copy link
Member Author

DRMacIver commented Aug 31, 2018

When you want to run in fuzzing mode! My suggestion is basically to start by flipping the default, then add the fuzzing mode which makes use_coverage=True use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

No, the setting is basically useless for fuzzing mode, and we should be using someone else's fuzzer for that. It will work better and be less effort for us to maintain.

@DRMacIver
Copy link
Member Author

DRMacIver commented Aug 31, 2018

Oh, sorry, I misread part of that:

use a python-afl backend (and emit HypothesisWarning if max_examples<=9999).

This also can't happen. The whole point of using someone else's fuzzer is that:

  1. The current workflow and UI for using Hypothesis as a fuzzer is very bad to the point of almost total uselessness.
  2. The current backend for using Hypothesis as a fuzzer is also very bad to the point of almost total uselessness.

The solution to (1) in particular is that we let do AFL what it basically demands to do and own main. In this case AFL plays an analogous role to something like pytest, it's not something under our control.

@Zac-HD
Copy link
Member

Zac-HD commented Aug 31, 2018

Huh, I was thinking about basically using afl to drive the gitbits primitive in a long-running process via a shim that AFL could control.

In any case, my position is basically just leave the actual deprecation until we have a working version of the new fuzzing workflow.

@DRMacIver
Copy link
Member Author

DRMacIver commented Aug 31, 2018

Huh, I was thinking about basically using afl to drive the gitbits primitive in a long-running process via a shim that AFL could control.

Yes, that's the idea, but if we're doing that then literally none of the settings are relevant because nothing is under our control.

In any case, my position is basically just leave the actual deprecation until we have a working version of the new fuzzing workflow.

I don't see why we should gate removing a feature that has never really lived up to its promise on another feature that does what it would have done if it worked, but if you really feel that strongly I don't object to simply turning use_coverage=False into the default and sitting back as literally nobody ever sets it to True because there is no good reason to do so.

@DRMacIver
Copy link
Member Author

DRMacIver commented Sep 8, 2018

nedbat/coveragepy#702 has reminded me of another non-trivial cost of keeping this feature: Our interaction with coverage is brittle and unofficial, and makes life worse for @nedbat as well as us, and fixing that issue is going to require a non-trivial amount of debugging / probably more bad hacks on the Coverage API.

Given this, unless someone comes up with at least one example of an actual concrete benefit that retaining this feature would bring, I am very strongly inclined to go ahead with the removal process.

@Zac-HD if you still don't like the deprecation plan I am happy to remove the feature and turn the flag into a dummy flag that silently does nothing until we backfill it with something else or actually deprecate it.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 8, 2018

Causing real problems for other projects is enough reason to tear it out IMO, so go ahead 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
opinions-sought performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants