You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Targeted property-based testing combines the advantages of both search-based and property-based testing. Instead of being completely random, T-PBT uses a search-based component to guide the input generation towards values that have a higher probability of falsifying a property. This explores the input space more effectively and requires less tests to find a bug or achieve a high confidence in the SUT than random PBT.
Here's a sample API for Hypothesis:
deftarget(observation, label=None):
# type: (float, str) -> None"""Calling this function with a ``float`` observation gives it a way to do a directed, rather than heuristic or random, search for inputs that will cause an error. Observations must always be finite. Hypothesis will try to drive the observed value to zero and as far from zero as possible over several examples, so almost any metric will work as an observation. You can think of this as bundling togther calls to imaginary ``target_zero``, ``target_positive``, and ``target_negative`` functions which are not available separately. If ``hypothesis.target`` is called multiple times within a single test case, the associated input will be treated as if you only called it once with the best result for each of the three target values. The optional ``label`` argument can be used to distinguish between and therefore separately optimise distinct observations, such as the mean and standard deviation of a dataset. """
Non-finite and non-float observations raise an InvalidArgument error, to make upgrading or just messing with the search backend easier. More importantly, it prevents a poor test implementation feeding us an uninteresting test case with an infinite observation, where we would get stuck. This way, Hypothesis will shrink to the minimal infinite observation and report that as a bug so it gets fixed.
All we need for an initial implementation is a way to generate bytestrings 'similar' to something existing. Then we just track the best buffer for each target so far, and instead of always doing random generation we sometimes perturb one of the current-best buffers and try that. Tada, evolutionary fuzzing!
The text was updated successfully, but these errors were encountered:
I've been thinking of doing some internals work to make this sort of thing easier, but in principle we've already got most of the pieces for this they're just not currently very good - Hypothesis is already a mutational fuzzer internally.
One thing I had been wondering about was the range of possible score types to include - I feel like it might be interesting to add scores with arbitrary orderable types. In particular I feel like there are cases where supporting some sort of lexicographic ordering might be nice. That being said, floats do make it a lot simpler from an API and error handling point of view.
I'd be happy with "pretty good", which is probably around "works at all" for this. Especially for something like this where we can radically change the internals without breaking compatibility, I'd rather get a prototype out there than wait a long time for a polished implementation.
Only accepting finite floats will make things so much simpler - and it should be trivial for users to calculate one from whatever metric they're really using. We could even provide a few examples with of e.g. shortlex scoring with some commentary on why this is useful. A big part of my argument for finite-floats-only though is that it forces the user to think about what the metric actually means, and picking a good metric is really really important to make gradient descent work well.
Aims are to develop an API that we can commit to (easy), and a proof-of-concept implementation which is good enough to be worth using (less so). Fortunately I suspect that "works at all" is also "good enough" - but hopefully we'll find out soon!
To paraphrase Andreas Löscher and Konstantinos Sagonas (ISSTA 2017 and ICST 2018),
Here's a sample API for Hypothesis:
Non-finite and non-float observations raise an
InvalidArgumenterror, to make upgrading or just messing with the search backend easier. More importantly, it prevents a poor test implementation feeding us an uninteresting test case with an infinite observation, where we would get stuck. This way, Hypothesis will shrink to the minimal infinite observation and report that as a bug so it gets fixed.All we need for an initial implementation is a way to generate bytestrings 'similar' to something existing. Then we just track the best buffer for each target so far, and instead of always doing random generation we sometimes perturb one of the current-best buffers and try that. Tada, evolutionary fuzzing!
The text was updated successfully, but these errors were encountered: