Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 'targeted PBT' for generating examples #1779

Closed
Zac-HD opened this issue Jan 26, 2019 · 3 comments · Fixed by #2006
Closed

Support 'targeted PBT' for generating examples #1779

Zac-HD opened this issue Jan 26, 2019 · 3 comments · Fixed by #2006
Assignees
Labels
new-feature entirely novel capabilities or strategies

Comments

@Zac-HD
Copy link
Member

Zac-HD commented Jan 26, 2019

To paraphrase Andreas Löscher and Konstantinos Sagonas (ISSTA 2017 and ICST 2018),

Targeted property-based testing combines the advantages of both search-based and property-based testing. Instead of being completely random, T-PBT uses a search-based component to guide the input generation towards values that have a higher probability of falsifying a property. This explores the input space more effectively and requires less tests to find a bug or achieve a high confidence in the SUT than random PBT.

Here's a sample API for Hypothesis:

def target(observation, label=None):
    # type: (float, str) -> None
	"""Calling this function with a ``float`` observation gives it a way
	to do a directed, rather than heuristic or random, search for inputs 
	that will cause an error.  Observations must always be finite.

	Hypothesis will try to drive the observed value to zero and as far from 
	zero as possible over several examples, so almost any metric will work
	as an observation.  You can think of this as bundling togther calls to
	imaginary ``target_zero``, ``target_positive``, and ``target_negative``
	functions which are not available separately.
	
	If ``hypothesis.target`` is called multiple times within a single test 
	case, the associated input will be treated as if you only called it once
	with the best result for each of the three target values.
	
	The optional ``label`` argument can be used to distinguish between 
	and therefore separately optimise distinct observations, such as the
	mean and standard deviation of a dataset.
	"""

Non-finite and non-float observations raise an InvalidArgument error, to make upgrading or just messing with the search backend easier. More importantly, it prevents a poor test implementation feeding us an uninteresting test case with an infinite observation, where we would get stuck. This way, Hypothesis will shrink to the minimal infinite observation and report that as a bug so it gets fixed.

All we need for an initial implementation is a way to generate bytestrings 'similar' to something existing. Then we just track the best buffer for each target so far, and instead of always doing random generation we sometimes perturb one of the current-best buffers and try that. Tada, evolutionary fuzzing!

@Zac-HD Zac-HD added the new-feature entirely novel capabilities or strategies label Jan 26, 2019
@DRMacIver
Copy link
Member

I've been thinking of doing some internals work to make this sort of thing easier, but in principle we've already got most of the pieces for this they're just not currently very good - Hypothesis is already a mutational fuzzer internally.

One thing I had been wondering about was the range of possible score types to include - I feel like it might be interesting to add scores with arbitrary orderable types. In particular I feel like there are cases where supporting some sort of lexicographic ordering might be nice. That being said, floats do make it a lot simpler from an API and error handling point of view.

@Zac-HD
Copy link
Member Author

Zac-HD commented Jan 26, 2019

I'd be happy with "pretty good", which is probably around "works at all" for this. Especially for something like this where we can radically change the internals without breaking compatibility, I'd rather get a prototype out there than wait a long time for a polished implementation.

Only accepting finite floats will make things so much simpler - and it should be trivial for users to calculate one from whatever metric they're really using. We could even provide a few examples with of e.g. shortlex scoring with some commentary on why this is useful. A big part of my argument for finite-floats-only though is that it forces the user to think about what the metric actually means, and picking a good metric is really really important to make gradient descent work well.

@Zac-HD Zac-HD self-assigned this Jun 8, 2019
@Zac-HD
Copy link
Member Author

Zac-HD commented Jun 8, 2019

I'm going to try a spike over this weekend at master...Zac-HD:targeted-pbt.

Aims are to develop an API that we can commit to (easy), and a proof-of-concept implementation which is good enough to be worth using (less so). Fortunately I suspect that "works at all" is also "good enough" - but hopefully we'll find out soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants