Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fair integer drawing #366

Merged
merged 1 commit into from
Sep 30, 2016

Conversation

cpennington
Copy link
Contributor

Without this patch, drawing an integer outside the valid range marks the
data as invalid, which forces a re-draw of all other data up to that
point. After the patch, we immediately redraw the integer until it is
within the target range. In practice, this doubled the number of valid
examples found within a bounded time period.

@DRMacIver
Copy link
Member

I'm curious if you've seen this make an improvement in practice. I'm happy to merge it if you get the build green, but I suspect that it mostly doesn't help that much because of the way the bytes get generated meaning it rarely actually fails this check.

@cpennington
Copy link
Contributor Author

Yeah, I made the change because I was running a test with hypothesis-print-statistics, and saw about equal numbers of valid and invalid examples. I dug in, and saw that this mark_invalid was responsible. When I made this change, I ended up with almost no invalid examples (and twice the number of valid ones).


probe = gap + 1
while probe > gap:
probe = int_from_bytes(data.draw_bytes(nbytes, byte_distribution)) & mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is currently too long (81 characters I think). If you run 'make format' and commit the changes it will fix that for you.

probe = int_from_bytes(data.draw_bytes(nbytes, byte_distribution)) & mask

probe = gap + 1
while probe > gap:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this loop always executes at least once by design, coverage is complaining about it. If you add a "# pragma: no branch" to it that will fix it.

@DRMacIver
Copy link
Member

Interesting. In that case I'm definitely happy to merge it! I've added some review comments explaining what you need to do to get a green build.

@cpennington
Copy link
Contributor Author

Ok, I've fixed the formatting/coverage issues.

I think in general, re-drawing rather than declaring a sample invalid will lead to faster results, since it minimizes the amount of work that's re-done. Not sure if there's a good way to add catches for UnsatisfiedAssumption that would be generally correct while still giving that behavior.

@DRMacIver
Copy link
Member

I think in general, re-drawing rather than declaring a sample invalid will lead to faster results, since it minimizes the amount of work that's re-done. Not sure if there's a good way to add catches for UnsatisfiedAssumption that would be generally correct while still giving that behavior.

This happens in a few places (e.g. composite does this for UnsatisfiedAssumption within it).

There's a sort of balancing act to do with more complex situations though - sometimes aborting the test is the correct thing to do because otherwise you're going to keep retrying but never succeeding, stopping only when you hit the end of the buffer, which is potentially quite slow. In filter and composite I think he default is that it retries three times and then gives up.

I agree that for small fast cases like this one you're probably right though.

@cpennington
Copy link
Contributor Author

Yeah, that makes sense.

probe = int_from_bytes(
data.draw_bytes(nbytes, byte_distribution)
) & mask

if probe <= gap:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misdiagnosed the coverage issue before. The actual problem is that this condition is now always satisfied because the previous bit loops until it is, so the if is now redundant.

Without this patch, drawing an integer outside the valid range marks the
data as invalid, which forces a re-draw of all other data up to that
point. After the patch, we immediately redraw the integer until it is
within the target range. In practice, this doubled the number of valid
examples found within a bounded time period.
@cpennington
Copy link
Contributor Author

@DRMacIver: Can this merge now?

@DRMacIver
Copy link
Member

It can! Sorry I didn't pick up on your last change sooner. Thanks!

@DRMacIver DRMacIver merged commit 6ce793f into HypothesisWorks:master Sep 30, 2016
@cpennington
Copy link
Contributor Author

Great, thanks!

@cpennington cpennington deleted the fast-integers branch September 30, 2016 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants