Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Also calculate blocks lazily #1830

Merged
merged 18 commits into from Feb 28, 2019

Conversation

Projects
None yet
3 participants
@DRMacIver
Copy link
Member

DRMacIver commented Feb 22, 2019

Hey, psst, buddy, would you like to see some really weird code? Then boy do I have the PR for you.

Following on from #1828 where we calculated examples on first use, this also allows us to calculate Block objects on first usage.

Unfortunately the big differences between Example and Block are that a) We use blocks a lot more and b) We need them during generation time. This means that the Example approach of just deferring until we first access the collection is a non-starter.

So this does uh something else. The basic idea is that we can create a collection that only looks like a list of blocks but actually only stores the endpoints until we first ask it for each block object.

Important features:

  • If we only ask for a small number of blocks then we only build a small number of blocks. e.g. stateful testing asks for the last block generated (due to slightly dodgy reasons), and doing that should not cause all blocks to be calculated.
  • A lot of things that could allocate blocks are exposed as helper methods that don't.
  • We do a bunch of probably over-clever stuff for transforming the representation into more memory efficient forms as we go.

Builds on top of #1826, so earlier commits are from that PR.

@DRMacIver DRMacIver force-pushed the DRMacIver/magic-blocks branch 2 times, most recently from 2730259 to e34c5ed Feb 22, 2019

@Zalathar

This comment has been minimized.

Copy link
Contributor

Zalathar commented Feb 27, 2019

We need them during generation time. This means that the Example approach of just deferring until we first access the collection is a non-starter.

I suspect that the number of places that actually need blocks at generation-time is small enough that we can edit them all to not need an actual Block.

If so, then we could assume that block instances aren't created until after generation is finished and the ConjectureData is frozen. That could eliminate the need for some of the resizing code, even if we still need lazy instance-creation.

(However, I haven't investigated whether this actually works in practice.)

I tried this briefly, and (predictably) it was not as easy as I thought. I still think it's possible in principle, but it needs more setup than I'm willing to do at the moment, so don't worry about it.

@DRMacIver

This comment has been minimized.

Copy link
Member Author

DRMacIver commented Feb 27, 2019

I tried this briefly, and (predictably) it was not as easy as I thought. I still think it's possible in principle, but it needs more setup than I'm willing to do at the moment, so don't worry about it.

Yeah this is about where I got to.

ETA: Also I think that even if we didn't need them during generation time this would probably still be a win because of the reduced storage requirements - many block usages don't actually need the whole list. I'm actually wondering about making examples a similarly lazy collection, just one where we still have to start by doing the hard work of precomputing the endpoints structure.

@DRMacIver DRMacIver force-pushed the DRMacIver/magic-blocks branch from e34c5ed to 121803f Feb 27, 2019

@Zac-HD

Zac-HD approved these changes Feb 27, 2019

Copy link
Member

Zac-HD left a comment

I'd add another assertion to catch mistakes in use, but otherwise LGTM.

@Zac-HD

This comment has been minimized.

Copy link
Member

Zac-HD commented Feb 27, 2019

Closing and reopening to reset CI...

@Zac-HD Zac-HD closed this Feb 27, 2019

@Zac-HD Zac-HD reopened this Feb 27, 2019

@DRMacIver DRMacIver force-pushed the DRMacIver/magic-blocks branch from 00d54eb to 9680f8f Feb 27, 2019

# actual result, but this isn't the best representation once we
# stop being sparse and want to use most of the blocks. Switch
# over to a list at that point.
if self.__sparse and len(self.__blocks) * 2 >= len(self):

This comment has been minimized.

@Zalathar

Zalathar Feb 28, 2019

Contributor

I can imagine a scenario where this gets tripped early during generation (e.g. 50% of 2 blocks is 1 block), and then we end up switching over to a list when it would be better to stick with the dictionary for a while.

I don’t know how realistic that is, and the penalty should only be a modest space increase, so I don’t know if it’s worth worrying about.

This comment has been minimized.

@DRMacIver

DRMacIver Feb 28, 2019

Author Member

Yeah I wondered about that. My inclination that it was probably not worrying about until we see it in the wild as a significant problem - the usage patterns that trigger it are likely to be not that common and the failure mode is just increased memory usage. Also until we actually see it happening it's hard to know what a good heuristic for avoiding it is. I could add a lower bound if you want (say don't trigger it until we have at least 5 elements?) but I thought it might be better to wait and see.

This comment has been minimized.

@Zalathar

Zalathar Feb 28, 2019

Contributor

I think it's OK to try as-is.

This comment has been minimized.

@Zac-HD

Zac-HD Feb 28, 2019

Member

IMO we shouldn't complicate our optimisations unless and until we actually observe a problem.

At which point porting to Rust might be a better choice anyway...

This comment has been minimized.

@DRMacIver

DRMacIver Feb 28, 2019

Author Member

At which point porting to Rust might be a better choice anyway...

I have found myself thinking "goddammit I wish I'd moved this code to Rust already" so many times in the last two weeks.

@DRMacIver DRMacIver force-pushed the DRMacIver/magic-blocks branch from 9680f8f to ffd6f0c Feb 28, 2019

@DRMacIver DRMacIver merged commit 58e1892 into master Feb 28, 2019

4 checks passed

HypothesisWorks.hypothesis #20190228.8 succeeded
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@DRMacIver DRMacIver deleted the DRMacIver/magic-blocks branch Feb 28, 2019

@Zalathar

This comment has been minimized.

Copy link
Contributor

Zalathar commented Feb 28, 2019

After some more poking around, I confirmed that there's only one genuine attempt to obtain a Block during generation, and it's easy to update if necessary:

block_length = data.blocks[-1].length

If I find a way to turn this knowledge into a useful simplification of Blocks, I'll open a separate PR.

@DRMacIver

This comment has been minimized.

Copy link
Member Author

DRMacIver commented Feb 28, 2019

If I find a way to turn this knowledge into a useful simplification of Blocks, I'll open a separate PR.

I guess we could just always write the length of the last block generated to the ConjectureData object. I do think this won't dramatically simplify Blocks (the lazy creation is still desirable) but it would be nice to be able to remove some of the logic around whether the collection could still grow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.