Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out iteration over pairs of blocks #1594

Merged
merged 5 commits into from Sep 24, 2018

Conversation

Zalathar
Copy link
Contributor

This change introduces Shrinker.for_each_pair_of_blocks, and updates shrinker passes shrink_offset_pairs and minimize_block_pairs_retaining_sum to use it instead of manually iterating over block pairs.

Along the way it introduces a Block class that ConjectureData now uses to record its block information, instead of keeping a list of (start, end) pairs.

@Zalathar
Copy link
Contributor Author

(This isn't really motivated by anything in particular, other than the fact that it seemed like a nice cleanup.)

Copy link
Contributor

@alexwlchan alexwlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a cursory reading, it looks like the diffs are functionally equivalent.

One thought: it looks like Block is created once, and then never modified. You can set frozen=True for a small runtime performance impact, but then any attempt to modify it will throw an exception. Not sure if worth adding here, but noting for interest: http://www.attrs.org/en/stable/api.html#attr.s

@@ -123,6 +144,9 @@ def depth(self):
def index(self):
return len(self.buffer)

def each_block_bounds(self):
return (block.bounds for block in self.blocks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be more comfortable returning a list than a generator here - slightly less efficient, but fewer ways to stuff up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the idea was to not need this method at all. It exists as a concession to the existing code sites that did for u, v in data.blocks, because I couldn't find a cleaner way to update them.

I have a slight preference for it being a generator, to encourage any new code to use data.blocks directly instead.

But I could go either way on it, and I can't deny that I already made one of the mistakes you hinted at, in a previous version of update_shrink_target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling either way, but FWIW using list comprehensions is usually slightly more efficient, due to the vagaries of Python performance. It's almost certainly dwarfed by the other micro-inefficiencies in the implementation of the shrinker though.

Copy link
Member

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice clean up, thanks! I've left some thoughts on a few specific things, but they're all entirely optional, and I'm happy for this to be merged when you're happy to merge it.

i = block_i.index
j = block_j.index

value_i = int_from_block(i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a request for change, more of a thought for discussion: When I was looking at the block class, I was thinking it was a shame that you couldn't access its contents from the class. I wonder if it would make sense to give Block a weak reference to the enclosing data so that that could be calculated on demand (actually storing it for each block would be quite memory hungry), and then value this be value_i = block_i.as_int() or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been thinking about similar helper methods, for slicing block data out of a buffer and for splicing it back into a candidate buffer.

(One thing to be careful of is keeping track of which buffer we actually want to read from. This can be subtle when running one of the shrinking/ kernels, and potentially updating the shrink target between iterations.)

if new_target.blocks != self.shrink_target.blocks:
if (
len(new_target.blocks) != len(self.shrink_target.blocks) or
list(new_target.each_block_bounds()) !=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW attrs gives us good equality methods, so it might make sense to just do new_target.blocks != self.shrink_target.blocks. That's technically a functionality change though (there are some cases in which this would now trigger that wouldn't previously, but I think those are all fine), so up to you if you want to do it. Also we'd need to take care to then exclude the reference to the enclosing example from equality if we do ahead with the thing from my other comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the presence of all_zero would make this a net loss.

We don't necessarily want to throw away change-tracking information just because a block was reduced to zeros.

When the shrinker pass 'minimize_block_pairs_retaining_sum' scans the
block list, it should include blocks containing only zeros as potential
destinations for value-shifting.

The current implementation happens to satisfy this condition, but it's
easy to accidentally break during refactoring, making the shrinker
silently worse.
Block information should be read-only, though it would be reasonable to
remove this flag if it becomes necessary to add extra information after
the block is created.
@Zalathar Zalathar merged commit d01835b into HypothesisWorks:master Sep 24, 2018
@Zalathar Zalathar deleted the each-pair-of-blocks branch September 24, 2018 10:15
@@ -1771,6 +1771,32 @@ def buffer(self):
def blocks(self):
return self.shrink_target.blocks

def all_block_bounds(self):
return self.shrink_target.all_block_bounds()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh. I've belatedly realised that this logic is wrong. The problem is that if self.shrink_target changes during iteration then this uses stale information.

My suspicion is that this pans out in a way that mostly just causes performance problems, but it would be nice to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inspected all of the callers, and I'm pretty sure that none of them will update the target while iterating over this list.

(If they did, then they would have been wrong before I touched them, since they were previously iterating directly over blocks.)

The places that do need up-to-date information should already all be doing the right thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants