New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out iteration over pairs of blocks #1594
Factor out iteration over pairs of blocks #1594
Conversation
(This isn't really motivated by anything in particular, other than the fact that it seemed like a nice cleanup.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a cursory reading, it looks like the diffs are functionally equivalent.
One thought: it looks like Block
is created once, and then never modified. You can set frozen=True
for a small runtime performance impact, but then any attempt to modify it will throw an exception. Not sure if worth adding here, but noting for interest: http://www.attrs.org/en/stable/api.html#attr.s
@@ -123,6 +144,9 @@ def depth(self): | |||
def index(self): | |||
return len(self.buffer) | |||
|
|||
def each_block_bounds(self): | |||
return (block.bounds for block in self.blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be more comfortable returning a list than a generator here - slightly less efficient, but fewer ways to stuff up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally the idea was to not need this method at all. It exists as a concession to the existing code sites that did for u, v in data.blocks
, because I couldn't find a cleaner way to update them.
I have a slight preference for it being a generator, to encourage any new code to use data.blocks
directly instead.
But I could go either way on it, and I can't deny that I already made one of the mistakes you hinted at, in a previous version of update_shrink_target
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong feeling either way, but FWIW using list comprehensions is usually slightly more efficient, due to the vagaries of Python performance. It's almost certainly dwarfed by the other micro-inefficiencies in the implementation of the shrinker though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice clean up, thanks! I've left some thoughts on a few specific things, but they're all entirely optional, and I'm happy for this to be merged when you're happy to merge it.
i = block_i.index | ||
j = block_j.index | ||
|
||
value_i = int_from_block(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a request for change, more of a thought for discussion: When I was looking at the block class, I was thinking it was a shame that you couldn't access its contents from the class. I wonder if it would make sense to give Block
a weak reference to the enclosing data so that that could be calculated on demand (actually storing it for each block would be quite memory hungry), and then value this be value_i = block_i.as_int()
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had been thinking about similar helper methods, for slicing block data out of a buffer and for splicing it back into a candidate buffer.
(One thing to be careful of is keeping track of which buffer we actually want to read from. This can be subtle when running one of the shrinking/
kernels, and potentially updating the shrink target between iterations.)
if new_target.blocks != self.shrink_target.blocks: | ||
if ( | ||
len(new_target.blocks) != len(self.shrink_target.blocks) or | ||
list(new_target.each_block_bounds()) != |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW attrs gives us good equality methods, so it might make sense to just do new_target.blocks != self.shrink_target.blocks
. That's technically a functionality change though (there are some cases in which this would now trigger that wouldn't previously, but I think those are all fine), so up to you if you want to do it. Also we'd need to take care to then exclude the reference to the enclosing example from equality if we do ahead with the thing from my other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the presence of all_zero
would make this a net loss.
We don't necessarily want to throw away change-tracking information just because a block was reduced to zeros.
When the shrinker pass 'minimize_block_pairs_retaining_sum' scans the block list, it should include blocks containing only zeros as potential destinations for value-shifting. The current implementation happens to satisfy this condition, but it's easy to accidentally break during refactoring, making the shrinker silently worse.
Block information should be read-only, though it would be reasonable to remove this flag if it becomes necessary to add extra information after the block is created.
35932b7
to
ea79769
Compare
@@ -1771,6 +1771,32 @@ def buffer(self): | |||
def blocks(self): | |||
return self.shrink_target.blocks | |||
|
|||
def all_block_bounds(self): | |||
return self.shrink_target.all_block_bounds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh. I've belatedly realised that this logic is wrong. The problem is that if self.shrink_target
changes during iteration then this uses stale information.
My suspicion is that this pans out in a way that mostly just causes performance problems, but it would be nice to fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inspected all of the callers, and I'm pretty sure that none of them will update the target while iterating over this list.
(If they did, then they would have been wrong before I touched them, since they were previously iterating directly over blocks
.)
The places that do need up-to-date information should already all be doing the right thing.
This change introduces
Shrinker.for_each_pair_of_blocks
, and updates shrinker passesshrink_offset_pairs
andminimize_block_pairs_retaining_sum
to use it instead of manually iterating over block pairs.Along the way it introduces a
Block
class thatConjectureData
now uses to record its block information, instead of keeping a list of(start, end)
pairs.