Factor out iteration over pairs of blocks #1594

Zalathar · 2018-09-23T07:24:15Z

This change introduces Shrinker.for_each_pair_of_blocks, and updates shrinker passes shrink_offset_pairs and minimize_block_pairs_retaining_sum to use it instead of manually iterating over block pairs.

Along the way it introduces a Block class that ConjectureData now uses to record its block information, instead of keeping a list of (start, end) pairs.

Zalathar · 2018-09-23T07:30:27Z

(This isn't really motivated by anything in particular, other than the fact that it seemed like a nice cleanup.)

alexwlchan

Based on a cursory reading, it looks like the diffs are functionally equivalent.

One thought: it looks like Block is created once, and then never modified. You can set frozen=True for a small runtime performance impact, but then any attempt to modify it will throw an exception. Not sure if worth adding here, but noting for interest: http://www.attrs.org/en/stable/api.html#attr.s

Zac-HD · 2018-09-23T14:03:34Z

hypothesis-python/src/hypothesis/internal/conjecture/data.py

@@ -123,6 +144,9 @@ def depth(self):
    def index(self):
        return len(self.buffer)

+    def each_block_bounds(self):
+        return (block.bounds for block in self.blocks)


I'd be more comfortable returning a list than a generator here - slightly less efficient, but fewer ways to stuff up later.

Originally the idea was to not need this method at all. It exists as a concession to the existing code sites that did for u, v in data.blocks, because I couldn't find a cleaner way to update them.

I have a slight preference for it being a generator, to encourage any new code to use data.blocks directly instead.

But I could go either way on it, and I can't deny that I already made one of the mistakes you hinted at, in a previous version of update_shrink_target.

I don't have a strong feeling either way, but FWIW using list comprehensions is usually slightly more efficient, due to the vagaries of Python performance. It's almost certainly dwarfed by the other micro-inefficiencies in the implementation of the shrinker though.

DRMacIver

This is a nice clean up, thanks! I've left some thoughts on a few specific things, but they're all entirely optional, and I'm happy for this to be merged when you're happy to merge it.

DRMacIver · 2018-09-24T05:27:10Z

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

+            i = block_i.index
+            j = block_j.index
+
+            value_i = int_from_block(i)


This is not a request for change, more of a thought for discussion: When I was looking at the block class, I was thinking it was a shame that you couldn't access its contents from the class. I wonder if it would make sense to give Block a weak reference to the enclosing data so that that could be calculated on demand (actually storing it for each block would be quite memory hungry), and then value this be value_i = block_i.as_int() or something.

I had been thinking about similar helper methods, for slicing block data out of a buffer and for splicing it back into a candidate buffer.

(One thing to be careful of is keeping track of which buffer we actually want to read from. This can be subtle when running one of the shrinking/ kernels, and potentially updating the shrink target between iterations.)

DRMacIver · 2018-09-24T05:29:30Z

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

-            if new_target.blocks != self.shrink_target.blocks:
+            if (
+                len(new_target.blocks) != len(self.shrink_target.blocks) or
+                list(new_target.each_block_bounds()) !=


FWIW attrs gives us good equality methods, so it might make sense to just do new_target.blocks != self.shrink_target.blocks. That's technically a functionality change though (there are some cases in which this would now trigger that wouldn't previously, but I think those are all fine), so up to you if you want to do it. Also we'd need to take care to then exclude the reference to the enclosing example from equality if we do ahead with the thing from my other comment.

I think the presence of all_zero would make this a net loss.

We don't necessarily want to throw away change-tracking information just because a block was reduced to zeros.

hypothesis-python/src/hypothesis/internal/conjecture/data.py

When the shrinker pass 'minimize_block_pairs_retaining_sum' scans the block list, it should include blocks containing only zeros as potential destinations for value-shifting. The current implementation happens to satisfy this condition, but it's easy to accidentally break during refactoring, making the shrinker silently worse.

Block information should be read-only, though it would be reasonable to remove this flag if it becomes necessary to add extra information after the block is created.

DRMacIver · 2018-09-25T12:07:55Z

hypothesis-python/src/hypothesis/internal/conjecture/engine.py

@@ -1771,6 +1771,32 @@ def buffer(self):
    def blocks(self):
        return self.shrink_target.blocks

+    def all_block_bounds(self):
+        return self.shrink_target.all_block_bounds()


Argh. I've belatedly realised that this logic is wrong. The problem is that if self.shrink_target changes during iteration then this uses stale information.

My suspicion is that this pans out in a way that mostly just causes performance problems, but it would be nice to fix this.

I inspected all of the callers, and I'm pretty sure that none of them will update the target while iterating over this list.

(If they did, then they would have been wrong before I touched them, since they were previously iterating directly over blocks.)

The places that do need up-to-date information should already all be doing the right thing.

Zalathar requested a review from DRMacIver September 23, 2018 07:24

alexwlchan reviewed Sep 23, 2018

View reviewed changes

Zac-HD reviewed Sep 23, 2018

View reviewed changes

DRMacIver approved these changes Sep 24, 2018

View reviewed changes

Zalathar added 5 commits September 24, 2018 19:12

Store block information in a dedicated structure

3f800f1

Factor out iteration over pairs of blocks

5d41cb2

Mark block records as frozen

e476b1e

Block information should be read-only, though it would be reasonable to remove this flag if it becomes necessary to add extra information after the block is created.

Return block bounds as a list instead of a generator

ea79769

Zalathar force-pushed the each-pair-of-blocks branch from 35932b7 to ea79769 Compare September 24, 2018 09:12

Zalathar merged commit d01835b into HypothesisWorks:master Sep 24, 2018

Zalathar deleted the each-pair-of-blocks branch September 24, 2018 10:15

DRMacIver reviewed Sep 25, 2018

View reviewed changes

Zalathar mentioned this pull request Sep 26, 2018

Add docstrings and comments for Example and Block #1601

Merged

Zalathar mentioned this pull request Oct 4, 2018

Pytests are stuck on 77% with all hypothesis versions since 3.71.10 #1620

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factor out iteration over pairs of blocks #1594

Factor out iteration over pairs of blocks #1594

Zalathar commented Sep 23, 2018

Zalathar commented Sep 23, 2018

alexwlchan left a comment

Zac-HD Sep 23, 2018

Zalathar Sep 24, 2018

DRMacIver Sep 24, 2018

DRMacIver left a comment

DRMacIver Sep 24, 2018

Zalathar Sep 24, 2018

DRMacIver Sep 24, 2018

Zalathar Sep 24, 2018

DRMacIver Sep 25, 2018

Zalathar Sep 25, 2018

Factor out iteration over pairs of blocks #1594

Factor out iteration over pairs of blocks #1594

Conversation

Zalathar commented Sep 23, 2018

Zalathar commented Sep 23, 2018

alexwlchan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DRMacIver left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment