-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate Float
shrinker to the ir
#3899
Conversation
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py
Outdated
Show resolved
Hide resolved
Seeing some slow shrinks locally ( |
leaf = self.ir_tree_leaves[self.ir_tree_leaves_index] | ||
if leaf.ir_type != ir_type or kwargs != leaf.kwargs: | ||
# Unlike buffers, not every ir tree is a valid choice sequence. If | ||
# we want to maintain determinism we don't have many options here | ||
# beyond giving up when a modified tree becomes misaligned. | ||
# | ||
# For what it's worth, misaligned buffers — albeit valid — are rather | ||
# unlikely to be *useful* buffers, so this isn't an enormous | ||
# downgrade. | ||
self.mark_overrun() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shrinking may make modifications that misalign the ir tree. When working with buffers, we still run these all the way to a result, but my guess is they are very unlikely to be useful shrinks.
I don't know if there's anything better we can do for ir trees than giving up, beyond realigning tricks that are research papers in their own right. One idea is passing shrinker.engine.random
here and continuing randomly, which might work, but I don't know if the same ir tree prefix running to different buffers will blow up.
By bailing out early we do less work in the test function, which means we could actually be spending more time proportionally on useful shrinks by overrunning. So it might be a wash at the end of the day. But I don't have a good intuition.
(also, just realizing: this should maybe be mark_invalid rather than overrun)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DRMacIver would be the world expert on this one 😅
Let's start with the simplest implementation that gets all our tests passing, and then we can profile it and see if there's anything obviously-silly going on. "Continue randomly until not-a-shrink" would make sense to me, although I'm not confident it'd be a win; "continue but try realigning each time we leave an Example
" is also salient to me for the explain
phase (for shrinking, we'd also abandon if the misaligned-Example
was > the base).
Good and bad news, mostly good:
I've pushed all my work here so we can see how everything fits together in a single place, but I'm going to bundle a few smaller changes in a split pr shortly. As is typical, this pr got very large very quickly 😄 |
ex.label == DRAW_FLOAT_LABEL | ||
and len(ex.children) == 2 | ||
and ex.children[1].length == 8 | ||
leaf = chooser.choose(self.leaves, lambda leaf: leaf.ir_type == "float") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we call Float.shrink
on any float node (+ some short circuiting behavior in Float
). The previous behavior skipped Float.shrink
on floats that happened to also be integers:
buf = self.shrink_target.buffer
b = buf[u:v]
f = lex_to_float(int_from_bytes(b))
b2 = int_to_bytes(float_to_lex(f), 8)
if b == b2 or self.consider_new_buffer(buf[:u] + b2 + buf[v:]):
I think the if b == b2
condition there is equivalent to checking if is_simple(f)
. i.e., don't try to float-shrink float nodes if they are already integers. The comments in lexical.py
in this commit give some further insight:
# This floating point number can be represented in our simple format.
# So we try converting it to that (which will give the same float, but
# a different encoding of it). If that doesn't work then the float
# value of this doesn't unambiguously give the desired predicate, so
# this approach isn't useful. If it *does* work, then we're now in a
# situation where we don't need it, so either way we return here.
if is_simple(f):
self.incorporate_float(f)
return
To be honest, I'm not sold that we don't want to shrink float nodes as floats even if they happen to currently be integers. We mostly defer to the Integer
shrinker anyway, and it also seems to make logical sense to keep all float-shrink-logic in the same place instead of jumping around. But I'm probably missing a good historical reason for it to be done this way. It's easy to put this condition back either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intuition here is that we consider integer-valued floats simpler than all non-integer-valued floats, and so once we get to an integer-value we might as well use the machinery we have for shrinking integers.
You could probably convince me that a different ordering is better, but that's how I understand the status quo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think that makes sense to me. I'll add this safeguard back. I was thinking about the case where shrinking an integer-valued float is faster if you allow it to try intermediate float values. But Integer
probably gets you close enough to the minimum and then you can finish it off with a final Float
pass.
I'd like to take a detailed look at the shrink ordering at some point. I saw a few weird things like values in (0, 1) being ordered after almost everything else; assert float_to_lex(sys.float_info.max) < float_to_lex(0.5)
. But for another day!
I think Hypothesis' shrinker is so well tuned that I shouldn't make changes like this without profiling or being very confident in them first. Of course, we may have no choice but to be adventurous when we get to the more complicated passes 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values between zero and one: not integers, and we order kinda by number of decimal places, so. And the largest finite float is an integer, which means "simpler" than 0.5! (I did say I'm not that attached to the ordering)
Unfortunately ran out of time tonight for the aforementioned split pr, but will get to it tomorrow. |
Now that the infra is in place, I'm hopeful we can fly through these shrinker migrations. Only had time for this tonight, but more pulls to come! |
ahh, coverage failure - will have to wait for tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Small comments below, but this doesn't look like it'll suddenly grow on us...
hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py
Outdated
Show resolved
Hide resolved
3ca282e is a probably-bug-fix: we weren't considering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great with some very small comments below, please merge when they're addressed (or not) to your satisfaction!
hypothesis-python/src/hypothesis/internal/conjecture/shrinker.py
Outdated
Show resolved
Hide resolved
hypothesis-python/src/hypothesis/internal/conjecture/shrinking/floats.py
Outdated
Show resolved
Hide resolved
thanks for the comments, they were all helpful! |
Since we've closed #3086, do you want to open an issue to track all the remaining parts of the migrate-to-IR project? #3086 (comment) is not particularly detailed and I think we could probably break it down further now that we've come this far. |
have also been thinking that we need a better issue for the IR project. I'll write something up. |
I had originally tried writing a
shrinker_v2.py
from scratch...and quickly abandoned it as infeasible. I think my current plan is to convert shrinking passes one by one (where possible, which may or may not be all of them).minimize_floats
is one of the more directly applicable ones.Definitely open to feedback on the details of the migration approach here.
There is overlap in the
IRTree
definition with #3806, though it's not exactly the same. Either is fine to merge before the other; I'll handle conflict resolution whichever way.