-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding a simple subsets / subsequences strategy? #1115
Comments
|
I'd definitely be fine with adding this to the API! There are a couple things to consider though:
One option for the latter would be to only support this for orderable types and to sort the values up front. Would that be acceptable for your use case? |
|
Yup, those are good considerations! The collection size parameters might be useful, yeah. If there's an elegant way to implement them without taking away from the simplicity of the base strategy, that would be cool (though I'd personally also be happy without them). Regarding the ordering problem, my first instinct would be that the strategy should simply assume / require that the input is given in some reliable order, and warn or error if it isn't. This is what the |
|
I thought we were trying to get rid of Perhaps this could be implemented as an upgrade to n-choose-k is at least related, and equivalent to taking a subset once you discard the ordering. It would make sense to me to use it as a general strategy, and just have a more efficient path for unordered operations internally. |
I'm definitely fine with doing this is as a first pass! The problem is that it results in making it invalid to call |
I don't think this is much different to |
|
IIRC |
|
Hmm, just thinking out loud, how's this for a strategy that also supports @st.composite
def subsets(draw, elements, *, min_size, max_size):
def choices():
always_size = min_size
never_size = len(elements) - max_size
maybe_size = len(elements) - always_size - never_size
choices = (
[True] * always_size +
[draw(st.booleans()) for _ in range(maybe_size)] +
[False] * never_size
)
assert len(elements) == len(choices)
return draw(st.permutations(choices))
return {element for (element, choose) in zip(elements, choices()) if choose}If I understand it right, this variation should have the bonus feature of shrinking toward selecting earlier elements in the presence of |
|
Thinking out loud some more, the above might actually be more generally useful as a list-based n-choose-k strategy for subsequences, simply by changing the final set comprehension to a list comprehension. It could be named In that case, getting actual subsets of a set would simply be a special case of doing |
|
Simpler: add the min and max size arguments to |
|
@Zac-HD: That wouldn't be the same thing as It also "wastes" the entropy spent on the elements that get discarded: I'm not sure how much effect that has on the effectiveness of shrinking, but my intuition tells me it might? (With the |
|
True! (assuming we make it not a set comprehension 😉) I see that as a feature though; and it shrinks towards shortest prefix in input order. The shrinker would be fine; worst case it lowers that value alone and notices no change in status. The most efficient way is probably to loop over sampling an unused index, then including it in the output if drawing from an appropriately biased coin is True - this would allow the shrinker to delete pairs of draw calls. However at this higher level I wouldn't worry about the shrinker at all! |
|
Here's the updated straw implementation, just for easier reference: @st.composite
def subsequences(draw, elements, *, min_size, max_size):
def choices():
always_size = min_size
never_size = len(elements) - max_size
maybe_size = len(elements) - always_size - never_size
choices = (
[True] * always_size +
[draw(st.booleans()) for _ in range(maybe_size)] +
[False] * never_size
)
assert len(elements) == len(choices)
return draw(st.permutations(choices))
return [element for (element, choose) in zip(elements, choices()) if choose] |
|
Hi, this looks like a good one to implement for the PyCon sprints; if no-one has any objections, I'll get started implementing this 😁 (see #1513) |
|
Ok, I have a proof-of-concept implemented by tkcranny@e49a018 I'd love some feedback on how the argument validation should be done, and perhaps on more/better testing. |
|
@tkcranny: Thanks for taking this to implementation! :) |
|
Future work on this issue should build on #1533, with appropriate credit for that work. |
I find myself implementing this every now and then, to draw subsets of some fixed set of elements:
(This is essentially equivalent to
st.sets(st.sampled_from(elements), max_size=len(elements)), but simpler and presumably more efficient. (?))Would this, or something like it, be useful enough to consider adding to the core Hypothesis strategies?
The text was updated successfully, but these errors were encountered: