-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutually_broadcastable_shapes much more constrained than expected #3170
Comments
|
@Zac-HD Is the documented defaulting of
No idea yet on how to improve |
Thanks for pointing this out!
These points are because
This is a bug that we should fix. Thanks again for reporting! I'd love to get your feedback on the diversity of the examples it generates again once the fix is merged; I think this might address most of your issues and but will re-engage on remaining points once you confirm them.
Do you mean that you need all-nonzero side lengths, in which case passing
Yes, it is - though it's hard to imagine a non-backwards-compatible change here which would still fit our API style guide. Separately I'm reluctant to change it - @rsokl and I put a lot of work into the design and tuning of this strategy. |
Thanks for bringing this to our attention. I just confirmed that this is a bug in the strategy, and I have a fix on the way. |
|
@asmeurer the patch has been merged. You should now find that there are no constraints on where singletons can manifest in the drawn values. |
|
I am going to go ahead and close this. The fix in #3172 should have remedied the fundamental issues here (thank you for bringing them to our attention!). The other points raised can all be remedied by simply modifying the default values that are provided to the strategy. |
|
Sorry for being slow to get back on this. I just tested the latest release, and I think this should stay open. Indeed the bug with not generating shapes that broadcast on a 1 dimension has been fixed, but the more general issue of I find it very surprising when the default behavior for this strategy doesn't produce "adversarial" draws. When I use floats(), for instance, I know it will always tell me if some strange floating point value that I failed to consider will break my code. If my code doesn't actually work with The default values of the parameters may make sense when combined with other parameters provided explicitly. But for the case when no parameters are provided (which I would expect to be the most common case), they don't make sense. Obviously the dimension sizes and number of dimensions should be limited so that if an actual array is created with the given shape it will fit in memory, but things could be increased many orders of magnitude before running into that limitation. As another aside, why does it not generate size 0 dimensions by default? This is another corner case situation I would expect to have to turn off, not on.
Either I'm not understanding you or you're not understanding me. prod(shape) is the number of elements of the array. It limits the size of the array. I always expect to have to do To be sure, it would be great if I could pass that as a parameter to a strategy rather than filtering, but that in itself isn't a huge deal. Again, filtering when hypothesis goes too far is fine. But the converse, hypothesis not going far in the first place, leaves me very worried that I'm not actually testing as much as I think I am. |
|
Small note on
What are the use cases of 0-sized dims? I've asked this before and I believe SciPy uses 'em, but outside the more core ecosystem I'm not sure if they're useful or annoying i.e. we'll be finding "bugs" in array-consuming functions where 99% of the time 0-sized dims would be erroneous input anyway and users would always have to write I think say with |
Actually this is looking like it would be more useful than I has previously thought. It's hard for me to use the existing parameters to create a result shape that isn't too large. The problem is that you can either make the average side length larger, or add more dimensions, but you can't do both because the total size is side_length**n_dimensions. Correct me if I'm wrong on this, but this seems like it would be much more tractable if done at the generation stage because you could keep track of the size so far when constructing the shape and avoid adding dimensions that exceed the total size (it might be a little more complicated than that to avoid introducing bias and to keep shrinking working). |
|
I think I've found another issue with the generation in Furthermore, the generation seems to be quite slow. The above code that generates 100 examples takes over 2 seconds to run. I am generating 32 shapes for the reason described at #3151. However, given this issue, I may have to switch back to the old way of generating |
Ah, yep, just a miscommunication. I thought you meant |
|
@honno has our reasoning for excluding side-length-zero arrays exactly right. While some super-generic code needs to handle such cases, this is so rare that we leave it off by default. A large part of that is that often the problems it reveals are upstream bugs which most users can't do much about. Concretely, how much does skipping these problems matter for most users when even Numpy itself leaves bug reports like numpy/numpy#15753 and numpy/numpy#16410 unfixed for so long? |
Yeah, we can't do it that way precisely because shrinking would become intractably difficult. One idea that might work is to choose a side length of one with probability related to the minimum dimensionality of the array shape, to constrain the elements-in-expectation. You'd still need to filter, but it would have much better scaling properties - literally picking a random-dimensional subspace to fill out. Overall, I think your main issue here is simply that you have a very different workload to almost all users of Hypothesis' array support. I regret that this means our API doesn't help you as much as we both want it to, but I just don't see a way to improve your workflow without compromising that of many more users. If you get stuck again, I'd suggest recreating our strategies using |
@asmeurer I can assure you that, if
Looking at: strat = mutually_broadcastable_shapes(num_shapes=32, max_dims=10)
for _ in range(10):
print([len(x) for x in strat.example().input_shapes])the diversity of shape-lengths generated is impressive! The length of the result-shape is looking at the max-length of each set of 32 shapes, which leads to the extreme statistics. Consider randomly sub-selecting from the collection of 32 shapes to improve your chances of getting more max shape-length diversity. |
I'm obviously pretty ignorant of how these things work, but how does this affect shrinking, given that a shrunk strategy will almost always be well below the size limit? Or do you mean specifically shrinking for the cases when it is near that boundary?
The 32 shapes thing that I'm using is irrelevant to my point here. That's why I used 3 in my examples. My point is about the size of the resulting shapes, which has nothing to do with how many shapes are generated (none of the defaults are dependent on num_shapes). I'll reiterate what I said before: "Obviously the dimension sizes and number of dimensions should be limited so that if an actual array is created with the given shape it will fit in memory, but things could be increased many orders of magnitude before running into that limitation." I fail to see how the current defaults are useful for the people it was designed for. Again, please look at the shapes that are generated. The biggest shape that can be generated is (3, 3). 9 elements. That's it. My own code had bugs that weren't being caught because the input shapes were only 2-dimensional. Only 2-dimensions with side length <= 3 is hardly what I would call "broad/general as possible" or "pretty interesting". By the way, in my use-case, I am generating arrays of the given shapes. Actually, I'm looping (in Python) over every element of the arrays in my test. So the size constraints you are talking about definitely apply to my own code. Allowing to specify a max size instead of the parameters also seems to go along with the "generating shapes that are useful for creating arrays with", which we both seem to be on board with.
I guess that makes sense. I think I'll try generating a base shape and using it instead. I'm actually having a pretty hard time finding a middle ground of input parameters to mutually_broadcastable_shapes that generates interesting enough examples while not generating shapes that are too large or creating health check errors from too much filtering. I also want to mention again the performance problem I noted in #3170 (comment) in case you didn't see it. |
What we are worried about is that the subsequent arrays will be large and thus draws from the
I don't think there is much for us to do here. Drawing up to |
But even array_shapes() itself draws more interesting shapes by default than mutually_broadcastable_shapes() (I would still say ndim=3 is still too small to catch many bugs, but it's better than 2). Again, it seems the more appropriate fix here is to limit the final array size, so that arrays with large dimensionalities or large sides can be generated, just not both at the same time. Limiting both at the same time means you can't catch a bug that only triggers for larger dimensionality or a bug that only triggers for larger side length. Worth mentioning also that I've never noticed any performance issues with using arrays() with much large shapes than the
I think something else is going on here. It actually takes 2 seconds to draw the just first example (this is evident if you actually run my code). All subsequent examples are drawn instantly. Does hypothesis have some sort of precomputation? |
@rsokl, nope, we can generate much longer lists than than; the average size before heuristics is
This is a good point, and I think making That said, I think we're always going to be working with smaller arrays than you're talking about - Hypothesis is literally unable to generate more than 4K array elements, so much larger arrays are going to be very sparse indeed (we assign an average of sqrt(elements) non-fill values for large). This is not as bad as it sounds, because bugs per minute of runtime is more important than per example, and smaller examples tend to be proportionally faster but almost as likely to find bugs (aside from size-related bugs we probably can't trigger anyway).
If you want to time generation, use hypothesis/hypothesis-python/src/hypothesis/strategies/_internal/strategies.py Lines 313 to 336 in f7155d2
|
Oh whoops! I realized there was a mistake in the code I was using to probe this. You are right, a batch of 100 examples can often produce lists of length 25 (or higher). |
I've been using the
mutually_broadcastable_shapesstrategy to generate some shapes. My test has been working well, but it appears the strategy is not really generating very interesting inputs by default. I only ended up noticing this on accident, because I noticed that a line of code wasn't being covered. I am basically usingmutually_broadcastable_shapes(num_shapes=n), wherenis chosen between 1 and 32 (see #3151).After playing with the strategy interactively, I found:
shapebut no such keyword argument exists. I'm assuming this is actually supposed to bebase_shape.(), which "corresponds to a scalar and thus does not constrain broadcasting at all." However, this appears to constrain the shapes that are actually generated by the strategy by quite a bit.max_dimsargument defaults to 2, if I understand the documentation correctly. 2 dimensions is not very many for getting good tests for mutually broadcastable shapes.max_sideargument defaults to 3.(2, 2) (2, 1). It only generates shapes that broadcast against missing dimensions.For example
It would seem that to get good behavior from this strategy, you must first generate a base shape then run the strategy against that. But 1) this is not really documented at all, 2) it seems pointless to have the default behavior generate almost useless outputs, dangerous in fact, since I expect most users won't notice this as I have, and 3) I'm not sure, but couldn't chaining strategies like "generate base_shape" -> mutually_broadcastable_arrays(base_shape=base_shape) produce the same sorts of problems I ran into in #3151?
To me, just having
mutually_broadcastable_shapes()produce reasonable but sufficiently complex shapes by default would be much better. I already expect to have to filter the strategy based on theprod(result_shape), which I generally already expect to do for any shape generating strategy in any instance where I create an actual array of the given shape.CC @honno
The text was updated successfully, but these errors were encountered: