Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move NumPy extra's strategies which are not dependent on NumPy into a seperate helper module #3067

Merged
merged 24 commits into from Aug 29, 2021

Conversation

honno
Copy link
Member

@honno honno commented Aug 24, 2021

As @Zac-HD suggested for #3065, this PR currently moves BasicIndexStrategy and MutuallyBroadcastableShapesStrategy from the NumPy extra into hypothesis.extra.__array_helpers.py. This change is useless on its own but should avoid duplicated code in the future. The public API remains unchanged.

In anticipation for Array API support, I have removed NumPy-specific naming conventions and boundary checks in the private helper strategies, which is covered by the NumPy extra's methods anywho. I also replace np.newaxis with None in the (private) index strategy.

See the honno/array-helpers-array-api branch for a working example of the Array API extra interfacing with __array_helpers.py too.

TODO: I need to properly evaluate array_shapes() and valid_tuple_axis(), as well as the public broadcastable and basic_indices() strategies. I'm concerned about the NumPy-specific checks that are involved, as well as what helper methods will need to go into __array_helpers.py, and how it could all effect #3065.

@honno honno changed the title Move BasicIndexStrategy and MutuallyBroadcastableShapesStrategy to seperate module Move NumPy extra's strategies which are not dependent on NumPy into a seperate helper module Aug 24, 2021
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'd like to share as much code as possible, which would include e.g. using an allow_newaxis argument for the generic array_api and documenting that newaxis is None - it might be a little less natural, but making the new module a drop-in replacement as much as possible seems useful.

Just let me know when this is ready for another review 😄

hypothesis-python/src/hypothesis/extra/__array_helpers.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/__array_helpers.py Outdated Show resolved Hide resolved
honno and others added 3 commits August 25, 2021 11:52
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
@honno
Copy link
Member Author

honno commented Aug 25, 2021

Just noting there's a force push because I accidentality started this PR branch a few commits behind master's HEAD 😅

Boundary checks were temporarily removed
@honno
Copy link
Member Author

honno commented Aug 25, 2021

Still need to work on this, namely commenting if @rsokl would like to take a look heh.

I've moved all the independent strategies to __array_helpers.py and have for the most part just mirrored them in the NumPy extra, modifying __doc__ where appropriate (it works surprisingly okay)—my honno/array-helpers-array-api branch should demonstrate how this change fits the needs of the Array API extra (as it currently exists). Something I'll sleep on is how I can nicely check and default-to bounds for NumPy and Array API respectively.


Update: I'm trying out this factory thing for all the strategies, which is a bit jarring but meets the needs of both the NumPy and Array API extras. Not sure if it's a necessary evil or if there's a better solution—I'm sure I can make it more palatable at least, and possibly a revised make_strategies_namespace() method in extra.array_api could remove the need for the clunkier logic. Something to sleep on.

@Zac-HD Zac-HD mentioned this pull request Aug 26, 2021
@Zac-HD
Copy link
Member

Zac-HD commented Aug 27, 2021

It looks like the only difference here is that Numpy is limited to 32 dims, while the array API standard does not impose a limit. Is there any real downside to just enforcing the 32-dim limit for everyone? I suspect that it still applies to numpy.array_api, for example, and it's not like working with dense arrays of >32 dimensions is practical anyway! (there are obviously some sparse use-cases, but you can't do that generically...)

Mostly this is because the factory code looks complicated enough to cause headaches for both users and maintainers as soon as something unexpected happens, so I'd rather avoid it.

@rsokl
Copy link
Contributor

rsokl commented Aug 27, 2021

Mostly this is because the factory code looks complicated enough to cause headaches for both users and maintainers as soon as something unexpected happens, so I'd rather avoid it.

Agreed, and in this current form the factory code prevents static type checkers and other tooling from seeing the signatures of the functions that are produced. I was going to recommend a more-complicated decorator that changes global state (the array-lib's) name within the context of the function's execution. This would enable transparency to tooling but would be even more complicated.

Given that this is all in service of merely changing max-dim, I have to wonder if it is simply okay to go with Zac's recommendation of enforcing MAXDIM=32. (And adjusting the error message so as to not mention "numpy" explicitly).

I think __array_helpers.py can be renamed to _array_helpers.py so that it matches the pattern of hypothesis' other internal modules.

@honno
Copy link
Member Author

honno commented Aug 27, 2021

Ok, so I think these are the issues that make any refactor potentially problematic:

  • Limit dims: I too have no idea who'd want to generate arrays with dims greater than 32, so yeah just hardcoding it makes life much easier 🥳 I have mostly (see below) just removed the factory methods.
  • ufuncs: Array API doesn't support them, so all that behaviour could be left in extra.numpy... but my experimentation led me to put all that behaviour in _array_helpers.py so that it can be nicely used in both the NumPy and future Array API extra—none of the signature behaviour requires importing NumPy!
  • 0d indices: Array API only supports indices for arrays with dims greater than 0, so it seems that a strategy factory make_basic_indices is required due to 1) default arg for min_dims 2) checking min_dims 3) doc niceties.

Assuming I'm on the right course, I need to work out the failing ghostwriter test case and sort out some more coverage issues (I didn't understand @check_function before today heh).

Copy link
Contributor

@rsokl rsokl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work on this! This should be pretty easy to wrap up. The remaining changes to be made are relatively simple; see below.

Regarding the ghostwriter test, hypothesis\hypothesis-python\tests\ghostwriter\recorded\magic_gufunc.txt needs to be updated to reflect the fact that ghostwriter will ultimately import mutually_broadcastable_shapes from _array_helpers.

I am a little confused by the loss of coverage at 301 in _array_helpers. @Zac-HD any idea how this might be happening?

@honno
Copy link
Member Author

honno commented Aug 27, 2021

I am a little confused by the loss of coverage at 301 in _array_helpers. @Zac-HD any idea how this might be happening?

Just incase folk missed the change—I removed np.lib.function_base._SIGNATURE to drop the NumPy dependency in _array_helpers.py, and used the (string) regex expression it pointed to instead (other than the coverage issue I hope that's okay).

hypothesis-python/tests/numpy/test_gen_data.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/_array_helpers.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved
hypothesis-python/RELEASE.rst Outdated Show resolved Hide resolved
honno and others added 4 commits August 28, 2021 09:45
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
Co-authored-by: Ryan Soklaski <ry26099@mit.edu>
Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>
@honno
Copy link
Member Author

honno commented Aug 28, 2021

I can't figure out the existing coverage problem (mind I'm rather ignorant about the coverage tool and how Hypothesis uses it), but otherwise I feel happy about this PR now.

@rsokl
Copy link
Contributor

rsokl commented Aug 28, 2021

Contra @rsokl, if it's not documented it's not public API - and shouldn't be added to all

@Zac-HD I think we should at least include BroadcastableShapes in __all__, since this is the type that is returned by broadcastable_shapes. I would argue that it is documented by proxy. It is not unreasonable that users would want to access this type for annotations.

@Zac-HD
Copy link
Member

Zac-HD commented Aug 28, 2021

Contra @rsokl, if it's not documented it's not public API - and shouldn't be added to all

@Zac-HD I think we should at least include BroadcastableShapes, since this is the type that is returned by broadcastable_shapes. It is not unreasonable that users would want to access this type for annotations.

Fair - it is mentioned in the docs, too. Let's also add that to __all__ then.

@honno
Copy link
Member Author

honno commented Aug 28, 2021

Contra @rsokl, if it's not documented it's not public API - and shouldn't be added to all

@Zac-HD I think we should at least include BroadcastableShapes, since this is the type that is returned by broadcastable_shapes. It is not unreasonable that users would want to access this type for annotations.

Fair - it is mentioned in the docs, too. Let's also add that to __all__ then.

Ah I've possibly misread—I had just put all returned custom types (i.e. also Shape and BasicIndex) in the extra.numpy namespace and specified them in __all__. (Update: Fixed to be just BroadcastableShapes on Ryan's suggestion)

@rsokl
Copy link
Contributor

rsokl commented Aug 28, 2021

@honno, regarding the coverage issue:

Under this current branch this test case now raises a different error:

>>> sig = "(" + ",".join(f"d{n}" for n in range(33)) + ")->()"
>>> mutually_broadcastable_shapes(signature=sig).example()
InvalidArgument: '(d0,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31,d32)->()' 
is not a valid gufunc signature

on master it raised the expected: matches Numpy's regex for gufunc signatures, but contains shapes with more than 32 dimensions and is thus invalid.

It looks like the regex signature you pasted in has escaped backslashes. This is the correct signature (it restores coverage):

r"^\((?:\w+(?:,\w+)*)?\)(?:,\((?:\w+(?:,\w+)*)?\))*->"
r"\((?:\w+(?:,\w+)*)?\)(?:,\((?:\w+(?:,\w+)*)?\))*$"

(I was surprised to find that Jupyter will automatically insert these additional slashes in its repr vs printing!)

image

Co-authored-by: Ryan Soklaski <ry26099@mit.edu>
@honno
Copy link
Member Author

honno commented Aug 28, 2021

I see now, hopefully fixed.

(I was surprised to find that Jupyter will automatically insert these additional slashes in its repr vs printing!)

Yeah I had evaluated np.lib.function_base._SIGNATURE in Jupyter and clearly this behaviour threw me haha.

hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/numpy.py Outdated Show resolved Hide resolved
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With one tiny nitpick about an error message, I think this is ready to merge! Thanks again @honno for your contribution - and patience with us through the process 🥰

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@Zac-HD Zac-HD merged commit 0f9ec99 into HypothesisWorks:master Aug 29, 2021
@honno
Copy link
Member Author

honno commented Aug 29, 2021

Awesome! Thank you Ryan & Zac for all your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants