Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array API extra #3065

Merged
merged 28 commits into from Sep 11, 2021
Merged

Array API extra #3065

merged 28 commits into from Sep 11, 2021

Conversation

honno
Copy link
Member

@honno honno commented Aug 24, 2021

What this introduces

This implements strategies for Array API implementations inside hypothesis.extra.array_api (closes #3037). As the Array API is largely based on NumPy behaviour, I imitated hypothesis.extra.numpy where appropriate and so these strategies will hopefully feel familiar to the extra's contributors and users.

Strategies in array_api do not import array modules, instead taking them as the xp argument. For the most part they assume that the user has indeed provided an Array API-compliant module.

The strategies would be used by Array API implementers (e.g. @asmeurer's compliance suite) and array-consuming libraries (examples).

Many tests are based on the test/numpy ones. There is a mock array-API implementation in xputils.py. Tests will try to import NumPy's Array API implementation (numpy/numpy#18585 was merged just today) and will fall back to the mocked one. I couldn't easily mock a "compliant-looking" array object so a particular non-compliance warning is suppressed and some assertions are skipped when necessary.

cc @mattip

Specific requests for feedback

An immediate concern I have is with how I form pretty ("reflective") reprs for xp consuming strategies. Default use of @defines_strategy (and thus LazyStrategy) is prone to produce some rather noisy repr strings, so I ended up wrapping these strategies with a custom decorator @pretty_xp_repr... it seems to be a hacky solution, especially since it gets called multiple times.

I'm also wondering if you're happy with the get_strategies_namespace() method that Zac suggested. It could be nice to not even have the top-level strategies (i.e. array module needs to be passed) to require users to use it, which could mitigate confusion. There could also be some magic where a user could import say extra.array_api.pytorch and have Hypothesis auto import a (future) Array API implementation in PyTorch.

I see the NumPy extra (and thus this PR) violates the house API style. Let me know if for array_api I should take the oppurtunity to drop potentially undesirable features in arrays() such as inferring strategies via dtype and passing kwargs via elements.

The shape/axis/index strategies were implemented namely to avoid importing NumPy by using extra.numpy, which also allows use of Array API naming conventions and removes some NumPy-specific limitations. They near-verbatim emulate those in the NumPy extra, so a future PR could have extra.numpy wrap them to deal with small differences.

The Array API is not finalised yet. Some tests may need slight modification in the future if the consortium decide in data-apis/array-api#212 that they don't want polymorphic return values in xp.unique().

@mattip
Copy link

mattip commented Aug 24, 2021

@asmeurer

@Zac-HD Zac-HD added interop how to play nicely with other packages new-feature entirely novel capabilities or strategies labels Aug 24, 2021
Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo! Thanks @honno; this is a really exciting new capability and great way help (test) the array ecosystem 😁

I've dumped some preliminary feedback below, but the key thing is that this looks good - I am confident that we can fix any small concerns and get this merged. Definitely a relief for such a large diff!

Remaining design-level issues:

  • Do we really need to expose a module full of functions in addition to the get_strategies_namespace() function? It seems to me that in practice this is much more ergonomic than manually passing around the xp array module, even when you're using several of them. I also want to think about the name - make feels more appropriate than get, but while strategies_namespace seems too vague I don't yet have a better idea.
  • Moving the non-array-module dependent strategies like array_shapes() or mutually_broadcastable_shapes() into a private namespace (e.g. hypothesis.extra.__array_helpers) to avoid duplicating code. IMO we can test these only against numpy, saving CI time. This should probably be a precursor PR (Move NumPy extra's strategies which are not dependent on NumPy into a seperate helper module #3067) after which we can rebase this one.

And more trivially, you'll need to add a RELEASE.rst for a minor release and add yourself (and any other contributors) to AUTHORS.rst so we give proper credit 😉

hypothesis-python/src/hypothesis/extra/array_api.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/array_api.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/array_api.py Outdated Show resolved Hide resolved
hypothesis-python/src/hypothesis/extra/array_api.py Outdated Show resolved Hide resolved
hypothesis-python/tests/array_api/test_scalar_dtypes.py Outdated Show resolved Hide resolved
hypothesis-python/tests/array_api/xputils.py Outdated Show resolved Hide resolved
@Zac-HD
Copy link
Member

Zac-HD commented Aug 25, 2021

An immediate concern I have is with how I form pretty ("reflective") reprs for xp consuming strategies ... it seems to be a hacky solution

If it's a hack but it works, we'll happily ship it! Especially really awful hacks which somehow improve the user experience; that's basically our signature around here 😉

There could also be some magic where a user could import say extra.array_api.pytorch and have Hypothesis auto import a (future) Array API implementation in PyTorch.

PEP 562 makes this easy with module-__getattr__; it would even support from hypothesis.extra.array_api import pytorch for free. Is there a standard way to discover the mapping from module names to their corresponding array-api implementations, e.g. numpy -> numpy.array_api?

If not I'm going to propose an update to the standard - the setuptools "entry points" API would be perfect for this - and a way to discover the installed array-api implementations seems generally useful. Just drop entry_points={"array_api": ["numpy = numpy.array_api"]} into your setup.py, and you're done!

I see the NumPy extra (and thus this PR) violates the house API style. Let me know if for array_api I should take the oppurtunity to drop potentially undesirable features in arrays() such as inferring strategies via dtype and passing kwargs via elements.

Thanks for checking! I think these are mostly fine, it's more array-specific style than bad style.

The Array API is not finalised yet. Some tests may need slight modification in the future if the consortium decide in data-apis/array-api#212 that they don't want polymorphic return values in xp.unique().

IMO we should probably add a warning to the docs: "The Array API standard is not yet final. We may therefore make breaking changes in minor releases, if this is necessary to support newer versions of the standard."

@honno
Copy link
Member Author

honno commented Aug 25, 2021

I'm really glad to hear I'm on the right track :) Thanks for the great feedback too!

PEP 562 makes this easy with module-__getattr__; it would even support from hypothesis.extra.array_api import pytorch for free. Is there a standard way to discover the mapping from module names to their corresponding array-api implementations, e.g. numpy -> numpy.array_api?

There is no standard way to discover the Array API implementation from a top-level package, where the only module discovery mechanism was built for array-consuming libraries by way of xp = x.__array_namespace__().

I was thinking of directly mapping the imports per module, e.g. if Jax had jax.numpy.array_api then it would be hardcoded in. I imagine <top-level-module>.array_api will become a convetion at least, so that could be tried first. A general solution would indeed be nicer.

Note to self:

  • Have make_strategies_namespace() as the single public top-level function
    • Docs still need to explain strategies
    • Could possibly remove current @pretty_xp_repr pattern
    • Review reprs
  • Write Update RELEASE.rst
  • Update AUTHORS.rst
  • Warn on Array API instability in docs
  • Ensure correct coverage behaviour for checking logic
  • Remove tests already covered in tests/numpy
  • Workout what Faster strategy for arrays(..., unique=True)  #3076 is doing and how it can apply to my arrays() or leave it.
  • Remove duplicated tests

@rsokl
Copy link
Contributor

rsokl commented Aug 25, 2021

@honno this is looking great! I have been involved in maintaining hypothesis.extra.numpy, so I am happy to lend a hand wherever it might be useful. @Zac-HD please let me know if there is anything that I might take off of your very-full plate for you.

@asmeurer
Copy link
Contributor

I was thinking of directly mapping the imports per module, e.g. if Jax had jax.numpy.array_api then it would be hardcoded in. I imagine .array_api will become a convetion at least, so that could be tried first. A general solution would indeed be nicer.

The entry points would be much nicer. I imagine some libraries will indeed follow NumPy with .array_api, but not all. At least some will just implement the array API at the top-level namespace (I believe pytorch is planning on doing this).

@Zac-HD
Copy link
Member

Zac-HD commented Aug 26, 2021

Zac-HD please let me know if there is anything that I might take off of your very-full plate for you.

🤗 In order of priority, it would be great if you could shepard #3067 - just ping me for a final review once it's ready to merge. I'd also appreciate a second set of eyes on this PR for design-level review (and probably eventually nitpicky code review, but after we rebase).

If you're up for both I can take a stab at #3066 this weekend 🚀

@Zac-HD
Copy link
Member

Zac-HD commented Aug 30, 2021

@honno - are you up for squash-rebasing on master, to make this a little easier to review?

@honno
Copy link
Member Author

honno commented Aug 30, 2021

@honno - are you up for squash-rebasing on master, to make this a little easier to review?

@Zac-HD done! (I think this is what you meant? :)

@Zac-HD
Copy link
Member

Zac-HD commented Aug 30, 2021

I think our main remaining points are

  • documentation, including updated release notes
  • review pretty reprs - how should we show which module strategies are bound to?
  • removing duplicated tests where we share functions with the Numpy extra
  • (Zac) add a module-__getattr__+entrypoint hack to support from hypothesis.extra.array_api import xp

Actually this last thing, together with our mock module, might solve the docs issue - just point Sphinx at array_api.xp! I'm also happy to make the array API extra require Python 3.7+ so that we can unconditionally rely on __getattr__; 3.6 has already been dropped by Numpy et al and Hypothesis will drop it entirely with the PSF this December.

@rsokl, have I missed anything?

@honno
Copy link
Member Author

honno commented Aug 30, 2021

Actually this last thing, together with our mock module, might solve the docs issue - just point Sphinx at array_api.xp!

This seems like the cleanest easy solution, although I think hypothesis.extra.array_api.xp.* Sphinx signatures could be a bit confusing as opposed to xps.* or hypothesis.extra.array_api.<LIBRARY>.*.

Also noting the current mock depends on NumPy, so maybe a specific hook triggers for array_api.xp which could try import numpy anyway and raise a helpful error if it's not installed.

I'm also happy to make the array API extra require Python 3.7+ so that we can unconditionally rely on __getattr__; 3.6 has already been dropped by Numpy et al and Hypothesis will drop it entirely with the PSF this December.

Incase this could be useful, Array API implementations have to be Python 3.8 or above due to the use of positional-only parameters, although maybe there's a world folk will ship/want to play around with 3.7 partial adopters.

review pretty reprs - how should we show which module strategies are bound to?

Yep my commit today e2ca6af have the namespaced strategies not show the module name anymore—more of a consequence of how I've dropped top-level strategies than a purposeful decision. Showing the module binds still seems nice tho.

@honno
Copy link
Member Author

honno commented Aug 31, 2021

On my end I'm pretty happy with everything for this PR, it's just the docs issue.

If we go the entry points route then we can utilise a mock like Zac suggested and generate docs from there at least (note it's not as simple as automodule:: hypothesis.extra.array_api.xp if xp is a SimpleNamespace), although per my previous comment I'd prefer signatures to look generic-y e.g. hypothesis.extra.array_api.<LIBRARY>.

If we're not doing entry points for this PR, or would rather highlight the xps = make_strategies_namespace(xp) method anyway, then I'll need to play around more to nicely generate xps.* Sphinx signatures + docs. And if the entry points proposal doesn't resolve soon, maybe it's best to go this route for now.

The TODO now seems to be:

  • Docs solution
  • Remove duplicated tests
  • See if we want module names in namespaced strategy reprs... and how we'd get that
  • Update RELEASE.rst, depending on how entry points support goes

Future PR ideas:

  • Optimise unique=True and empty fill like extra.numpy now does
  • Have tests/array_api able to run on multiple Array API implementations (see comment)

@Zac-HD
Copy link
Member

Zac-HD commented Aug 31, 2021

I've got a paper deadline on Saturday (ICSE'22); unlikely I'll have much time before then but happy to implement my getattr idea and see about docs once the paper is in.

Thanks again for all your work on this - it's been skilled, patient, valuable, and very much appreciated. I'm really looking forward to shipping it to the array-oriented parts of the Python community ☺️

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to merge!

And as I've said before, thanks for all your work on this - it's been skilled, patient, valuable, and very much appreciated. I look forward to seeing what you work on next 😁

@Zac-HD Zac-HD merged commit 420fdf8 into HypothesisWorks:master Sep 11, 2021
@honno
Copy link
Member Author

honno commented Sep 11, 2021

Oh wow, awesome! Likewise I really appreciated the feedback process for all the PRs, and grateful to of learnt sooo much about Hypothesis and beyond. Down the line feel free to ping me on Array API issues.

I completely forgot to ask about the check_function typing problem heh, thanks for fixing it.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 11, 2021

Oh wow, awesome! Likewise I really appreciated the feedback process for all the PRs, and grateful to of learnt sooo much about Hypothesis and beyond. Down the line feel free to ping me on Array API issues.

I'm happy to pay it forward - expanding on hypothesis.extra.numpy was my first ever contribution to Hypothesis, and I learned a lot from David's reviews there too 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop how to play nicely with other packages new-feature entirely novel capabilities or strategies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Array API
5 participants