Array API extra #3065

honno · 2021-08-24T09:25:56Z

What this introduces

This implements strategies for Array API implementations inside hypothesis.extra.array_api (closes #3037). As the Array API is largely based on NumPy behaviour, I imitated hypothesis.extra.numpy where appropriate and so these strategies will hopefully feel familiar to the extra's contributors and users.

Strategies in array_api do not import array modules, instead taking them as the xp argument. For the most part they assume that the user has indeed provided an Array API-compliant module.

The strategies would be used by Array API implementers (e.g. @asmeurer's compliance suite) and array-consuming libraries (examples).

Many tests are based on the test/numpy ones. There is a mock array-API implementation in xputils.py. Tests will try to import NumPy's Array API implementation (numpy/numpy#18585 was merged just today) and will fall back to the mocked one. I couldn't easily mock a "compliant-looking" array object so a particular non-compliance warning is suppressed and some assertions are skipped when necessary.

cc @mattip

Specific requests for feedback

An immediate concern I have is with how I form pretty ("reflective") reprs for xp consuming strategies. Default use of @defines_strategy (and thus LazyStrategy) is prone to produce some rather noisy repr strings, so I ended up wrapping these strategies with a custom decorator @pretty_xp_repr... it seems to be a hacky solution, especially since it gets called multiple times.

I'm also wondering if you're happy with the get_strategies_namespace() method that Zac suggested. It could be nice to not even have the top-level strategies (i.e. array module needs to be passed) to require users to use it, which could mitigate confusion. There could also be some magic where a user could import say extra.array_api.pytorch and have Hypothesis auto import a (future) Array API implementation in PyTorch.

I see the NumPy extra (and thus this PR) violates the house API style. Let me know if for array_api I should take the oppurtunity to drop potentially undesirable features in arrays() such as inferring strategies via dtype and passing kwargs via elements.

The shape/axis/index strategies were implemented namely to avoid importing NumPy by using extra.numpy, which also allows use of Array API naming conventions and removes some NumPy-specific limitations. They near-verbatim emulate those in the NumPy extra, so a future PR could have extra.numpy wrap them to deal with small differences.

The Array API is not finalised yet. Some tests may need slight modification in the future if the consortium decide in data-apis/array-api#212 that they don't want polymorphic return values in xp.unique().

mattip · 2021-08-24T10:41:14Z

@asmeurer

Zac-HD

Woohoo! Thanks @honno; this is a really exciting new capability and great way help (test) the array ecosystem 😁

I've dumped some preliminary feedback below, but the key thing is that this looks good - I am confident that we can fix any small concerns and get this merged. Definitely a relief for such a large diff!

Remaining design-level issues:

Do we really need to expose a module full of functions in addition to the get_strategies_namespace() function? It seems to me that in practice this is much more ergonomic than manually passing around the xp array module, even when you're using several of them. I also want to think about the name - make feels more appropriate than get, but while strategies_namespace seems too vague I don't yet have a better idea.
Moving the non-array-module dependent strategies like array_shapes() or mutually_broadcastable_shapes() into a private namespace (e.g. hypothesis.extra.__array_helpers) to avoid duplicating code. IMO we can test these only against numpy, saving CI time. This should probably be a precursor PR (Move NumPy extra's strategies which are not dependent on NumPy into a seperate helper module #3067) after which we can rebase this one.

And more trivially, you'll need to add a RELEASE.rst for a minor release and add yourself (and any other contributors) to AUTHORS.rst so we give proper credit 😉

hypothesis-python/src/hypothesis/extra/array_api.py

hypothesis-python/tests/array_api/test_scalar_dtypes.py

hypothesis-python/tests/array_api/xputils.py

Zac-HD · 2021-08-25T01:55:05Z

An immediate concern I have is with how I form pretty ("reflective") reprs for xp consuming strategies ... it seems to be a hacky solution

If it's a hack but it works, we'll happily ship it! Especially really awful hacks which somehow improve the user experience; that's basically our signature around here 😉

There could also be some magic where a user could import say extra.array_api.pytorch and have Hypothesis auto import a (future) Array API implementation in PyTorch.

PEP 562 makes this easy with module-__getattr__; it would even support from hypothesis.extra.array_api import pytorch for free. Is there a standard way to discover the mapping from module names to their corresponding array-api implementations, e.g. numpy -> numpy.array_api?

If not I'm going to propose an update to the standard - the setuptools "entry points" API would be perfect for this - and a way to discover the installed array-api implementations seems generally useful. Just drop entry_points={"array_api": ["numpy = numpy.array_api"]} into your setup.py, and you're done!

I see the NumPy extra (and thus this PR) violates the house API style. Let me know if for array_api I should take the oppurtunity to drop potentially undesirable features in arrays() such as inferring strategies via dtype and passing kwargs via elements.

Thanks for checking! I think these are mostly fine, it's more array-specific style than bad style.

The Array API is not finalised yet. Some tests may need slight modification in the future if the consortium decide in data-apis/array-api#212 that they don't want polymorphic return values in xp.unique().

IMO we should probably add a warning to the docs: "The Array API standard is not yet final. We may therefore make breaking changes in minor releases, if this is necessary to support newer versions of the standard."

honno · 2021-08-25T10:03:15Z

rsokl · 2021-08-25T17:36:56Z

@honno this is looking great! I have been involved in maintaining hypothesis.extra.numpy, so I am happy to lend a hand wherever it might be useful. @Zac-HD please let me know if there is anything that I might take off of your very-full plate for you.

asmeurer · 2021-08-25T20:14:33Z

I was thinking of directly mapping the imports per module, e.g. if Jax had jax.numpy.array_api then it would be hardcoded in. I imagine .array_api will become a convetion at least, so that could be tried first. A general solution would indeed be nicer.

The entry points would be much nicer. I imagine some libraries will indeed follow NumPy with .array_api, but not all. At least some will just implement the array API at the top-level namespace (I believe pytorch is planning on doing this).

Zac-HD · 2021-08-26T00:46:40Z

Zac-HD please let me know if there is anything that I might take off of your very-full plate for you.

🤗 In order of priority, it would be great if you could shepard #3067 - just ping me for a final review once it's ready to merge. I'd also appreciate a second set of eyes on this PR for design-level review (and probably eventually nitpicky code review, but after we rebase).

If you're up for both I can take a stab at #3066 this weekend 🚀

Zac-HD · 2021-08-30T11:26:56Z

@honno - are you up for squash-rebasing on master, to make this a little easier to review?

honno · 2021-08-30T11:39:52Z

@honno - are you up for squash-rebasing on master, to make this a little easier to review?

@Zac-HD done! (I think this is what you meant? :)

hypothesis-python/tests/array_api/test_indices.py

Zac-HD · 2021-08-30T13:34:02Z

I think our main remaining points are

documentation, including updated release notes
review pretty reprs - how should we show which module strategies are bound to?
removing duplicated tests where we share functions with the Numpy extra
(Zac) add a module-__getattr__+entrypoint hack to support from hypothesis.extra.array_api import xp

Actually this last thing, together with our mock module, might solve the docs issue - just point Sphinx at array_api.xp! I'm also happy to make the array API extra require Python 3.7+ so that we can unconditionally rely on __getattr__; 3.6 has already been dropped by Numpy et al and Hypothesis will drop it entirely with the PSF this December.

@rsokl, have I missed anything?

honno · 2021-08-30T15:41:33Z

Actually this last thing, together with our mock module, might solve the docs issue - just point Sphinx at array_api.xp!

This seems like the cleanest easy solution, although I think hypothesis.extra.array_api.xp.* Sphinx signatures could be a bit confusing as opposed to xps.* or hypothesis.extra.array_api.<LIBRARY>.*.

Also noting the current mock depends on NumPy, so maybe a specific hook triggers for array_api.xp which could try import numpy anyway and raise a helpful error if it's not installed.

I'm also happy to make the array API extra require Python 3.7+ so that we can unconditionally rely on __getattr__; 3.6 has already been dropped by Numpy et al and Hypothesis will drop it entirely with the PSF this December.

Incase this could be useful, Array API implementations have to be Python 3.8 or above due to the use of positional-only parameters, although maybe there's a world folk will ship/want to play around with 3.7 partial adopters.

review pretty reprs - how should we show which module strategies are bound to?

Yep my commit today e2ca6af have the namespaced strategies not show the module name anymore—more of a consequence of how I've dropped top-level strategies than a purposeful decision. Showing the module binds still seems nice tho.

honno · 2021-08-31T15:32:34Z

On my end I'm pretty happy with everything for this PR, it's just the docs issue.

If we go the entry points route then we can utilise a mock like Zac suggested and generate docs from there at least (note it's not as simple as automodule:: hypothesis.extra.array_api.xp if xp is a SimpleNamespace), although per my previous comment I'd prefer signatures to look generic-y e.g. hypothesis.extra.array_api.<LIBRARY>.

If we're not doing entry points for this PR, or would rather highlight the xps = make_strategies_namespace(xp) method anyway, then I'll need to play around more to nicely generate xps.* Sphinx signatures + docs. And if the entry points proposal doesn't resolve soon, maybe it's best to go this route for now.

The TODO now seems to be:

Docs solution
Remove duplicated tests
~~See if we want module names in namespaced strategy reprs... and how we'd get that~~
Update RELEASE.rst, depending on how entry points support goes

Future PR ideas:

Optimise unique=True and empty fill like extra.numpy now does
Have tests/array_api able to run on multiple Array API implementations (see comment)

Zac-HD · 2021-08-31T16:24:27Z

I've got a paper deadline on Saturday (ICSE'22); unlikely I'll have much time before then but happy to implement my getattr idea and see about docs once the paper is in.

Thanks again for all your work on this - it's been skilled, patient, valuable, and very much appreciated. I'm really looking forward to shipping it to the array-oriented parts of the Python community ☺️

Co-authored-by: Aaron Meurer <asmeurer@gmail.com>

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Co-authored-by: Ryan Soklaski <ry26099@mit.edu>

Zac-HD

Time to merge!

And as I've said before, thanks for all your work on this - it's been skilled, patient, valuable, and very much appreciated. I look forward to seeing what you work on next 😁

honno · 2021-09-11T08:05:08Z

Oh wow, awesome! Likewise I really appreciated the feedback process for all the PRs, and grateful to of learnt sooo much about Hypothesis and beyond. Down the line feel free to ping me on Array API issues.

I completely forgot to ask about the check_function typing problem heh, thanks for fixing it.

Zac-HD · 2021-09-11T09:00:54Z

Oh wow, awesome! Likewise I really appreciated the feedback process for all the PRs, and grateful to of learnt sooo much about Hypothesis and beyond. Down the line feel free to ping me on Array API issues.

I'm happy to pay it forward - expanding on hypothesis.extra.numpy was my first ever contribution to Hypothesis, and I learned a lot from David's reviews there too 😁

Zac-HD added interop how to play nicely with other packages new-feature entirely novel capabilities or strategies labels Aug 24, 2021

Zac-HD mentioned this pull request Aug 24, 2021

Improve arrays() performance with unique-sampled-list tricks #3066

Closed

Zac-HD reviewed Aug 24, 2021

View reviewed changes

honno mentioned this pull request Aug 24, 2021

Move NumPy extra's strategies which are not dependent on NumPy into a seperate helper module #3067

Merged

Zac-HD mentioned this pull request Aug 25, 2021

Proposal: use entrypoints to make array-api submodules discoverable data-apis/array-api#244

Closed

honno force-pushed the honno/array-api branch 3 times, most recently from 105c5a9 to 80b3017 Compare August 30, 2021 08:54

honno mentioned this pull request Aug 30, 2021

Use @proxies instead of deepcopy for mutated array helpers #3077

Merged

honno force-pushed the honno/array-api branch from fb0be82 to 1581785 Compare August 30, 2021 11:37

Zac-HD reviewed Aug 30, 2021

View reviewed changes

hypothesis-python/tests/array_api/test_indices.py Show resolved Hide resolved

honno mentioned this pull request Aug 30, 2021

Wrap array_shapes in @defines_strategy #3078

Merged

honno force-pushed the honno/array-api branch 2 times, most recently from 4cd19d9 to b252efd Compare August 31, 2021 09:25

honno mentioned this pull request Aug 31, 2021

ENH: Add entry point for Array API implementation numpy/numpy#19800

Merged

honno force-pushed the honno/array-api branch 2 times, most recently from 7818835 to d98ce1e Compare August 31, 2021 13:23

honno and others added 18 commits September 10, 2021 17:24

Rudimentary valid indexers (slow generation)

ef55531

Co-authored-by: Aaron Meurer <asmeurer@gmail.com>

Appropiate min/max dims behaviour for indices(), expose xps.slices()

f584cae

Made indices tests ported from tests/numpy work with array_api

c229f31

xarray -> Xarray in docs

6fca573

Compact indices effeciency test

9993004

Label valid indexer draws

c72374f

Clearer error in indices() for shapes with non-negative integers

06202ad

Fix lint issue

19f71b2

Revert indices() injecting custom slices method

52ae7bc

Used check_argument for basic_indices() shapes check, improved msg

1d49cc7

Use appropiate Sphinx func links

b81dad0

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Internal testing/coverage for dtype methods

cbcf09d

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Use assigntment expressions in error messages

389cde2

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Remove redundant skip in large uint arrays test

d8ea29b

Clarified comment in ellipsis indices test (NumPy and Array API)

b6c2e42

Improved clarity in ellipsis indices test (NumPy and Array API)

fba0fb2

Co-authored-by: Zac Hatfield-Dodds <zac.hatfield.dodds@gmail.com>

Change max_dims defaulting in indices() to len(shape)

72dd427

Dtype name lists are now tuples

520d018

Co-authored-by: Ryan Soklaski <ry26099@mit.edu>

honno force-pushed the honno/array-api branch from 1fefd54 to 520d018 Compare September 10, 2021 16:25

Zac-HD added 4 commits September 11, 2021 13:30

Refactor test code

e90e34d

Link to xp.unique() issue

47ba696

Avoid type: ignore

bd9797b

Set, not defaultdict

500fd15

Zac-HD approved these changes Sep 11, 2021

View reviewed changes

Zac-HD merged commit 420fdf8 into HypothesisWorks:master Sep 11, 2021

honno mentioned this pull request Sep 21, 2021

Clean-up and slightly optimise extra.array_api.ArrayStrategy #3105

Closed

honno deleted the honno/array-api branch September 23, 2021 12:41

honno mentioned this pull request Sep 23, 2021

Hypothesis + Array API blog post Quansight-Labs/quansight-labs-site#252

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array API extra #3065

Array API extra #3065

honno commented Aug 24, 2021

mattip commented Aug 24, 2021

Zac-HD left a comment •

edited

Zac-HD commented Aug 25, 2021

honno commented Aug 25, 2021 •

edited by Zac-HD

rsokl commented Aug 25, 2021 •

edited

asmeurer commented Aug 25, 2021

Zac-HD commented Aug 26, 2021

Zac-HD commented Aug 30, 2021

honno commented Aug 30, 2021

Zac-HD commented Aug 30, 2021

honno commented Aug 30, 2021 •

edited

honno commented Aug 31, 2021 •

edited

Zac-HD commented Aug 31, 2021

Zac-HD left a comment

honno commented Sep 11, 2021

Zac-HD commented Sep 11, 2021

Array API extra #3065

Array API extra #3065

Conversation

honno commented Aug 24, 2021

What this introduces

Specific requests for feedback

mattip commented Aug 24, 2021

Zac-HD left a comment • edited

Choose a reason for hiding this comment

Zac-HD commented Aug 25, 2021

honno commented Aug 25, 2021 • edited by Zac-HD

rsokl commented Aug 25, 2021 • edited

asmeurer commented Aug 25, 2021

Zac-HD commented Aug 26, 2021

Zac-HD commented Aug 30, 2021

honno commented Aug 30, 2021

Zac-HD commented Aug 30, 2021

honno commented Aug 30, 2021 • edited

honno commented Aug 31, 2021 • edited

Zac-HD commented Aug 31, 2021

Zac-HD left a comment

Choose a reason for hiding this comment

honno commented Sep 11, 2021

Zac-HD commented Sep 11, 2021

Zac-HD left a comment •

edited

honno commented Aug 25, 2021 •

edited by Zac-HD

rsokl commented Aug 25, 2021 •

edited

honno commented Aug 30, 2021 •

edited

honno commented Aug 31, 2021 •

edited