Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run-time version of API #584

Open
nstarman opened this issue Jan 14, 2023 · 13 comments
Open

Run-time version of API #584

nstarman opened this issue Jan 14, 2023 · 13 comments
Labels
RFC Request for comments. Feature requests and proposed changes.

Comments

@nstarman
Copy link
Contributor

nstarman commented Jan 14, 2023

Hi, first time contributor, long-time fan. I've been following the development of the Array API and having discussions with my fellow Astropy maintainers about how Astropy's Quantity class can support the API and how this might impact our users. Overall we feel that the adoption of the API will be beneficial (preaching to the choir here) but we, or at least I, have a point of concern.

Right now NumPy serves as the primary source of mathematical functions: np.cos, etc. Through their interoperability protocols, like __array_function__ we can vendor custom array classes like Quantity and for users most of numpy just works. This API standard has a somewhat different paradigm: where astropy.units will need to have a Quantity namespace that defines ``cos`, etc. In some ways this is good: for Astropy, Quantity can more easily support non-numpy array objects by acting as a wrapper and forwarding actual computations to the wrapped array object. For example,

def cos(x: Quantity, /) -> Quantity:
    unit = ...
    return Quantity(x.__array_namespace__().cos(x.value), unit)

However, we worry the loss of a central dispatching library is a regression in ease of user experience. Yes numpy will continue to __array_(u)func(tion)__ forever/ a long time and so can still serve as a central dispatching library, but base numpy does not conform to the array API standard. So there will not be an array-API-conformant central dispatching library and users will need to uses non-conformant libraries, which seems at odds to the purpose of this standard.

The situation becomes more difficult for Astropy and users when we consider Astropy's number of other array-like objects for which we might want to implement the array API: table Columns, coordinate representations, Distributions. Users will need to import a new array namespace for each — e.g. from astropy.table import column_api as cp.

I'm sure this issue has come up in discussion before. To add to that discussion, I'd like to propose what came up in ours. We felt the best solution might be if array-api published a package just of the supported functions.
These functions would be thin wrappers, dispatching the actual computations to the array namespace of the argument.
We know that a new Array implementation is out of scope, but hopefully this suggestion is within scope.
For example:

def cos(x: ArrayAPIConformant, /) -> ArrayAPIConformant:
    return get_namespace(x).cos(x)

Now the following is possible:

>>> import array_api as ap
>>> import numpy.array_api as np
>>> x = np.linspace(1, 2, num=5)[:, None]
>>> x
Array([[1.  ],
       [2.  ]], dtype=float64)
>>> np.cos(x)  # specific library if a user knows the input type and wants speed
Array([[ 0.54030231],
       [-0.41614684]], dtype=float64)
>>> ap.cos(x)  # universal library for just ease of use.
Array([[ 0.54030231],
       [-0.41614684]], dtype=float64)

For users this is especially convenient because they do not need to worry about the output types of non-standard functions. For example, a function

def angle_between(x1: ArrayAPIConformant | float, x2: ArrayAPIConformant | float) -> Quantity:
    ...

will return a Quantity regardless of the input type. Currently a user would have to switch array namespaces.

>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> xp = angle.__array_namespace()
>>> xp.cos(angle)

With the import array_api as ap that is unnecessary (though they might choose to if optimizing for speed).

>>> import array_api as ap
>>> import numpy.array_api as np
>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> ap.cos(angle)

Another aspect to this proposal is inclusion of a (preferably runtime-checkable) typing.Protocol of the Array API definition. In the above examples I called this ArrayAPIConformant.

In summary

To demonstrate this proposal I have created an example implementation of the 2021.12 version of the API at https://github.com/nstarman/array_api.

This library is just copying from here, adding the dispatch code, some static typing, a get_namespace function, and a run-time checkable set of Protocols: in particularArrayAPIConformant which makes it easy to check that an object conforms to the array-API.
Also, I mypyc compile the library for extra performance.
In keeping with the scope of this standard, I have not included array-creation functions like ones, though it may be better to instead just have them raise a NotImplementedError with a helpful error message.

If you are interested, I would be more than happy to submit PRs that add the dispatching, protocols, mypyc compilation, etc.

Regardless, I look forward to working with you to address these concerns so Astropy can start benefitting from the great work being done here.

@nstarman
Copy link
Contributor Author

ping @mhvk for this discussion.

@rgommers
Copy link
Member

Thanks for the detailed description, ideas and prototype @nstarman! It will take me a little while to write a more detailed reply, but here are a few initial thoughts:

but base numpy does not conform to the array API standard.

That will hopefully change in the coming year, see this message about a NumPy 2.0 proposal/roadmap.

In the meantime, https://github.com/data-apis/array-api-compat/ should hopefully be helpful for standard adoption purposes.

In keeping with the scope of this standard, I have not included array-creation functions like ones,

Creation functions like ones are included in the array API standard (see https://data-apis.org/array-api/latest/API_specification/creation_functions.html). These functions don't take an array as input, which is something that __array_ufunc__/__array_function__ cannot handle unfortunately. That is the main limitation of those protocols, and why several attempts at improvements to them have been started. What ended up in the array API standard is basically derived from NEP 37.

and a run-time checkable set of Protocols: in particularArrayAPIConformant which makes it easy to check that an object conforms to the array-API.

That sounds cool. It may address gh-229.

@asmeurer
Copy link
Member

I agree with your general point that a wrapper library like this could be helpful for your use-case.

I do have one technical comment on your implementation, however. Based on my experience with writing array-api-compat, this approach:

def cos(x: ArrayAPIConformant, /) -> ArrayAPIConformant:
    return get_namespace(x).cos(x)

is not a good one. The problem is that some functions do not accept an array object from which the array namespace can be deduced (the array creation functions are a great example of this, but there are others too, like the data type functions which accept dtypes). Furthermore, it leads to inherent ambiguity if a user decides to mix array objects from multiple libraries, which is not supported by the standard.

Instead, what you should do is specify the array library, which can be inferred from an array object or could just be the module object itself. And then from that return the entire array API compatible namespace. In the compat library I do something along these lines with a decorator to allow code sharing between the NumPy and CuPy compat layers, which are almost identical (see the @get_xp decorator at https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/_internal.py and https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/common/_aliases.py and https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/numpy/_aliases.py). One could also use a bit more metaprogramming to reduce the amount of typing.

@nstarman
Copy link
Contributor Author

Thanks for the quick responses @rgommers and @asmeurer

That will hopefully change in the coming year, see this message about a NumPy 2.0 proposal/roadmap.

Exciting!

In keeping with the scope of this standard, I have not included array-creation functions like ones,

Creation functions like ones are included in the array API standard (see https://data-apis.org/array-api/latest/API_specification/creation_functions.html).

Apologies for the confusion, what I meant was since ones cannot determine a namespace to which to dispatch, I didn't include it in the example library.

That sounds cool. It may address #229.

I can submit a PR for this, but this library would have to have an installable package for the Protocols to be useful...

The problem is that some functions do not accept an array object from which the array namespace can be deduced (the array creation functions are a great example of this, but there are others too, like the data type functions which accept dtypes)

The data type functions are a good example! A point of curiosity, is there a reason why dtypes don't have the same __array_namespace__ method, which would break this ambiguity?

Furthermore, it leads to inherent ambiguity if a user decides to mix array objects from multiple libraries, which is not supported by the standard.

I think this is less of an issue. I modeled my get_namespace off the one suggested in the numpy docs

def get_namespace(*xs: Any, api_version: str | None = None) -> ArrayAPINamespace:
    # `xs` contains one or more arrays.
    namespaces = {x.__array_namespace__(api_version=api_version) for x in xs if isinstance(x, ArrayAPIConformant)}

    if not namespaces:
        raise ValueError("Unrecognized array input")
    elif len(namespaces) != 1:
        raise ValueError(f"Multiple namespaces for array inputs: {namespaces}")

    return namespaces.pop()

It will error if passed objects from multiple libraries.

While definitely a limitation, having dispatching functions will still help with the following example:

>>> import array_api as ap
>>> import numpy.array_api as np
>>> x1, x2 = np.Array(...), np.Array(...)
>>> angle = angle_between(x1, x2)
>>> ap.cos(angle)

even if not for

>>> ap.subtract(x1, angle)

@asmeurer
Copy link
Member

The data type functions are a good example! A point of curiosity, is there a reason why dtypes don't have the same array_namespace method, which would break this ambiguity?

There's also asarray(), which accepts built-in types. asarray has to know a priori which array library it corresponds to.

It will error if passed objects from multiple libraries.

But the point is that in calling it in every function, you cannot control one array being passed to one function and another being passed to another.

It's better to move the get_namespace() call as high up as possible, ideally just one level below the user, so that the first thing you do is get a namespace, and after that everything just operates on that one namespace.

@nstarman
Copy link
Contributor Author

It's better to move the get_namespace() call as high up as possible, ideally just one level below the user, so that the first thing you do is get a namespace, and after that everything just operates on that one namespace.

Thanks for the example codes in the combat library! If I understand everything, each compat library is still specific to a non ArrayAPI library, e.g. one for numpy, another for cupy. A single dispatching library is still useful. Would it be in scope to have a generic version, e.g. array_api_compat.common that will accept ANY array-API compatible object and dispatch to the correct library? It might even accept ndarray etc and dispatch to array_api_compat.numpy.

@nstarman
Copy link
Contributor Author

@rgommers,

and a run-time checkable set of Protocols: in particularArrayAPIConformant which makes it easy to check that an object conforms to the array-API.

That sounds cool. It may address #229.

I can submit a PR for this, but this library would have to have an installable package for the Protocols to be useful. I'm happy to contribute to that as well, if this is in the project's scope.

@asmeurer
Copy link
Member

Would it be in scope to have a generic version, e.g. array_api_compat.common that will accept ANY array-API compatible object and dispatch to the correct library? It might even accept ndarray etc and dispatch to array_api_compat.numpy.

I think so. The array_api_compat.get_namespace already dispatches to either the corresponding compat namespace or library. I'm not sure if this works out-of-the-box with your Quantity use-case or if we need to adjust it. Feel free to open an issue on that repo.

@honno
Copy link
Member

honno commented Jan 24, 2023

Thanks for all your work and thought here.

First of, heads up that as of last week we now store the Python stubs of all versions in the repo's HEAD at src/array_api_stubs, where each sub-folder represents a version of the spec, where all work will be done in src/array_api_stubs/draft/ going forward. As you can see, this is actually organised like a package, but (right now) only for the purposes of reading stub signatures/docs in tooling (i.e.array-api-tests)—so not for say type hinting or dispatching. Generally we still need to mull over whether we want this as a publicly-advertised package.


Your main interest here looks to be dispatching, which I can't say I know how to feel about myself right now heh, but I was interested in a type hinting package a ways back (relevant discussion). Your repo demonstrates that having a Protocol for array and dtype objects (which can be used as type hints in the stub signatures) is a rather trivial change for the current stubs, so I feel like we should just implement that for this repo, irrespective if we want a first-party type hinting package (or dispatching).

If you'd like to take a stab at a PR converting our array_api_stubs._draft.array_object._array class into a Protocol, it will help see how consortium members feel about just having a first-party package containing an array protocol, which could help discussion on a dispatching library move along. Keep in mind:

  • Only modify src/array_api_stubs/_draft/ files. We can always backport things later.
  • As you've seen we don't have an equivalent _dtype class, only an __eq__ function defined in data_types.py, but it did feel like something we should have so feel free to implement a _dtype class+protocol likewise.
  • We wouldn't want these protocols to be runtime-checkable.
  • We need to be considerate of what gets autodoc'd in spec/draft/ so the docs still render docstrings where we expect them to. I'd happily help with that stuff on review tho.

@rgommers
Copy link
Member

rgommers commented Jan 26, 2023

Regarding static typing, the key thing here is the array protocol I think. I'll sketch a very brief version of what you (@nstarman) implemented here:

@runtime_checkable
class ArrayAPIConformant(Protocol):
    """Runtime checkable protocol for conformance with the array API."""

    @property
    def dtype(self) -> DTypeConformant:
        ...

    @property
    def device(self) -> DeviceConformant:
        ...

    @property
    def mT(self: Self) -> Self:
        ...

    @property
    def ndim(self) -> int:
        ...

A few thoughts:

  • This ArrayAPIConformant protocol seems good to go. I'd just name it plain Array (or array), like is already used for annotations in this repo as well as in jaxtyping.
    • Should we add it inside this repo, as well as figure out how best to ship it in a separate package and/or make it work with jaxtyping?
  • Dealing with dtype and/or dimensionality annotation inside this protocol seems useful and should be considered, but can possibly be done as a second step. I'll note that @honno sketched Array[Bool], jaxtyping does the opposite (Float[Array] - this seems wrong conceptually), and NumPy supports NDArray[np.float64] (see the docs).
  • I like the examples given in the issue description here (e.g., ArrayAPIConformant | float) - no ArrayLike or some such thing.
  • DTypeConformant doesn't seem viable (or useful) to define, as discussed in Array API type hinting package consortium-feedback#8 (reply in thread).
  • Same for DeviceConformat - it's only an arbitrary object with an __eq__ method, so might as well use Any.

So perhaps step 1 here is to open a PR to add an Array protocol to this repo? And step 2 is to extend it to Array[dtype-specifier]?

@NeilGirdhar
Copy link

NeilGirdhar commented Jan 26, 2023

Dealing with dtype and/or dimensionality annotation inside this protocol seems useful and should be considered, but can possibly be done as a second step.

I'm just curious, but if you start to do things in later steps, won't this cause breaks in instance checks because the new protocol won't match classes that implemented the old protocol?

I know you're avoiding have a root package (although I've read some comments saying that it's still an open question), but the advantage to having a base class is that you can change things, and publish a new version of root package. Client libraries can depend on it, and there's guaranteed consistency. (And if two client libraries require different versions of the root package, that's a very helpful error suggesting that the interoperation you're expecting may not work, right?)

I know there's a lot of effort going into careful design and it looks like everything might come out perfect and so there may never need to be a new (incompatible) version of protocols, but I thought I'd mention it 😄

@nstarman
Copy link
Contributor Author

If you'd like to take a stab at a PR converting our array_api_stubs._draft.array_object._array class into a Protocol, it will help see how consortium members feel about just having a first-party package containing an array protocol, which could help discussion on a dispatching library move along.

So perhaps step 1 here is to open a PR to add an Array protocol to this repo? And step 2 is to extend it to Array[dtype-specifier]?

Sounds great! I think I'll have some time next week.

@rgommers
Copy link
Member

I'm just curious, but if you start to do things in later steps, won't this cause breaks in instance checks because the new protocol won't match classes that implemented the old protocol?

That should not be a problem. Array[float32] is more specific than plain Array, so if isinstance(x, Array) passes then it will still pass if the type annotation for x gets more specific later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments. Feature requests and proposed changes.
Projects
None yet
Development

No branches or pull requests

5 participants