- Author
Hameer Abbasi <habbasi@quansight.com>
- Author
Ralf Gommers <rgommers@quansight.com>
- Author
Peter Bell <pbell@quansight.com>
- Status
Draft
- Type
Standards Track
- Created
2019-08-22
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local overrides in a separate namespace, as well as a dispatch mechanism similar to NEP-181. First experiences with __array_function__
show that it is necessary to be able to override NumPy functions that do not take an array-like argument, and hence aren't overridable via __array_function__
. The most pressing need is array creation and coercion functions, such as numpy.zeros
or numpy.asarray
; see e.g. NEP-302.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-223, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
The primary end-goal of this NEP is to make the following possible:
# On the library side
import numpy.overridable as unp
def library_function(array):
array = unp.asarray(array)
# Code using unumpy as usual
return array
# On the user side:
import numpy.overridable as unp
import uarray as ua
import dask.array as da
ua.register_backend(da) # Can be done within Dask itself
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da):
library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, backend
can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module dask.array
or cupy
itself.
These kinds of overrides are useful for both the end-user as well as library authors. End-users may have written or wish to write code that they then later speed up or move to a different implementation, say PyData/Sparse. They can do this simply by setting a backend. Library authors may also wish to write code that is portable across array implementations, for example sklearn
may wish to write code for a machine learning algorithm that is portable across array implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This was the goal of uarray
: to allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That unumpy
4 becomes the recommended override mechanism for the parts of the NumPy API not yet covered by __array_function__
or __array_ufunc__
, and that uarray
is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making scipy.fft
overridable (see5).
The motivation behind uarray
is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the __array_ufunc__
protocol in NEP-136, and the __array_function__
protocol in NEP-187, but this has shown the need for further protocols to be developed, including a protocol for coercion (see 8,9). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed; but in short: It is necessary for library authors to be able to coerce arbitrary objects into arrays of their own types, such as CuPy needing to coerce to a CuPy array, for example, instead of a NumPy array. In simpler words, one needs things like np.asarray(...)
or an alternative to "just work" and return duck-arrays.
This NEP allows for global and context-local overrides, as well as automatic overrides a-la __array_function__
.
Here are some use-cases this NEP would enable, besides the first one stated in the motivation section:
The first is allowing alternate dtypes to return their respective arrays.
# Returns an XND array
x = unp.ones((5, 5), dtype=xnd_dtype) # Or torch dtype
The second is allowing overrides for parts of the API. This is to allow alternate and/or optimised implementations for np.linalg
, BLAS, and np.random
.
import numpy as np
import pyfftw # Or mkl_fft
# Makes pyfftw the default for FFT
np.set_global_backend(pyfftw)
# Uses pyfftw without monkeypatching
np.fft.fft(numpy_array)
with np.set_backend(pyfftw) # Or mkl_fft, or numpy
# Uses the backend you specified
np.fft.fft(numpy_array)
This will allow an official way for overrides to work with NumPy without monkeypatching or distributing a modified version of NumPy.
Here are a few other use-cases, implied but not already stated:
data = da.from_zarr('myfile.zarr')
# result should still be dask, all things being equal
result = library_function(data)
result.to_zarr('output.zarr')
This second one would work if magic_library
was built on top of unumpy
.
from dask import array as da
from magic_library import pytorch_predict
data = da.from_zarr('myfile.zarr')
# normally here one would use e.g. data.map_overlap
result = pytorch_predict(data)
result.to_zarr('output.zarr')
There are some backends which may depend on other backends, for example xarray depending on numpy.fft, and transforming a time axis into a frequency axis, or Dask/xarray holding an array other than a NumPy array inside it. This would be handled in the following manner inside code:
with ua.set_backend(cupy), ua.set_backend(dask.array):
# Code that has distributed GPU arrays here
There are no backward incompatible changes proposed in this NEP.
The only change this NEP proposes at its acceptance, is to make unumpy
the officially recommended way to override NumPy, along with making some submodules overridable by default via uarray
. unumpy
will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate unumpy
package only if it is installed, rather than depend on for the time being). In concrete terms, numpy.overridable
becomes an alias for unumpy
, if available with a fallback to the a vendored version if not. uarray
and unumpy
and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
- Faster iteration in the case of bugs or issues.
- Faster design changes, in the case of needed functionality.
unumpy
will work with older versions of NumPy as well.- The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in
unumpy
mean thatnumpy
remains unaffected. - For
numpy.fft
,numpy.linalg
andnumpy.random
, the functions in the main namespace will mirror those in thenumpy.overridable
namespace. The reason for this is that there may exist functions in the in these submodules that need backends, even fornumpy.ndarray
inputs.
unumpy
offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, unumpy
will be able to offer a unified API with very minor changes. For example:
ufunc
objects can be overridden via their__call__
,reduce
and other methods.- Other functions can be overridden in a similar fashion.
np.asduckarray
goes away, and becomesnp.overridable.asarray
with a backend set.- The same holds for array creation functions such as
np.zeros
,np.empty
and so on.
This also holds for the future: Making something overridable would require only minor changes to unumpy
.
Another promise unumpy
holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require. This would allow us to avoid designing entire protocols, e.g., a protocol for stacking and concatenating would be replaced by simply implementing stack
and/or concatenate
and then providing default implementations for everything else in that class. The same applies for transposing, and many other functions for which protocols haven't been proposed, such as isin
in terms of in1d
, setdiff1d
in terms of unique
, and so on.
It also allows one to override functions in a manner which __array_function__
simply cannot, such as overriding np.einsum
with the version from the opt_einsum
package, or Intel MKL overriding FFT, BLAS or ufunc
objects. They would define a backend with the appropriate multimethods, and the user would select them via a with
statement, or registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the coerce
keyword in ua.set_backend
), and a protocol for coercing not only arrays, but also dtype
objects and ufunc
objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see10). This is a separate issue compared to the C-level dtype redesign proposed in11, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy.
Normally, one would only want to import only one of unumpy
or numpy
, you would import it as np
for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style:
from numpy import overridable as unp
import numpy as np
or:
import numpy as np
# Use unumpy via np.overridable
There are inherent problems about returning objects that are not NumPy arrays from numpy.array
or numpy.asarray
, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule numpy.overridable
rather than numpy
. NumPy will continue to work unaffected by anything in numpy.overridable
.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
- Use
numpy.asarray
(the non-overridable version). - Use
numpy.overridable.asarray
with the NumPy backend set and coercion enabled
All functionality in numpy.random
, numpy.linalg
and numpy.fft
will be aliased to their respective overridable versions inside numpy.overridable
. The reason for this is that there are alternative implementations of RNGs (mkl-random
), linear algebra routines (eigen
, blis
) and FFT routines (mkl-fft
, pyFFTW
) that need to operate on numpy.ndarray
inputs, but still need the ability to switch behaviour.
This is different from monkeypatching in a few different ways:
- The caller-facing signature of the function is always the same, so there is at least the loose sense of an API contract. Monkeypatching does not provide this ability.
- There is the ability of locally switching the backend.
- It has been suggested that the reason that 1.17 hasn't landed in the Anaconda defaults channel is due to the incompatibility between monkeypatching and
__array_function__
, as monkeypatching would bypass the protocol completely. - Statements of the form
from numpy import x; x
andnp.x
would have different results depending on whether the import was made before or after monkeypatching happened.
All this isn't possible at all with __array_function__
or __array_ufunc__
.
It has been formally realised (at least in part) that a backend system is needed for this, in the NumPy roadmap.
For numpy.random
, it's still necessary to make the C-API fit the one proposed in NEP-19. This is impossible for mkl-random, because then it would need to be rewritten to fit that framework. The guarantees on stream compatibility will be the same as before, but if there's a backend that affects numpy.random
set, we make no guarantees about stream compatibility, and it is up to the backend author to provide their own guarantees.
It has been suggested that the ability to dispatch methods which do not take a dispatchable is needed, while guessing that backend from another dispatchable.
As a concrete example, consider the following:
with unumpy.determine_backend(array_like, np.ndarray):
unumpy.arange(len(array_like))
While this does not exist yet in uarray
, it is trivial to add it. The need for this kind of code exists because one might want to have an alternative for the proposed *_like
functions, or the like=
keyword argument. The need for these exists because there are functions in the NumPy API that do not take a dispatchable argument, but there is still the need to select a backend based on a different dispatchable.
The need for an opt-in module is realised because of a few reasons:
- There are parts of the API (like numpy.asarray) that simply cannot be overridden due to incompatibility concerns with C/Cython extensions, however, one may want to coerce to a duck-array using
asarray
with a backend set. - There are possible issues around an implicit option and monkeypatching, such as those mentioned above.
NEP 18 notes that this may require maintenance of two separate APIs. However, this burden may be lessened by, for example, parametrizing all tests over numpy.overridable
separately via a fixture. This also has the side-effect of thoroughly testing it, unlike __array_function__
. We also feel that it provides an opportunity to separate the NumPy API contract properly from the implementation.
Mixing backends is easy in uarray
, one only has to do:
# Explicitly say which backends you want to mix
ua.register_backend(backend1)
ua.register_backend(backend2)
ua.register_backend(backend3)
# Freely use code that mixes backends here.
The benefits to end-users extend beyond just writing new code. Old code (usually in the form of scripts) can be easily ported to different backends by a simple import switch and a line adding the preferred backend. This way, users may find it easier to port existing code to GPU or distributed computing.
- NEP-18, the
__array_function__
protocol.12 - NEP-13, the
__array_ufunc__
protocol.13 - NEP-30, the
__duck_array__
protocol.14
- Dask: https://dask.org/
- CuPy: https://cupy.chainer.org/
- PyData/Sparse: https://sparse.pydata.org/
- Xnd: https://xnd.readthedocs.io/
- Astropy's Quantity: https://docs.astropy.org/en/stable/units/
- Dask: https://dask.org/
- scikit-learn: https://scikit-learn.org/
- xarray: https://xarray.pydata.org/
- TensorLy: http://tensorly.org/
ndtypes
: https://ndtypes.readthedocs.io/en/latest/- Datashape: https://datashape.readthedocs.io
- Plum: https://plum-py.readthedocs.io/
mkl_random
: https://github.com/IntelPython/mkl_randommkl_fft
: https://github.com/IntelPython/mkl_fftbottleneck
: https://github.com/pydata/bottleneckopt_einsum
: https://github.com/dgasmith/opt_einsum
The implementation of this NEP will require the following steps:
- Implementation of
uarray
multimethods corresponding to the NumPy API, including classes for overridingdtype
,ufunc
andarray
objects, in theunumpy
repository, which are usually very easy to create. - Moving backends from
unumpy
into the respective array libraries.
Maintenance can be eased by testing over {numpy, unumpy}
via parameterized tests. If a new argument is added to a method, the corresponding argument extractor and replacer will need to be updated within unumpy
.
A lot of argument extractors can be re-used from the existing implementation of the __array_function__
protocol, and the replacers can be usually re-used across many methods.
For the parts of the namespace which are going to be overridable by default, the main method will need to be renamed and hidden behind a uarray
multimethod.
Default implementations are usually seen in the documentation using the words "equivalent to", and thus, are easily available.
Note: This section will not attempt to go into too much detail about uarray, that is the purpose of the uarray documentation.15 However, the NumPy community will have input into the design of uarray, via the issue tracker.
unumpy
is the interface that defines a set of overridable functions (multimethods) compatible with the numpy API. To do this, it uses the uarray
library. uarray
is a general purpose tool for creating multimethods that dispatch to one of multiple different possible backend implementations. In this sense, it is similar to the __array_function__
protocol but with the key difference that the backend is explicitly installed by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to end-users and backend authors. For example, it is possible to:
- override functions not taking arrays as arguments
- create backends out of source from the array type
- install multiple backends for the same array type
This decoupling also means that uarray
is not constrained to dispatching over array-like types. The backend is free to inspect the entire set of function arguments to determine if it can implement the function e.g. dtype
parameter dispatching.
uarray
consists of two main protocols: __ua_convert__
and __ua_function__
, called in that order, along with __ua_domain__
. __ua_convert__
is for conversion and coercion. It has the signature (dispatchables, coerce)
, where dispatchables
is an iterable of ua.Dispatchable
objects and coerce
is a boolean indicating whether or not to force the conversion. ua.Dispatchable
is a simple class consisting of three simple values: type
, value
, and coercible
. __ua_convert__
returns an iterable of the converted values, or NotImplemented
in the case of failure.
__ua_function__
has the signature (func, args, kwargs)
and defines the actual implementation of the function. It receives the function and its arguments. Returning NotImplemented
will cause a move to the default implementation of the function if one exists, and failing that, the next backend.
Here is what will happen assuming a uarray
multimethod is called:
- We canonicalise the arguments so any arguments without a default are placed in
*args
and those with one are placed in**kwargs
. - We check the list of backends.
- If it is empty, we try the default implementation.
- We check if the backend's
__ua_convert__
method exists. If it exists:- We pass it the output of the dispatcher, which is an iterable of
ua.Dispatchable
objects. - We feed this output, along with the arguments, to the argument replacer.
NotImplemented
means we move to 3 with the next backend. - We store the replaced arguments as the new arguments.
- We pass it the output of the dispatcher, which is an iterable of
- We feed the arguments into
__ua_function__
, and return the output, and exit if it isn'tNotImplemented
. - If the default implementation exists, we try it with the current backend.
- On failure, we move to 3 with the next backend. If there are no more backends, we move to 7.
- We raise a
ua.BackendNotImplementedError
.
To define an overridable function (a multimethod), one needs a few things:
- A dispatcher that returns an iterable of
ua.Dispatchable
objects. - A reverse dispatcher that replaces dispatchable values with the supplied ones.
- A domain.
- Optionally, a default implementation, which can be provided in terms of other multimethods.
As an example, consider the following:
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables):
def full(shape, fill_value, dtype=None, order='C'):
return (shape, fill_value), dict(
dtype=dispatchables[0],
order=order
)
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy")
def full(shape, fill_value, dtype=None, order='C'):
return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the unumpy
repository,16. This simple act of overriding callables allows us to override:
- Methods
- Properties, via
fget
andfset
- Entire objects, via
__get__
.
A library that implements a NumPy-like API will use it in the following manner (as an example):
import numpy.overridable as unp
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(unp.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(unp.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
The current alternative to this problem is a combination of NEP-1817, NEP-1318 and NEP-3019 plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring unumpy
is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
The third alternative is to move unumpy
into the NumPy organisation and develop it as a NumPy project. This will also achieve the said goals, and is also a possibility that can be considered by this NEP. However, the act of doing an extra pip install
or conda install
may discourage some users from adopting this method.
An alternative to requiring opt-in is mainly to not override np.asarray
and np.array
, and making the rest of the NumPy API surface overridable, instead providing np.duckarray
and np.asduckarray
as duck-array friendly alternatives that used the respective overrides. However, this has the downside of adding a minor overhead to NumPy calls.
uarray
blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/- The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion
- NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
- Dask issue #4462: dask/dask#4462
- PR #13046: numpy#13046
- Dask issue #4883: dask/dask#4883
- Issue #13831: numpy#13831
- Discussion PR 1: hameerabbasi#3
- Discussion PR 2: hameerabbasi#4
- Discussion PR 3: numpy#14389
This document has been placed in the public domain.
NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html↩
NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html↩
NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html↩
unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io↩
NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html↩
NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html↩
Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html↩
NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html↩
Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html↩
The epic dtype cleanup plan: numpy#2899↩
NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html↩
NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html↩
NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html↩
uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io↩
unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io↩
NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html↩
NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html↩
NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html↩