-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an Array Protocol & improve static typing support #589
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this :)
The Array
protocl stuff looks grand. And personally I like having a DType
class just for consistency, even if practically type hinting an object has __eq__
isn't useful.
On namespace type hints (e.g. _CreatoinFuncs
, ArrayAPINamespace
): IMO I'm not sure on this yet, as they make contributing to the spec less ergonomic given we repeat the signatures, although they are pretty nifty to have. Is there a world where these could be dynamically construct them from the stubs instead? We might want to put these in a follow-up PR if just to ship the concretely non-controversial stuff first—interested in what others think.
|
||
@property | ||
def ndim(self: array) -> int: | ||
def ndim(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I see methods/properties which don't return arrays don't get any type hint for their self
arg—could these be hinted with Array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static type checkers don't like this because the TypeVar
is not properly bound — see https://peps.python.org/pep-0484/#scoping-rules-for-type-variables ("Unbound type variables should not appear in the bodies of generic functions, or in the class bodies apart from method definitions:").
In general, there's no need to type hint the self arg unless the return type depends on the type of self — e.g. see https://peps.python.org/pep-0484/#user-defined-generic-types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice thing about using Protocol is that self
is understood by static type checkers to be the Protocol
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this only done for certain methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it for any methods I looked at over the course of the PR. This is by no means comprehensive and should be done: here or elsewhere.
I was thinking the same thing last evening about moving this to a followup PR.
Maybe? I haven't experimented yet but perhaps mypy will be satisfied with: class ArrayAPINamespace(Protocol):
zeros_like = staticmethod(creation_funcs.zeros_like) Also for a future PR: class Array(Protocol):
def __array_namespace__(self, ...) -> ArrayAPINamespace[Self]: ...
class ArrayAPINamespace(Protocol[Array]):
zeros_like = staticmethod(zeros_like)
class zeros_like(Protocol[Array]):
def __call__(self, x: Array, /, *, dtype: Optional[dtype] = None, device: Optional[device] = None) -> Array: ... The advantage of this is that functions can be checked to see if their signature is compatible with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nstarman, and apologies for the delay in reviewing (your PR came in right at the start of my 2 week holiday). This is looking quite good. I pushed a couple of small commits to resolve some Mypy and Sphinx complaints.
This is close to ready I'd say. One thing that isn't nice is that Self
now shows up instead of array
in the signatures of the html docs. I tried to fiddle with autodoc_type_aliases
in src/_array_api_conf.py
, but was not successful in mapping that back to array
. Using "self" will be pretty confusing in docs for regular methods that return arrays. I think we'll have to figure that one out.
The DType
protocol is fine with me - it won't add much to type-checking, but it should be fine to have it here for consistency.
The Array
protocol looks good.
@BvB93 if you have any feedback on this PR, that would be great to see. |
@nstarman there are 3 errors left when running |
If the
That sounds like a good idea. I can try adding to the CI and create an ignore file that can later be whittled down as the package is made more type friendly. |
That works. It's not clickable - there is no array object in the standard on purpose, because in the end we don't care whether it's named |
c4d977e
to
9d50eb5
Compare
@rgommers Sorry for the long interlude. I can't seem to resolve 3 linking errors in the docs https://app.circleci.com/pipelines/github/data-apis/array-api/1088/workflows/ed2dfa75-c3c7-4845-8f28-e01fe35d0c8d/jobs/996?invite=true#step-103-1646 |
Sphinx is trying to create links for the type hints. I think to fix that you need to update the list(s) here https://github.com/data-apis/array-api/blob/main/src/_array_api_conf.py#L52-L77 |
I can squash commits if you don't use Squash & Merge (assuming this is approved, of course :) ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
|
||
@property | ||
def shape(self: array) -> Tuple[Optional[int], ...]: | ||
def shape(self) -> tuple[int | None, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to say more for myself than anyone else—I initially wasn't too sure if we wanted to use the py3.9+ type hint features, but:
- Forgot that
from future import __annotations__
makes this work for py3.8 - py3.8 doesn't get updates end of next year
- The scientific ecosystem already is adopting py3.9+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on my end!
I think pre-commit
is a good addition (would love to add hooks for other spec things myself), but probably not too useful until we get a workflow that runs it. That said, I'm not too sure if we'd want to do that just yet 🤔
|
Thanks @honno for the reviews and discussion. |
Any reason to change |
|
||
@property | ||
def device(self: array) -> Device: | ||
def device(self) -> "Device": # type: ignore[type-var] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def device(self) -> "Device": # type: ignore[type-var] | |
def device(self) -> Any: |
Type variables (such as Device
here) don't really make sense when it's only used as return type without using them in either a generic class or another parameter within the function; this is the source of mypy's error here and why you need the type: ignore
; there must always be at least two of them.
Instead, unless you're willing to make the protocol itself generic w.r.t. the device (which might be a bit excessive here), then it's customary to use Any
. I'm not sure if this Device
type is used anywhere else, but if not you could even consider just repurposing it as an Any
alias in order to keep its more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you're willing to make the protocol itself generic w.r.t. the device (which might be a bit excessive here),
This is planned, but for a followup PR. (See opening comment to this PR).
then it's customary to use Any
Alternatively, this PR introduces a Device Protocol which can be used instead of the TypeVar
. No binding issues because it's a Protocol. When Array is made generic wrt Device we can change this back to a TypeVar
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this PR introduces a Device Protocol which can be used instead of the TypeVar. No binding issues because it's a Protocol. When Array is made generic wrt Device we can change this back to a TypeVar.
I guess that would be sort of possible? Rereading the Device Support docs concrete API implementations have a lot of freedom regarding the implementation of their Device type, to the point where only __eq__
(and __ne__
) really matter? Honestly, with an interface that minimal it might just be worthwhile to stick to plain object
rather than Any
or a protocol, as there's very little (or nothing) to gain by usage of the latter two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently a Device
Protocol has only __eq__
in it and as such is effectively equivalent to object
.
But it's designing with the future in mind ("The choice to not include a standardized Device object may be revisited in a future revision of this standard.") and we'd need to alias object
anyway to get the correct rendering on the docs...
@@ -39,6 +43,7 @@ def device(self: array) -> Device: | |||
out: device | |||
a ``device`` object (see :ref:`device-support`). | |||
""" | |||
... | |||
|
|||
@property | |||
def mT(self: array) -> array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a heads up: the PEP 673 Self
type (a special type variable that's always bound to the respective classes self
parameter) is now also supported by mypy and the likes, so you could, if so desired, express this as:
if TYPE_CHECKING:
from typing_extensions import Self
class Array(Protocol):
def mT(self) -> Self:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a note already
array = TypeVar("array", bound="Array")
# NOTE: when working with py3.11+ this can be ``typing.Self``.
But I'm happy to change it now. I was avoiding adding additional dependencies, even if they aren't runtime and
are made by the core python devs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way is perfectly fine in my opinion; at the end of the day it's just a nice bit of syntactic sugar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's tackle this in a followup, since Self
will need to be rendered in the docs as Array
.
@@ -272,7 +272,7 @@ def matrix_norm( | |||
/, | |||
*, | |||
keepdims: bool = False, | |||
ord: Optional[Union[int, float, Literal[inf, -inf, "fro", "nuc"]]] = "fro", | |||
ord: Optional[Union[int, float, Literal[inf, -inf, "fro", "nuc"]]] = "fro", # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there's currently no way to express literal floats in the type system, hence the need for the # type: ignore
comment. Could maybe consider adding two Inf
and NInf
types (or something along those lines) and declare them as float
aliases?
Inf = float
NInf = float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's annoying Python can't do Literal[<float>]
. I considered doing this, but I think I prefer not to change the spec in this PR. Changing to alias values will require additional docs and explanation since Inf = float
is not the same as math.inf
, and the latter is part of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing to alias values will require additional docs and explanation since Inf = float is not the same as math.inf, and the latter is part of the spec.
True, though the float
already being present as a valid ord
parameter means that the inclusion of math.inf
is redundant either way (from a typing perspective at least), so to me this doesn't sound like such a change would violate the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the type ignores only to make my local mypy happier (mypy isn't part of the the CI on this repo).
I think making these changes should be done in a followup PR. I'm happy to make the changes in another PR.
|
||
def to_device( | ||
self: array, device: Device, /, *, stream: Optional[Union[int, Any]] = None | ||
self: array, device: "Device", /, *, stream: int | Any | None = None # type: ignore[type-var] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been some minor discussion about this before (xref numpy/numpy#19083 (comment)), but considering this class is now becoming a fully fledged protocol and not just a documentation device the int
type in the stream
annotation is starting to become a bit problematic as it makes it mandatory for any (concrete) subtype to support it (while this is, in fact, optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good point. I'm trying to translate this class, not change the spec, so I think this is best done in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this would not be a spec violation as it'd merely remove an inconsistency between the type signature as implied by the annotations (..., stream: int | Any | None = ...)
versus the one described in docstring (..., stream: None = ...
).
As a little bit more background on the nitty gritty part of the world of typing that's going on here: Contrary to the return type of callables (which is covariant and concrete protocol implementations can thus return any sub-type), argument types are contravariant, meaning that concrete implementations will either have to exactly match it or return a supertype thereof (in the most extreme scenario that'd mean that even plain object
is acceptable, though whether or not that's an actual good practice is another discussion entirely...).
What this means is that in order to proper subtype something like def (stream: int | None = ...) -> object: ...
then concrete implementation must support both int
and None
, something that contracts the actual docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
869cfde
to
496c356
Compare
LGTM, though it might be good to verify if the currently available downstream implementations properly satisfy subtype the protocol as this could potentially reveal some typing-related corner cases that we've missed thus far. Could be something for a follow up though. |
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Sphinx doesn't set `TYPE_CHECKING`, but does use the type annotations. `Self` is unknown to Sphinx, so should be filtered out to prevent lots of errors.
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
Signed-off-by: nstarman <nstarman@users.noreply.github.com>
It would be great to see this released. It looks very developed to me, and would be useful for us within xarray. |
@@ -793,7 +793,7 @@ def vector_norm( | |||
*, | |||
axis: Optional[Union[int, Tuple[int, ...]]] = None, | |||
keepdims: bool = False, | |||
ord: Union[int, float, Literal[inf, -inf]] = 2, | |||
ord: Union[int, float, Literal[inf, -inf]] = 2, # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is out of scope, but this Literal
is redundant (and as this change implies, also incorrect), since inf
and -inf
are both assignable to float
.
Continuing from #584, I've done a quick refactor of the array (and dtype) objects to make them typing Protocols.
@rgommers, I agree that making Array generic wrt the dtype is good for a followup.
It's not perfect, but mypy doesn't complain a whole lot, so I think this is a good start. I'm happy to make refinements.