-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: Type annotations overhaul, part 1 #257
Conversation
a7a848e
to
807ce41
Compare
PSA: CI failing left and right is a transient artifact, will get back to reasonable after gh-250 is in and 1.11 is out. This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12. |
Then we should temporarily remove |
Yes. I'll do this in gh-250 |
30ed6ef
to
607d9bb
Compare
DiscussionThe Array API documentation calls the types |
756a3fa
to
07e34bb
Compare
Great idea, would be best to coordinate with the spec indeed. Consortium meetings are weekly on Mondays (hint hint). |
Also would be best to coordinate with the array-api-typing initiative, cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, typing! 🎉
Apart from the inline comments, there's also some generally applicable opportunities for improvement
- Using
TypeAlias
when defining things likeDType: TypeAlias = np.dtype[...]
, so that type-checkers know how to treat it - Avoid dynamic
__all__
construction, so that static type-checkers know what's exported - Since Python 3.9
typing.Tuple
is deprecated in favor of usingtuple
directly Optional
andUnion
are not needed when annotating things if you use__future__.annotations
, even if you use Python 3.9.- all
*args
and**kwargs
I've seen are missing annotations. If unsure what these should be, then you should use: object
for both (Any
is not needed in input positions like these). bool | int
simplifies toint
,int | float
simplifies tofloat
, andfloat | complex
simplifies tocomplex
. Shorter annotations helps to avoid type-checker error message spaghetti.
|
||
SupportsBufferProtocol = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In NumPy, we've been using this in the stubs:
@runtime_checkable
class _Buffer(Protocol):
def __buffer__(self, flags: int, /) -> memoryview: ...
if sys.version_info >= (3, 12):
_SupportsBuffer: TypeAlias = _Buffer
else:
import array as _array
import mmap as _mmap
_SupportsBuffer: TypeAlias = (
_Buffer | bytes | bytearray | memoryview | _array.array[Any] | _mmap.mmap | NDArray[Any] | generic
)
https://github.com/numpy/numtype/blob/main/src/numpy-stubs/__init__.pyi#L469-L483
This very same code should also work in a .py
.
And in case you're not comfortable with the <3.12 workaround, you could in that case use Any
instead, so e.g.
@runtime_checkable
class _Buffer(Protocol):
def __buffer__(self, flags: int, /) -> memoryview: ...
if sys.version_info >= (3, 12):
_SupportsBuffer: TypeAlias = _Buffer
else:
_SupportsBuffer: TypeAlias = _Buffer | Any
Note that _Buffer | Any
is not the same as Any
:
The union
T | Any
is not reducible to a simpler form. It represents an unknown static type with lower boundT
. That is, it represents an unknown set of objects which may be as large as object, or as small asT
, but no smaller.
https://typing.readthedocs.io/en/latest/spec/concepts.html#union-types
And even before 3.12, _Buffer
will already work in the stdlib, because typeshed conveniently chose to lie about the existence of __buffer__
in the stdlib stubs (even though it's very type-unsafe).
But you shouldn't expect other stub packages to also lie about this: In the current numpy
stubs, for example, annotates it correctly: https://github.com/numpy/numpy/blob/e1e985cd0e2a8d40ec84fc6aed1ee98934baeea8/numpy/__init__.pyi#L2066-L2067
tldr; this type could be meaningfully narrowed using a Protocol
implementing __buffer__
, that will even help before Python 3.12 in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and don't forget about the : TypeAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise, this is rather verbose, and in a typing context the term Protocol
already has a special meaning, that's from the "protocol" that's used in "buffer protocol".
For protocols with a single (dunder) method, I personally prefer the Can*
prefix over the Supports*
prefix. Not only because it's much shorter, but also because it's more meaningful.
From the optype docs:
optype.Can{}
types describe what can be done with it. For instance, anyCanAbs[T]
type can be used as argument to theabs()
builtin function with return typeT
. MostCan{}
implement a single special method, whose name directly matched that of the type.CanAbs
implements__abs__
,CanAdd
implements__add__
, etc.
So I propose to instead name this CanBuffer
, as I've done so myself with the equivalent optype.Buffer
protocol (docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol
to array-api-typing
?
I was going to suggest adding a comment that says that that group of annotations (Array
, DType
, etc.) are just placeholders until array-api-typing
exports the full definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a placeholder? Aren't they ultimately going to point
SupportsBufferProtocol
toarray-api-typing
?
My point is that there's no need to wait, because it can already be done without much effort
array_api_compat/common/_aliases.py
Outdated
import inspect | ||
from typing import Any, NamedTuple, Optional, Sequence, Tuple, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the __future__.annotations
import, you can already use A | B
instead of Union[A, B]
, and T | None
instead of Optional[T]
, even before 3.10.
So on Python 3.9, this is valid:
from __future__ import annotations
def ensure_str(x: str | bytes) -> str:
return x if isinstance(x, str) else x.decode()
Which can equivalently we written __future__.annotations
as
def ensure_str(x: "str | bytes") -> "str":
return x if isinstance(x, str) else x.decode()
Note that this will only work within annotations. In case of type aliases, you need to manually wrap the type expression as a string:
from __future__ import annotations
from typing import TypeAlias
# This will raise (and doesn't care about the future)
# Stringy: TypeAlias = str | bytes
# Take the future into your own hands (apply string theory)
Stringy: TypeAlias = "str | bytes"
The TypeAlias
is very mandatory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tuple
type should not be used; tuple
itself should be used instead (which is supported as generic type since 3.9).
x: Array, | ||
/, | ||
xp: Namespace, | ||
*, | ||
dtype: Optional[DType] = None, | ||
device: Optional[Device] = None, | ||
**kwargs, | ||
) -> Array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When dtype
and device
are None
, then it's certain that the returned array is of the same type x
. This can be annotated by adding an additional generic overload:
# additional required imports
from typing import TypeVar, overload
# [...]
# this is the recommended naming scheme for (invariant)
# type parameters like this one, i.e. as `_{}T`
# TODO: add `bound=Array` once `Array` is defined (never set `bound` to `Any`)
_ArrayT = TypeVar("_ArrayT")
@overload
def empty_like(
x: _ArrayT,
/,
xp: Namespace,
*,
dtype: None = None,
device: None = None,
**kwargs: object,
) -> _ArrayT: ...
@overload
def empty_like(
x: Array,
/,
xp: Namespace,
*,
dtype: DType | None = None,
device: Device | None = None,
**kwargs: object,
) -> Array: ...
def empty_like(
x: Array,
/,
xp: Namespace,
*,
dtype: DType | None = None,
device: DType | None = None,
**kwargs: object, # don't forget about these ones
) -> Array:
# do the thing
This way, xp.empty_like(np.zeros(1, np.int8))
will result in a np.ndarray[tuple[int], np.dtype[np.int8]]
type, instead of Any
.
_check_device(xp, device) | ||
return xp.full(shape, fill_value, dtype=dtype, **kwargs) | ||
|
||
def full_like( | ||
x: ndarray, | ||
x: Array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment for empty_like
also applies here
SupportsBufferProtocol, | ||
], | ||
/, | ||
*, | ||
dtype: Optional[Dtype] = None, | ||
dtype: Optional[DType] = None, | ||
device: Optional[Device] = None, | ||
copy: "Optional[Union[bool, np._CopyMode]]" = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for quotes here; np._CopyMode
even exists at runtime.
A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing? I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end. |
Stubs target a specific package. What your talking about sounds more similar to optype, which is a collection of re-usable protocols, aliases, and other typing utilities, that's heavy used by It's indeed that role that At the moment I'm busy rewriting the numpy stubs, and building an accompanying In the meantime, I don't see any need for holding off on writing |
Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.) Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!) |
Array-api-compat is still going to need annotations. But array-api-typing will make it easier to do this in a more accurate way.
Haha, it's good to see that I'm not the only one that thinks typing things is exciting. |
@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up. I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking. |
Yea of course; most of my suggestions fall into the category of dotting the "i"'s. Specifically the stuff about overloads, the It's just that when I see some opportunity for improving annotations, I get enthusiastic, which sometimes leads to out-of-scope nitpicking or mostly off-topic rants. But I can see how that could be experienced as overwhelming 😅. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @crusaderky, @jorenham ! Please feel free to open issues for follow-ups if that would be useful to track ideas.
there is a merge conflict |
Please don't remove explicit bools. Python bools notwithstanding, Array API promotions (https://data-apis.org/array-api/2023.12/API_specification/type_promotion.html) do distinguish between One other thing I'd really like to see is the same notation (DType vs Dtype, Array or array or ndarray) here and in |
solved conflicts. |
I agree that explicit is better than implicit. But keep in mind that unions can have a big impact on type-checker performance when used to annotate parameters of overloaded functions. It also makes the error messages that type-checkers report longer, which makes typing errors more difficult to debug. If you're new to static typing in python, and you see Also, if you're going to be consistent about this, then you should do this for |
One last thing about import time
from typing import assert_type
def f[T1, T2](a: T1, b: T2) -> T1 | T2:
return a if time.time() % 2 else b
b: bool = bool("")
i: int = int("42")
assert_type(f(b, i), int) Here, mypy reports no errors, but
Playground links: |
How is this not a bug? I'm +1 in favour of compactness myself. FWIW, Eventually, I'd hope for |
In pyright there are no bugs; everything is "as designed" 🤭
Amen. Intersection types is probably the no. 1 or no. 2 on my python typing wishlist.
That's pretty much what I had in mind as well. And protocols are actually essential to building that, so no worries |
Well, implementation details aside, the semantics of bools and ints is clearly different, at least in numeric contexts. |
I agree. In fact, I agree so much that I created a workaround for this: A type that only accepts |
I think this PR is ready to be merged? |
Ruff failure looks real? |
It's looks similar to a bug in ruff 0.11.1, which has been fixed in 0.11.2 |
thanks again Guido |
if not endpoint: | ||
return torch.linspace(start, stop, num+1, dtype=dtype, device=device, **kwargs)[:-1] | ||
return torch.linspace(start, stop, num, dtype=dtype, device=device, **kwargs) | ||
|
||
# torch.full does not accept an int size | ||
# https://github.com/pytorch/pytorch/issues/70906 | ||
def full(shape: Union[int, Tuple[int, ...]], | ||
fill_value: Union[bool, int, float, complex], | ||
fill_value: complex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and similar changes are still incorrect. Please revert this and other instances of this @lucascolley @crusaderky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a matter of preference as they are equivalent. I absolutely agree that we should be as explicit as possible in docstrings, but I am yet to be convinced that the benefits of being explicit in type annotations outweigh the drawbacks.
I'll be working on the follow-up |
To explain further my motivation for merge here: I think the problems highlighted with implicit subtypes are a (serious) issue for everyone unfamiliar with the nuances of Python typing, but they are problems that need to be addressed at a wider level. I think it would be a big mistake to remove this explicitness from the spec, as that is the source of truth for these APIs to which people will look. On the other hand, those looking at type annotations for array-api-compat are likely to be those more familiar with these typing nuances. What we really need to make as clear as possible are the spec pages. I'm hoping that array-api-typing can provide concise, documented type aliases so that we can get rid of types like Until we get there, I don't see the harm in simplifying annotations in these auxiliary libraries, especially when changes are (however begrudgingly) consistent with the wider typing ecosystem. |
This alone is a smoking-gun sign that things should not be merged as is until there's a consensus, or at least a wider level attempt at reaching one. |
feel free to revert the merge if you would like to continue the discussion! |
I'd rather that you don't actually, as I'm quite far into the follow-up PR now |
If we're going to conform to the Python typing spec, then If we don't, then we're need to have an exceptionally good reason for doing so. But so far, I haven't heard any. |
Can we open a high-level discussion on |
Review all type annotations.
DType
andDtype
. I pickedDType
in accordance withnumpy.typing
.Array
,ndarray
, orarray
. I pickedArray
as it is more idiomatic.TYPE_CHECKING
guards, which were doing nothing but reduce readability (performance impact at runtime is absolutely inconsequential)py.typed
. Note that this will typically not work forarray_api_compat.<namespace>.
modules as they are typically pulled in byarray_namespace
, so for most people it is exclusively aboutarray_api_compat.common._helpers
.Out of scope
common._typing
public. This will be in a later PR.