Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: Type annotations overhaul, part 1 #257

Merged
merged 14 commits into from
Mar 22, 2025

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Feb 27, 2025

Review all type annotations.

  • dtypes were called a mix of DType and Dtype. I picked DType in accordance with numpy.typing.
  • arrays were called a mix of Array, ndarray, or array. I picked Array as it is more idiomatic.
  • Added Namespace annotation
  • Removed many TYPE_CHECKING guards, which were doing nothing but reduce readability (performance impact at runtime is absolutely inconsequential)
  • Add py.typed. Note that this will typically not work for array_api_compat.<namespace>. modules as they are typically pulled in by array_namespace, so for most people it is exclusively about array_api_compat.common._helpers.

Out of scope

  • Add type checkers to CI
  • Make common._typing public. This will be in a later PR.

@crusaderky crusaderky marked this pull request as draft February 27, 2025 13:38
@ev-br
Copy link
Member

ev-br commented Feb 27, 2025

PSA: CI failing left and right is a transient artifact, will get back to reasonable after gh-250 is in and 1.11 is out.

This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12.

@crusaderky
Copy link
Contributor Author

This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12.

Then we should temporarily remove py.typed introduced in #254 CC @NeilGirdhar

@ev-br
Copy link
Member

ev-br commented Feb 27, 2025

Then we should temporarily remove py.typed

Yes. I'll do this in gh-250

@crusaderky crusaderky force-pushed the annotations branch 2 times, most recently from 30ed6ef to 607d9bb Compare February 27, 2025 14:09
@crusaderky
Copy link
Contributor Author

Discussion

The Array API documentation calls the types array, dtype, and device all lowercase.
I'm not a fan for obvious reasons. Should we change them here to be all lowercase, or should we propose changing them to uppercase in array-api?

@crusaderky crusaderky marked this pull request as ready for review February 27, 2025 14:58
@ev-br ev-br added this to the 1.12 milestone Feb 27, 2025
@crusaderky crusaderky force-pushed the annotations branch 3 times, most recently from 756a3fa to 07e34bb Compare February 27, 2025 17:00
@ev-br
Copy link
Member

ev-br commented Feb 28, 2025

The Array API documentation calls the types array, dtype, and device all lowercase. ... Should we change them here to be all lowercase, or should we propose changing them to uppercase in array-api?

Great idea, would be best to coordinate with the spec indeed. Consortium meetings are weekly on Mondays (hint hint).

@ev-br
Copy link
Member

ev-br commented Mar 2, 2025

Also would be best to coordinate with the array-api-typing initiative, cc
@jorenham @nstarman

Copy link

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, typing! 🎉

Apart from the inline comments, there's also some generally applicable opportunities for improvement

  • Using TypeAlias when defining things like DType: TypeAlias = np.dtype[...], so that type-checkers know how to treat it
  • Avoid dynamic __all__ construction, so that static type-checkers know what's exported
  • Since Python 3.9 typing.Tuple is deprecated in favor of using tuple directly
  • Optional and Union are not needed when annotating things if you use __future__.annotations, even if you use Python 3.9.
  • all *args and **kwargs I've seen are missing annotations. If unsure what these should be, then you should use : object for both (Any is not needed in input positions like these).
  • bool | int simplifies to int, int | float simplifies to float, and float | complex simplifies to complex. Shorter annotations helps to avoid type-checker error message spaghetti.


SupportsBufferProtocol = Any
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In NumPy, we've been using this in the stubs:

@runtime_checkable
class _Buffer(Protocol):
    def __buffer__(self, flags: int, /) -> memoryview: ...

if sys.version_info >= (3, 12):
    _SupportsBuffer: TypeAlias = _Buffer
else:
    import array as _array
    import mmap as _mmap

    _SupportsBuffer: TypeAlias = (
        _Buffer | bytes | bytearray | memoryview | _array.array[Any] | _mmap.mmap | NDArray[Any] | generic
    )

https://github.com/numpy/numtype/blob/main/src/numpy-stubs/__init__.pyi#L469-L483

This very same code should also work in a .py.

And in case you're not comfortable with the <3.12 workaround, you could in that case use Any instead, so e.g.

@runtime_checkable
class _Buffer(Protocol):
    def __buffer__(self, flags: int, /) -> memoryview: ...

if sys.version_info >= (3, 12):
    _SupportsBuffer: TypeAlias = _Buffer
else:
    _SupportsBuffer: TypeAlias = _Buffer | Any

Note that _Buffer | Any is not the same as Any:

The union T | Any is not reducible to a simpler form. It represents an unknown static type with lower bound T. That is, it represents an unknown set of objects which may be as large as object, or as small as T, but no smaller.

https://typing.readthedocs.io/en/latest/spec/concepts.html#union-types

And even before 3.12, _Buffer will already work in the stdlib, because typeshed conveniently chose to lie about the existence of __buffer__ in the stdlib stubs (even though it's very type-unsafe).
But you shouldn't expect other stub packages to also lie about this: In the current numpy stubs, for example, annotates it correctly: https://github.com/numpy/numpy/blob/e1e985cd0e2a8d40ec84fc6aed1ee98934baeea8/numpy/__init__.pyi#L2066-L2067


tldr; this type could be meaningfully narrowed using a Protocol implementing __buffer__, that will even help before Python 3.12 in some cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and don't forget about the : TypeAlias

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming-wise, this is rather verbose, and in a typing context the term Protocol already has a special meaning, that's from the "protocol" that's used in "buffer protocol".

For protocols with a single (dunder) method, I personally prefer the Can* prefix over the Supports* prefix. Not only because it's much shorter, but also because it's more meaningful.
From the optype docs:

optype.Can{} types describe what can be done with it. For instance, any CanAbs[T] type can be used as argument to the abs() builtin function with return type T. Most Can{} implement a single special method, whose name directly matched that of the type. CanAbs implements __abs__, CanAdd implements __add__, etc.

So I propose to instead name this CanBuffer, as I've done so myself with the equivalent optype.Buffer protocol (docs).

Copy link
Contributor

@NeilGirdhar NeilGirdhar Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol to array-api-typing?

I was going to suggest adding a comment that says that that group of annotations (Array, DType, etc.) are just placeholders until array-api-typing exports the full definitions?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol to array-api-typing?

My point is that there's no need to wait, because it can already be done without much effort

import inspect
from typing import Any, NamedTuple, Optional, Sequence, Tuple, Union
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the __future__.annotations import, you can already use A | B instead of Union[A, B], and T | None instead of Optional[T], even before 3.10.

So on Python 3.9, this is valid:

from __future__ import annotations

def ensure_str(x: str | bytes) -> str:
    return x if isinstance(x, str) else x.decode()

Which can equivalently we written __future__.annotations as

def ensure_str(x: "str | bytes") -> "str":
    return x if isinstance(x, str) else x.decode()

Note that this will only work within annotations. In case of type aliases, you need to manually wrap the type expression as a string:

from __future__ import annotations
from typing import TypeAlias

# This will raise (and doesn't care about the future)
# Stringy: TypeAlias = str | bytes 

# Take the future into your own hands (apply string theory)
Stringy: TypeAlias = "str | bytes"

The TypeAlias is very mandatory here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tuple type should not be used; tuple itself should be used instead (which is supported as generic type since 3.9).

Comment on lines +43 to +50
x: Array,
/,
xp: Namespace,
*,
dtype: Optional[DType] = None,
device: Optional[Device] = None,
**kwargs,
) -> Array:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dtype and device are None, then it's certain that the returned array is of the same type x. This can be annotated by adding an additional generic overload:

# additional required imports
from typing import TypeVar, overload

# [...]

# this is the recommended naming scheme for (invariant) 
# type parameters like this one, i.e. as `_{}T`
# TODO: add `bound=Array` once `Array` is defined (never set `bound` to `Any`)
_ArrayT = TypeVar("_ArrayT") 

@overload
def empty_like(
    x: _ArrayT,
    /,
    xp: Namespace,
    *,
    dtype: None = None, 
    device: None = None,
    **kwargs: object,
) -> _ArrayT: ...
@overload
def empty_like(
    x: Array,
    /,
    xp: Namespace,
    *,
    dtype: DType | None = None, 
    device: Device | None = None,
    **kwargs: object,
) -> Array: ...
def empty_like(
    x: Array,
    /,
    xp: Namespace,
    *,
    dtype: DType | None = None, 
    device: DType | None = None,
    **kwargs: object,  # don't forget about these ones
) -> Array:
    # do the thing

This way, xp.empty_like(np.zeros(1, np.int8)) will result in a np.ndarray[tuple[int], np.dtype[np.int8]] type, instead of Any.

_check_device(xp, device)
return xp.full(shape, fill_value, dtype=dtype, **kwargs)

def full_like(
x: ndarray,
x: Array,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment for empty_like also applies here

SupportsBufferProtocol,
],
/,
*,
dtype: Optional[Dtype] = None,
dtype: Optional[DType] = None,
device: Optional[Device] = None,
copy: "Optional[Union[bool, np._CopyMode]]" = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for quotes here; np._CopyMode even exists at runtime.

@ev-br
Copy link
Member

ev-br commented Mar 2, 2025

A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing?
Then, it probably means the typing stubs live somewhere not inline---similar to scipy-stubs?

I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end.

@jorenham
Copy link

jorenham commented Mar 2, 2025

A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing? Then, it probably means the typing stubs live somewhere not inline---similar to scipy-stubs?

I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end.

Stubs target a specific package. What your talking about sounds more similar to optype, which is a collection of re-usable protocols, aliases, and other typing utilities, that's heavy used by scipy-stubs.

It's indeed that role that array-api-typing is intended to fulfill; a collection of types and typing-related utilities to help annotate array-api-related code.

At the moment I'm busy rewriting the numpy stubs, and building an accompanying optype-like (optional) typing package for NumPy. Once that's off the ground, then I plan to redirect my focus towards array-api-typing.

In the meantime, I don't see any need for holding off on writing Protocols yourself, because such a protocol might be provided by array-api-typing in the future. In fact, it will probably help speed up the development of array-api-typing that way, because that would give us a better idea of what works, and what doesn't.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 2, 2025

Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.)

Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!)

@jorenham
Copy link

jorenham commented Mar 2, 2025

Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.)

Array-api-compat is still going to need annotations. But array-api-typing will make it easier to do this in a more accurate way.

Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!)

Haha, it's good to see that I'm not the only one that thinks typing things is exciting.

@ev-br ev-br modified the milestones: 1.11.1, 1.12 Mar 4, 2025
@crusaderky
Copy link
Contributor Author

crusaderky commented Mar 4, 2025

@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up.
There are places where I added deliberately obsolete notation that is consistent with the notation style that is extensively used across the repo, specifically because I didn't want to update the whole repo in a single PR.

I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking.

@crusaderky crusaderky changed the title ENH: Type annotations overhaul ENH: Type annotations overhaul, part 1 Mar 4, 2025
@jorenham
Copy link

jorenham commented Mar 4, 2025

@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up. There are places where I added deliberately obsolete notation that is consistent with the notation style that is extensively used across the repo, specifically because I didn't want to update the whole repo in a single PR.

I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking.

Yea of course; most of my suggestions fall into the category of dotting the "i"'s. Specifically the stuff about overloads, the **kwargs: object and simplifications for float | int and the NestedSequence unions, are perfectly fine to ignore, and perhaps reconsider at a later point.

It's just that when I see some opportunity for improving annotations, I get enthusiastic, which sometimes leads to out-of-scope nitpicking or mostly off-topic rants. But I can see how that could be experienced as overwhelming 😅.

@ev-br ev-br added this to the 1.12 milestone Mar 20, 2025
Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @crusaderky, @jorenham ! Please feel free to open issues for follow-ups if that would be useful to track ideas.

@lucascolley
Copy link
Member

there is a merge conflict

@ev-br
Copy link
Member

ev-br commented Mar 21, 2025

I can't find any mention that bool | int simplifies to int?

Please don't remove explicit bools. Python bools notwithstanding, Array API promotions (https://data-apis.org/array-api/2023.12/API_specification/type_promotion.html) do distinguish between bool and int.
So unless and until that changes in the spec, array-api-compat must treat them separately.

One other thing I'd really like to see is the same notation (DType vs Dtype, Array or array or ndarray) here and in array-api-strict. It's great that you made it uniform here @crusaderky, could we please have -strict follow.

@crusaderky
Copy link
Contributor Author

solved conflicts.
numpy 1.21 failure is unrelated to this PR.
ruff failure is an upstream regression: astral-sh/ruff#16878

@jorenham
Copy link

jorenham commented Mar 21, 2025

Please don't remove explicit bools.

I agree that explicit is better than implicit. But keep in mind that unions can have a big impact on type-checker performance when used to annotate parameters of overloaded functions. It also makes the error messages that type-checkers report longer, which makes typing errors more difficult to debug.

If you're new to static typing in python, and you see bool | int, it wouldn't be strange to think that annotating x: int would disallow bool (which is wrong).
In the same way, if a biology textbook aways says "animals and ducks" instead of just "animals", then you shouldn't be surprised that readers interpret that as "ducks aren't animals".

Also, if you're going to be consistent about this, then you should do this for complex as well. So that means writing it complex | float | int | bool. It's not difficult to imagine how that would make the code pretty unreadable, and how it would spaghettify the type-checker error messages.

@jorenham
Copy link

One last thing about bool | int: It can lead to unexpected typing errors, depending on which type-checker you use:

import time
from typing import assert_type

def f[T1, T2](a: T1, b: T2) -> T1 | T2:
    return a if time.time() % 2 else b

b: bool = bool("")
i: int = int("42")

assert_type(f(b, i), int)

Here, mypy reports no errors, but (based)?pyright reports

"assert_type" mismatch: expected "int" but received "bool | int"  (reportAssertTypeFailure)

Playground links:

@crusaderky
Copy link
Contributor Author

crusaderky commented Mar 21, 2025

Here, mypy reports no errors, but (based)?pyright reports

"assert_type" mismatch: expected "int" but received "bool | int"  (reportAssertTypeFailure)

How is this not a bug?

I'm +1 in favour of compactness myself. FWIW, typing is currently very prominently lacking "and" and "not" operators, so it's impossible to write complex & ~bool or complex and not bool.

Eventually, I'd hope for array-api-types to define Array as a generic that accepts bespoke types that mimic the isdtype argument, e.g. Array[Integral | Floating] (where Integral is a superclass of SignedInteger and UnsignedInteger, and Floating is a superclass of RealFloating and ComplexFloating).
How doubt that will integrate well with it being a Protocol though.

@jorenham
Copy link

Here, mypy reports no errors, but (based)?pyright reports

"assert_type" mismatch: expected "int" but received "bool | int"  (reportAssertTypeFailure)

How is this not a bug?

In pyright there are no bugs; everything is "as designed" 🤭

I'm +1 in favour of compactness myself. FWIW, typing is currently very prominently lacking "and" and "not" operators, so it's impossible to write complex & ~bool or complex and not bool.

Amen. Intersection types is probably the no. 1 or no. 2 on my python typing wishlist.

Eventually, I'd hope for array-api-types to define Array as a generic that accepts bespoke types that mimic the isdtype argument, e.g. Array[Integral | Floating] (where Integral is a superclass of SignedInteger and UnsignedInteger, and Floating is a superclass of RealFloating and ComplexFloating).
How doubt that will integrate well with it being a Protocol though.

That's pretty much what I had in mind as well. And protocols are actually essential to building that, so no worries

@ev-br
Copy link
Member

ev-br commented Mar 21, 2025

I agree that explicit is better than implicit. But keep in mind that unions can have a big impact on type-checker performance when used to annotate parameters of overloaded functions. It also makes the error messages that type-checkers report longer, which makes typing errors more difficult to debug.

If you're new to static typing in python, and you see bool | int, it wouldn't be strange to think that annotating x: int would disallow bool (which is wrong).

Well, implementation details aside, the semantics of bools and ints is clearly different, at least in numeric contexts.
If the typing community does not recognize it, maybe it takes somebody immersed in that community (hint hint) to advocate for fixing this. Array API promotion rules, https://data-apis.org/array-api/2023.12/API_specification/type_promotion.html, can be thought of a distillation of a couple of decades of collective experience.

@jorenham
Copy link

Well, implementation details aside, the semantics of bools and ints is clearly different, at least in numeric contexts.
If the typing community does not recognize it, maybe it takes somebody immersed in that community (hint hint) to advocate for fixing this.

I agree. In fact, I agree so much that I created a workaround for this: A type that only accepts int but rejects bool. And coincidentally, it's already part of optype, and is called JustInt. And as a cherry on top, it even has documentation).

@crusaderky
Copy link
Contributor Author

I think this PR is ready to be merged?

@lucascolley
Copy link
Member

Ruff failure looks real?

@jorenham
Copy link

jorenham commented Mar 22, 2025

Ruff failure looks real?

It's looks similar to a bug in ruff 0.11.1, which has been fixed in 0.11.2
astral-sh/ruff#16874

@lucascolley lucascolley changed the title ENH: Type annotations overhaul, part 1 TYP: Type annotations overhaul, part 1 Mar 22, 2025
@lucascolley lucascolley merged commit a5a1d8b into data-apis:main Mar 22, 2025
37 of 40 checks passed
@lucascolley
Copy link
Member

thanks again Guido

if not endpoint:
return torch.linspace(start, stop, num+1, dtype=dtype, device=device, **kwargs)[:-1]
return torch.linspace(start, stop, num, dtype=dtype, device=device, **kwargs)

# torch.full does not accept an int size
# https://github.com/pytorch/pytorch/issues/70906
def full(shape: Union[int, Tuple[int, ...]],
fill_value: Union[bool, int, float, complex],
fill_value: complex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and similar changes are still incorrect. Please revert this and other instances of this @lucascolley @crusaderky

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a matter of preference as they are equivalent. I absolutely agree that we should be as explicit as possible in docstrings, but I am yet to be convinced that the benefits of being explicit in type annotations outweigh the drawbacks.

@jorenham
Copy link

I'll be working on the follow-up

@lucascolley
Copy link
Member

lucascolley commented Mar 22, 2025

To explain further my motivation for merge here: I think the problems highlighted with implicit subtypes are a (serious) issue for everyone unfamiliar with the nuances of Python typing, but they are problems that need to be addressed at a wider level. I think it would be a big mistake to remove this explicitness from the spec, as that is the source of truth for these APIs to which people will look. On the other hand, those looking at type annotations for array-api-compat are likely to be those more familiar with these typing nuances.

What we really need to make as clear as possible are the spec pages. I'm hoping that array-api-typing can provide concise, documented type aliases so that we can get rid of types like Union[bool, int, float, complex] from https://data-apis.org/array-api/draft/API_specification/generated/array_api.full.html. Aliases like Scalar and RealScalar and NumericScalar (which would be an alias for a union of JustInt, JustFloat and JustComplex) will make the spec read much better in my opinion.

Until we get there, I don't see the harm in simplifying annotations in these auxiliary libraries, especially when changes are (however begrudgingly) consistent with the wider typing ecosystem.

@ev-br
Copy link
Member

ev-br commented Mar 22, 2025

but they are problems that need to be addressed at a wider level.

This alone is a smoking-gun sign that things should not be merged as is until there's a consensus, or at least a wider level attempt at reaching one.

@lucascolley
Copy link
Member

feel free to revert the merge if you would like to continue the discussion!

@jorenham
Copy link

e if you would like to continue the discussion!

I'd rather that you don't actually, as I'm quite far into the follow-up PR now

@jorenham
Copy link

If we're going to conform to the Python typing spec, then int | bool simply isn't an option: https://typing.python.org/en/latest/reference/best_practices.html#types

If we don't, then we're need to have an exceptionally good reason for doing so. But so far, I haven't heard any.

@crusaderky crusaderky deleted the annotations branch March 23, 2025 18:52
@crusaderky
Copy link
Contributor Author

Can we open a high-level discussion on array-api/ and reach a final consensus there with the wider community, instead of arguing among the 4 of us on individual PRs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants