Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Python's typing instead of attaching Type objects to Variables #134

Open
brandonwillard opened this issue Oct 28, 2020 · 19 comments · May be fixed by #1207
Open

Use Python's typing instead of attaching Type objects to Variables #134

brandonwillard opened this issue Oct 28, 2020 · 19 comments · May be fixed by #1207
Labels
graph objects help wanted Extra attention is needed important question Further information is requested refactor This issue involves refactoring typing Work related to type checking

Comments

@brandonwillard
Copy link
Member

brandonwillard commented Oct 28, 2020

Instead of creating "type" classes (i.e. Types) and manually attaching them to Aesara objects (i.e. via Variable.type), we should actual use Python's built-in type/class inheritance. I'm not immediately aware of any required functionality in the current Aesara type system that wouldn't allow this.

Here's an illustration of the current situation:

import aesara.tensor as at


x = at.vector()

We created a simple Aesara vector, x; now, let's inspect its (Python) types:

>>> type(x).mro()

[aesara.tensor.var.TensorVariable,
 aesara.tensor.var._tensor_py_operators,
 aesara.gof.graph.Variable,
 aesara.gof.graph.Node,
 aesara.gof.utils.object2,
 object]

Oddly, there's also a separate Variable.type field/member/property introduced by inheriting from Variable:

>>> x.type

TensorType(float64, vector)

>>> x.type.Variable

aesara.tensor.var.TensorVariable

>>> x.type.Constant

aesara.tensor.var.TensorConstant

As we can see, this extra Type object holds information about the variable's dtype and broadcastable dimensions (i.e. the vector means x.type.broadcastable equals [False]/[True]). From the latter two properties, it's also clear that the actual Python type (i.e. TensorVariable) is directly associated with the Aesara type (i.e. TensorType) as illustrated by the value of TensorType.Variable.

This leads to the question: why doesn't TensorVariable just inherit from its x.type type/class (i.e. some TensorType)?

If anything, the mechanics behind such inheritance might seem non-standard, as it could require some form of automatic type generation for sub-classes like TensorVariable, but—even so—that's pretty straightforward to do in a __new__ class method (e.g. we already do that in symbolic-pymc).

Regarding actual code changes, it seems like we would at least need to change all the Variable.type invocations with the standard type(...) calls, and/or more direct calls to isinstance(x, TensorType) instead of things like x.type == TensorType(...).

We could also keep Variable.type and have it return type(self) just to make the initial transition easy.

@brandonwillard brandonwillard added important refactor This issue involves refactoring help wanted Extra attention is needed labels Oct 28, 2020
@brandonwillard brandonwillard changed the title Use actual types for Theano types Inherit from Theano types Oct 28, 2020
@michaelosthege michaelosthege added this to the 2.0 milestone Dec 15, 2020
@brandonwillard brandonwillard removed this from the 2.0 milestone Feb 2, 2021
@brandonwillard brandonwillard added the question Further information is requested label Feb 2, 2021
@brandonwillard
Copy link
Member Author

@LegrandNico, since you're becoming Aesara's type expert, I would like to introduce you to this potentially revolutionary refactoring topic...

@brandonwillard brandonwillard changed the title Inherit from Theano types Use actual Python types instead of separate Type classes Mar 12, 2021
@brandonwillard brandonwillard changed the title Use actual Python types instead of separate Type classes Use actual Python types instead of attaching Type objects to Variables Mar 12, 2021
@brandonwillard brandonwillard changed the title Use actual Python types instead of attaching Type objects to Variables Use Python's inheritance instead of attaching Type objects to Variables Mar 12, 2021
@brandonwillard brandonwillard changed the title Use Python's inheritance instead of attaching Type objects to Variables Use Python's OOP typing instead of attaching Type objects to Variables Mar 12, 2021
@brandonwillard brandonwillard changed the title Use Python's OOP typing instead of attaching Type objects to Variables Use Python's OOP typing instead of attaching Type objects to Variables Dec 31, 2021
@brandonwillard
Copy link
Member Author

brandonwillard commented Apr 7, 2022

The issue fixed in #892 could possibly have been caught by Mypy's type checking if the approach described here was in place.

In that example, the problematic at.minimum(x, y)-like expression could've been noticed by Mypy, since at.minimum could be declared to have a (FloatLikeType, FloatLikeType) inputs signature, which would've failed when given the incompatible y: BoolLikeType that it was being given.

@brandonwillard brandonwillard changed the title Use Python's OOP typing instead of attaching Type objects to Variables Use Python types instead of attaching Type objects to Variables Apr 7, 2022
@brandonwillard brandonwillard changed the title Use Python types instead of attaching Type objects to Variables Use Python's typing instead of attaching Type objects to Variables Apr 7, 2022
@brandonwillard brandonwillard added the typing Work related to type checking label Apr 7, 2022
@brandonwillard
Copy link
Member Author

brandonwillard commented Apr 7, 2022

It looks like we would need to do something exactly like NumPy is currently doing with its ndarray typing: i.e. use typing.Generic parameterized on dtype and shape information.

Here's a simple example:

import numpy as np
import typing
import typing_inspect


DataType = typing.TypeVar("DataType")


class TensorType(typing.Generic[DataType]):

    @classmethod
    @property
    def dtype(cls) -> DataType:
        generic_bases = typing_inspect.get_generic_bases(cls)
        type_base = next((b for b in generic_bases if typing_inspect.get_origin(b) == TensorType))
        dtype = typing_inspect.get_args(type_base)[0]
        return dtype


Float32TensorType = TensorType[np.float32]


assert Float32TensorType == TensorType[np.float32]


class Float32Variable(Float32TensorType):
    pass


scalar_float_inst = Float32Variable()

assert scalar_float_inst.dtype is np.float32

assert isinstance(scalar_float_inst, TensorType)
assert isinstance(scalar_float_inst, Float32Variable)

Unfortunately, checks based on the parameterized TensorTypes won't work, but it might not matter given that checks like the preceding Float32Variable one work:

# Fails
assert isinstance(scalar_float_inst, Float32TensorType)

@dhirschfeld
Copy link

There might possibly be some useful typing examples in the array-api project:

@brandonwillard
Copy link
Member Author

brandonwillard commented Apr 8, 2022

To clarify, adding this kind of parameterized typing will allow type checking (i.e. Mypy) to catch issues involving incompatible dtypes and (static) shapes. Currently, these kinds of issues require explicit unit tests to catch, and that's quite a burden on our testing requirements.

Also, it should be possible to add this on top of the existing codebase without making the changes implied by this issue.

In other words, we don't need to make the change from using TensorType instances (i.e. where TensorType is a standard class and we create instance of that class for each specific dtype and static shape/ndim) to explicit TensorType types (i.e. where TensorType is a "base" type and "instances" are new type-compatible objects specific to each dtype and static shape/ndim).

@markusschmaus
Copy link

Based on @brandonwillard's code example above I developed a more complete example of how a TensorType based on Python's typing system could look like.

https://github.com/markusschmaus/Documents/blob/main/aesara_typing_demo.py

One key difference between his code and mine, is that @brandonwillard used inspection to extract the dtype information from the GenericAlias. Since python's typing system is still rapidly evolving, runtime behavior of GenericAlias has changed in the past (isinstance no longer works with GenericAlias), and typing_inspection is still marked experimental, I decided to use a different approach. Since, as he suggested, we need to dynamically generate sub-classes, we can store the relevant information in these sub-classes when they are created in a way which can be easily retrieved later.

The demo passes mypy without errors and contains a few asserts which also succeed.

I tried to get TensorType((3, 7), np.float64) to work without mypy errors, but I believe this is currently impossible using __call__ (see python/mypy#6721 (comment)).

Based on this demo I believe that implementing this suggestion is possible and I would be interested in working on this. However this would be a major refactoring and only makes sense if there is support from existing aesara developers.

One aspect I would like to explore, when working on this, is the role of Variables as part of the shape specification and how they interact with broadcasting, and how they could be leveraged for aesara to support dimensions (see #1013).

@brandonwillard
Copy link
Member Author

brandonwillard commented Sep 22, 2022

One key difference between his code and mine, is that @brandonwillard used inspection to extract the dtype information from the GenericAlias. Since python's typing system is still rapidly evolving, runtime behavior of GenericAlias has changed in the past (isinstance no longer works with GenericAlias), and typing_inspection is still marked experimental, I decided to use a different approach. Since, as he suggested, we need to dynamically generate sub-classes, we can store the relevant information in these sub-classes when they are created in a way which can be easily retrieved later.

Yes, that example was completely experimental; we definitely don't need to take an approach like that. As long as what we're currently calling Variables become instances of Types, and we can adequately track "static"/type-level information like dtype and (partial) shapes using those new Types, we're much better off.

Based on this demo I believe that implementing this suggestion is possible and I would be interested in working on this. However this would be a major refactoring and only makes sense if there is support from existing aesara developers.

I'm ready to make these changes sooner than later, so, if you want to work on this, we can start now. A draft PR is the next step.

One aspect I would like to explore, when working on this, is the role of Variables as part of the shape specification and how they interact with broadcasting, and how they could be leveraged for aesara to support dimensions (see #1013).

Be sure to read the comments here: #1013 (reply in thread). In other words, we shouldn't need anything at the type-level to support most kinds of dimension tracking. Regardless, these changes would help any related efforts a lot.

@markusschmaus

This comment was marked as off-topic.

@brandonwillard

This comment was marked as off-topic.

@brandonwillard
Copy link
Member Author

Before I forget, here's some related (and interesting) work on Python typing for shapes: https://peps.python.org/pep-0646/

@brandonwillard
Copy link
Member Author

Come to think of it, we could start using TypeVarTuple and Unpack from typing_extensions right now.

@markusschmaus

This comment was marked as off-topic.

@ricardoV94

This comment was marked as off-topic.

@markusschmaus

This comment was marked as off-topic.

@ricardoV94

This comment was marked as off-topic.

@markusschmaus

This comment was marked as off-topic.

@markusschmaus markusschmaus linked a pull request Sep 23, 2022 that will close this issue
10 tasks
@markusschmaus
Copy link

Here is the PR
#1207

@brandonwillard brandonwillard linked a pull request Sep 23, 2022 that will close this issue
10 tasks
@aseyboldt

This comment was marked as off-topic.

@brandonwillard

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graph objects help wanted Extra attention is needed important question Further information is requested refactor This issue involves refactoring typing Work related to type checking
Projects
Status: Graph
Development

Successfully merging a pull request may close this issue.

6 participants