Unify `Type` with Python's type system #1207

markusschmaus · 2022-09-23T12:38:44Z

Ongoing work on #134

Here is my plan of action:

1. Turn Type and all its sub-classes into meta-classes, such that their instances become classes themselves, making shape and dimension read-only
2. Setup a proper class hierarchy for the dynamically created classes introducing generic abstract base classes, using abc for this could be useful, but is probably not strictly necessary.
3. Merge Variable and its sub-classes with these base classes, such that type(x) == x.type
4. Add additional type hints where appropriate

After each of these steps I want Aesara to be in a valid state with all tests passing.

Here are a few important guidelines and requirements to check before your PR can be merged:

There is an informative high-level description of the changes.
The description and/or commit message(s) references the relevant GitHub issue(s).
pre-commit is installed and set up.
The commit messages follow these guidelines.
The commits correspond to relevant logical changes, and there are no commits that fix changes introduced by other commits in the same branch/BR.
There are tests covering the changes introduced in the PR.

brandonwillard · 2022-09-24T00:02:51Z

I just rebased and pushed to this branch. In general, if you ever want to update a branch/PR in this repo, rebasing (e.g. onto upstream/main) is the way to go.

brandonwillard · 2022-09-24T00:34:25Z

This initial commit appears to locally expose a Mypy issue with an if sys.platform != "win32": statement and the (platform-dependent) unreachable code that follows the implicit else branch. (I don't run Windows, so that condition is always True, I assume.)

markusschmaus · 2022-09-24T08:01:51Z

This initial commit appears to locally expose a Mypy issue with an if sys.platform != "win32": statement and the (platform-dependent) unreachable code that follows the implicit else branch. (I don't run Windows, so that condition is always True, I assume.)

I again committed with --no-verify. This branch will go through intermediate steps which are invalid in one or more ways (mypy errors, tests failing, downward incompatibility, ...)

markusschmaus · 2022-10-03T13:08:59Z

This branch still has major issues, but it's in a good enough place to have a conversation about it.

The key change in this branch so far was to make Type a subclass of type and to change the concrete types with defined parameters like shape and dtype from instances to sub-classes of their abstract parent. For the rest of the code this has the following consequences:

Since the concrete types are now sub-classes the former TensorType(shape=shape, dtype=dtype) no longer makes sense, so I changed it to TensorType.subtype(shape=shape, dtype=dtype)
There are many places in which isinstance(x, SomeType) is used to check for the type, but since concrete types are now sub-classes issubclass is more appropriate, this gets complicated since in some cases x may or may not be a type, the issubtype helper function fixes this. Identifying all the places where this change is necessary was the bulk of the work for the latest commit, though I probably haven't found all the places where this is necessary, and now that I have done the work I realize, that in its current form issubtype could probably replace isinstance in all places where it is used
In a few places type.__class__ is used to identify the abstract base of a concrete type, this is changed to type.base_type

brandonwillard · 2022-10-05T20:05:44Z

Since the concrete types are now sub-classes the former TensorType(shape=shape, dtype=dtype) no longer makes sense, so I changed it to TensorType.subtype(shape=shape, dtype=dtype)

There are many places in which isinstance(x, SomeType) is used to check for the type, but since concrete types are now sub-classes issubclass is more appropriate, this gets complicated since in some cases x may or may not be a type, the issubtype helper function fixes this. Identifying all the places where this change is necessary was the bulk of the work for the latest commit, though I probably haven't found all the places where this is necessary, and now that I have done the work I realize, that in its current form issubtype could probably replace isinstance in all places where it is used

It seems like we need to use class Variable(metaclass=Type) before these points can really be addressed, because, for instance, those checks ultimately need to become isinstance(some_var, SomeVariableSubclass) or something similar. Basically, Variable subclasses becomes the new Type instances.

This stuff starts to get pretty confusing right about here, especially when the Python/typing/Mypy restrictions get involved, so I might not be characterizing this correctly, but I do know that—as a result of these changes—we want to start using Variables where we previously were using Types.

Regarding the first point, it would be great if we could obtain (cached) subclasses via a Generics-like __getitem__ interface (e.g. Variable[dtype, static_shape]). In other words, if we can use/co-opt the type constructor interface for generics (e.g. λω with λP "emulation" for static shapes), there would be less syntactic "overhead".

markusschmaus · 2022-10-06T15:35:23Z

Regarding the first point, it would be great if we could obtain (cached) subclasses via a Generics-like __getitem__ interface (e.g. Variable[dtype, static_shape]). In other words, if we can use/co-opt the type constructor interface for generics (e.g. λω with λP "emulation" for static shapes), there would be less syntactic "overhead".

in Variable[dtype, static_shape] mypy will always give an error error: Type expected within [...] unless both dtype and static_shape are Types, (2, 3) is not a Type, but tuple[Literal[2], Literal[3]] is, similarly I couldn't get np.float32 to work, but only Type[np.float32]. It also needs to be static, so typle[Literal[someshape[0]], Literal[someshape[1]]] also always results in a mypy error. With these limitations I was able to update my demo and implement __getitem__.

https://github.com/markusschmaus/Documents/blob/main/aesara_typing_demo.py

But given these limitations, I think there needs to be a more flexible subtype method, which probably will be the standard way to construct a subtype.

It seems like we need to use class Variable(metaclass=Type) before these points can really be addressed, because, for instance, those checks ultimately need to become isinstance(some_var, SomeVariableSubclass) or something similar. Basically, Variable subclasses becomes the new Type instances.

This stuff starts to get pretty confusing right about here, especially when the Python/typing/Mypy restrictions get involved, so I might not be characterizing this correctly, but I do know that—as a result of these changes—we want to start using Variables where we previously were using Types.

This isinstance(some_var, SomeVariableSubclass) works fine. The issue is testing whether SomeVariableSubclass is a Variable. This can be done using issubclass(SomeVariableSubclass, Variable. But then there are places in the code looking like isinstance(x, Variable), where x may be SomeVariableSublcass, but also an object which isn't a type, for which issubclass(x, Variable) throws an error. And in some places there is isinstance(x, (Variable, OtherClass)). The function issubtype(x, SomeClass) solves these two cases by checking for subclass if x is a type and for isinstance otherwise. As I'm writing this, I realize that another option would be to check isinstance(SomeVariableSubclass, VariableMeta), do you think this would be the better idea?

This should prevent errors when the input is already a Numba scalar, and it will use Numba's type information to selectively apply the scalar conversion.

Using the `auto_name` values will result in cache misses when caching is based on the generated source code, so we're not going to use it.

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

markusschmaus · 2022-10-10T23:29:22Z

I asked about Type.find_inplace in #1248

markusschmaus · 2022-10-12T11:16:21Z

I'm looking at the class hierarchy of Type and Variable and it looks quite convoluted. So I am trying to summarize it here as I understand it.

(1) There are two Types which can't hold any value (DisconnectedType, NullType), for these types, there is only one kind of Variable and the following code would make perfect sense:

var = DisconnectedType()  # maybe renamed to DisconnectedVar()
assert type(var) == DisconnectedType
assert var.type == DisconnectedType

for_reasons = NullType.subtype(why_null="For reasons")
var2 = for_reasons()
assert type(var2) == for_reasons
assert var2.type == for_reasons

(2) While the two kinds of random types (RandomStateType, RandomGeneratorType) can hold a value, there is also only one kind of variable that is reasonable, (3) similarly two types can only be constant (CDataType, SliceType), so again:

var = RandomStateType()  # maybe renamed to RandomStateVar()
assert type(var) == RandomStateType
assert var.type == RandomStateType

(4) The more interesting case is that of types like TensorType which first can hold a value and there are at least four flavors of variables of this type: Variable which represents a local value which can change, SharedVariable which represents a global value which can change, Constant which represents a constant value, and NominalVariable which, as far as I understand, represents a variable of that type (with shape and dtype) without the ability to actually hold a specific value. So TensorType.subtype(shape=shape, dtype=dtype) isn't actually specific enough to be the actual python type of a variable. Here is the interface that I would suggest:

shape_dtype_spec = TensorType.subtype(shape=shape, dtype=dtype)
shape_dtype_const = shape_dtype_spec.subtype(var="constant")
shape_dtype_const2 = TensorType.subtype(shape=shape, dtype=dtype, var="constant")

assert shape_dtype_const is shape_dtype_const2

const = shape_dtype_const(value)

assert const.type == shape_dtype_const
assert type(const) == shape_dtype_const

assert isinstance(const, TensorType)
assert isinstance(const, shape_dtype_spec)
assert isinstance(const, shape_dtype_const)

assert isinstance(const, Constant)

shape_dtype_shared = TensorType.subtype(shape=shape, dtype=dtype, var="shared")

shared = shape_dtype_shared()

assert isinstance(shared, TensorType)
assert isinstance(shared, SharedVariable)

SparseTensorType and ScalarType also belong into this category, though I don't fully understand why there are two different classes named ScalarSharedVariable.

This leaves EnumType, EnumList, CEnumType, ParamsType, and TypedListType which I believe fall into category (3), but I am less certain about that.

@brandonwillard, is my understanding correct? How does the suggested interface look like to you? Am I missing something? Should the interface be different?

brandonwillard · 2022-10-14T23:05:09Z

@brandonwillard, is my understanding correct? How does the suggested interface look like to you? Am I missing something? Should the interface be different?

Yeah, that all sounds consistent enough to use.

Here's something else to keep in mind: constants don't need to be a Variable type like Constant. We could make it so that constants are represented by Apply nodes with Ops that take no inputs and always return the same values. This representation is considerably more consistent with the idea that all Variables are the output of some Apply node, and it matches the way in which constant terms are defined in mathematical and logical contexts (i.e. as nullary functions).

markusschmaus · 2022-10-16T15:20:41Z

Yeah, that all sounds consistent enough to use.

Here's something else to keep in mind: constants don't need to be a Variable type like Constant. We could make it so that constants are represented by Apply nodes with Ops that takes no inputs and always return the same values. This representation is considerably more consistent with the idea that all Variables are the output of some Apply node, and it matches the way in which constant terms are defined in mathematical and logical contexts (i.e. as nullary functions).

Would this make anything here easier? There would still be Variable, SharedVariable, as well as NominalVariable.

brandonwillard · 2022-10-19T01:02:53Z

Would this make anything here easier?

It could, if only because the way that Variable is currently typed is already problematic (for Mypy, at least), and it's due to the fact that Variable.owner can also be None. If you start making changes to Variable and notice some seemingly unrelated Mypy issues involving OptionalApplyType, it could be due to this questionable design choice.

The Mypy issues arise whenever we've attempted to take a parametric approach other than the current one that uses OptionalApplyType (e.g. using Optional[Apply[...]] instead), and also when attempting to use the parameterized information carried by the Apply types represent by OptionalApplyType (when it's non-None, of course). There may be some other ways around these specific Mypy issues, but the underlying complication is that the parameter currently represented by OptionalApplyType can be two distinct types, and that's forced by the design choice we're currently questioning.

There would still be Variable, SharedVariable, as well as NominalVariable.

I haven't thought through all of this yet, but—aside from keeping a few of those types around in some form for backward compatibility—we might be able to remove them. For instance, we wouldn't need a Constant type anymore, because all the defining type-level properties of Constant (e.g. Constant.data) could be moved to the Ops (i.e. the nullary functions that return/represent constant values).

NominalVariables could be represented by the same Variable singletons that they currently are, but with Ops taking the role of the id values. Likewise, SharedVariables could be represented by Ops that carry the underlying shared value objects and can be updated.

The real question is how we would like to distinguish between "input" Variables (i.e. Variables having Apply nodes with no inputs and that aren't to be evaluated in any way) and "constant" Variables (i.e. Variables with no inputs that can be evaluated). In many/most/all cases where the input/constant distinction is important, we will have an explicit collection telling us which Variables are inputs, and, in these cases, the question isn't very relevant, because we can always check for membership in said collection. In all other cases, we could use the type of the Op in the Variable's Apply node to distinguish between the two.

For instance, the Variable output of Apply(NullOp(), []) could represent a non-constant "input". (Note that I've removed the superfluous outputs argument of Apply's constructor for simplicity. We're likely going to remove that argument soon, and we would need/want to do that in this Variable.owner != None scenario anyway.) Constant Variables could have their own Op type(s) as well: Apply(ConstantOp(value), []).

Anyway, all this is only broadly related, and hopefully not necessary in any way. Regardless, it might be worth knowing in case there is a concrete way that these changes could help here.

brandonwillard · 2022-10-19T01:21:29Z

aesara/graph/type.py

@@ -35,7 +49,90 @@ class Type(MetaObject, Generic[D]):
    The `Type` that will be created by a call to `Type.make_constant`.
    """

-    def in_same_class(self, otype: "Type") -> Optional[bool]:
+    _prop_names: tuple[str, ...] = tuple()
+    _subclass_cache: dict[Any, "NewTypeMeta"] = dict()


While not necessarily important right now, we might want to use weak reference dictionaries here; especially when/if we start constructing distinct type/class objects for each Literal shape value.

brandonwillard

The NewTypeMeta approach is very interesting, and, as far as I can tell, looks quite promising! It seems like I'll need to run the test suite locally in order to really understand some of the details and implications of these changes (e.g. what's actually working, what's not, and why), though.

Otherwise, there appear to be some merge commits in this branch, so it will need to be rebased. That would also help make it easier to follow the changes via their distinct commits. Looking at the cumulative diffs alone gets too difficult, especially as the number of changes grows.

Also, at some point soon, we'll need to get the tests passing, at least up to the last commit. That way, reviewers can get some concrete understanding of what works, what doesn't, and what needs to be done via the failures in CI.

brandonwillard · 2022-10-19T01:52:34Z

in Variable[dtype, static_shape] mypy will always give an error error: Type expected within [...] unless both dtype and static_shape are Types, (2, 3) is not a Type, but tuple[Literal[2], Literal[3]] is, similarly I couldn't get np.float32 to work, but only Type[np.float32]. It also needs to be static, so typle[Literal[someshape[0]], Literal[someshape[1]]] also always results in a mypy error. With these limitations I was able to update my demo and implement __getitem__.

https://github.com/markusschmaus/Documents/blob/main/aesara_typing_demo.py

By the way, that demo is a great illustration, and the Literal shape approach is exactly where we need to take this.

FYI: That file would make for a good Gist use-case.

markusschmaus mentioned this pull request Sep 23, 2022

Use Python's typing instead of attaching Type objects to Variables #134

Open

brandonwillard marked this pull request as draft September 23, 2022 15:14

brandonwillard linked an issue Sep 23, 2022 that may be closed by this pull request

Use Python's typing instead of attaching Type objects to Variables #134

Open

brandonwillard added important refactor This issue involves refactoring graph objects typing Work related to type checking labels Sep 23, 2022

brandonwillard changed the title ~~Unify Type with Python type system~~ Unify Type with Python's type system Sep 23, 2022

brandonwillard force-pushed the python_types branch from 4e5add1 to d51271f Compare September 24, 2022 00:01

brandonwillard force-pushed the python_types branch from d51271f to 94c6a00 Compare September 24, 2022 00:03

brandonwillard mentioned this pull request Sep 29, 2022

Implement at.eye using existing Ops #1217

Open

1 task

markusschmaus added 3 commits September 29, 2022 16:10

Turn HasDataType and HasShape into Protocols

02f1435

Prepare switch from class to metaclass for Type

0929c9d

Convert __init__s to only accept keyword arguments

b1d4098

brandonwillard force-pushed the python_types branch from 457ebf4 to b1d4098 Compare September 29, 2022 21:10

brandonwillard and others added 4 commits October 3, 2022 15:08

Stop renaming variables in Scan's merge rewrite

c99a120

Allow shared RandomState/Generator updates in JAX compiled functions

600a118

Fix _debugprint's handling of empty profile data

3125091

Make Type a subclass of type

bc2f292

brandonwillard added 5 commits October 10, 2022 19:10

Make compilation modes configurable in compare_numba_and_py

0813e34

Enable Numba bounds checking during testing

21c46ac

Use to_scalar in numba_funcify_ScalarFromTensor

47ecc5f

This should prevent errors when the input is already a Numba scalar, and it will use Numba's type information to selectively apply the scalar conversion.

Allow FunctionGraph arguments in create_numba_signature

ff1ad55

Stop using auto_name for transpilation

26bcc9b

Using the `auto_name` values will result in cache misses when caching is based on the generated source code, so we're not going to use it.

brandonwillard and others added 9 commits October 10, 2022 19:10

Do not overwrite arguments in Numba's Scan implementation

8e67e04

Fix storage handling in numba_funcify_Scan

e7b8e9b

Make fgraph_to_python process constant FunctionGraph outputs correctly

9d07ce9

Add gufunc signature to RandomVariable\'s docstrings

a25afa3

Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>

Update pre-commit hooks

6302849

Fix make_numba_random_fn RandomStateType check

6198d7a

Support updates and more input types in compare_numba_and_py

2799bad

Add support for shared inputs in numba_funcify_Scan

41eeeb6

change issubtype to isinstance on metaclass

4ef874c

brandonwillard mentioned this pull request Oct 11, 2022

Remove Type.filter_inplace #1250

Open

brandonwillard mentioned this pull request Oct 16, 2022

Remove Type.filter_inplace #1261

Closed

brandonwillard reviewed Oct 19, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify `Type` with Python's type system #1207

Unify `Type` with Python's type system #1207

markusschmaus commented Sep 23, 2022 •

edited

Loading

brandonwillard commented Sep 24, 2022

brandonwillard commented Sep 24, 2022 •

edited

Loading

markusschmaus commented Sep 24, 2022

markusschmaus commented Oct 3, 2022

brandonwillard commented Oct 5, 2022

markusschmaus commented Oct 6, 2022

markusschmaus commented Oct 10, 2022

markusschmaus commented Oct 12, 2022

brandonwillard commented Oct 14, 2022 •

edited

Loading

markusschmaus commented Oct 16, 2022

brandonwillard commented Oct 19, 2022 •

edited

Loading

brandonwillard Oct 19, 2022

brandonwillard left a comment •

edited

Loading

brandonwillard commented Oct 19, 2022 •

edited

Loading

Unify Type with Python's type system #1207

Are you sure you want to change the base?

Unify Type with Python's type system #1207

Conversation

markusschmaus commented Sep 23, 2022 • edited Loading

brandonwillard commented Sep 24, 2022

brandonwillard commented Sep 24, 2022 • edited Loading

markusschmaus commented Sep 24, 2022

markusschmaus commented Oct 3, 2022

brandonwillard commented Oct 5, 2022

markusschmaus commented Oct 6, 2022

markusschmaus commented Oct 10, 2022

markusschmaus commented Oct 12, 2022

brandonwillard commented Oct 14, 2022 • edited Loading

markusschmaus commented Oct 16, 2022

brandonwillard commented Oct 19, 2022 • edited Loading

brandonwillard Oct 19, 2022

Choose a reason for hiding this comment

brandonwillard left a comment • edited Loading

Choose a reason for hiding this comment

brandonwillard commented Oct 19, 2022 • edited Loading

Unify `Type` with Python's type system #1207

Unify `Type` with Python's type system #1207

markusschmaus commented Sep 23, 2022 •

edited

Loading

brandonwillard commented Sep 24, 2022 •

edited

Loading

brandonwillard commented Oct 14, 2022 •

edited

Loading

brandonwillard commented Oct 19, 2022 •

edited

Loading

brandonwillard left a comment •

edited

Loading

brandonwillard commented Oct 19, 2022 •

edited

Loading