Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Implement "typing.Optional[]" and give it Python semantics ("not None" vs. "or None") #3883

Closed
scoder opened this issue Oct 18, 2020 · 33 comments

Comments

@scoder
Copy link
Contributor

scoder commented Oct 18, 2020

In Cython, def func(list arg) allows arg to be None. In Python's PEP-484 typing universe, def func(arg: list) excludes None and passing None requires the spelling def func(arg: Optional[list]).

Given that Optional[…] exists, we could change the type analysis of PEP-484 argument annotations to exclude None by default (i.e. interpret them as if not None was given), since this it is a common source of mistakes (see #2696). The use of Optional[] would then be interpreted as or None.

This would probably not apply to variable annotations. While PEP-526 doesn't seem to mention the topic anywhere explicitly, it seems hard to prevent variables from being set to None, and it has a runtime impact to prevent that case. Also, it does not seem very useful to actively prevent users from setting variables to None.

Also note that there is the issue of subtypes, which Python typing allows but Cython typing rejects. This difference would probably remain.

@scoder scoder added this to the 3.0 milestone Oct 18, 2020
@scoder scoder changed the title [ENH] Implement typing.Optional[] and give it Python semantics (not None vs. or None) [ENH] Implement "typing.Optional[]" and give it Python semantics ("not None" vs. "or None") Oct 18, 2020
@da-woods
Copy link
Contributor

Some of the problem here is identifying typing.Optional. It's fairly easy when you have an annotation of typing.Optional but as far as I can tell there isn't an easy mechanism to do it for from typing import Optional or import typing as not_typing; somevar: not_typing.Optional[int].

I'm trying to get this to work for #3400. If I get something satisfactory I'll try to make a PR for it on its own, because it'd probably be useful here too.

@0dminnimda
Copy link
Contributor

from typing import Optional or import typing as not_typing; somevar: not_typing.Optional[int].

@da-woods Should we allow this at all? Because it seems that the only real cases are

import typing
var: typing.Optional[...]
from typing import Optional
var: Optional[...]

And we can still just watch the var: x.Optional[...] and give warnings that "this is not something that we will take it as typing.Optional, could you mean typing.Optional?"

@da-woods
Copy link
Contributor

@da-woods Should we allow this at all?

I don't see why not. You have to track it anyway because users could name their own global variables typing or Optional and these shouldn't affect the type. Anyway, the code to allow it already exists and already works so why not allow it?

@0dminnimda
Copy link
Contributor

Anyway, the code to allow it already exists and already works so why not allow it?

I don't mind, I just suggested an easier way.

@matusvalo
Copy link
Contributor

@scoder this should be closed based on PR #3400 right?

@da-woods
Copy link
Contributor

@matusvalo yes I think it should

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

Is it complete, though? I also suggested to change the default semantics of func(x: str) to disallow None for x and require users to write func(x: Optional[str]). While I rejected #2696 as such, I'm not sure that this equally applies to type annotations. Providing them with full Python semantics, given that new code has to be written or old code modified in order to use them, seems tempting.

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

Ah, right. It seems that this was also implemented:

# inject type declaration from annotations
# this is called without 'env' by AdjustDefByDirectives transform before declaration analysis
if (self.annotation and env and env.directives['annotation_typing']
# CSimpleBaseTypeNode has a name attribute; CAnalysedBaseTypeNode
# (and maybe other options) doesn't
and getattr(self.base_type, "name", None) is None):
arg_type = self.inject_type_from_annotations(env)
if arg_type is not None:
base_type = arg_type
return self.declarator.analyse(base_type, env, nonempty=nonempty)
def inject_type_from_annotations(self, env):
annotation = self.annotation
if not annotation:
return None
base_type, arg_type = annotation.analyse_type_annotation(env, assigned_value=self.default)
if base_type is not None:
self.base_type = base_type
if arg_type and arg_type.python_type_constructor_name == "typing.Optional":
self.or_none = True
arg_type = arg_type.resolve()
if arg_type and arg_type.is_pyobject and not self.or_none:
self.not_none = True
return arg_type

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

One issue that I found is that (as before) this only applies to Python calls. cdef functions look the same but neither the function nor the caller check for None when it's disallowed. This is because previously, not None was not allowed on cdef function arguments. Now it's intended to be the default.

Should we make this fail on the caller side?

cdef func(x: list): ...

func(None)

It feels a bit mixed. Here, the developer is usually in charge of the calling code. User provided None values should be ruled out at this point. So there doesn't seem to be a strict requirement to change this into a runtime error.

What do you think?

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

Oh, and we probably also need some error tests for things like func(x: Optional[SomeStruct]) or other C types, right? That shouldn't be allowed.

@scoder scoder reopened this Jan 29, 2022
@da-woods
Copy link
Contributor

I'm not completely sure what the right thing to do is, but it should definitely be tested and documented!

Having different rules for cdef functions feels a bit obscure to me.

Optional[ctype] probably should be an error (as you say). For structs it obviously can't work. And I think people would find Optional[double] being interpretted as a PyObject to be confusing

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

One more tricky thing: func(x: Optional[double[:]]). This, on the other hand, should probably be allowed. Memory views definitely suffer from the same "didn't expect that to be None" gotcha.

scoder added a commit to scoder/cython that referenced this issue Jan 29, 2022
@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

I have implemented Optional / or None semantics for func(x: list = None) in 13f8cd7.

@scoder
Copy link
Contributor Author

scoder commented Jan 29, 2022

I'm not completely sure what the right thing to do is, but it should definitely be tested and documented!

Yes.

Having different rules for cdef functions feels a bit obscure to me.

Sort-of, yes. I mean, cdef functions already have different semantics, but not when it comes to argument type behaviour.

Optional[ctype] probably should be an error (as you say).

Turns out, it already is, because or None is disallowed for non-Python types. I'll just improve the error message there.

For structs it obviously can't work. And I think people would find Optional[double] being interpreted as a PyObject to be confusing

It looks like Optional[float] and Optional[int] are rejected now, though. That shouldn't be the case. They should be resolved to the Python types instead of the C types, as should generally be the case for Python type annotations.

scoder added a commit that referenced this issue Jan 29, 2022
…d complaining about "or None", which wasn't the actual syntax construct on user side.

See #3883
@scoder
Copy link
Contributor Author

scoder commented Jan 30, 2022

It looks like Optional[float] and Optional[int] are rejected now, though.

Turns out that this is difficult to handle. The type analysis converts explicit type references like cython.int to c_int_type, and it does the same for int, because it lacks the context whether C types are expected here. Thus, Optional[int] (which is ok and 'obviously' refers to a Python int) and Optional[cython.int] (which should be rejected) look really the same and cannot easily be distinguished. I think we might have to reject C types with Optional[] at an earlier stage, so that the type declaration stages can rely on valid type declarations already (or at least don't have to second-guess what the type declaration looked like in actual code).

scoder added a commit that referenced this issue Jan 30, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See #3883
@scoder
Copy link
Contributor Author

scoder commented Jan 30, 2022

I looked into the test failures that I introduced in the master branch (sorry for that) and one of the issues is this test:

x: typing.Tuple[int, float] = (a[0], a[1])
y: Tuple[int, ...] = (1,2.)
z = a[0] # should infer to int

As you can see, it assumes that x: Tuple[int, float] refers to a tuple of C int, C float. This is a wrong assumption. It sees a Python type declaration and should infer (Python int, Python float) instead (or C double, for that matter). Otherwise, it would incorrectly constrain the value range of the first tuple item, if this comes in from Python code.

I've reverted my changes in master and will see how I can fix this up in a separate branch.

@da-woods
Copy link
Contributor

It was definitely my intent that typing.Tuple should be able to produce C-tuples (and I think this is a reasonable thing to do).

I think here it shouldn't though, because it should require cython.int and cython.float? Sorry about that... I probably should have spotted this.

@da-woods
Copy link
Contributor

So my view: AnnotationNode.analyse_type_annotation seems to have a specific workaround for int/long/float which applies directly to the annotation.

if annotation.is_name and not annotation.cython_attribute and annotation.name in ('int', 'long', 'float'):

These subscripted types can end up with fairly deeply nested types, which we want to detect these in. That suggests we'd be better passing an safe_annotation_typing flag to analyse_as_type and then using that flag to decide what types to keep, rather than doing it all in the method of AnnotationNode?

@scoder
Copy link
Contributor Author

scoder commented Jan 30, 2022

Yes, nesting is an issue, but also e.g. cdef Set[int] x vs. x: Set[int]. In the first case, int is a C int, in the second, it needs to be a Python int.

scoder added a commit to scoder/cython that referenced this issue Jan 31, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See cython#3883
scoder added a commit to scoder/cython that referenced this issue Jan 31, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See cython#3883
@jakirkham
Copy link
Contributor

Yes, nesting is an issue, but also e.g. cdef Set[int] x vs. x: Set[int]. In the first case, int is a C int, in the second, it needs to be a Python int.

How should these be handled in a .pyx/.pxd vs. a .py file?

@scoder
Copy link
Contributor Author

scoder commented Feb 2, 2022

Yes, nesting is an issue, but also e.g. cdef Set[int] x vs. x: Set[int]. In the first case, int is a C int, in the second, it needs to be a Python int.

How should these be handled in a .pyx/.pxd vs. a .py file?

I also had to stop and think about that for a moment, but the file type should make no difference here.

The reason for the different meanings is that cdef Set[int] x should work like cdef int x, where int has always referred to a C int (because there is no reason to refer to a Python int here). If you use Python typing syntax, however, it needs to mimic Python semantics, so x: Set[int] and x: int must both refer to a Python int, simply because we might really be looking at code here that was written for Python and not for Cython. Whether that code appears in a .py file or in a (potentially renamed) .pyx file (or even a .pxd file) is irrelevant.

@da-woods
Copy link
Contributor

da-woods commented Feb 2, 2022

Set[int] presumably refers to set in both cases (because we can't do much with the container contents). I think Tuple is the only container where we try to make use of the element types (with ctuples).

@scoder
Copy link
Contributor Author

scoder commented Feb 2, 2022

Set[int] presumably refers to set in both cases (because we can't do much with the container contents). I think Tuple is the only container where we try to make use of the element types (with ctuples).

Correct as it stands, but it doesn't need to stay that way. We could certainly start using Set[int] to infer the type of iteration variables and other kinds of item access. (#1979)

@scoder
Copy link
Contributor Author

scoder commented Feb 2, 2022

In fact, we could consider the PythonTypeConstructor type a feature specifically of type inference, and only resolve types before setting node.type. That way, we could use the full set of available type information in the inference (and possibly optimisation) phases. Even after type analysis, whenever we can make use of nested type information, we can let nodes infer their type again instead of just looking at node.type.

@jakirkham
Copy link
Contributor

Set[int] presumably refers to set in both cases (because we can't do much with the container contents). I think Tuple is the only container where we try to make use of the element types (with ctuples).

Correct as it stands, but it doesn't need to stay that way. We could certainly start using Set[int] to infer the type of iteration variables and other kinds of item access. (#1979)

Just as another thought, this could be a place where C++ developers might want to use std::set. Whether/how that would be setup is admittedly a whole other discussion, but just want to surface that use case.

To the original question about whether file type should affect type interpretation, this could be posed with Tuple instead of Set. It wasn't really meant to be focused on Set specifically.

@da-woods
Copy link
Contributor

da-woods commented Feb 2, 2022

I'd be reluctant to start substituting C++ containers for Python types. There's two set implementations for C++ (set and unordered_set), and they have different internal designs and different performance characteristics, and different requirements for what they hold. I believe the Python set is probably closer to unordered_set. It also creates a copy (which often wouldn't be wanted).

I think I agree that file-type shouldn't affect interpretation. Mainly because I think it'd be a complicated rule that'd be hard to explain and end up confusing people (so it'd need a very good justification why it's needed)

@jakirkham
Copy link
Contributor

Sorry didn't mean to send us down a rabbit hole with C++. Just thinking of other cases where Set[int] and others might be useful. Though would add there are a lot of ways this could be implemented to handle these differences, but maybe that can be saved for another discussion.

scoder added a commit to scoder/cython that referenced this issue Feb 4, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See cython#3883
@wyfo
Copy link

wyfo commented Feb 10, 2022

I have implemented Optional / or None semantics for func(x: list = None) in 13f8cd7.

I think it could be a good idea to also add PEP 604 support, i.e. var: dict | None

@scoder
Copy link
Contributor Author

scoder commented Feb 10, 2022

I think it could be a good idea to also add PEP 604 support, i.e. var: dict | None

Right. But writing up #4631, I can see that that's not a small topic.

@wyfo
Copy link

wyfo commented Feb 10, 2022

By the way, I see in 13f8cd7 that you've supported implicit optionals (non-optional but with None default value).
However, this behavior is now deprecated, as stated in PEP 484:

This is no longer the recommended behavior. Type checkers should move towards requiring the optional type to be made explicit.

I think it would be better thus to not support it, but I'm maybe too strict.

scoder added a commit that referenced this issue Feb 10, 2022
scoder added a commit to scoder/cython that referenced this issue Feb 10, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See cython#3883
@scoder
Copy link
Contributor Author

scoder commented Feb 10, 2022

By the way, I see in 13f8cd7 that you've supported implicit optionals (non-optional but with None default value). However, this behavior is now deprecated
I think it would be better thus to not support it, but I'm maybe too strict.

Thanks for the notice. I'm sure there's a considerable body of code out there by now that makes use of this 'obvious' simplification, so a warning seems better than a syntax error.
b1b23de

scoder added a commit to scoder/cython that referenced this issue Feb 24, 2022
… "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See cython#3883
scoder added a commit that referenced this issue Jul 11, 2022
* Check for "Optional[ctype]" earlier because we need to make sure that "Optional[int]" etc. interprets "int" as (valid) Python int type and not (invalid) C int type.

See #3883

* Fix typing assumptions in PEP 526 variable annotations test: in a Python type annotation, "int" means Python int and "float" means Python float, not the C types.

* Use a context manager to make it explicit in annotation type analysis when C types are allowed, and when Python types are required or expected.

* Generalise the concept of equivalent Python and C types for more efficient type inference: PyFloat/double, PyBool/bint, PyComplex/double complex.

* Refactor analyse_type_annotation() to prepare the extraction of type modifiers (as opposed to special types).

See discussion in #4606 (comment)

* Refactor handling of "typing.Optional", "dataclasses.InitVar" etc. annotations to move them into the declared Entry during type analysis and keep only the bare type in the type system.

* Force ClassVar[...] types to be object types.

* Add a warning when users define a ClassVar[] with a non-Python type.

See #4606 (comment)

* Provide a helpful warning when users write plain C types in a non-C annotation context.

* Only consider Python object item types from list/tuple as self.type in IndexNode since that will be the result of the index access. Coercion needs to happen externally, then based on the type inference.

* Ignore Python annotation type "long" since it almost certainly does not refer to PyLong but to C long. Issue a warning to make users aware of it.

* Fix PEP-526 test by working around incomplete type inference, but leave FIXME comments.
@da-woods
Copy link
Contributor

This issue is largely done I think?

@scoder
Copy link
Contributor Author

scoder commented Jul 31, 2022

I think so, too. Let's close it.

@scoder scoder closed this as completed Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants