Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for typing.Union #6206

Closed
wants to merge 9 commits into from
Closed

Conversation

matusvalo
Copy link
Contributor

This PR adds support of typing.Union. It is optimised to fused type.

Note

This PR does not support Union of None. Hence Union[int, None] will trigger an error.

Currently only as a draft. The test suite is passing but needs #6204 merged to master.

@bzoracler
Copy link

As I understand, this can be considered work towards some of the union semantics outlined in #4631.

I don't think Cython fused types should be expressed by Python unions. Constrained type variables would be a more natural fit. Consider the following example, which fails to compile in Cython:

ctypedef fused char_or_float:
    char
    double

cpdef char_or_float plus_one(char_or_float var):
    return 0.0
cpdef char_or_float plus_one(char_or_float var):
    return 0.0
           ^
error: Cannot assign type 'double' to 'char'

In Python, if char_or_float was a union type, this example is perfectly valid. A constrained type variable on the other hand would behave like Cython, raising type-checking errors with roughly equivalent semantics to Cython. See the following example (mypy Playground, Pyright playground):

import typing as t

char_or_float_union: t.TypeAlias = str | float
char_or_float_typevar = t.TypeVar("char_or_float_typevar", str, float)

def plus_one_union(var: char_or_float_union) -> char_or_float_union:
    return 0.0  # No errors

def plus_one_typevar(var: char_or_float_typevar) -> char_or_float_typevar:
    return 0.0  # mypy: Incompatible return value type (got "float", expected "str")  [return-value]

@matusvalo matusvalo marked this pull request as draft May 21, 2024 06:02
@matusvalo
Copy link
Contributor Author

matusvalo commented May 21, 2024

I don't think Cython fused types should be expressed by Python unions.

I see your point and I agree with your example. But I think this should be more optimisation technique than expression of type. I don't see any wrong to optimise Union[] type with fused type where it make sense. E.g. when function contains parameter with the Union[] type, cython can pick optimised version of function:

def expensive_test(x: Union[cython.int, cython.float]) -> cython.bint:
    # Optimised Implementation of time consuming test.

Of course we cannot optimise/translate Union[] type to fused of result type.

But maybe I am missing something that's why i created draft to get feedback.

@da-woods
Copy link
Contributor

Yeah it's a fair point and we don't want to break too much working code by using type-hints in an odd way. The only thing about fused return types is that they're ignored in a def function, so anyone hitting this is already partly into cdef/cpdef functions so they aren't breaking real pure Python code.

Just thinking about other places it can break:

  • Annotations on classes
  • Multiple arguments with the same fused type are assumed to all be the same type

Constrained type variables [...]

Reading the documentation, that looks like a better fit for what we're trying to write.

I'm personally pretty sceptical of the more advanced features of typing - the advantage of Union is that people will at least know about it. (But for me in my own code, Union is the absolute limit of how far I'm prepared to take annotation typing). Constrained type variables look like an obscure detail of a wider "generics" feature.

So I'm a little unsure of what's best to do here.

@matusvalo
Copy link
Contributor Author

I'm personally pretty sceptical of the more advanced features of typing

Me too. I think implementing more advanced typing features seems to be complex task. My personal motivation of this PR is to implement following optional type using | but I end up implementing optional type using Union[] which naturally lead to this PR. Later on, I would like to add support of None too.

So I'm a little unsure of what's best to do here.

I would optimise what does make sense and has reasonable complexity. Rest I would ignore as we ignore types now. Currently, I don't know what is exact scope of optimisation but hopefully we will figure out in time.

@bzoracler
Copy link

bzoracler commented May 21, 2024

def expensive_test(x: Union[cython.int, cython.float]) -> cython.bint:
    # Optimised Implementation of time consuming test.

If the purpose is to optimise inline Python unions into anonymous fused types behind-the-hood, then I agree that in this example a union could be fine under standard Python typing (but only if the body of the function [doesn't pass this argument to another function which uses a named fused type appearing 2 or more times in the function signature*]). More generally speaking, I believe that Python union types are compatible with a fused type in a function signature only if the union type is (1) declared as a type alias which only appears once in the function signature* or (2) declared inline as an argument annotation.

I initially understood this PR as a general way of declaring functions taking and returning fused types - if that's the case, the same subtleties for named fused types as explained in the Cython docs apply to constrained type variables, which would be a much better fit.

*Function signature here includes all parameter types and the return type.

@bzoracler
Copy link

So I'm a little unsure of what's best to do here.

What's Cython's general direction of Pure Python mode, with regards to annotations and utilisation of items from the typing module?

  1. Match Python static typing behaviour as much as possible, utilise existing constructs in Python to emit C or C++ code where appropriate, and encouraging users to employ existing Python tooling for static analysis.
  2. De-emphasise the importance of Python static typing behaviour; Cython may utilise existing Python typing where convenient, and may develop new idioms not recognised by Python typing. Cython will avoid any incompatible behaviours with idioms already existing in Python static typing.
  3. Cython defines its own interpretation of typing and type annotations, which may be incompatible with Python static typing behaviour.

@da-woods
Copy link
Contributor

It's somewhere between the three.

  • generally if we use existing typing constructs we try to match the existing defined behaviour.
  • the one notable exception is that we interpret builtin types like list as "exact list" because allowing derived types is pretty useless to us.
  • Not everything that you can do with Cython can be expressed with existing static typing constructs. In which case we invent something suitable of our own (trying to avoid conflicts)

@bzoracler
Copy link

It's somewhere between the three.
...

That sounds like it matches (2), which is fairly straightforward to work with. I was mainly afraid that it might be (3), which is a lot harder to work with in terms of 3rd party static analysis tools.

For future developments of Python unions in Cython, then, I would not expect Cython to compile this into C functions taking and returning each member of a fused type, because that goes against what a Python union type means:

import typing as t
import cython

char_or_float: t.TypeAlias = cython.char | cython.float

@cython.cfunc
def f(arg1: char_or_float, arg2: char_or_float) -> char_or_float:
    ...

if t.TYPE_CHECKING:
    # Standard Python type-checker behaviour
    reveal_type(f(0.0, 1.0))  # revealed type is "int | float"

As for this example, an implicit optimisation with a fused type is fine:

import typing as t
import cython

@cython.cfunc
def f(arg1: cython.char | cython.float, arg2: cython.char | cython.float) -> cython.char | cython.float:
    """
    # OK for Cython to see this as:
    import cython
    from typing import Any

    fused_type1 = cython.fused_type(cython.char, cython.float)
    fused_type2 = cython.fused_type(cython.char, cython.float)
    
    def f(arg1: fused_type_1, arg2: fused_type_2) -> Any: ...
    """

@da-woods
Copy link
Contributor

My current view is that the Union -> Fused idea doesn't really work and that we probably shouldn't do this. Sorry for now spotting this flaw when I made the issue for the idea.

@matusvalo
Copy link
Contributor Author

My current view is that the Union -> Fused idea doesn't really work and that we probably shouldn't do this

OK I will close this PR then. It was still just an experiment. Any any case, I would like to implement following optimisation though (since it is pretty commonly used):

  • Union[XYZ, None] -> Optional[XYZ]
  • XYZ | None -> Optional[XYZ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants