Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(@cython.cfunc + Type Annotation) Fragility #2529

Closed
jbrockmendel opened this issue Aug 3, 2018 · 16 comments · Fixed by #4433
Closed

(@cython.cfunc + Type Annotation) Fragility #2529

jbrockmendel opened this issue Aug 3, 2018 · 16 comments · Fixed by #4433

Comments

@jbrockmendel
Copy link
Contributor

Going through some of pandas' cython modules to try to make code valid-python where possible, I'm seeing cythonize-time errors. It seems like some combinations are invalid for surprising reasons, e.g:

cpdef object get_rule_month(object source, object default='DEC')
# existing, works as expected

cpdef object get_rule_month(source: object, default: object='DEC'):
# seems OK

cpdef get_rule_month(source: object, default: object='DEC') -> object:
# breaks --> Syntax error in C variable declaration

@cython.ccall
def get_rule_month(source: object, default: object='DEC') -> object:
# breaks --> Exception clause not allowed for function returning Python object

@cython.ccall
def object get_rule_month(source: object, default: object='DEC'):
# breaks --> Expected '(', found 'get_rule_month'

@cython.ccall
@cython.returns(object)
def get_rule_month(source: object, default: object='DEC'):
# appears OK

Another example with a slightly different set of errors:

cdef inline bint is_null_datetime64(v):
# existing, OK

cdef inline is_null_datetime64(v) -> bint:
# breaks --> Syntax error in C variable declaration

@cython.cfunc
def inline is_null_datetime64(v) -> bint:
# breaks --> Expected '(', found 'is_null_datetime64

@cython.cfunc
@cython.inline
def is_null_datetime64(v) -> bint:
# breaks --> Function signature does not match previous declaration

This last one here is the one that is noticeably different from the first example.

Is this behavior intentional?

@scoder
Copy link
Contributor

scoder commented Aug 3, 2018

Thanks for that list (and for trying it in the first place). It means we clearly lack tests here, and it's good to have real-world feedback about where people encounter real-world problems.

Regarding the "obvious" cases:

def inline func()

This can never work, because def requires Python syntax. Similarly, cdef/cpdef syntax cannot be combined with -> return type annotations, which are a Python def function feature. While cpdef + Python syntax is a debatable edge case, I'd suggest that providing the return type like this should stay illegal for now. Either you use Python syntax (def ... -> type) or Cython syntax (cpdef type ...), not both. There are probably ways to give better error messages here (I created #2530 for that).

There is an @cython.inline decorator, so the (currently) only correct way to use inline in Python code is

@cython.cfunc
@cython.inline
def is_null_datetime64(v) -> bint:

There is a (similar) test for this in our test suite, so I don't know right now why this would fail. Probably something trivial.

@cython.cfunc
@cython.inline
def cdef_inline(x) -> cython.double:

Similarly, cpdef should be expressed like this:

@cython.ccall
def get_rule_month(source: object, default: object='DEC') -> object:

which also fails for you, as you said. I remember noticing this kind of error, too, and fixing something in that corner recently. Could you try the latest master on that?

@jbrockmendel
Copy link
Contributor Author

There is a (similar) test for this in our test suite, so I don't know right now why this would fail. Probably something trivial.

Possibly relevant: there is an accompanying .pxd file that was unchanged throughout all these iterations with "cdef bint is_null_datetime64(v)`

I remember noticing this kind of error, too, and fixing something in that corner recently. Could you try the latest master on that?

Will do. I'll also add to the list permutations involving nogil and except? -1

Similarly, cdef/cpdef syntax cannot be combined with -> return type annotations, which are a Python def function feature.

Good to know, thanks. My intuition was that the parser would translate each of the supported forms to a canonical representation, allowing users to mix and match (even though doing so would be really weird in some cases). Not the first time intuition has led me astray. (Can you point me towards the part of the code responsible for this? If it doesn't require learning the entire code-base, I'd be happy to help with fleshing out the tests)

While cpdef + Python syntax is a debatable edge case, I'd suggest that providing the return type like this should stay illegal for now.

A use case I can imagine where this would be really convenient is linting (a large part of the motivation for trying this syntax in the first place). With cp?def + Python syntax, we can probably cobble together something to pass re.sub('^cp?def ', 'def', source).replace('cimport', 'import') and be most of the way towards having something we can pass to flake8.

@scoder
Copy link
Contributor

scoder commented Aug 3, 2018

there is an accompanying .pxd file that was unchanged throughout all these iterations with "cdef bint is_null_datetime64(v)`

Ah, yes, I'd be surprised if that didn't make a difference. But if you have a .pxd file with the types in it, why do you still need to define them in the actual source file?

part of the code responsible for this

There are different aspects to this. One is the AdjustDefByDirectives transform, which adapts Python declarations to overrides provided in an external .pxd file, in decorators, or in annotations. Other tasks are done directly by certain syntax tree nodes, e.g. declaring a variable on an expression like x: int. Look for annotation_typing in the Cython.Compiler package.

Tests for this ended up in the pure Python mode tests (pure_*.py), in cython3.pyx, in pep526_variable_annotations.py, and in annotation_typing.pyx. The pure mode tests focus on anything cythonic that can run in Python, and the last one focusses on static typing with PEP 484. .py files are also tested in Python.

re.sub('^cp?def ', 'def', source).replace('cimport', 'import')

For the first part, use def with decorators.

For the second, there isn't currently a way to use cimport in Python syntax. The only use case I can see is sharing declarations, and for that, it's probably best to require .pxd files – in which case you can get pretty far with having your implementation in .py files and override the Python declarations with C data types/functions/classes in the corresponding .pxd file.

@jbrockmendel
Copy link
Contributor Author

jbrockmendel commented Aug 4, 2018

But if you have a .pxd file with the types in it, why do you still need to define them in the actual source file?

Not sure how this status quo developed; presumably easier for contributors/reviewers if they match.

re.sub('^cp?def ', 'def', source).replace('cimport', 'import')

For the first part, use def with decorators.

For the second, there isn't currently a way to use cimport in Python syntax.

This runs into a "if it were up to me..." issue where its hard to predict what maintainers will or won't have a strong opinion about. Last time I tried to move to the decorator syntax the idea was shot down. It's totally reasonable if you want to declare that Not Cython's Problem.

As for the source.replace("cimport", "import"), that's just a hack to trick a linter into think its valid python. Feel free to ignore.

I'll install from master and update results in a bit.

@jbrockmendel
Copy link
Contributor Author

Installed from master, no apparent change:

cpdef get_rule_month(source: object, default: object='DEC') -> object:
# still breaks --> Syntax error in C variable declaration

@cython.ccall
def get_rule_month(source: object, default: object='DEC') -> object:
# still breaks --> Exception clause not allowed for function returning Python object

cdef inline is_null_datetime64(v) -> bint:
# still breaks --> Syntax error in C variable declaration

@cython.cfunc
def inline is_null_datetime64(v) -> bint:
# still breaks --> Expected '(', found 'is_null_datetime64

@cython.cfunc
@cython.inline
def is_null_datetime64(v) -> bint:
# still breaks --> Function signature does not match previous declaration

Some more cases, these without an accompanying pxd file:

@cython.cfunc
@cython.inline
def median_linear(a: float64_t*, n: int) -> float64_t nogil:
# breaks --> Expected an identifier or literal

@cython.cfunc
@cython.inline
@cython.nogil
def median_linear(a: float64_t*, n: int) -> float64_t:
# breaks --> Expected an identifier or literal

@cython.cfunc
@cython.inline
@cython.nogil
def median_linear(a: cython.pointer(float64_t), n: int) -> float64_t:
# breaks --> Invalid directive: 'nogil'.  (+1 on the error message)

So it looks like there exists no valid way to use type annotations on this function.

A weird thing a user might do is mix/match (here using type annotation only for the labels kwarg:

@cython.boundscheck(False)
@cython.wraparound(False)
def group_median_float64(ndarray[float64_t, ndim=2] out,
                         ndarray[int64_t] counts,
                         ndarray[float64_t, ndim=2] values,
                         labels: ndarray[int64_t],
                         Py_ssize_t min_count=-1):
# breaks --> 'int64_t' is not a constant, variable or function identifier

Of course the ndim=2 bit isn't valid python (we're in the process of converting these to foo[:, :]), so trying to put that in the new format:

@cython.boundscheck(False)
@cython.wraparound(False)
def group_median_float64(out ndarray[float64_t, ndim=2],
                         counts: ndarray[int64_t],
                         values: ndarray[float64_t, ndim=2],
                         labels: ndarray[int64_t],
                         min_count: Py_ssize_t=-1):
# breaks --> Expected ']', found '='

But even with the memoryview syntax it chokes:

@cython.boundscheck(False)
@cython.wraparound(False)
def group_cumsum(out: numeric[:, :],
                 values: numeric[:, :],
                 labels: int64_t[:],
                 is_datetimelike,
                 skipna: bint=True):
# breaks --> Compiler crash in AnalyseDeclarationsTransform

Now one with an except? -1 clause (some of these I now know will break, but listing them for thoroughness)

cdef inline get_datetime64_nanos(val: object) -> int64_t except? -1:
# Syntax error in C variable declaration

@cython.cfunc
@cython.inline
@cython.exceptval(-1)
def get_datetime64_nanos(val: object) -> int64_t:
# OK!  Even with an accompanying pxd file

cdef inline int64_t get_datetime64_nanos(val: object) except? -1:
# OK

@cython.inline
cdef int64_t get_datetime64_nanos(val: object) except? -1:
# OK

@cython.exceptval(-1)
cdef inline int64_t get_datetime64_nanos(val: object):
# breaks --> Function signature does not match previous declaration

@cython.exceptval(-1)
@cython.inline
cdef int64_t get_datetime64_nanos(val: object):
# breaks --> Function signature does not match previous declaration

@cython.inline
@cython.cfunc
def int64_t get_datetime64_nanos(val: object) except? -1:
# breaks --> Expected '(', found 'get_datetime64_nanos'

@cython.inline
@cython.cfunc
@cython.exceptval(-1)
def int64_t get_datetime64_nanos(val: object):
# breaks --> Expected '(', found 'get_datetime64_nanos'

@scoder
Copy link
Contributor

scoder commented Aug 15, 2018

Thanks for testing these.

cpdef get_rule_month(source: object, default: object='DEC') -> object:
# still breaks --> Syntax error in C variable declaration

@cython.cfunc
def inline is_null_datetime64(v) -> bint:
# still breaks --> Expected '(', found 'is_null_datetime64

I improved the error message for these.

@cython.cfunc
@cython.inline
def is_null_datetime64(v) -> bint:
# still breaks --> Function signature does not match previous declaration

This is a bug and should work.

@cython.cfunc
def inline is_null_datetime64(v) -> bint:
# still breaks --> Expected '(', found 'is_null_datetime64

@cython.cfunc
@cython.inline
def median_linear(a: float64_t*, n: int) -> float64_t nogil:
# breaks --> Expected an identifier or literal

These are not valid Python syntax (note that you say def) and are correctly rejected.
Using pointer types in Python syntax requires a typedef. Improving that error is difficult, because it would be perfectly ok to write a: int * 5, but a: int* is only … half of an expression.

@cython.cfunc
@cython.inline
@cython.nogil
def median_linear(a: cython.pointer(float64_t), n: int) -> float64_t:
# breaks --> Invalid directive: 'nogil'.  (+1 on the error message)

This is a missing feature. I created #2557 for it.

@cython.boundscheck(False)
@cython.wraparound(False)
def group_cumsum(out: numeric[:, :],
                 values: numeric[:, :],
                 labels: int64_t[:],
                 is_datetimelike,
                 skipna: bint=True):
# breaks --> Compiler crash in AnalyseDeclarationsTransform

This is a bug and should work. (fixed in 7bebbdd)

@cython.inline
@cython.cfunc
def int64_t get_datetime64_nanos(val: object) except? -1:
# breaks --> Expected '(', found 'get_datetime64_nanos'

I improved the error message for this one.

scoder added a commit that referenced this issue Aug 15, 2018
scoder added a commit that referenced this issue Aug 15, 2018
scoder added a commit that referenced this issue Aug 15, 2018
@scoder
Copy link
Contributor

scoder commented Aug 15, 2018

I implemented support for memoryviews in type annotations.

scoder added a commit that referenced this issue Aug 15, 2018
@jbrockmendel
Copy link
Contributor Author

Thanks, I'll give this a try on master and report back.

@jbrockmendel
Copy link
Contributor Author

Trying out 0.29

@cython.cfunc
@cython.inline
def is_null_datetimelike(val : object) -> bint:
# breaks --> pandas/_libs/tslibs/nattype.pyx:627:0: Function signature does not match previous declaration

Tried changing this in the accompanying pxd to match

@cython.cfunc
def is_null_datetimelike(val : object) -> bint
# breaks --> pandas/_libs/tslibs/nattype.pxd:14:0: def statement not allowed here

@Yobmod
Copy link

Yobmod commented Oct 18, 2018

I'm having the similar issues on cython V0.29 with return types in the .pyx file. Any python object return type breaks compilation in python compatible mode. Cython types work for returns (int, float, cython.int, cython.void, etc), np.ndarray and python builtins don't.

@cython.ccall
def get_object() -> object:  ...
# -> Exception clause not allowed for function returning Python object
@cython.cfunc
def get_str() -> str:  ...
# -> Exception clause not allowed for function returning Python object

Also returning None gives a different error:

@cython.ccall
def do_something_without_return() -> None:  ...
# -> Not a type

which makes it not compatible with typed python even with just builtin types. If returning None is redundant, can cython just ignore it? or alias it to a cython void / NULL? (Should this be a separate issue?)

@black-square
Copy link
Contributor

@Yobmod It looks like that there is a workaround for -> object case: If we add a decorator @cython.exceptval(check=False) it works fine. The generated code still checks the return value for exceptions.

@JesseRMeyer
Copy link

@Yobmod I'm encountering this too.

@scoder
Copy link
Contributor

scoder commented Jul 13, 2021

Marking this as a blocker for 3.0 since we're aiming at making the pure Python mode generally usable in that version.

@0dminnimda
Copy link
Contributor

0dminnimda commented Oct 9, 2021

It's mention in the docs (or rather inspired by to_unicode.pyx), so:

@cython.cfunc
def foo(s) -> unicode:
    return unicode(s)

doesn't work, while this does:

cdef unicode foo(s):
    return unicode(s)

Also

def foo(s) -> unicode:
    return unicode(s)

compiles fine, and to the same code as pyx.

The same goes with

@cython.cfunc
def foo(s) -> str:
    return str(s)

All of that shouldn't be a new info, but rather additional use/test cases
This also should serve as a reminder to come back to the documentation and correct everything when the issue will be fixed.

@matusvalo
Copy link
Contributor

OK all cases raising exception: Exception clause not allowed for function returning Python object are caused by annotating return type as python object:

@cython.ccall
def get_rule_month(source: object, default: object='DEC') -> object:

@cython.cfunc
def foo(s) -> str:

@cython.cfunc
def foo(s) -> unicode:

etc...

Exception is raised here:

if (return_type.is_pyobject
and (self.exception_value or self.exception_check)
and self.exception_check != '+'):
error(self.pos, "Exception clause not allowed for function returning Python object")

because self.exception_check is set to True.

The root cause of it is following (self.exception_check is set to True based on second item in tuple):

# for Python annotations, prefer safe exception handling by default
if return_type_node is not None and except_val is None:
except_val = (None, True) # except *

And basically it boils down to principle:

  1. cython language defaults to no exception checking and hence is not affected by this bug
  2. pure python defautls to exception checking and hence by default it causes this bug.

I have pretty simple solution for this issue, PR will be created.

scoder pushed a commit that referenced this issue Dec 14, 2021
…Object. (GH-4433)

With Cython syntax, we ignore the exception_check when a Python object is returned.
In pure Python mode, with a Python object type as return type, we reject it.
Instead of raising an error, we now just reset exception_check to False.

Found in #2529
@scoder
Copy link
Contributor

scoder commented Dec 14, 2021

This got auto-closed by #4433. Could someone please verify that we've resolved all of the issues? Or are there still open ones?

matusvalo added a commit to matusvalo/cython that referenced this issue Dec 15, 2021
scoder pushed a commit that referenced this issue Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants