New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] UB from __Pyx_pretend_to_initialize not initializing #5278
Comments
Thanks. We'll need to look into this properly I think. I'm a bit wary of the |
I'm trying to find a temporary workaround to unblock the upgrade of LLVM, but so far no success. Is there any way we can modify the python source code to make cython generate initialization of |
So I'm a little puzzled by this.
In terms of workarounds I'm not completely sure, since
|
We're using C++ templates:
I'm not sure about C, but in C++ returning an uninitialized variable from a function is undefined behavior, regardless of whether the return value of the function is used or not. Compilers can utilize UB for optimization purposes in quite surprising ways (https://en.cppreference.com/w/cpp/language/ub). As for the suggested workarounds, I'd be glad to try them, but I have problems expressing them within cython syntax. Ideally, I would like some way to tell cython to insert a snippet of C++to the output verbatim. Is there such an option? |
I didn't actually know you could do that with templates. Thanks for the clarification.
There isn't really, but option 2 is fairly close to that |
https://docs.cython.org/en/latest/src/userguide/external_C_code.html#including-verbatim-c-code |
Thanks for the advice! I came up with the following two options for a (hopefully) short-term workaround:
|
I would certainly not use the asm version. Compilers have very limited possibility to optimize around asm blocks, so this can seriously affect compiler optimizations. |
Yes I am not really sure about this either, since in our specific case we actually do take the address: __Pyx_pretend_to_initialize(&__pyx_r);
return __pyx_r; One thing in there is that it could still be undefined behavior if the system supports trap representations, so maybe that's why it is undefined? In C we can always just memset the value. In C++ since C++11 we can memset if |
I think in C taking the address is sufficient to avoid this. Although that's based on a fairly superficial understanding of the standard. I think in C++ we probably just need to initialize |
Yes I was certainly making this more complicated in my head, than it actually was. Simply always initializing the return value should absolutely be enough |
Please validate that this also works in generator/coroutine functions, where we use goto->yield_label at the beginning. |
Do you have an example, where you are concerned? At least the generators in my code base always return PyObjects, which are already set to |
Changing
|
rather than relying on __Pyx_pretend_to_initialize which isn't sufficient to avoid undefined behaviour there. Fixes cython#5278
In this case we actually have to initialize rather than just do nothing. Fixes cython#5278. Supercedes cython#5296 (I think this is better since it limits the amount of work Cython itself has to do)
In this case we actually have to initialize rather than just do nothing. Fixes cython#5278. Supercedes cython#5296 (I think this is better since it limits the amount of work Cython itself has to do)
In this case we actually have to initialize rather than just do nothing. Fixes #5278. * Move code to utility code file
Is this something we should backport to 3.0.9? |
Probably. |
In this case we actually have to initialize rather than just do nothing. Fixes cython#5278. * Move code to utility code file
Describe the bug
Apologies in advance: I don't use cython, but I'm debugging a bug caused by using cython with a fresh version of clang. I'll try to get a better repro soon, but for now I'm just filing this in case the cause/fix is obvious. My repro is representative of something internal, but I haven't reduced it to something standalone yet.
With a very new LLVM commit, we're seeing failures in Cython code. The issue seems to be caused by UB in Cython when returning uninitialized data.
Given some Cython code like this:
The generated code for will look something like this:
The UB is that
__Pyx_pretend_to_initialize
doesn't actually do any initialization, and sodecode
will return the uninitialized__pyx_r
value. It's enough to silence static compiler warnings, but the optimizer will see that it doesn't do anything to the variable, and mis-optimize the UB accordingly.There are two possible changes to
__Pyx_pretend_to_initialize
that would make it satisfy this requirement:Code to reproduce the behaviour:
Expected behaviour
When passed invalid data, it should raise an error. Instead, it crashes.
Environment
OS: Linux
Python version: 3.10
Cython version: 0.29.32 (but I still see the bug in trunk)
LLVM/Clang version: something very recent; newer than this: llvm/llvm-project@b6a0be8
Additional context
No response
The text was updated successfully, but these errors were encountered: