You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As some of you might know, Stackless Python is an old and venerable fork of regular C-Python. Stackless Python enhances C-Python and modifies the core of the interpreter in a way, that can't be done as an extension module. But otherwise Stackless Python is fully API and ABI compatible with the corresponding C-Python version, for instance you can replace C-Python 3.6.4 with Stackless Python 3.6.4 and everything should continue to work. This works well, except for Cython modules, because Cython uses undocumented implementation details of C-Python. Additionally there is Stackless bug 166, which is already fixed for Stackless 2.7 and will soon be fixed for 3.x.
Currently, I know of the following problems:
PyMethodDef ml_flags
In Cython/Utility/ModuleSetupCode.c and Cython/Utility/ObjectHandling.c Cython uses METH_xxx-flags but ignores METH_STACKLESS. Probably this affects Stackless 3.6 and 3.7 and is is trivial to fix. I'll created pull request #2554.
Incompatible PyFrameObject
Cython uses fields of PyFameObject, which are documented to be subject to change at any time. Unfortunately Stackless has a different PyFrameObject. Let's compare them:
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyFrame_ExecFunc *f_execute;/* support for soft stackless */
... other fields, identical to C-Python
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyCodeObject *f_code; /* code segment */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;
As you can see, Stackless has an additional field f_execute. And the following fields have a different offset: f_code and f_localsplus. This makes __Pyx_PyFunction_FastCallNoKw() ABI incompatible between C-Python and Stackless, because this function uses f_localsplus. This is fatal, because it breaks compatibility to binary wheels, which are almost always compiled using C-Python. How can we resolve this incompatibility? Currently I see three options:
Change Cython to use only documented API. Not realistic for performance reasons.
Change Cython to determinate the offset of f_localsplus at runtime. Probably the most clean solution. It is possible to compute the offset of f_localsplus from the size of a frame (Py_SIZE(f)) and the information in the code object. Of course the offset is constant at run time and may be cached.
Change Stackless to detect and "repair" a corrupted PyFrameObject. This would require a change to the layout of the Stackless PyFrameObject first, because currently Stackless f_code gets overwritten, when __Pyx_PyFunction_FastCallNoKw() writes f_localsplus[0]. Once f_code is preserved, Stackless can shift the f_localsplus array to the correct offset. It could work, but it would be an ugly hack, that relies on Cython implementation details. The advantage is, that it restores the compatibility with already existing Cython based extension modules.
Perhaps there are better options.
The text was updated successfully, but these errors were encountered:
Users can disable the usage of f_localsplus by setting the macro CYTHON_FAST_PYCALL=0 at C compile time, and Cython could also do that automatically when compiling against Stackless. (Ok, doesn't solve the problem of exchanging extensions between CPython and Stackless.)
I'd vote for 2). A static C variable in the call helper function to cache the offset should be enough.
I should also note that most of the fast-call code in Cython is essentially a copy from CPython. There is PEP 580 to generalise the fast-call protocol (and some other things) in Py3.8, assuming that it finally gets some traction (endorsing it on python-dev is welcome).
As some of you might know, Stackless Python is an old and venerable fork of regular C-Python. Stackless Python enhances C-Python and modifies the core of the interpreter in a way, that can't be done as an extension module. But otherwise Stackless Python is fully API and ABI compatible with the corresponding C-Python version, for instance you can replace C-Python 3.6.4 with Stackless Python 3.6.4 and everything should continue to work. This works well, except for Cython modules, because Cython uses undocumented implementation details of C-Python. Additionally there is Stackless bug 166, which is already fixed for Stackless 2.7 and will soon be fixed for 3.x.
Currently, I know of the following problems:
PyMethodDef ml_flags
In Cython/Utility/ModuleSetupCode.c and Cython/Utility/ObjectHandling.c Cython uses
METH_xxx
-flags but ignoresMETH_STACKLESS
. Probably this affects Stackless 3.6 and 3.7 and is is trivial to fix. I'll created pull request #2554.Incompatible PyFrameObject
Cython uses fields of
PyFameObject
, which are documented to be subject to change at any time. Unfortunately Stackless has a differentPyFrameObject
. Let's compare them:C-Python:
Stackless Python:
As you can see, Stackless has an additional field
f_execute
. And the following fields have a different offset:f_code
andf_localsplus
. This makes__Pyx_PyFunction_FastCallNoKw()
ABI incompatible between C-Python and Stackless, because this function usesf_localsplus
. This is fatal, because it breaks compatibility to binary wheels, which are almost always compiled using C-Python. How can we resolve this incompatibility? Currently I see three options:Change Cython to use only documented API. Not realistic for performance reasons.
Change Cython to determinate the offset of
f_localsplus
at runtime. Probably the most clean solution. It is possible to compute the offset off_localsplus
from the size of a frame (Py_SIZE(f)
) and the information in the code object. Of course the offset is constant at run time and may be cached.Change Stackless to detect and "repair" a corrupted
PyFrameObject
. This would require a change to the layout of the StacklessPyFrameObject
first, because currently Stacklessf_code
gets overwritten, when__Pyx_PyFunction_FastCallNoKw()
writesf_localsplus[0]
. Oncef_code
is preserved, Stackless can shift thef_localsplus
array to the correct offset. It could work, but it would be an ugly hack, that relies on Cython implementation details. The advantage is, that it restores the compatibility with already existing Cython based extension modules.Perhaps there are better options.
The text was updated successfully, but these errors were encountered: