Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Link-time issues when Cython output compiled as C++ #4111

Closed
aytey opened this issue Apr 12, 2021 · 7 comments
Closed

[BUG] Link-time issues when Cython output compiled as C++ #4111

aytey opened this issue Apr 12, 2021 · 7 comments
Milestone

Comments

@aytey
Copy link
Contributor

aytey commented Apr 12, 2021

Overview

I recently created the following BPO:

as well as associated PR:

to fix (what I felt was) a bug in (at least) Python 3.8/Python 3.9.

However, I received push-back from at least one commenter on the Python mailing list that the issue was discussing was an "API misuse" of Python.h (maybe that's true).

Anyway, since this came about because of Cython, I thought it was worth discussing it here.

Actual issue

If you Cythonate something like:

float(1)

then you end-up with some code that looks like this in your output:

static CYTHON_UNUSED double __Pyx__PyBytes_AsDouble(PyObject *obj, const char* start, Py_ssize_t length) {
    double value;
    Py_ssize_t i, digits;
    const char *last = start + length;
    char *end;
    while (Py_ISSPACE(*start))
        start++;
...

And where Py_ISSPACE has an expansion like:

#define Py_ISSPACE(c)  (_Py_ctype_table[Py_CHARMASK(c)] & PY_CTF_SPACE)

(in pyctype.h)

and where the variable _Py_ctype_table looks like this:

PyAPI_DATA(const unsigned int) _Py_ctype_table[256];

Problematically, the inclusion of (cpython) pyctype.h in Python.h does not guard _Py_ctype_table against the case of C++ compilation.

This then leads to the following issue:

  1. You use pre-compiled version of Python38 (e.g., on Windows), which has been compiled with a C compiler
  2. This means that all symbols are not name-managled
  3. You Cythonate some code and pull in a C++-only header
  4. You then compile your Cythonated code as C++ (e.g., with /TP if you're using Visual Studio)
  5. Finally, you try to link your Cythonated object code with Python38.lib

This then falls apart because Python38.lib has a un-name-mangled version of _Py_ctype_table, while your Cythonated code has C++-name-managled version of _Py_ctype_table, which means the linker cannot resolve the symbol, giving a link time error:

cl /Fe:test.exe /TP /I include cython_output.c test.c /link libs/python39.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29336 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cython_output.c
test.c
Generating Code...
Microsoft (R) Incremental Linker Version 14.28.29336.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:test.exe
libs/python39.lib
cython_output.obj
test.obj
   Creating library test.lib and object test.exp
cython_output.obj : error LNK2019: unresolved external symbol "__declspec(dllimport) unsigned int const * const _Py_ctype_table" (__imp_?_Py_ctype_table@@3QBIB) referenced in function "double __cdecl __Pyx__PyBytes_AsDouble(struct _object * __ptr64,char const * __ptr64,__int64)" (?__Pyx__PyBytes_AsDouble@@YANPEAU_object@@PEBD_J@Z)
test.exe : fatal error LNK1120: 1 unresolved externals

Resolution

Without changing the use of Py_ISSPACE within Cython, I feel that the only solution is to change the generated C code to look like this:

#ifdef __cplusplus
extern "C" {
#endif
#include <Python.h>
#ifdef __cplusplus
}
#endif

Personally, given that other files downstream of Python.h do have extern "C", I felt that this really is a Python bug rather than a "user" bug.

However, I'm sharing this in here in case my feelings are off-base and, indeed, this is a Cython bug.

If this is considered a Cython bug, I'm more than happy to make a PR with the above change.

@aytey aytey changed the title [BUG] [BUG] Link-time issues when Cython output compiled as C++ Apr 12, 2021
@da-woods
Copy link
Contributor

da-woods commented Apr 12, 2021

This does look to turn up in our Appveyor CI logs (for Python version 3.7, 3.8, 3.9):

[00:05:13]    Creating library C:\projects\cython\TEST_TMP\1\memoryview\cpp\numpy_memoryview_readonly\numpy_memoryview_readonly.cp38-win_amd64.lib and object C:\projects\cython\TEST_TMP\1\memoryview\cpp\numpy_memoryview_readonly\numpy_memoryview_readonly.cp38-win_amd64.exp
[00:05:13] numpy_memoryview_readonly.obj : error LNK2001: unresolved external symbol "__declspec(dllimport) unsigned int const * const _Py_ctype_table" (__imp_?_Py_ctype_table@@3QBIB)
[00:05:13] C:\projects\cython\TEST_TMP\1\memoryview\cpp\numpy_memoryview_readonly\numpy_memoryview_readonly.cp38-win_amd64.pyd : fatal error LNK1120: 1 unresolved externals

It's only an issue on Windows I think - presumably GCC doesn't mangle this kind of symbol.

I think whether it's ultimately a Python bug or not, it should be fixed in Cython since there are enough Python versions with this unguarded code in the wild.

@da-woods
Copy link
Contributor

Maybe we should use Py_UNICODE_ISSPACE universally rather than just as a workaround?

#if CYTHON_COMPILING_IN_PYPY && !defined(Py_ISSPACE)
#define Py_ISSPACE(c) Py_UNICODE_ISSPACE(c)
#endif

That has the advantage that it's both documented and in the correct C++ guards

@aytey
Copy link
Contributor Author

aytey commented Apr 12, 2021

I think whether it's ultimately a Python bug or not, it should be fixed in Cython since there are enough Python versions with this unguarded code in the wild.

Okay, so would you like the #ifdef __cplusplus PR?

That has the advantage that it's both documented and in the correct C++ guards

Would you like me to take a look at doing this?

Plus, yes, it does have the correct guards.

@da-woods
Copy link
Contributor

That has the advantage that it's both documented and in the correct C++ guards

Would you like me to take a look at doing this?

To me the Py_UNICODE_ISSPACE fix feels like a more proportionate change, so a PR doing that would be welcome. It's always possible that I'm wrong about this though...

It looks like this is already tested on appveyor (https://ci.appveyor.com/project/cython/cython) so I don't think there's any need to add specific tests.

@da-woods da-woods added the C++ label Apr 12, 2021
@da-woods
Copy link
Contributor

Or possibly don't do anything until a decision has been made on the Python bug...

@aytey
Copy link
Contributor Author

aytey commented Apr 13, 2021

Or possibly don't do anything until a decision has been made on the Python bug...

Okay, so given my PR got a positive review, it seems this is likely to go in ...

However, we still have:

To me the Py_UNICODE_ISSPACE fix feels like a more proportionate change

I can take a look at this

aytey added a commit to aytey/cython that referenced this issue Apr 13, 2021
…sues cython#4111

Signed-off-by: Andrew V. Jones <andrew.jones@vector.com>
@scoder scoder closed this as completed in 7ce24bc Apr 13, 2021
@scoder scoder added this to the 3.0 milestone Apr 13, 2021
@scoder scoder added the defect label Apr 13, 2021
@scoder
Copy link
Contributor

scoder commented Apr 13, 2021

I also searched for other occurrences of the Py_IS… macros, but it seems that all we're doing is testing for space characters.

The only difference that I can see is that Py_UNICODE_ISSPACE() treats NBSP (0xA0) as space whereas Py_ISSPACE() does not since it only looks at US-ASCII characters. This would mean that parsing floats surrounded by non-breaking spaces from a byte string would work in Cython but not in Python. Seems rather nice to have, but it's not how Python works here.

Given that we are talking about either 0x20 or 0x9 … 0xd as positive matches here, and the number of spaces around a value should be very low on average, I guess that an inline function to test that range should be fast enough. See 7ce24bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants