New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode identifiers #2601

Open
phdoerfler opened this Issue Sep 11, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@phdoerfler

phdoerfler commented Sep 11, 2018

I want to cythonize this Python 3 code:

def say_hello_to_λ(name):
    print("Hello λ %s!" % name)

which fails:

cythoning src/main/cython/hellocython.pyx to src/main/cython/hellocython.cpp

Error compiling Cython file:
------------------------------------------------------------
...
def say_hello_to_λ(name):
                ^
------------------------------------------------------------

src/main/cython/hellocython.pyx:1:17: Unrecognized character
building 'cignificance' extension
clang -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -std=c++11 -I/usr/local/miniconda3/envs/spiht/include/python3.6m -c src/main/cython/cignificance.cpp -o build/temp.macosx-10.7-x86_64-3.6/src/main/cython/cignificance.o -std=c++11 -fextended-identifiers
src/main/cython/cignificance.cpp:1:2: error: Do not use this file, it is the result of a failed Cython compilation.
#error Do not use this file, it is the result of a failed Cython compilation.
 ^
1 error generated.
error: command 'clang' failed with exit status 1

I am not the only one with a desire to use unicode in my identifiers:

https://stackoverflow.com/questions/47462127/avoid-unrecognized-character-when-compiling-pyx-to-c-without-deleting-the-nord

And apparently certain Unicode characters are permitted in identifiers according to this https://stackoverflow.com/a/12693346/969122 since C11.

So I don't see a reason cython should not allow these characters.

@scoder

This comment has been minimized.

Show comment
Hide comment
@scoder

scoder Sep 11, 2018

Contributor

Enough of PEP-489 should now be supported in the latest master to consider this a realistic feature. PR welcome.

Note that it's worth not requiring C11. The C code that Cython generates usually satisfies itself with C89. Punycode can be used to encode Unicode identifier names to ASCII identifiers.

Contributor

scoder commented Sep 11, 2018

Enough of PEP-489 should now be supported in the latest master to consider this a realistic feature. PR welcome.

Note that it's worth not requiring C11. The C code that Cython generates usually satisfies itself with C89. Punycode can be used to encode Unicode identifier names to ASCII identifiers.

@scoder scoder changed the title from Unrecognized character to Support Unicode identifiers Sep 12, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment