Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode identifiers #2601

phdoerfler opened this issue Sep 11, 2018 · 2 comments

Support Unicode identifiers #2601

phdoerfler opened this issue Sep 11, 2018 · 2 comments


Copy link

phdoerfler commented Sep 11, 2018

I want to cythonize this Python 3 code:

def say_hello_to_λ(name):
    print("Hello λ %s!" % name)

which fails:

cythoning src/main/cython/hellocython.pyx to src/main/cython/hellocython.cpp

Error compiling Cython file:
def say_hello_to_λ(name):

src/main/cython/hellocython.pyx:1:17: Unrecognized character
building 'cignificance' extension
clang -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -std=c++11 -I/usr/local/miniconda3/envs/spiht/include/python3.6m -c src/main/cython/cignificance.cpp -o build/temp.macosx-10.7-x86_64-3.6/src/main/cython/cignificance.o -std=c++11 -fextended-identifiers
src/main/cython/cignificance.cpp:1:2: error: Do not use this file, it is the result of a failed Cython compilation.
#error Do not use this file, it is the result of a failed Cython compilation.
1 error generated.
error: command 'clang' failed with exit status 1

I am not the only one with a desire to use unicode in my identifiers:

And apparently certain Unicode characters are permitted in identifiers according to this since C11.

So I don't see a reason cython should not allow these characters.

Copy link

scoder commented Sep 11, 2018

Enough of PEP-489 should now be supported in the latest master to consider this a realistic feature. PR welcome.

Note that it's worth not requiring C11. The C code that Cython generates usually satisfies itself with C89. Punycode can be used to encode Unicode identifier names to ASCII identifiers.

@scoder scoder changed the title Unrecognized character Support Unicode identifiers Sep 12, 2018
Copy link

ghost commented Oct 15, 2018

Note that Punycode uses the hyphen as separator. As the separator is not allowed in C or C++ identifiers, we'd have to use a custom variant of Punycode which uses the underscore as separator.

Edit: There is already a note 'with hyphens ("-") replaced by underscores ("_")' in the PEP-489 which you linked to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants