Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode identifiers (C structures) #3156

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

da-woods
Copy link
Contributor

This is a separate pull request for the parts of #3119 that needed more discussion. These were essentially the features that go beyond Python compatibility.

Support for unicode identifiers in C/C++ features such as structs and cppclasses. For structs used purely in Python I've mangled the names with punycode. For features that are exported/imported to C with "public" or "extern", I've translated the names to be \uXXXX escaped (without any mangling or normalization). Pretty much every modern C/C++ compiler supports unicode in identifiers in this form (only Clang supports it in raw form I think), so this this seems like the most compatible thing to do. I've trusted that the user knows what names they want and not performed any normalization for these (I don't think normalization is yet defined in C/C++ standards, so it's hard to do anything else).

When they are to be used internally the name is mangled with
punycode as normal. When they are to be used externally
(e.g. "cdef public" or "cdef from extern") the name is taken
exactly as-is and simply slash-escaped ("\uNNNN"). The
vast majority of C compilers are capable to dealing with
\uNNNN characters in literal names.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant