Skip to content

Change sys.maxunicode to 0xffff#1196

Merged
slozier merged 3 commits intoIronLanguages:masterfrom
slozier:maxunicode
May 1, 2021
Merged

Change sys.maxunicode to 0xffff#1196
slozier merged 3 commits intoIronLanguages:masterfrom
slozier:maxunicode

Conversation

@slozier
Copy link
Copy Markdown
Contributor

@slozier slozier commented Apr 29, 2021

Using 0xffff for sys.maxunicode is probably more accurate since our implementation of unicode is closer that of CPython 3.2.

Some Python code uses this value to check if they can use "wide" characters, for example, from docutils:

if sys.maxunicode >= 0x10FFFF: # "wide" build
    delimiters += '\U00010100\U00010101\U0001039f\U000103d0\U00010857'

Another example from pyyaml:

has_ucs4 = sys.maxunicode > 0xffff

In both these modules, we end up with errors about "invalid" character ranges in a regular expressions:

re.compile("[\U00010000-\U0010ffff]")

re.error: parsing "[-]" - [x-y] range in reverse order.

@slozier slozier merged commit d95fcae into IronLanguages:master May 1, 2021
@slozier slozier deleted the maxunicode branch May 1, 2021 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant