Support named unicode characters in f-strings #160

thatch · 2020-11-21T03:47:12Z

Fixes #154

The previous behavior misinterpreted the curly braces as enclosing an
expression. This change does some cursory validation so we can still
get parse errors in the most egregious cases, but does not validate that
the names are actually valid, only that they are name-shaped and have a
chance of being valid.

The character names appear to obey a few rules:

Case insensitive
Name characters are [A-Z0-9 \-]
Whitespace before or after is not allowed
Whitespace in the middle may only be a single space between words
Dashes may occur at the start or middle of a word

f"\N{A B}"           # might be legal
f"\N{a b}"           # equivalent to above
f"\N{A     B}"       # no way
f"\N{    A B     }"  # no way
f"""\N{A
B}"""                # no way

For confirming this regex matches all (current) unicode character names:

import re
import sys
import unicodedata

R = re.compile(r"[A-Za-z0-9\-]+(?: [A-Za-z0-9\-]+)*")

for i in range(sys.maxunicode):
    try:
        name = unicodedata.name(chr(i))
    except ValueError:
        # Some small values like 0 and 1 have no name, /shrug
        continue
    m = R.fullmatch(name)
    if m is None:
        print("FAIL", repr(name))

Fixes davidhalter#154 The previous behavior misinterpreted the curly braces as enclosing an expression. This change does some cursory validation so we can still get parse errors in the most egregious cases, but does not validate that the names are actually valid, only that they are name-shaped and have a chance of being valid. The character names appear to obey a few rules: * Case insensitive * Name characters are `[A-Z0-9 \-]` * Whitespace before or after is not allowed * Whitespace in the middle may only be a single space between words * Dashes may occur at the start or middle of a word ```py f"\N{A B}" # might be legal f"\N{a b}" # equivalent to above f"\N{A B}" # no way f"\N{ A B }" # no way f"""\N{A B}""" # no way ``` For confirming this regex matches all (current) unicode character names: ```py import re import sys import unicodedata R = re.compile(r"[A-Za-z0-9\-]+(?: [A-Za-z0-9\-]+)*") for i in range(sys.maxunicode): try: name = unicodedata.name(chr(i)) except ValueError: # Some small values like 0 and 1 have no name, /shrug continue m = R.fullmatch(name) if m is None: print("FAIL", repr(name)) ```

This is the same as my pull request davidhalter/parso#160

isidentical

LGTM. Could you please add a few tests where these unicode escapes used in the middle and at the end of the f-string. Like; f"some {stuff} and \N{escape}" etc.

thatch · 2020-11-22T01:17:25Z

Additional tests in latest push.

isidentical · 2020-11-22T12:37:37Z

Thanks for the patch @thatch! Also loved your work on https://github.com/thatch/python-grammar-changes

* Support named unicode characters in f-strings This is the same as my pull request davidhalter/parso#160 * A small bugfix to what is allowed in f-string expressions Thanks to davidhalter/parso#159 for catching that yield (as an expression, I suppose) is allowed on 3.6.

thatch added a commit to thatch/LibCST that referenced this pull request Nov 21, 2020

Support named unicode characters in f-strings

324bc7e

This is the same as my pull request davidhalter/parso#160

thatch mentioned this pull request Nov 21, 2020

Support Named Unicode Characters in f-strings Instagram/LibCST#424

Merged

thatch added a commit to thatch/LibCST that referenced this pull request Nov 21, 2020

Support named unicode characters in f-strings

fadede1

This is the same as my pull request davidhalter/parso#160

isidentical approved these changes Nov 21, 2020

View reviewed changes

Improve tests for named unicode escapes

ed7f35a

isidentical merged commit d39aadc into davidhalter:master Nov 22, 2020

jimmylai mentioned this pull request Nov 30, 2020

Parse error for unicode character name literals in f-strings Instagram/LibCST#400

Closed

dependabot bot mentioned this pull request Mar 11, 2021

Bump parso from 0.7.0 to 0.8.1 qutech/Qcodes#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support named unicode characters in f-strings #160

Support named unicode characters in f-strings #160

thatch commented Nov 21, 2020

isidentical left a comment

thatch commented Nov 22, 2020

isidentical commented Nov 22, 2020

Support named unicode characters in f-strings #160

Support named unicode characters in f-strings #160

Conversation

thatch commented Nov 21, 2020

isidentical left a comment

Choose a reason for hiding this comment

thatch commented Nov 22, 2020

isidentical commented Nov 22, 2020