-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support named unicode characters in f-strings #160
Conversation
Fixes davidhalter#154 The previous behavior misinterpreted the curly braces as enclosing an expression. This change does some cursory validation so we can still get parse errors in the most egregious cases, but does not validate that the names are actually valid, only that they are name-shaped and have a chance of being valid. The character names appear to obey a few rules: * Case insensitive * Name characters are `[A-Z0-9 \-]` * Whitespace before or after is not allowed * Whitespace in the middle may only be a single space between words * Dashes may occur at the start or middle of a word ```py f"\N{A B}" # might be legal f"\N{a b}" # equivalent to above f"\N{A B}" # no way f"\N{ A B }" # no way f"""\N{A B}""" # no way ``` For confirming this regex matches all (current) unicode character names: ```py import re import sys import unicodedata R = re.compile(r"[A-Za-z0-9\-]+(?: [A-Za-z0-9\-]+)*") for i in range(sys.maxunicode): try: name = unicodedata.name(chr(i)) except ValueError: # Some small values like 0 and 1 have no name, /shrug continue m = R.fullmatch(name) if m is None: print("FAIL", repr(name)) ```
This is the same as my pull request davidhalter/parso#160
This is the same as my pull request davidhalter/parso#160
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you please add a few tests where these unicode escapes used in the middle and at the end of the f-string. Like; f"some {stuff} and \N{escape}"
etc.
Additional tests in latest push. |
Thanks for the patch @thatch! Also loved your work on https://github.com/thatch/python-grammar-changes |
* Support named unicode characters in f-strings This is the same as my pull request davidhalter/parso#160 * A small bugfix to what is allowed in f-string expressions Thanks to davidhalter/parso#159 for catching that yield (as an expression, I suppose) is allowed on 3.6.
Fixes #154
The previous behavior misinterpreted the curly braces as enclosing an
expression. This change does some cursory validation so we can still
get parse errors in the most egregious cases, but does not validate that
the names are actually valid, only that they are name-shaped and have a
chance of being valid.
The character names appear to obey a few rules:
[A-Z0-9 \-]
For confirming this regex matches all (current) unicode character names: