New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revise comma handling on templates #2213
Revise comma handling on templates #2213
Conversation
Add unit tests for the use of the separator special character (comma) outside a function argument.
Remove ARG_SEP from Parser.special_chars, and promote some groups of characters used in parse_expression to class variables. ARG_SEP is still considered an "escapable" character, pending a decision on whether both escaped ('$,') and unescaped (',') syntax would be allowed.
Add ArgumentParser as a subclass of Parser that considers ARG_SEP a special character (ie. always needs escaping, terminates a block); and use it for parsing the substring that contains the list of arguments at parse_argument_list().
special_char_re = re.compile(r'[%s]|$' % | ||
u''.join(re.escape(c) for c in special_chars)) | ||
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left ARG_SEP
deliberately in the list of escapable chars in order to allow both escaped and unescaped commands outside a function (-f 'foo, $bar'
would work exactly the same as -f 'foo$, $bar'
).
I'm undecided about it, though: on one hand, it keeps the syntax backwards compatible-ish, but on the other hand introduces some ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree here—I think it's more predictable to make $, work the same way—as an escape—both inside and outside of function arguments.
special_char_re = re.compile(r'[%s]|$' % | ||
u''.join(re.escape(c) for c in special_chars)) | ||
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP) | ||
terminator_chars = (GROUP_CLOSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want (GROUP_CLOSE,)
here to make it a tuple. Otherwise this is just a one-character string, which happens to work in this case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that was the idea - thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I have a few small suggestions; and we should also add a changelog entry.
@@ -512,6 +513,18 @@ def _parse_ident(self): | |||
return ident | |||
|
|||
|
|||
class ArgumentsParser(Parser): | |||
"""``Parser`` that considers ``ARG_SEP`` to be a special character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more descriptive docstring would say "a parser used inside function arguments," i.e., state the purpose first before getting into the implementation details.
special_char_re = re.compile(r'[%s]|$' % | ||
u''.join(re.escape(c) for c in special_chars)) | ||
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP) | ||
terminator_chars = (GROUP_CLOSE, ARG_SEP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this subclassing approach, but I'm a little sad that it required us to copy n' paste the definitions from the base class. This seems like it could get us into trouble if we eventually change one set of lists and forget to update the other one.
It's a tough call, but maybe just an in_argument
flag on Parser
would be simpler?
Use a `in_argument` flag on Parser constructor for specifying if the parser should treat commas as a special character, including the logic in parse_expression.
extra_special_chars = (ARG_SEP,) | ||
special_char_re = re.compile( | ||
r'[%s]|$' % u''.join(re.escape(c) for c in | ||
self.special_chars + extra_special_chars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take 2, using an in_argument
flag! This block introduces an amount of extra-cruft on parse_expression
, but I wasn't too comfortable directly overwriting the class variables with instance variables on the constructor, ie:
def __init__(self, string, in_argument=False):
...
if in_argument:
self.special_chars = ...
self.special_char_re = ...
It might be quite a minor concern, as the Parsers that are used for list arguments only get a single call to parse_expression (and none to the other methods) in practice, so I'd be up for another refactoring if you think the trade-off makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; this looks great! The small bit of extra cruft is a little bit annoying, but I agree this is the right direction. It's preferable, as you note, to overriding those class variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! And quick work too. Thank you for tackling that issue. 🎉 ✨
extra_special_chars = (ARG_SEP,) | ||
special_char_re = re.compile( | ||
r'[%s]|$' % u''.join(re.escape(c) for c in | ||
self.special_chars + extra_special_chars)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; this looks great! The small bit of extra cruft is a little bit annoying, but I agree this is the right direction. It's preferable, as you note, to overriding those class variables.
Thanks for the quick review, it's my pleasure to squash bugs! I'm wondering if I could trouble you to revise the Path Formats\Syntax Details documentation before merging, as it will probably result in a clearer explanation of the situation? |
I'd be happy to, but I don't think I can push to your fork to update this PR? I can also just add the docs post-merge, but I'd say something like this:
|
Fix a formatting problem related to sphinx not allowing spaces at the beginning or end of an inline literal, and removed an extra sentence at th end of the %first template function documentation.
Hmmm, I think it should be possible since the recent-ish github changes (I have an "Allow edit from mantainers" box checked right after the "Lock conversation" and was prompted when creating the pull request), but honestly haven't tried out that feature. Nevertheless, I adjusted the documentation based on your suggestion, plus included a fix for a formatting issue one the
|
Aha; thanks for fixing that! And for committing the docs. I honestly can't quite figure out what's going on with the "allow edit from maintainers" checkbox—AFAICT, I still don't have ordinary push permissions to your repository, but maybe there's some kind of invisible exception that requires some incantation I don't know? Anyway, this looks great! Please merge whenever you like. |
Thanks! As for the permissions issue, it seems that they are "per-branch" (in this case, just for the Merging! |
Yay! Thanks again. ✨ I'll look into that new feature more closely in future PRs too. |
A first take for treating commas (
ARG_SEP
) as a regular character during the template parsing, except when it is used inside a function's argument (ie.%foo{bar,baz}
), as discussed on #2166.I have added a couple of tests checking for the usage of commas before other elements, as it seems that the problem was that the parser stopped processing the rest of the string whenever an unescaped comma was found. The test suite already includes other tests that deal with separators, (
a $, b
,a , b
,%foo{bar,baz}
,%foo{bar$,baz}
, ...) but I'd be happy to add more if needed.The solution basically splits the handling of the special characters (
special_chars,
special_char_re
, and the newly promotedescapable_chars
andterminator_chars
) in two cases: one for regular parsing that does not treats commas in a special way, and one just forparse_argument_list()
which does consider commas a special character (ie. in the same way it has been until these changes). I opted for subclassing, albeit I'm not sure it's entirely justified - it could easily be moved to a lower abstraction level, ie.__init__
orparse_expression
if preferred.