Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise comma handling on templates #2213

Merged
merged 7 commits into from Oct 4, 2016

Conversation

diego-plan9
Copy link
Member

@diego-plan9 diego-plan9 commented Oct 2, 2016

A first take for treating commas (ARG_SEP) as a regular character during the template parsing, except when it is used inside a function's argument (ie. %foo{bar,baz}), as discussed on #2166.

I have added a couple of tests checking for the usage of commas before other elements, as it seems that the problem was that the parser stopped processing the rest of the string whenever an unescaped comma was found. The test suite already includes other tests that deal with separators, (a $, b, a , b, %foo{bar,baz}, %foo{bar$,baz}, ...) but I'd be happy to add more if needed.

The solution basically splits the handling of the special characters (special_chars, special_char_re, and the newly promoted escapable_chars and terminator_chars) in two cases: one for regular parsing that does not treats commas in a special way, and one just for parse_argument_list() which does consider commas a special character (ie. in the same way it has been until these changes). I opted for subclassing, albeit I'm not sure it's entirely justified - it could easily be moved to a lower abstraction level, ie. __init__ or parse_expression if preferred.

Add unit tests for the use of the separator special character (comma)
outside a function argument.
Remove ARG_SEP from Parser.special_chars, and promote some groups of
characters used in parse_expression to class variables.
ARG_SEP is still considered an "escapable" character, pending a decision
on whether both escaped ('$,') and unescaped (',') syntax would be
allowed.
Add ArgumentParser as a subclass of Parser that considers ARG_SEP a
special character (ie. always needs escaping, terminates a block); and
use it for parsing the substring that contains the list of arguments at
parse_argument_list().
special_char_re = re.compile(r'[%s]|$' %
u''.join(re.escape(c) for c in special_chars))
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left ARG_SEP deliberately in the list of escapable chars in order to allow both escaped and unescaped commands outside a function (-f 'foo, $bar' would work exactly the same as -f 'foo$, $bar').

I'm undecided about it, though: on one hand, it keeps the syntax backwards compatible-ish, but on the other hand introduces some ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree here—I think it's more predictable to make $, work the same way—as an escape—both inside and outside of function arguments.

@diego-plan9 diego-plan9 changed the title Revise comma handling on templates (#2166) Revise comma handling on templates Oct 2, 2016
special_char_re = re.compile(r'[%s]|$' %
u''.join(re.escape(c) for c in special_chars))
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP)
terminator_chars = (GROUP_CLOSE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want (GROUP_CLOSE,) here to make it a tuple. Otherwise this is just a one-character string, which happens to work in this case!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that was the idea - thanks for catching it!

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I have a few small suggestions; and we should also add a changelog entry.

@@ -512,6 +513,18 @@ def _parse_ident(self):
return ident


class ArgumentsParser(Parser):
"""``Parser`` that considers ``ARG_SEP`` to be a special character.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more descriptive docstring would say "a parser used inside function arguments," i.e., state the purpose first before getting into the implementation details.

special_char_re = re.compile(r'[%s]|$' %
u''.join(re.escape(c) for c in special_chars))
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP)
terminator_chars = (GROUP_CLOSE, ARG_SEP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this subclassing approach, but I'm a little sad that it required us to copy n' paste the definitions from the base class. This seems like it could get us into trouble if we eventually change one set of lists and forget to update the other one.

It's a tough call, but maybe just an in_argument flag on Parser would be simpler?

Use a `in_argument` flag on Parser constructor for specifying if the
parser should treat commas as a special character, including the logic
in parse_expression.
extra_special_chars = (ARG_SEP,)
special_char_re = re.compile(
r'[%s]|$' % u''.join(re.escape(c) for c in
self.special_chars + extra_special_chars))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take 2, using an in_argument flag! This block introduces an amount of extra-cruft on parse_expression, but I wasn't too comfortable directly overwriting the class variables with instance variables on the constructor, ie:

def __init__(self, string, in_argument=False):
    ...
    if in_argument:
        self.special_chars = ...
        self.special_char_re = ...

It might be quite a minor concern, as the Parsers that are used for list arguments only get a single call to parse_expression (and none to the other methods) in practice, so I'd be up for another refactoring if you think the trade-off makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; this looks great! The small bit of extra cruft is a little bit annoying, but I agree this is the right direction. It's preferable, as you note, to overriding those class variables.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! And quick work too. Thank you for tackling that issue. 🎉 ✨

extra_special_chars = (ARG_SEP,)
special_char_re = re.compile(
r'[%s]|$' % u''.join(re.escape(c) for c in
self.special_chars + extra_special_chars))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; this looks great! The small bit of extra cruft is a little bit annoying, but I agree this is the right direction. It's preferable, as you note, to overriding those class variables.

@diego-plan9
Copy link
Member Author

Thanks for the quick review, it's my pleasure to squash bugs! I'm wondering if I could trouble you to revise the Path Formats\Syntax Details documentation before merging, as it will probably result in a clearer explanation of the situation?

@sampsyo
Copy link
Member

sampsyo commented Oct 4, 2016

I'd be happy to, but I don't think I can push to your fork to update this PR? I can also just add the docs post-merge, but I'd say something like this:

Commas are used as argument separators in function calls. Inside of a function's argument, use $, to get a literal , character. Outside of any function argument, escaping is not necessary: , by itself will produce , in the output.

Fix a formatting problem related to sphinx not allowing spaces at the
beginning or end of an inline literal, and removed an extra sentence
at th end of the %first template function documentation.
@diego-plan9
Copy link
Member Author

I'd be happy to, but I don't think I can push to your fork to update this PR?

Hmmm, I think it should be possible since the recent-ish github changes (I have an "Allow edit from mantainers" box checked right after the "Lock conversation" and was prompted when creating the pull request), but honestly haven't tried out that feature.

Nevertheless, I adjusted the documentation based on your suggestion, plus included a fix for a formatting issue one the %first{} section that I noticed while revising generated html file that seems to be caused by a limitation on reST inline markup:

content may not start or end with whitespace: * text* is wrong,

@sampsyo
Copy link
Member

sampsyo commented Oct 4, 2016

Aha; thanks for fixing that!

And for committing the docs. I honestly can't quite figure out what's going on with the "allow edit from maintainers" checkbox—AFAICT, I still don't have ordinary push permissions to your repository, but maybe there's some kind of invisible exception that requires some incantation I don't know?

Anyway, this looks great! Please merge whenever you like.

@diego-plan9
Copy link
Member Author

Thanks! As for the permissions issue, it seems that they are "per-branch" (in this case, just for the template-comma-behaviour branch), so there might be some magic involved - or alternatively, I might have missed something on my end, which is always an option! I'll look into it for future pull requests.

Merging!

@sampsyo
Copy link
Member

sampsyo commented Oct 4, 2016

Yay! Thanks again. ✨

I'll look into that new feature more closely in future PRs too.

@diego-plan9 diego-plan9 deleted the template-comma-behaviour branch October 6, 2016 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants