Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise comma handling on templates #2213

Merged
merged 7 commits into from Oct 4, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
36 changes: 27 additions & 9 deletions beets/util/functemplate.py
Expand Up @@ -311,33 +311,51 @@ class Parser(object):
replaced with a real, accepted parsing technique (PEG, parser
generator, etc.).
"""
def __init__(self, string):
def __init__(self, string, in_argument=False):
""" Create a new parser.
:param in_arguments: boolean that indicates the parser is to be
used for parsing function arguments, ie. considering commas
(`ARG_SEP`) a special character
"""
self.string = string
self.in_argument = in_argument
self.pos = 0
self.parts = []

# Common parsing resources.
special_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_OPEN, GROUP_CLOSE,
ARG_SEP, ESCAPE_CHAR)
ESCAPE_CHAR)
special_char_re = re.compile(r'[%s]|$' %
u''.join(re.escape(c) for c in special_chars))
escapable_chars = (SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left ARG_SEP deliberately in the list of escapable chars in order to allow both escaped and unescaped commands outside a function (-f 'foo, $bar' would work exactly the same as -f 'foo$, $bar').

I'm undecided about it, though: on one hand, it keeps the syntax backwards compatible-ish, but on the other hand introduces some ambiguity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree here—I think it's more predictable to make $, work the same way—as an escape—both inside and outside of function arguments.

terminator_chars = (GROUP_CLOSE,)

def parse_expression(self):
"""Parse a template expression starting at ``pos``. Resulting
components (Unicode strings, Symbols, and Calls) are added to
the ``parts`` field, a list. The ``pos`` field is updated to be
the next character after the expression.
"""
# Append comma (ARG_SEP) to the list of special characters only when
# parsing function arguments.
extra_special_chars = ()
special_char_re = self.special_char_re
if self.in_argument:
extra_special_chars = (ARG_SEP,)
special_char_re = re.compile(
r'[%s]|$' % u''.join(re.escape(c) for c in
self.special_chars + extra_special_chars))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take 2, using an in_argument flag! This block introduces an amount of extra-cruft on parse_expression, but I wasn't too comfortable directly overwriting the class variables with instance variables on the constructor, ie:

def __init__(self, string, in_argument=False):
    ...
    if in_argument:
        self.special_chars = ...
        self.special_char_re = ...

It might be quite a minor concern, as the Parsers that are used for list arguments only get a single call to parse_expression (and none to the other methods) in practice, so I'd be up for another refactoring if you think the trade-off makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; this looks great! The small bit of extra cruft is a little bit annoying, but I agree this is the right direction. It's preferable, as you note, to overriding those class variables.


text_parts = []

while self.pos < len(self.string):
char = self.string[self.pos]

if char not in self.special_chars:
if char not in self.special_chars + extra_special_chars:
# A non-special character. Skip to the next special
# character, treating the interstice as literal text.
next_pos = (
self.special_char_re.search(
special_char_re.search(
self.string[self.pos:]).start() + self.pos
)
text_parts.append(self.string[self.pos:next_pos])
Expand All @@ -348,14 +366,14 @@ def parse_expression(self):
# The last character can never begin a structure, so we
# just interpret it as a literal character (unless it
# terminates the expression, as with , and }).
if char not in (GROUP_CLOSE, ARG_SEP):
if char not in self.terminator_chars + extra_special_chars:
text_parts.append(char)
self.pos += 1
break

next_char = self.string[self.pos + 1]
if char == ESCAPE_CHAR and next_char in \
(SYMBOL_DELIM, FUNC_DELIM, GROUP_CLOSE, ARG_SEP):
if char == ESCAPE_CHAR and next_char in (self.escapable_chars +
extra_special_chars):
# An escaped special character ($$, $}, etc.). Note that
# ${ is not an escape sequence: this is ambiguous with
# the start of a symbol and it's not necessary (just
Expand All @@ -375,7 +393,7 @@ def parse_expression(self):
elif char == FUNC_DELIM:
# Parse a function call.
self.parse_call()
elif char in (GROUP_CLOSE, ARG_SEP):
elif char in self.terminator_chars + extra_special_chars:
# Template terminated.
break
elif char == GROUP_OPEN:
Expand Down Expand Up @@ -483,7 +501,7 @@ def parse_argument_list(self):
expressions = []

while self.pos < len(self.string):
subparser = Parser(self.string[self.pos:])
subparser = Parser(self.string[self.pos:], in_argument=True)
subparser.parse_expression()

# Extract and advance past the parsed expression.
Expand Down
2 changes: 2 additions & 0 deletions docs/changelog.rst
Expand Up @@ -45,6 +45,8 @@ The are a couple of small new features:
when a song can be found on AcousticBrainz, this is faster and more
automatic than using the :doc:`/plugins/bpm`.
* ``beet --version`` now includes the python version used to run beets.
* :doc:`/reference/pathformat` can now include unescaped commas (``,``) when
they are not part of a function call. :bug:`2166` :bug:`2213`

And there are a few bug fixes too:

Expand Down
20 changes: 13 additions & 7 deletions docs/reference/pathformat.rst
Expand Up @@ -76,12 +76,12 @@ These functions are built in to beets:
* ``%time{date_time,format}``: Return the date and time in any format accepted
by `strftime`_. For example, to get the year some music was added to your
library, use ``%time{$added,%Y}``.
* ``%first{text}``: Returns the first item, separated by ``; ``.
* ``%first{text}``: Returns the first item, separated by ``;`` (a semicolon
followed by a space).
You can use ``%first{text,count,skip}``, where ``count`` is the number of
items (default 1) and ``skip`` is number to skip (default 0). You can also use
``%first{text,count,skip,sep,join}`` where ``sep`` is the separator, like
``;`` or ``/`` and join is the text to concatenate the items.
For example,
* ``%ifdef{field}``, ``%ifdef{field,truetext}`` or
``%ifdef{field,truetext,falsetext}``: If ``field`` exists, then return
``truetext`` or ``field`` (default). Otherwise, returns ``falsetext``.
Expand Down Expand Up @@ -142,11 +142,17 @@ Syntax Details
The characters ``$``, ``%``, ``{``, ``}``, and ``,`` are "special" in the path
template syntax. This means that, for example, if you want a ``%`` character to
appear in your paths, you'll need to be careful that you don't accidentally
write a function call. To escape any of these characters (except ``{``), prefix
it with a ``$``. For example, ``$$`` becomes ``$``; ``$%`` becomes ``%``, etc.
The only exception is ``${``, which is ambiguous with the variable reference
syntax (like ``${title}``). To insert a ``{`` alone, it's always sufficient to
just type ``{``.
write a function call. To escape any of these characters (except ``{``, and
``,`` outside a function argument), prefix it with a ``$``. For example,
``$$`` becomes ``$``; ``$%`` becomes ``%``, etc. The only exceptions are:

* ``${``, which is ambiguous with the variable reference syntax (like
``${title}``). To insert a ``{`` alone, it's always sufficient to just type
``{``.
* commas are used as argument separators in function calls. Inside of a
function's argument, use ``$,`` to get a literal ``,`` character. Outside of
any function argument, escaping is not necessary: ``,`` by itself will
produce ``,`` in the output.

If a value or function is undefined, the syntax is simply left unreplaced. For
example, if you write ``$foo`` in a path template, this will yield ``$foo`` in
Expand Down
16 changes: 16 additions & 0 deletions test/test_template.py
Expand Up @@ -211,6 +211,22 @@ def test_nested_call_with_argument(self):
self._assert_call(arg_parts[0], u"bar", 1)
self.assertEqual(list(_normexpr(arg_parts[0].args[0])), [u'baz'])

def test_sep_before_call_two_args(self):
parts = list(_normparse(u'hello, %foo{bar,baz}'))
self.assertEqual(len(parts), 2)
self.assertEqual(parts[0], u'hello, ')
self._assert_call(parts[1], u"foo", 2)
self.assertEqual(list(_normexpr(parts[1].args[0])), [u'bar'])
self.assertEqual(list(_normexpr(parts[1].args[1])), [u'baz'])

def test_sep_with_symbols(self):
parts = list(_normparse(u'hello,$foo,$bar'))
self.assertEqual(len(parts), 4)
self.assertEqual(parts[0], u'hello,')
self._assert_symbol(parts[1], u"foo")
self.assertEqual(parts[2], u',')
self._assert_symbol(parts[3], u"bar")


class EvalTest(unittest.TestCase):
def _eval(self, template):
Expand Down