Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add boolean "not" query operator #1728

Merged
merged 7 commits into from Nov 20, 2015
Merged

Conversation

diego-plan9
Copy link
Member

A tentative effort for tackling #819, which hopefully will be welcome by users such as myself whose brain decides to forget regular expression syntax more often than not.

It started as a small test on my end (basically, wrap the existing SQL queries with NOT ()) and might have turned into a rough but working prototype for several scenarios, and even though the formal tests are far for complete, user should be able to do things like:

beets ls Dylan title:¬"tired horses" year:¬1980..1990

I'll add some line comments and clarifications within a day, but the main points are:

  • Is this the right direction (prepending SQL with (NOT), negating the match() of the subquery)? I was a bit hesitant at first, but kept adding tests and glancing at the sqlite SQL specs and, well, haven't found large issues yet.
  • NotQuery definition is currently a bit messy: it is a subclass of MutableCollectionQuery for ... historical reasons (ie. it was the quickest way for me to play with a Query that had subqueries). I'm not really sure if there is a valid case where more than one subquery will be needed, and tests have gone in that direction.
  • The syntax is currently field:¬value, as it was the only option (the others being field.not:value and not field:value, unless I missed some comments) that allowed quickly reusing the prefixes mechanism, and avoid introduce extra code for resolving ambiguities (specially in the case of the third option). I'm wondering if a consensus has been reached on what would be the preferred syntax?
  • Tests (and everything in general) are a work in progress, and are mainly geared towards finding out how well NotQuery plays with the different Query subclasses, and comparing it to existing tests on test_query.py - I did not run tests for other modules yet.

* Add support for user friendly "not" operator in queries without requiring to
use regular expressions, via adding NotQuery to beets.dbcore.query.
* Update the prefix list at library.parse_query_parts() and the query parsing
mechanisms in order to create a NotQuery when the prefix is found.
* Add two TestCases, NotQueryMatchTest as the negated version of MatchTest, and
the more generic NotQueryTest for testing the integration of the NotQuery with
the rest of the existing Query subclasses.
@@ -447,6 +447,24 @@ def match(self, item):
return any([q.match(item) for q in self.subqueries])


class NotQuery(MutableCollectionQuery):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As hinted on the main comment, the reason for being a subclass of MutableCollectionQuery is rather weak, and mainly due to laziness. Conceptually, I can't really think of an argument for NotQuery having more than one subquery (and if there is a case for that, one could always use a NotQuery(AndQuery([...])) or similar).

The "only one subquery" is not enforced at the moment, and has some impact on readability (self.clause_with_joiner(joiner) never uses the joiner, self.match() iterates over only one element, constructor receives a list of one element as its first parameter, etc). I'm still revising the implications, but seems like preserving the rest of niceties and special methods of MutableCollectionQuery (or at least CollectionQuery) would be desirable. Any hints on what would be the best option? May just be a case of overriding __init__, accepting only one subquery.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can see why this was easy but might feel awkward. Probably the most straightforward approach is to not inherit from anything special (i.e., make Query itself the superclass). The __init__ method would take a single subquery, and clause and match would work more or less as they do now.

@sampsyo
Copy link
Member

sampsyo commented Nov 17, 2015

Awesome! Thank you for tackling this! You've made a lot of progress already. ✨

On the specific syntax: Personally, I would find a prefix outside of the query term the most intuitive. For example, you would write ¬artist:beatles instead of artist:¬beatles. Here are a couple of reasons this makes more sense to me:

  • Intuitively, you're negating the entire query term—not just the substring. That is, in my head I translate more to "tracks that do not have an artist matching beatles" rather than "tracks whose artist does not match beatles."
  • As you noted above, this makes it easier to negate other query types. Not just regular expressions, but also the fuzzy matches provided by the corresponding plugin and any other query extensions we come up with in the future.
  • If we eventually add parentheses, it will be natural to write ¬(foo , bar) or other more complex combinations.

Also, on the concrete character used: It would be nice to have an alternative when ¬ is hard to type. It's perfectly easy to get on my Mac, of course, but I'm sure it's tricky on Windows or whatever. The natural thing to use would be a minus - sign, but that has clear problems with being mistaken for command-line options. So here's an idea: let's allow either ¬ or -, and then on the command line use the old -- trick to mean "everything after this is an argument, not an option flag."

This will also work well with other interfaces, like the web plugin, where - doesn't pose a problem.

@diego-plan9
Copy link
Member Author

Thanks for the feedback and for sharing your thoughts, in particular with the syntax issue!

As mentioned, the choice was rather arbitrary and meant to be changed indeed. I'd be open to dropping ¬ entirely after revising the keyboard layout differences (I'm using a Spanish keyboard on GNU/Linux, and it's available at altgr+6, but seems to be missing on many variants, most notably the US-non international one); so in practice very few people might end up using it. I'd still love to find a way to avoid having to use the -- trick (even if it shell users have probably encountered it in the past one way or another), but I'm out of ideas about finding a character that fills all the ("universally recognized as negation", "easy to type in all variants", "unambigous", "not conflicting with reserved shell characters") criteria.

* Modify NotQuery so it subclasses Query instead of MutableCollectionQuery.
* Update instances where NotQuery objects are created on tests and queryparse,
as NotQuery expects a single Query as a parameter to the constructor instead of
a list of Queries.
* Update the negation query sintax so that the negation prefix ("-" or "¬") is
always the first part of a query, instead of after the ":" separator.
* Modify queryparse, so that the detection of the negation is done inside
parse_query_part() (using the modified PARSE_QUERY_PART_REGEX, and returning
the negation flag) instead of construct_query_part.
* Revert the prefixes dict on beets.library to the original dict (only one item,
with the RegexpQuery prefix).
* Add tests to NotQueryTest for testing the results of using queries with
negation.
* Fix issue on test_dbcore due to the modifications on the tuple returned by
parse_query_part (the number of elements was changed from 3 to 4).
* Fix issue with an assertion that was order-sensitive and caused a problem on
some tox runs.
@diego-plan9
Copy link
Member Author

Updated the code with the suggested changes:

  • NotQuery now subclasses Query directly, hopefully getting rid of the awkwardness and making it less cumbersome to instantiate.
  • The syntax is now the suggested -foo:bar or ¬foo:bar. I hope the changes are not too intrusive: I was a bit hesitant about modifying PARSE_QUERY_PART_REGEX and the number of elements on the return tuple of parse_query_part(), but in the end went for it as the alternative solutions I could think of seems a bit too cumbersome with no real gain (they would potentially need to access the original query string, discerning between keyed and not-keyed cases, and access to the desired query_class, and make sure to preserve the original behaviour if the query is not negated, if I'm not mistaken).

Also, I added a handful of new tests checking the behaviour of NotQuery when building the queries from strings directly (using both prefixes, keyed and non-keyed terms, mixing negated and non-negated terms, using negated regexps, etc).

@sampsyo
Copy link
Member

sampsyo commented Nov 18, 2015

Awesome! This is looking great. I'll do a full code review now.

On the syntax: here's one other idea for a magic character. The carat, ^, is on most keyboards, not a "special" character for any shell I know, and vaguely suggests negation to me because it's often XOR in programming languages. Does that make any sense to anybody else?

if clause:
return 'not ({0})'.format(clause), subvals
else:
# special case for RegexpQuery, (None, ())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this is actually the case for any "slow" query. For explicitness, I might suggest putting this in __init__:

self.fast = subquery.fast

And then here:

if self.fast:
    clause, subvals = ...
    return ...
else:
    # A slow query.
    return None, ()

as in the clause method of FieldQuery, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After playing a bit with the suggestion and inspecting the code more carefully, I'm wondering if the suggested change for explicitness would actually decrease readability? In particular:

  • only the FieldQuery subclasses have the fast attribute, and ideally NotQuery should work fine with any 'Query' (for example, -foo results in a NotQuery(AnyQuery(...))).
  • RegexpQuery is still a bit problematic, as it is the only FieldQuery descendant that does not implement col_clause(). As a result, I can't seem to find a way to avoid the explicit checking for clause != None at 458-462, otherwise NotQuery(RegexpQuery(..., fast=True).clause() ends up returning (u'not (None)', ()), which breaks many test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; I forgot that reading the fast flag would assume a FieldQuery.

Let's keep it like this. I suggest a comment something like this:

If there is no clause, there is nothing to negate. All the logic is handled by match() for slow queries.

@sampsyo
Copy link
Member

sampsyo commented Nov 18, 2015

The tests are looking awesome, by the way! Extremely thorough.

* Revise the NotQuery syntax, replacing the '¬' character with '^'. Fix tests
to conform to this change, and cleanup the PARSE_QUERY_PART_REGEX.
* Modify parse_query_part() docstring to mention the negate parameter on the
returned tuple, and added an example.
@diego-plan9
Copy link
Member Author

Thanks for the review! Just pushed some of the changes: the main difference is that now the syntax uses ^ instead of ¬. There are still a couple of issues where more input or guidance would be much appreciated, at comments #1728 (comment) and #1728 (comment).

@@ -447,6 +447,34 @@ def match(self, item):
return any([q.match(item) for q in self.subqueries])


class NotQuery(Query):
"""A query that matches the negation of its `subquery`, as a shorcut for
performing `not(subquery)` without using regular expressions."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny style nit: PEP8 suggests putting the closing """ on its own line (for some reason).

@sampsyo
Copy link
Member

sampsyo commented Nov 19, 2015

Looking great! The only loose ends from here are a changelog entry and documentation on the page about queries. Then, please feel free to merge at will!

Afterward, we can make sure -- works on the command line. We could also consider allowing | as a synonym for the "or" operator that is currently spelled only as , for symmetry with this new operator.

* Add changelog and query.rst documentation entries for the usage of negated
queries.
* Cleanup NotQuery class as suggested during code review (PEP conforming
docstring, clarification on empty clause match behaviour).
@diego-plan9
Copy link
Member Author

Updated the code with the suggested changes, and added an entry to changelog.rst and documentation to query.rst, hopefully getting the pull request closer to being merged! I'd still prefer if you would double-check the documentation changes (as my English can get a bit creative sometimes) instead of merging them myself.

As for the next steps, I'd be up for taking a look at them during next week (via a new pull request, I assume?), seems like a logical follow-up. I did not test the -- issue extensively, but some informal test on the command line seem to suggest that it is working already.

@sampsyo sampsyo merged commit 51bf6a1 into beetbox:master Nov 20, 2015
sampsyo added a commit that referenced this pull request Nov 20, 2015
Add boolean "not" query operator
sampsyo added a commit that referenced this pull request Nov 20, 2015
- Use a marginally more realistic example in the changelog.
- The -- convention is actually not the purview of shells; it's just a de
  facto standard for command-line parsers. Fortunately, argparse supports it
  out of the box.
@sampsyo
Copy link
Member

sampsyo commented Nov 20, 2015

Wonderful! Thank you again---this is an awesome addition! ✨ Much gratitude on behalf of the entire beets community.

I merged with just a few tiny changes to the docs.

@diego-plan9 diego-plan9 deleted the notquery branch October 17, 2016 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants