Fix incorrect index selection when sort specified #866

willholley · 2017-10-05T12:26:48Z

Overview

Mango has an apparently long-standing bug whereby if an index is deemed valid for sorting a query but is not valid according to mango_idx:is_usable, the query planner would fall back to _all_docs silently, therefore ignoring the sort order specified in the query.

This reverses the index selection logic so that the list of usable indexes is generated prior to filtering them based on the sort order specified in the query.

Similar logic is applied to use_index, allowing us to generate a more specific error message when a user specifies an index which isn't valid for the current
selector.

This fix exposes that the change introduced in #816 may cause existing queries with sort fields to fail (indeed, several queries in our test suite needed fixing), so we need to consider whether this is a severe enough breaking change to warrant a major version bump.

Testing recommendations

Run the test suite. Test that you can run Mango queries with sort criteria.

Related Pull Requests

#816

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

garrensmith

I've tested and it looks good to me

garrensmith · 2017-10-05T13:36:22Z

src/mango/src/mango_idx.erl

-    % If a sort was specified we have to find an index that
-    % can satisfy the request.
+has_use_index(Opts) ->
+    erlang:display(<<"sort opts">>),


garrensmith · 2017-10-05T13:36:51Z

src/mango/test/02-basic-find-test.py

@@ -222,6 +222,20 @@ def test_sort(self):
        docs2 = list(reversed(sorted(docs1, key=lambda d: d["age"])))
        assert docs1 is not docs2 and docs1 == docs2

+    def test_sort_desc_complex(self):


Could you add a test without the manager: $exists: true and make sure it returns an error

garrensmith

LGTM

tonysun83 · 2017-10-05T18:38:22Z

src/mango/src/mango_idx.erl

+
+    UsableIndexes1 = case has_use_index(Opts) of
+        true ->
+            UsableFilteredIndexes = mango_cursor:maybe_filter_indexes_by_ddoc(UsableIndexes0, Opts),


maybe this can be written as:

case mango_cursor:maybe_filter_indexes_by_ddoc(UsableIndexes0, Opts) of [] -> ?MANGO_ERROR({no_usable_index, no_usable_index_matching_name}); UsableFilteredIndexes -> UsableFilteredIndexes end;

tonysun83 · 2017-10-05T18:40:50Z

src/mango/src/mango_idx.erl

+    % can satisfy the request.
+    case has_sort(Opts) of
+        true ->
+            SortIndexes = mango_idx:for_sort(UsableIndexes1, Opts),


similar stylist change like above:

case mango_idx:for_sort(UsableIndexes1, Opts) of [] -> ?MANGO_ERROR({no_usable_index, missing_sort_index}); SortIndexes -> SortIndexes end;

tonysun83 · 2017-10-05T19:13:42Z

This is a tricky spot. I think the situation where the user would complain about backwards compatibility is as follows:

User had a sort specified on one field a
User had an index created for fields : [a, b]
User Searched on a and incorrectly got back documents which only included docs with both a and b.
User gets a list of sorted docs returned, but missing some documents. User doesn't complain because the docs look sorted.

In the scenario where we ignore sort order and used _all_docs, I would assume the user would have already complained that their documents are not sorted.

Perhaps we should improve the error messages returned?

willholley · 2017-10-06T08:51:08Z

@tonysun83 this PR should improve the error message in the case where a user specifies an index with use_index that is no longer deemed valid. We should have a corresponding documentation PR which explains the error messages and how to address them.

willholley · 2017-10-06T09:30:12Z

@tonysun83 If a sort is specified, perhaps it's reasonable to assume that the user expects the field to exist in all documents that are returned. To preserve existing behaviour, we could implicitly add an {"$exists": true} predicate for each sort field to the selector.

This would allow us to safely validate the selector/index combination against is_usable whilst preserving existing behaviour as much as possible.

tonysun83 · 2017-10-06T14:48:33Z

@willholley : I considered appending {"$exists":true} for backwards compatibility. We've done it in the past when we appended {"$gt": null} to do the _all_docs trick for users. If we do this though, users would not see an error message, but they could still be getting incorrect results back as we're forcing the predicate fields to exist. I.e

Index: [a, b, c]
Query: {"c": "foo"}
Sort:[a, b]
New modified becomes: {"c": "foo", {"a":{"$exists": true}}, {"b":{"$exists": true}}}

It's a tradeoff: let their apps continue to run with possible incorrect data or let it break and force them to possibly re-index and modify their app. I'd like more opinions @garrensmith @davisp ?

garrensmith · 2017-10-10T13:27:34Z

Adding $exists: true is a nice idea. +1

davisp

Overall the change looks good. I can't figure out norm_deduplicate though. I assume that's to try and deduplicate the $exists check which a) I'm not sure is a requirement, and b) seems to be unused near as I can tell.

Also there's a bug in has_use_index I noted below.

davisp · 2017-10-10T18:03:47Z

src/mango/src/mango_idx.erl

-    % can satisfy the request.
+has_use_index(Opts) ->
+    case lists:keyfind(sort, 1, Opts) of
+        {use_index, _} ->


This check will never return true as sort is not equal to use_index.

Which means we're likely missing a test for this since travis returned green.

There is a test but this situation is handled elsewhere. I'll remove has_use_index and its usage.

davisp · 2017-10-10T18:06:23Z

src/mango/src/mango_selector.erl

@@ -398,6 +400,14 @@ negate({[{Field, Cond}]}) ->
    {[{Field, negate(Cond)}]}.


+norm_deduplicate({[{Op, [H|T]}]}) ->


I'm not seeing where this is called.

It's a step in the selector normalization - (line 40 above). The intention is to deduplicate predicates (particularly "$exists": true) but this isn't necessary for correctness - it just makes the normalized selector cleaner. Thinking about it again, I'll remove it here and will consider improving selector normalization in a different PR.

davisp · 2017-10-10T18:07:36Z

src/mango/src/mango_selector.erl

@@ -398,6 +400,14 @@ negate({[{Field, Cond}]}) ->
    {[{Field, negate(Cond)}]}.


+norm_deduplicate({[{Op, [H|T]}]}) ->
+    Arg = [H | [X || X <- norm_deduplicate(T), X /= H]],


This really hurts my brain to read. And not knowing what's supposed to call it I can't figure out if its correct.

davisp · 2017-10-10T18:47:20Z

Couple things to unpack in the discussion:

First, for #816 I think that's actually the opposite, its not whether it needs a new version bump but how many versions should be be back porting this fix to. A database that gives bad answers to queries is more a bug that needs to be patched backwards, not something we cut the cord on and just bump version numbers.

There is of course the issue around notifying users that a pending change may break their application (technically fix, but users have a different perspective :), so this is definitely something we should talk about. We'll also want to update the documentation as we've long said that using column prefixes was a suggestion for sharing indexes so that'll need to be fixed.

For @tonysun83's example of this changing a query from something that uses _all_docs to something that uses an existing index I don't think that's an issue. Originally I would have ixnayed it because of the lack of range restriction on A and B but we've already relaxed that by having an _all_docs default. Also the behavior here is different than the issue introduced by #816 in that previously Tony's example would have been an incorrect sort returned where the new version changes that (ie, same set of docs would be returned both places, just now its sorted). Which I think is quite a bit different than subtley different sets of documents returned. Also, if a user specifies a sort and isn't detecting the wrong sort order based on the _all_docs fall back then I'm failing to see how they'd be relying on that sort, and thus fixing it seems like not an issue.

For #816 with the _all_docs fallback is that also not more of a concern about performance rather than an app that stops working? Ie, all queries will continue to work and be "fixed" but some unknown number of queries will suddenly switch to using _all_docs and become much slower? Granted suddenly slower performance that lasts until an index can be built isn't nothing either.

willholley · 2017-10-12T10:03:58Z

I've pushed a new fix which doesn't require rewriting the selector. Instead, we just include sort fields when considering whether a JSON index provides coverage for the selector. This seems a lot simpler and, unlike the previous implementation, doesn't impact the validity of other index types (i.e. text indexes).

davisp

Looks good other than I think there's some dead code left over.

davisp · 2017-10-12T14:43:36Z

src/mango/src/mango_selector.erl

@@ -16,7 +16,8 @@
 -export([
    normalize/1,
    match/2,
-    has_required_fields/2
+    has_required_fields/2,
+    append_sort_predicate/2


This appears to be left over from the previous commits?

Mango has an apparently long-standing bug whereby if an index is deemed valid for sorting a query but is not valid according to mango_idx:is_usable, the query planner would fall back to _all_docs silently, therefore ignoring the sort order specified in the query. This reverses the index selection logic so that the list of usable indexes is generated prior to filtering them based on the sort order specified in the query. Similar logic is applied to use_index, allowing us to generate a more specific error message when a user specifies an index which isn't valid for the current selector.

It seems safe to assume that if a user specifies that results should be sorted by a field, that field needs to exist (but could be null) in the results returned. In #816 we can only use an index if all its columns are required to exist in the selector, so this improves compatibility with the old behaviour by allowing sort fields to be included in the index coverage check for JSON indexes.

This reverts commit d0445f5.

garrensmith approved these changes Oct 5, 2017

View reviewed changes

willholley force-pushed the mango_fix_sort_index_selection branch from 891ab6d to 5cf449a Compare October 5, 2017 14:02

garrensmith approved these changes Oct 5, 2017

View reviewed changes

tonysun83 reviewed Oct 5, 2017

View reviewed changes

willholley force-pushed the mango_fix_sort_index_selection branch from 44c78a9 to 0146ced Compare October 9, 2017 12:13

willholley force-pushed the mango_fix_sort_index_selection branch from 0146ced to e411886 Compare October 10, 2017 14:01

davisp requested changes Oct 10, 2017

View reviewed changes

willholley force-pushed the mango_fix_sort_index_selection branch 2 times, most recently from b494c44 to a06115d Compare October 12, 2017 10:00

willholley force-pushed the mango_fix_sort_index_selection branch 2 times, most recently from 595a234 to 2acaf40 Compare October 12, 2017 10:29

davisp requested changes Oct 12, 2017

View reviewed changes

willholley added 2 commits October 12, 2017 16:28

willholley force-pushed the mango_fix_sort_index_selection branch from 2acaf40 to 235b68a Compare October 12, 2017 15:28

willholley merged commit d0445f5 into apache:master Oct 12, 2017

iilyak added a commit to cloudant/couchdb that referenced this pull request Nov 6, 2017

Revert "Fix incorrect index selection when sort specified (apache#866)"

483d193

This reverts commit d0445f5.

iilyak added a commit to cloudant/couchdb that referenced this pull request Nov 13, 2017

Revert "Fix incorrect index selection when sort specified (apache#866)"

dc89343

This reverts commit d0445f5.

wohali mentioned this pull request May 7, 2018

Different behavior "sorting" in the couchdb in version 2.1.0 and 2.1.1 #1315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incorrect index selection when sort specified #866

Fix incorrect index selection when sort specified #866

willholley commented Oct 5, 2017 •

edited

Loading

garrensmith left a comment

garrensmith Oct 5, 2017

garrensmith Oct 5, 2017

garrensmith left a comment

tonysun83 Oct 5, 2017

tonysun83 Oct 5, 2017

tonysun83 commented Oct 5, 2017 •

edited

Loading

willholley commented Oct 6, 2017

willholley commented Oct 6, 2017

tonysun83 commented Oct 6, 2017 •

edited

Loading

garrensmith commented Oct 10, 2017

davisp left a comment

davisp Oct 10, 2017

davisp Oct 10, 2017

willholley Oct 11, 2017

davisp Oct 10, 2017

willholley Oct 11, 2017

davisp Oct 10, 2017

davisp commented Oct 10, 2017

willholley commented Oct 12, 2017

davisp left a comment

davisp Oct 12, 2017

		@@ -398,6 +400,14 @@ negate({[{Field, Cond}]}) ->
		{[{Field, negate(Cond)}]}.


		norm_deduplicate({[{Op, [H\|T]}]}) ->

Fix incorrect index selection when sort specified #866

Fix incorrect index selection when sort specified #866

Conversation

willholley commented Oct 5, 2017 • edited Loading

Overview

Testing recommendations

Related Pull Requests

Checklist

garrensmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrensmith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonysun83 commented Oct 5, 2017 • edited Loading

willholley commented Oct 6, 2017

willholley commented Oct 6, 2017

tonysun83 commented Oct 6, 2017 • edited Loading

garrensmith commented Oct 10, 2017

davisp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davisp commented Oct 10, 2017

willholley commented Oct 12, 2017

davisp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willholley commented Oct 5, 2017 •

edited

Loading

tonysun83 commented Oct 5, 2017 •

edited

Loading

tonysun83 commented Oct 6, 2017 •

edited

Loading