-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incorrect index selection when sort specified #866
Fix incorrect index selection when sort specified #866
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested and it looks good to me
src/mango/src/mango_idx.erl
Outdated
% If a sort was specified we have to find an index that | ||
% can satisfy the request. | ||
has_use_index(Opts) -> | ||
erlang:display(<<"sort opts">>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugging
@@ -222,6 +222,20 @@ def test_sort(self): | |||
docs2 = list(reversed(sorted(docs1, key=lambda d: d["age"]))) | |||
assert docs1 is not docs2 and docs1 == docs2 | |||
|
|||
def test_sort_desc_complex(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test without the manager: $exists: true
and make sure it returns an error
891ab6d
to
5cf449a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/mango/src/mango_idx.erl
Outdated
|
||
UsableIndexes1 = case has_use_index(Opts) of | ||
true -> | ||
UsableFilteredIndexes = mango_cursor:maybe_filter_indexes_by_ddoc(UsableIndexes0, Opts), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be written as:
case mango_cursor:maybe_filter_indexes_by_ddoc(UsableIndexes0, Opts) of
[] -> ?MANGO_ERROR({no_usable_index, no_usable_index_matching_name});
UsableFilteredIndexes -> UsableFilteredIndexes
end;
src/mango/src/mango_idx.erl
Outdated
% can satisfy the request. | ||
case has_sort(Opts) of | ||
true -> | ||
SortIndexes = mango_idx:for_sort(UsableIndexes1, Opts), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar stylist change like above:
case mango_idx:for_sort(UsableIndexes1, Opts) of
[] -> ?MANGO_ERROR({no_usable_index, missing_sort_index});
SortIndexes -> SortIndexes
end;
This is a tricky spot. I think the situation where the user would complain about backwards compatibility is as follows:
In the scenario where we ignore sort order and used _all_docs, I would assume the user would have already complained that their documents are not sorted. Perhaps we should improve the error messages returned? |
@tonysun83 this PR should improve the error message in the case where a user specifies an index with |
@tonysun83 If a This would allow us to safely validate the selector/index combination against |
@willholley : I considered appending Index: [a, b, c] It's a tradeoff: let their apps continue to run with possible incorrect data or let it break and force them to possibly re-index and modify their app. I'd like more opinions @garrensmith @davisp ? |
44c78a9
to
0146ced
Compare
Adding $exists: true is a nice idea. +1 |
0146ced
to
e411886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the change looks good. I can't figure out norm_deduplicate though. I assume that's to try and deduplicate the $exists check which a) I'm not sure is a requirement, and b) seems to be unused near as I can tell.
Also there's a bug in has_use_index I noted below.
src/mango/src/mango_idx.erl
Outdated
% can satisfy the request. | ||
has_use_index(Opts) -> | ||
case lists:keyfind(sort, 1, Opts) of | ||
{use_index, _} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check will never return true as sort is not equal to use_index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which means we're likely missing a test for this since travis returned green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test but this situation is handled elsewhere. I'll remove has_use_index
and its usage.
src/mango/src/mango_selector.erl
Outdated
@@ -398,6 +400,14 @@ negate({[{Field, Cond}]}) -> | |||
{[{Field, negate(Cond)}]}. | |||
|
|||
|
|||
norm_deduplicate({[{Op, [H|T]}]}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing where this is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a step in the selector normalization - (line 40 above). The intention is to deduplicate predicates (particularly "$exists": true
) but this isn't necessary for correctness - it just makes the normalized selector cleaner. Thinking about it again, I'll remove it here and will consider improving selector normalization in a different PR.
src/mango/src/mango_selector.erl
Outdated
@@ -398,6 +400,14 @@ negate({[{Field, Cond}]}) -> | |||
{[{Field, negate(Cond)}]}. | |||
|
|||
|
|||
norm_deduplicate({[{Op, [H|T]}]}) -> | |||
Arg = [H | [X || X <- norm_deduplicate(T), X /= H]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really hurts my brain to read. And not knowing what's supposed to call it I can't figure out if its correct.
Couple things to unpack in the discussion: First, for #816 I think that's actually the opposite, its not whether it needs a new version bump but how many versions should be be back porting this fix to. A database that gives bad answers to queries is more a bug that needs to be patched backwards, not something we cut the cord on and just bump version numbers. There is of course the issue around notifying users that a pending change may break their application (technically fix, but users have a different perspective :), so this is definitely something we should talk about. We'll also want to update the documentation as we've long said that using column prefixes was a suggestion for sharing indexes so that'll need to be fixed. For @tonysun83's example of this changing a query from something that uses _all_docs to something that uses an existing index I don't think that's an issue. Originally I would have ixnayed it because of the lack of range restriction on A and B but we've already relaxed that by having an _all_docs default. Also the behavior here is different than the issue introduced by #816 in that previously Tony's example would have been an incorrect sort returned where the new version changes that (ie, same set of docs would be returned both places, just now its sorted). Which I think is quite a bit different than subtley different sets of documents returned. Also, if a user specifies a sort and isn't detecting the wrong sort order based on the _all_docs fall back then I'm failing to see how they'd be relying on that sort, and thus fixing it seems like not an issue. For #816 with the _all_docs fallback is that also not more of a concern about performance rather than an app that stops working? Ie, all queries will continue to work and be "fixed" but some unknown number of queries will suddenly switch to using _all_docs and become much slower? Granted suddenly slower performance that lasts until an index can be built isn't nothing either. |
b494c44
to
a06115d
Compare
I've pushed a new fix which doesn't require rewriting the selector. Instead, we just include sort fields when considering whether a JSON index provides coverage for the selector. This seems a lot simpler and, unlike the previous implementation, doesn't impact the validity of other index types (i.e. text indexes). |
595a234
to
2acaf40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good other than I think there's some dead code left over.
src/mango/src/mango_selector.erl
Outdated
@@ -16,7 +16,8 @@ | |||
-export([ | |||
normalize/1, | |||
match/2, | |||
has_required_fields/2 | |||
has_required_fields/2, | |||
append_sort_predicate/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be left over from the previous commits?
Mango has an apparently long-standing bug whereby if an index is deemed valid for sorting a query but is not valid according to mango_idx:is_usable, the query planner would fall back to _all_docs silently, therefore ignoring the sort order specified in the query. This reverses the index selection logic so that the list of usable indexes is generated prior to filtering them based on the sort order specified in the query. Similar logic is applied to use_index, allowing us to generate a more specific error message when a user specifies an index which isn't valid for the current selector.
It seems safe to assume that if a user specifies that results should be sorted by a field, that field needs to exist (but could be null) in the results returned. In #816 we can only use an index if all its columns are required to exist in the selector, so this improves compatibility with the old behaviour by allowing sort fields to be included in the index coverage check for JSON indexes.
2acaf40
to
235b68a
Compare
It seems safe to assume that if a user specifies that results should be sorted by a field, that field needs to exist (but could be null) in the results returned. In #816 we can only use an index if all its columns are required to exist in the selector, so this improves compatibility with the old behaviour by allowing sort fields to be included in the index coverage check for JSON indexes.
This reverts commit d0445f5.
This reverts commit d0445f5.
Overview
Mango has an apparently long-standing bug whereby if an index is deemed valid for sorting a query but is not valid according to mango_idx:is_usable, the query planner would fall back to _all_docs silently, therefore ignoring the sort order specified in the query.
This reverses the index selection logic so that the list of usable indexes is generated prior to filtering them based on the sort order specified in the query.
Similar logic is applied to use_index, allowing us to generate a more specific error message when a user specifies an index which isn't valid for the current
selector.
This fix exposes that the change introduced in #816 may cause existing queries with sort fields to fail (indeed, several queries in our test suite needed fixing), so we need to consider whether this is a severe enough breaking change to warrant a major version bump.
Testing recommendations
Run the test suite. Test that you can run Mango queries with sort criteria.
Related Pull Requests
#816
Checklist