New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mango json index selection #816

Merged
merged 1 commit into from Sep 20, 2017

Conversation

Projects
None yet
2 participants
@willholley
Member

willholley commented Sep 14, 2017

Overview

JSON index selection in Mango previously deemed an index to be usable if a range scan on the first component of its compound key could be generated from the query selector.

For instance, if I had an index:

[A, B]

is_usable would return true for the selector:

{"A": "foo"}

This is incorrect because JSON indexes only index documents that contain all the fields in the index; null values are ok, but the field must be present. That means that for the above selector, the index would implicitly include only documents where B exists, missing documents where {"A":5} matched but field B was not present.

This commit changes is_usable so that it only returns true if all the keys in the index are required to exist by the selector. This means that in the worst case e.g. none of the predicates can be used to generate a range query, we should end up performing a full index scan, but this is still more efficient
than a full database scan.

We leave the generation of the optimal range for a given index as a separate exercise - currently this happens after index selection.

Potentially we'd want to score indexes during index selection based on their ability to restrict the result set, etc.

Testing recommendations

Run the mango test suite. Manually run some mango queries with/without JSON indexes and check that the results are as expected.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

@willholley willholley requested review from davisp and garrensmith Sep 14, 2017

@davisp

At the very least I think the naming is misleading on this. The fields in the index must exist in the selector, not the other way around which is how it reads.

Also I'm hesitant to modify indexable like this for this check rather than writing the similar yet subtly different check. This definitely isn't an obvious rule but it seems odd to overload the meaning of "indexable" with "covers index field". For instance, the $exists:false call out seems out of place.

Also the maybe return value for unknown operators I'm also not 100% on. It seems like having a new covered index thing which explicitly lists all operators that can operate on the index would be a better approach.

@willholley

This comment has been minimized.

Show comment
Hide comment
@willholley

willholley Sep 14, 2017

Member

@davisp thanks for the review! I'll have a think about the naming and overloaded function issues.

To your last point, my thought there is that all predicates except, by definition, "$exists":false, "operate" on the index because selection of the index restricts the result set to documents where the indexed fields exist.

Member

willholley commented Sep 14, 2017

@davisp thanks for the review! I'll have a think about the naming and overloaded function issues.

To your last point, my thought there is that all predicates except, by definition, "$exists":false, "operate" on the index because selection of the index restricts the result set to documents where the indexed fields exist.

@willholley

This comment has been minimized.

Show comment
Hide comment
@willholley

willholley Sep 18, 2017

Member

@davisp I went with your second suggestion and implemented a has_required_fields function

Member

willholley commented Sep 18, 2017

@davisp I went with your second suggestion and implemented a has_required_fields function

@davisp

Really good work on this. Requested changes here are purely stylistic, once those are addressed then +1 to merging this.

Show outdated Hide outdated src/mango/src/mango_idx_view.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
Show outdated Hide outdated src/mango/src/mango_selector.erl
%%%%%%%% module tests below %%%%%%%%
-ifdef(TEST).
-include_lib("eunit/include/eunit.hrl").

This comment has been minimized.

@davisp

davisp Sep 18, 2017

Member

Good test cases!

@davisp

davisp Sep 18, 2017

Member

Good test cases!

Show outdated Hide outdated src/mango/test/05-index-selection-test.py
Show outdated Hide outdated src/mango/test/05-index-selection-test.py
Fix json index selection
JSON index selection in Mango previously deemed an
index to be usable if a range scan on the first component
of its compound key could be generated from the query selector.

For instance, if I had an index:

[A, B]

is_usable would return true for the selector:

{"A": "foo"}

This is incorrect because JSON indexes only index documents that contain all
the fields in the index; null values are ok, but the field must be present.
That means that for the above selector, the index would implicitly include
only documents where B exists, missing documents where {"A":5} matched but
field B was not present.

This commit changes is_usable so that it only returns true if all the keys
in the index are required to exist by the selector. This means that in the
worst case e.g. none of the predicates can be used to generate a range query,
we should end up performing a full index scan, but this is still more
efficient than a full database scan.

We leave the generation of the optimal range for a given index as a separate
exercise - currently this happens after index selection.

Potentially we'd want to score indexes during index selection based on their
ability to restrict the result set, etc.

@willholley willholley merged commit dae81be into apache:master Sep 20, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@willholley willholley referenced this pull request Oct 5, 2017

Merged

Fix incorrect index selection when sort specified #866

2 of 3 tasks complete

willholley added a commit to willholley/couchdb that referenced this pull request Oct 9, 2017

automatically inject "$exists": true for sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by automatically
injecting an { "$exists": true } predicate for each
sort field prior to index selection.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 10, 2017

automatically inject "$exists": true for sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by automatically
injecting an { "$exists": true } predicate for each
sort field prior to index selection.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 11, 2017

automatically inject "$exists": true for sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by automatically
injecting an { "$exists": true } predicate for each
sort field prior to index selection.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 12, 2017

Make JSON indexes usable if covered by sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 12, 2017

Make JSON indexes usable if covered by sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 12, 2017

Make JSON indexes usable if covered by sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 12, 2017

Make JSON indexes usable if covered by sort fields
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit that referenced this pull request Oct 12, 2017

Fix incorrect index selection when sort specified (#866)
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 17, 2017

Fix index selection tests
These tests were not running previously and were incorrect.
This fixes the assertions so that they match the expected
behaviour which changed in #816.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 17, 2017

Fix index selection tests
These tests were not running previously and were incorrect.
This fixes the assertions so that they match the expected
behaviour which changed in #816.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 17, 2017

Fix index selection tests
These tests were not running previously and were incorrect.
This fixes the assertions so that they match the expected
behaviour which changed in #816.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 17, 2017

Fix index selection tests
These tests were not running previously and were incorrect.
This fixes the assertions so that they match the expected
behaviour which changed in #816.

willholley added a commit to willholley/couchdb that referenced this pull request Oct 17, 2017

Fix index selection tests
These tests were not running previously and were incorrect.
This fixes the assertions so that they match the expected
behaviour which changed in #816.

willholley added a commit that referenced this pull request Oct 17, 2017

Mango text index test fixes (#894)
* Split out text index selection tests
* Skip operator tests that do not apply to text indexes
* Only run array length tests against text indexes
* Fix index crud tests when text indexes available
* Use environment variable to switch on text tests
* Fix incorrect text sort assertion in test
* Always use -test in fixture filename
* Fix index selection test compatibility with #816.
* Improve test README

wohali added a commit that referenced this pull request Oct 19, 2017

Fix json index selection (#816)
JSON index selection in Mango previously deemed an
index to be usable if a range scan on the first component
of its compound key could be generated from the query selector.

For instance, if I had an index:

[A, B]

is_usable would return true for the selector:

{"A": "foo"}

This is incorrect because JSON indexes only index documents that contain all
the fields in the index; null values are ok, but the field must be present.
That means that for the above selector, the index would implicitly include
only documents where B exists, missing documents where {"A":5} matched but
field B was not present.

This commit changes is_usable so that it only returns true if all the keys
in the index are required to exist by the selector. This means that in the
worst case e.g. none of the predicates can be used to generate a range query,
we should end up performing a full index scan, but this is still more
efficient than a full database scan.

We leave the generation of the optimal range for a given index as a separate
exercise - currently this happens after index selection.

Potentially we'd want to score indexes during index selection based on their
ability to restrict the result set, etc.

wohali added a commit that referenced this pull request Oct 19, 2017

Fix incorrect index selection when sort specified (#866)
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

wohali added a commit that referenced this pull request Oct 19, 2017

Mango text index test fixes (#894)
* Split out text index selection tests
* Skip operator tests that do not apply to text indexes
* Only run array length tests against text indexes
* Fix index crud tests when text indexes available
* Use environment variable to switch on text tests
* Fix incorrect text sort assertion in test
* Always use -test in fixture filename
* Fix index selection test compatibility with #816.
* Improve test README

willholley added a commit to willholley/couchdb that referenced this pull request Nov 3, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit to willholley/couchdb that referenced this pull request Nov 3, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

iilyak pushed a commit to cloudant/couchdb that referenced this pull request Nov 6, 2017

ILYA Khlopotov
Revert "Fix json index selection (#816)"
This reverts commit dae81be.

Conflicts:
	src/mango/test/05-index-selection-test.py

willholley added a commit to willholley/couchdb that referenced this pull request Nov 8, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

iilyak pushed a commit to cloudant/couchdb that referenced this pull request Nov 13, 2017

ILYA Khlopotov
Revert "Fix json index selection (#816)"
This reverts commit dae81be.

Conflicts:
	src/mango/test/05-index-selection-test.py

nono added a commit to nono/couchdb that referenced this pull request Nov 20, 2017

Allow to use $or with a mango JSON index
Since #816, mango JSON index on compound fields can be selected only if
the selector make sure that all the fields listed in the index are
always present. This commit add a special case where all clauses of an
`$or` can ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

@nono nono referenced this pull request Nov 20, 2017

Closed

Allow to use $or with a mango JSON index #1011

0 of 3 tasks complete

willholley added a commit to willholley/couchdb that referenced this pull request Nov 28, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit to willholley/couchdb that referenced this pull request Nov 30, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit to willholley/couchdb that referenced this pull request Nov 30, 2017

Fallback to valid index when invalid use_index specified
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit that referenced this pull request Nov 30, 2017

warn instead of error when use_index not valid (#962)
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit to willholley/couchdb that referenced this pull request Dec 1, 2017

Allow to use $or with a mango JSON index
Since #816, mango JSON index on compound fields can be selected only if
the selector make sure that all the fields listed in the index are
always present. This commit add a special case where all clauses of an
`$or` can ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

willholley added a commit to willholley/couchdb that referenced this pull request Dec 1, 2017

Allow to use $or with a mango JSON index
Since #816, mango JSON index on compound fields can be selected only if
the selector make sure that all the fields listed in the index are
always present. This commit add a special case where all clauses of an
`$or` can ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

willholley added a commit to willholley/couchdb that referenced this pull request Dec 11, 2017

Allow to use $or with a mango JSON index
Since #816, mango JSON index on compound fields can be selected only if
the selector make sure that all the fields listed in the index are
always present. This commit add a special case where all clauses of an
`$or` can ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

willholley added a commit that referenced this pull request Dec 11, 2017

Allow to use index with or (#1038)
Since #816, mango JSON index on compound fields can be 
selected only if the selector make sure that all the fields 
listed in the index are always present. 

This adds a special case where all clauses of an `$or` can
ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

but false for:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "C": "bar"
  }
}

@willholley willholley deleted the willholley:mango_json_index_selection branch Dec 20, 2017

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

Fix json index selection (#816)
JSON index selection in Mango previously deemed an
index to be usable if a range scan on the first component
of its compound key could be generated from the query selector.

For instance, if I had an index:

[A, B]

is_usable would return true for the selector:

{"A": "foo"}

This is incorrect because JSON indexes only index documents that contain all
the fields in the index; null values are ok, but the field must be present.
That means that for the above selector, the index would implicitly include
only documents where B exists, missing documents where {"A":5} matched but
field B was not present.

This commit changes is_usable so that it only returns true if all the keys
in the index are required to exist by the selector. This means that in the
worst case e.g. none of the predicates can be used to generate a range query,
we should end up performing a full index scan, but this is still more
efficient than a full database scan.

We leave the generation of the optimal range for a given index as a separate
exercise - currently this happens after index selection.

Potentially we'd want to score indexes during index selection based on their
ability to restrict the result set, etc.

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

Fix incorrect index selection when sort specified (#866)
It seems safe to assume that if a user specifies that
results should be sorted by a field, that field needs to exist
(but could be null) in the results returned.

In #816 we can only use an index if all its columns are
required to exist in the selector, so this improves
compatibility with the old behaviour by allowing sort
fields to be included in the index coverage check for
JSON indexes.

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

Mango text index test fixes (#894)
* Split out text index selection tests
* Skip operator tests that do not apply to text indexes
* Only run array length tests against text indexes
* Fix index crud tests when text indexes available
* Use environment variable to switch on text tests
* Fix incorrect text sort assertion in test
* Always use -test in fixture filename
* Fix index selection test compatibility with #816.
* Improve test README

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

warn instead of error when use_index not valid (#962)
If a user specifies a value for use_index that is not
valid for the selector - i.e. it does not meet the coverage
requirements of the selector or sort fields - attempt
to fall back to a valid index (or database scan) rather
than returning a 400 error.

When a fallback occurs, populate the "warning" field
in the response (as we already do when a full database
scan takes place) with details of the fallback.

This change is partially as mitigation for #816, which may
lead to some previously valid indexes being deemed invalid,
and also to make use_index less brittle in general. If
an index that is used explicitly by active queries is removed,
Couch will now generate warnings and there may be a performance
impact, but the client will still get correct results.

willholley added a commit to willholley/couchdb that referenced this pull request May 22, 2018

Allow to use index with or (#1038)
Since #816, mango JSON index on compound fields can be 
selected only if the selector make sure that all the fields 
listed in the index are always present. 

This adds a special case where all clauses of an `$or` can
ensure that a field is present.

For instance, if I had an index:

[A, B]

is_usable would now return true for the selector:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "B": "baz"
  }
}

but false for:

{
  "A": "foo",
  "$or": {
    "B": "bar",
    "C": "bar"
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment