New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose index based on fields match #469

Merged
merged 3 commits into from May 9, 2017

Conversation

Projects
None yet
4 participants
@garrensmith
Member

garrensmith commented Apr 4, 2017

Overview

If two indexes can be used, then instead of choosing the index based on
alphabet order of index names, rather choose based on a score. This
score is calculated by determining which index has the least number of
fields

Testing recommendations

There is a test that proves the issue and passes based on this fix.

JIRA issue number

https://issues.apache.org/jira/browse/COUCHDB-3357

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
  • I will not forget to update rebar.config.script
    with the correct commit hash once this PR get merged.
@garrensmith

This comment has been minimized.

Show comment
Hide comment
@garrensmith

garrensmith Apr 4, 2017

Member

I ran the pouchdb-find tests locally against this fix and all tests passed.

Member

garrensmith commented Apr 4, 2017

I ran the pouchdb-find tests locally against this fix and all tests passed.

@davisp

For the most part this looks pretty good. Although it doesn't solve the case where we have to choose an index with the same number of columns. Ie, if you have [a, b, c] and [a, d, e] and you have a selector for a.

While I don't want to tack it on as part of this PR, we do have a _count reduce for every view query. In the future we could get fancy and cache selectors and a reduce query with the prefix range and prefer the view that has more documents. Granted that gets complicated and does strange things to latencies when hitting cached vs. not. So basically ignore this paragraph for now and we'll want to pick some other arbitrary metric. The docid thing I think is wrong in hindsight but the only slightly less terrible thing I can think of is using the field names left in the index.

Show outdated Hide outdated src/mango/rebar.config.script Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
@tonysun83

This comment has been minimized.

Show comment
Hide comment
@tonysun83

tonysun83 Apr 5, 2017

Contributor

Paul hit the nail for the gist of this, so the only thing I'm going to add is that for the tests, we should probably use the _explain endpoint and make sure we're choosing the right index. So

  1. Create a bunch of indexes with various columns lengths.

  2. Create selectors that use various combinations of fields

  3. Execute those queries but with _explain, and ensure that the correct index with the right prefix was used by checking the index name.

Contributor

tonysun83 commented Apr 5, 2017

Paul hit the nail for the gist of this, so the only thing I'm going to add is that for the tests, we should probably use the _explain endpoint and make sure we're choosing the right index. So

  1. Create a bunch of indexes with various columns lengths.

  2. Create selectors that use various combinations of fields

  3. Execute those queries but with _explain, and ensure that the correct index with the right prefix was used by checking the index name.

@garrensmith

This comment has been minimized.

Show comment
Hide comment
@garrensmith

garrensmith Apr 5, 2017

Member

This got lost:
I've made an attempt at fixing this. I'm not 100% I fully understood @davisp explanation. In the situation of [a, b, c] and [a, d, e] I've defaulted to the indices names. The only problem with that is there is a possibility of a doc being in one index but not in another and this could lead to unexpected results for a user.

Member

garrensmith commented Apr 5, 2017

This got lost:
I've made an attempt at fixing this. I'm not 100% I fully understood @davisp explanation. In the situation of [a, b, c] and [a, d, e] I've defaulted to the indices names. The only problem with that is there is a possibility of a doc being in one index but not in another and this could lead to unexpected results for a user.

@tonysun83

This comment has been minimized.

Show comment
Hide comment
@tonysun83

tonysun83 Apr 6, 2017

Contributor

@garrensmith : @davisp was just saying that if you have the following indexes:

Idx1 - [a, b, c, d]
Idx2 - [a, b, c, e]
Idx3 - [a, b, c]

and you issued a query with selectors using a,b,c, then we would choose Idx3.

If Idx3 didn't exist, then you need some sort of tiebreaker. The old way used a combination of
[dbname, ddocid, viewname]. He was just saying you could still use this crude method as a tiebreaker, and use the ddocid (not all three elements).

You are right that this won't solve the issue when Idx1 is always chosen, and then a new Idx4 [a, b, c, f] which is alphabetically ahead of Idx1is added, then the user could possibly get different results.

Paul did mention:

"In the future we could get fancy and cache selectors and a reduce query with the prefix range and prefer the view that has more documents."

so that we could possibly choose the index with more documents and be a little more consistent. However, it's still probable that some indexes will contain different docs. I'll think about it some more, but do you think that's more of index design on their part?

Contributor

tonysun83 commented Apr 6, 2017

@garrensmith : @davisp was just saying that if you have the following indexes:

Idx1 - [a, b, c, d]
Idx2 - [a, b, c, e]
Idx3 - [a, b, c]

and you issued a query with selectors using a,b,c, then we would choose Idx3.

If Idx3 didn't exist, then you need some sort of tiebreaker. The old way used a combination of
[dbname, ddocid, viewname]. He was just saying you could still use this crude method as a tiebreaker, and use the ddocid (not all three elements).

You are right that this won't solve the issue when Idx1 is always chosen, and then a new Idx4 [a, b, c, f] which is alphabetically ahead of Idx1is added, then the user could possibly get different results.

Paul did mention:

"In the future we could get fancy and cache selectors and a reduce query with the prefix range and prefer the view that has more documents."

so that we could possibly choose the index with more documents and be a little more consistent. However, it's still probable that some indexes will contain different docs. I'll think about it some more, but do you think that's more of index design on their part?

@garrensmith

This comment has been minimized.

Show comment
Hide comment
@garrensmith

garrensmith Apr 6, 2017

Member

@tonysun83 great. That makes sense. I think I have it to the point now where it will cover the above situations. Then in terms of choosing indexes with different docs, I think for now the best we can do is add documentation to explain that part.

Member

garrensmith commented Apr 6, 2017

@tonysun83 great. That makes sense. I think I have it to the point now where it will cover the above situations. Then in terms of choosing indexes with different docs, I think for now the best we can do is add documentation to explain that part.

@tonysun83

This comment has been minimized.

Show comment
Hide comment
@tonysun83

tonysun83 Apr 25, 2017

Contributor

Good work @garrensmith, +1 for me. @davisp way want to double check

Contributor

tonysun83 commented Apr 25, 2017

Good work @garrensmith, +1 for me. @davisp way want to double check

@davisp

Generally looks pretty good with some minor style issues. We should also clear up the comments a bit. Took me a good fifteen minutes of contemplating to realize what was going on.

Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated
Show outdated Hide outdated src/mango/src/mango_cursor_view.erl Outdated

@wohali wohali added the enhancement label Apr 30, 2017

garrensmith added some commits Apr 5, 2017

Choose index based on Prefix and fields
If two indexes can be used, then instead of choosing the index based on
alphabet order of index names, rather choose based on a query prefix and
number of columns.
Improve index selector
Choose the index for the query based on Prefix and number of fields in
the index
@davisp

davisp approved these changes May 9, 2017

+1 to squerge

@garrensmith garrensmith merged commit 9568bb7 into apache:master May 9, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment