Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mango to nodes #1423

Merged
merged 2 commits into from Aug 8, 2018
Merged

Mango to nodes #1423

merged 2 commits into from Aug 8, 2018

Conversation

@garrensmith
Copy link
Member

@garrensmith garrensmith commented Jul 4, 2018

Overview

Move mango selector matching to the shard level

This moves the Mango selector matching down to the shard level.
This would mean that the document is retrieved from the index and
matched against the selector before being sent to the coordinator node.
That way only the documents that match the query are sent to the coordinator node.

Testing recommendations

Mango tests should all pass

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
@@ -209,22 +213,66 @@ choose_best_index(_DbName, IndexRanges) ->
{SelectedIndex, SelectedIndexRanges}.


baseViewRow(Row) ->

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

baseViewRow -> base_view_row

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

Granted you only use it once, so I'd not even make it a function call.

Copy link
Member

@davisp davisp left a comment

Minor nits. You can mostly ignore the comment about adding a message when we're filtering docs for now. Just wanted to call that out as I saw it, it doesn't require any immediate changes. Other than that its minor style nits.

{ok, Acc};
view_cb({row, Row}, #mrargs{extra = Options} = Acc) ->
ViewRow = baseViewRow(Row),
case couch_util:get_value(doc, Row) of

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

Super minor nit, but case ViewRow#view_row.doc of would be slightly faster since you've already pulled it out of the Row value.

{ok, couch_doc:to_json_obj(Doc, []), {execution_stats, ExecutionStats1}};
{ok, #doc{}=DocProps} ->
Doc = couch_doc:to_json_obj(DocProps, []),
case mango_selector:match(Selector, Doc) of

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

You'll need to modify the ExecutionStats here to increment the examined doc count.

This comment has been minimized.

@garrensmith

garrensmith Jul 9, 2018
Author Member

I think we want to keep the quorum docs examined separate to the other docs examined.

ok = rexi:stream2(ViewRow2),
put(mango_docs_examined, 0);
false ->
put(mango_docs_examined, get(mango_docs_examined) + 1)

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

Something to think about for the future will be how to tunnel a message through from here occasionally if we happen upon a highly restrictive filter. This doesn't change anything from the user POV currently as we could have the same effect from filtering at the collector level.

@@ -28,6 +29,8 @@

-include_lib("couch/include/couch_db.hrl").
-include_lib("couch_mrview/include/couch_mrview.hrl").
-include_lib("fabric/include/fabric.hrl").

This comment has been minimized.

@davisp

davisp Jul 5, 2018
Member

I'm missing why we need to include fabric.hrl.

This comment has been minimized.

@davisp
Copy link
Member

@davisp davisp commented Jul 5, 2018

Also, have you had a chance to do any sort of performance testing with this? It seems like it'd be helpful but I'm having a bit of premature-optimization-itis wondering if it makes any difference in practice. I can certainly dream up some scenarios where it'd help a cluster in a specific set of circumstances but I can't remember ever having seen such issues crop up in production.


view_cb({meta, Meta}, Acc) ->
% Map function starting
put(mango_docs_examined, 0),

This comment has been minimized.

@tonysun83

tonysun83 Jul 9, 2018
Contributor

having trouble figuring out why we need this

This comment has been minimized.

@tonysun83

tonysun83 Jul 9, 2018
Contributor

nevermind, it's for execution stats below

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from e8a87f0 to 72038af Jul 9, 2018
@garrensmith
Copy link
Member Author

@garrensmith garrensmith commented Jul 9, 2018

Thanks for reviewing @davisp and @tonysun83 I'm going to setup and run some perf tests on this, this week.

@garrensmith
Copy link
Member Author

@garrensmith garrensmith commented Jul 18, 2018

I've run some perf tests on this. First a very basic performance test. Below are metrics for network traffic between nodes. The first image is for current mango. The second image is CouchDB with this PR.
old-mango2

new-mango2

The graphs are not at the same scale which is a bit frustrating. There is no regression and a slight improvement on network traffic.
Then I ran a full regression test:
regression-test

There are no regressions and in some cases a nice performance improvement.

@davisp
Copy link
Member

@davisp davisp commented Jul 18, 2018

From what I see there's basically no difference. For the network traffic graphs it looks maybe a bit less on the front end but then spikes badly on the second half. And if memory serves you were attempting to have a best case scenario for this test by restricting results to an empty set. Given there's not a dramatic difference I'm hesitant on whether this change is actually providing any benefit.

The regression tests I think are even more indicative of this not having much if any effect on the results. Your largest median change on a single test is only 23ms and it looks like nearly half of them are measuring nearly exactly the same suggesting this PR isn't providing a measurable difference.

Can you describe the tests you did for the network graphs? Perhaps we missed something there or we can figure out how to measure things better?

@garrensmith
Copy link
Member Author

@garrensmith garrensmith commented Jul 18, 2018

@davisp I was expecting much better performance. I'm happy to park this PR for now. I think it would be useful for when we look at mango aggregation. So we can either close this PR or keep it open for a bit.

I'll revisit it once we get around to mango aggregations.

@davisp
Copy link
Member

@davisp davisp commented Jul 24, 2018

Up to you whether you want to keep it open or closed. We can always re-open later if you get more time for perf testing as well. Also agree that I would have expected a much starker difference which suggests we're either missing something (like maybe that timeout you found) or there's some other bottleneck that prevents this from having a noticeable effect. Doesn't mean we don't want to pursue this, just that we need to investigate more to figure out what's up before we can merge it.

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from f2db817 to 9f56eb6 Aug 6, 2018

class LongRunningMangoTest(mango.DbPerClass):

def setUp(self):

This comment has been minimized.

@davisp

davisp Aug 6, 2018
Member

Python uses 4 space indents.

Copy link
Member

@davisp davisp left a comment

Looks great other than two super minor style nits.

false ->
put(mango_docs_examined, get(mango_docs_examined) + 1)
put(mango_docs_examined, get(mango_docs_examined) + 1),

This comment has been minimized.

@davisp

davisp Aug 6, 2018
Member

Move this whole block to its own function.

@garrensmith
Copy link
Member Author

@garrensmith garrensmith commented Aug 7, 2018

I think this branch is ready to go now.
Here is some final performance testing. I have ran a perf test that would exaggerate this so we can measure. It is a database with 1 million docs that does a query that uses _all_docs but doesn't actually return a doc. So each query requires scanning through all the documents in the database. In the first screenshot is the test run on current master CouchDB. Each doc queried is streamed to the coordinator node for the final mango selector test.

current-couchdb

This is with this PR, each doc is checked at the node and only the documents that would match the selector would get streamed to the coordinator. In this test no docs would match the selector. As you can see the network activity is greatly reduced.
mango-pr

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from f3bee87 to 953bf6c Aug 7, 2018
@davisp
davisp approved these changes Aug 7, 2018
Copy link
Member

@davisp davisp left a comment

Looks good. I'd merge this into two commits. One for rexi ping and then the mango updates. Other than that it looks good to merge.

@tonysun83
Copy link
Contributor

@tonysun83 tonysun83 commented Aug 8, 2018

+1, great work @garrensmith

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch 3 times, most recently from 0eeb5fa to 471790f Aug 8, 2018
garrensmith and others added 2 commits Aug 8, 2018
Add a ping message to rexi to avoid any long running operations from
timing out. Long running operations at the node level can exceed the
fabric timeout and be cancelled. Sending a ping message back will
stop that from happening.
This moves the Mango selector matching down to the shard level.
this would mean that the document is retrieved from the index and
matched against the selector before being sent to the coordinator node.
This reduces the network traffic for a mango query

Co-authored-by: Paul J. Davis <paul.joseph.davis@gmail.com>
Co-authored-by: Garren Smith <garren.smith@gmail.com>
@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from 471790f to e90ee1f Aug 8, 2018
@garrensmith garrensmith merged commit a6bc72e into apache:master Aug 8, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garrensmith garrensmith deleted the garrensmith:mango-to-nodes branch Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants