New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mango to nodes #1423

Merged
merged 2 commits into from Aug 8, 2018

Conversation

Projects
None yet
3 participants
@garrensmith
Member

garrensmith commented Jul 4, 2018

Overview

Move mango selector matching to the shard level

This moves the Mango selector matching down to the shard level.
This would mean that the document is retrieved from the index and
matched against the selector before being sent to the coordinator node.
That way only the documents that match the query are sent to the coordinator node.

Testing recommendations

Mango tests should all pass

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;
@@ -209,22 +213,66 @@ choose_best_index(_DbName, IndexRanges) ->
{SelectedIndex, SelectedIndexRanges}.
baseViewRow(Row) ->

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

baseViewRow -> base_view_row

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

Granted you only use it once, so I'd not even make it a function call.

@davisp

Minor nits. You can mostly ignore the comment about adding a message when we're filtering docs for now. Just wanted to call that out as I saw it, it doesn't require any immediate changes. Other than that its minor style nits.

{ok, Acc};
view_cb({row, Row}, #mrargs{extra = Options} = Acc) ->
ViewRow = baseViewRow(Row),
case couch_util:get_value(doc, Row) of

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

Super minor nit, but case ViewRow#view_row.doc of would be slightly faster since you've already pulled it out of the Row value.

{ok, couch_doc:to_json_obj(Doc, []), {execution_stats, ExecutionStats1}};
{ok, #doc{}=DocProps} ->
Doc = couch_doc:to_json_obj(DocProps, []),
case mango_selector:match(Selector, Doc) of

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

You'll need to modify the ExecutionStats here to increment the examined doc count.

This comment has been minimized.

@garrensmith

garrensmith Jul 9, 2018

Member

I think we want to keep the quorum docs examined separate to the other docs examined.

ok = rexi:stream2(ViewRow2),
put(mango_docs_examined, 0);
false ->
put(mango_docs_examined, get(mango_docs_examined) + 1)

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

Something to think about for the future will be how to tunnel a message through from here occasionally if we happen upon a highly restrictive filter. This doesn't change anything from the user POV currently as we could have the same effect from filtering at the collector level.

@@ -28,6 +29,8 @@
-include_lib("couch/include/couch_db.hrl").
-include_lib("couch_mrview/include/couch_mrview.hrl").
-include_lib("fabric/include/fabric.hrl").

This comment has been minimized.

@davisp

davisp Jul 5, 2018

Member

I'm missing why we need to include fabric.hrl.

This comment has been minimized.

@garrensmith
@davisp

This comment has been minimized.

Member

davisp commented Jul 5, 2018

Also, have you had a chance to do any sort of performance testing with this? It seems like it'd be helpful but I'm having a bit of premature-optimization-itis wondering if it makes any difference in practice. I can certainly dream up some scenarios where it'd help a cluster in a specific set of circumstances but I can't remember ever having seen such issues crop up in production.

view_cb({meta, Meta}, Acc) ->
% Map function starting
put(mango_docs_examined, 0),

This comment has been minimized.

@tonysun83

tonysun83 Jul 9, 2018

Contributor

having trouble figuring out why we need this

This comment has been minimized.

@tonysun83

tonysun83 Jul 9, 2018

Contributor

nevermind, it's for execution stats below

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from e8a87f0 to 72038af Jul 9, 2018

@garrensmith

This comment has been minimized.

Member

garrensmith commented Jul 9, 2018

Thanks for reviewing @davisp and @tonysun83 I'm going to setup and run some perf tests on this, this week.

@garrensmith

This comment has been minimized.

Member

garrensmith commented Jul 18, 2018

I've run some perf tests on this. First a very basic performance test. Below are metrics for network traffic between nodes. The first image is for current mango. The second image is CouchDB with this PR.
old-mango2

new-mango2

The graphs are not at the same scale which is a bit frustrating. There is no regression and a slight improvement on network traffic.
Then I ran a full regression test:
regression-test

There are no regressions and in some cases a nice performance improvement.

@davisp

This comment has been minimized.

Member

davisp commented Jul 18, 2018

From what I see there's basically no difference. For the network traffic graphs it looks maybe a bit less on the front end but then spikes badly on the second half. And if memory serves you were attempting to have a best case scenario for this test by restricting results to an empty set. Given there's not a dramatic difference I'm hesitant on whether this change is actually providing any benefit.

The regression tests I think are even more indicative of this not having much if any effect on the results. Your largest median change on a single test is only 23ms and it looks like nearly half of them are measuring nearly exactly the same suggesting this PR isn't providing a measurable difference.

Can you describe the tests you did for the network graphs? Perhaps we missed something there or we can figure out how to measure things better?

@garrensmith

This comment has been minimized.

Member

garrensmith commented Jul 18, 2018

@davisp I was expecting much better performance. I'm happy to park this PR for now. I think it would be useful for when we look at mango aggregation. So we can either close this PR or keep it open for a bit.

I'll revisit it once we get around to mango aggregations.

@davisp

This comment has been minimized.

Member

davisp commented Jul 24, 2018

Up to you whether you want to keep it open or closed. We can always re-open later if you get more time for perf testing as well. Also agree that I would have expected a much starker difference which suggests we're either missing something (like maybe that timeout you found) or there's some other bottleneck that prevents this from having a noticeable effect. Doesn't mean we don't want to pursue this, just that we need to investigate more to figure out what's up before we can merge it.

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from f2db817 to 9f56eb6 Aug 6, 2018

class LongRunningMangoTest(mango.DbPerClass):
def setUp(self):

This comment has been minimized.

@davisp

davisp Aug 6, 2018

Member

Python uses 4 space indents.

@davisp

Looks great other than two super minor style nits.

false ->
put(mango_docs_examined, get(mango_docs_examined) + 1)
put(mango_docs_examined, get(mango_docs_examined) + 1),

This comment has been minimized.

@davisp

davisp Aug 6, 2018

Member

Move this whole block to its own function.

@garrensmith

This comment has been minimized.

Member

garrensmith commented Aug 7, 2018

I think this branch is ready to go now.
Here is some final performance testing. I have ran a perf test that would exaggerate this so we can measure. It is a database with 1 million docs that does a query that uses _all_docs but doesn't actually return a doc. So each query requires scanning through all the documents in the database. In the first screenshot is the test run on current master CouchDB. Each doc queried is streamed to the coordinator node for the final mango selector test.

current-couchdb

This is with this PR, each doc is checked at the node and only the documents that would match the selector would get streamed to the coordinator. In this test no docs would match the selector. As you can see the network activity is greatly reduced.
mango-pr

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from f3bee87 to 953bf6c Aug 7, 2018

@davisp

davisp approved these changes Aug 7, 2018

Looks good. I'd merge this into two commits. One for rexi ping and then the mango updates. Other than that it looks good to merge.

@tonysun83

This comment has been minimized.

Contributor

tonysun83 commented Aug 8, 2018

+1, great work @garrensmith

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch 3 times, most recently from 0eeb5fa to 471790f Aug 8, 2018

garrensmith and others added some commits Aug 8, 2018

Add rexi ping message
Add a ping message to rexi to avoid any long running operations from
timing out. Long running operations at the node level can exceed the
fabric timeout and be cancelled. Sending a ping message back will
stop that from happening.
Move mango selector matching to the shard level
This moves the Mango selector matching down to the shard level.
this would mean that the document is retrieved from the index and
matched against the selector before being sent to the coordinator node.
This reduces the network traffic for a mango query

Co-authored-by: Paul J. Davis <paul.joseph.davis@gmail.com>
Co-authored-by: Garren Smith <garren.smith@gmail.com>

@garrensmith garrensmith force-pushed the garrensmith:mango-to-nodes branch from 471790f to e90ee1f Aug 8, 2018

@garrensmith garrensmith merged commit a6bc72e into apache:master Aug 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment