New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to enforce fetching data from local shards, instead of from shards on on remote nodes(if data present on local node) #2329
Comments
We're not trying to actively load balance here so it wouldn't make sense to add a parameter to disable anything. What ends up happening is that when we wait for quorum we essentially pick a random representative which may have references to attachments on a remote node. However, we could add some smarts in |
@davisp Thanks a lot! Agree that the adding smarts in fabric_doc_open is better. Is it possible to address this in next release, let's say 2.4. This problem (performance degradation with cluster mode) is preventing us to adopt the cluster mode. |
We won't be able to land this in the next release (3.0), but I do think an enhancement here makes sense. One technique that I've seen in other systems is
|
We tested some behavior with 3 zone cluster (each zone with 5 nodes, n=3, q=1, and placement is one in each zone{a,b,c}). For us, we use network impairment tools so that there is 60ms RTD between each zone. We used CouchDB2.3.1
We haven't tested attachment retrieve yet, but it seems to me that it should follow the same view query logic in 2.b.2 if not already. We will try to test this some time next week. Also, not sure if this needs to be a different ticket, but we would really like to see 2.a.ii.1 to be optimized so that it would query its local zone Couchdb_METALOOKUP_NODE first. Currently, we plan to workaround this by change the "by_range" order in 5986/dbs/mydb to favorite the primary zone for our service. |
Heya @nickimho, great to see those empirical results 👍 Regarding Regarding couchdb/src/fabric/src/fabric_util.erl Lines 93 to 105 in dd1b281
Can you say how you tracked down that the traffic was flowing to the first node in the |
Oh, and regarding attachments ... unfortunately they share less in common with view streaming than you might imagine. I had to go back and re-read the code, but it looks like we select one of the DATA_NODES that contributed to the quorum from |
Thanks @kocolosk, for the detailed info! I just want say that I am not a coder, and we have been looking at this from a black box point of view so far. So, please forgive if I am not very precise in some of the discussion or not using the right terms =) . At a high level, here is how we setup the environment:
For data points I provided above:
For 2.b.2, we ran the view tests without any parameters specified. We noticed for a given mydb, one zone always performs better (almost about 60ms). This become more clear if we use "time curl Couchdb_QUERY_NODE:5984/mydb/mydoc?r=1 #from Client of the same zone" (r=1 eliminated the delay from 2.a.iii before returning result). The preferred node (the first entry in "by_range" if i remember correctly), will always get pick for Couchdb_METALOOKUP_NODE, resulting in almost no network delay for the query. Where as the other two zones will always have the added 60ms. We then tweak the :5986/dbs/mydb doc to re-arrange the "by_range" array and saw that the Couchdb_METALOOKUP_NODE follows the first one in the array. Actually, I should check the notes later on this; there might be an except to this:
And thanks for the insight on attachment! This is good to know. For what we do now, we should be OK (as long as this behavior also exists in BigCouch which is where we are upgrading from). I will have my team run the analysis against BigCouch and CouchDB next week and will provide the result here later. In general, we see attachments as bigger data retrieval and the higher layer application should have logic to handle more delay and caching; also, we are moving attachment to external storage in general and just use doc as a pointer to those external resource. We do want doc and view query be as optimized as possible (or at least consistent in all zones) for better user experience. |
Actually, the commit you provided ( couchdb/src/fabric/src/fabric_util.erl Lines 93 to 105 in dd1b281
I just looked it up, and this was the build we used: rpm -qi couchdb |
No I just grabbed the permalink to the head of the master branch. If you do the |
We brought up test environment and recheck the behavior, I updated 2.a.ii in #2329 (comment) . The Couchdb_METALOOKUP_NODE lookup is skipped if Couchdb_QUERY_NODE holds the shard. We also reconfirmed that the Couchdb_METALOOKUP_NODE lookup is based on "by_range" |
@kocolosk Regarding the 39c0b24 fix you mentioned, I did a quick search in src folders and only "get_revs_limit", "get_purge_infos_limit", and "get_security" seem to use it. Here is an example of the data sent for the 2.a.ii step for Couchdb_QUERY_NODE -> Couchdb_METALOOKUP_NODE:11500 . It seems to have more to do with shard lookup or confirming revision is up to date (i am just guessing here). The capture below is extracted using "Show Packet Bytes" from Wireshark. |
Add option to enforce fetching data from local shards, instead of from shards/copies on on the remote nodes(if data present on local node), something like GET /mydb/doc1?preferlocal=true.
Summary
In the current software, it seems CouchDB will always try to distribute disk i/o across nodes during query.
For example: in a two nodes cluster(MACHINE1.SOMECOMPANY.COM and MACHINE2.SOMECOMPANY.COM), database mydb was created with q=8 and n=2, as following. Each node contains a copy of all the data. Documents in the example database contain large binary data(i.e, as attachments.)
When querying/fetching data from mydb, it seems CouchDB may load some data from local shards, and some from shards on the remote machine. Due to the large document size( thus the heavy network traffic to to transfer data from the remote note the the coordinate node), the query speed is much slower than the standalone deployment.
Query is 2+ times faster with the standalone deployment for the same testing data.
Adding ?r=1 did not help.
Possible Solution
It will be very helpful to add an option in the query request, something like GET /mydb/id0?preferlocal=true, to enforce fetching data from shards on local node if it presents locally; otherwise, fetching from remote shards.
Additional context
Software Tested: CouchDB 2.2
OS: Linux.
The text was updated successfully, but these errors were encountered: