Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

qix · 2015-12-29T21:18:20Z

We've been using the Riak in production for around six months now and have had a ton of issues, primarily around connection handling.

Our system usually requires values stored in riak in a very burst-y fashion, requesting up to +/- 500 keys within a few milliseconds. We were shocked to realize that each get request required it's own connection, and that the defaults maxConnections was set to 10000. This essentially means whenever we requested the keys each key would open it's own connection and overwhelm the riak servers [keep in mind this is happening on 50-100 boxes at similar times.]

In an ideal world we would open a connection to each riak server, send all the commands down them in round-robin fashion and then wait for responses. I understand the protocol requires a roundtrip for every request right now which is its own problem -- I'm not sure if there is anything in the pipeline to solve that.

The logic in the queueCommands is useless to us as it would either create a whole lot of cpu load creating 500 timeouts repeatedly. It also takes (N [requests] / M [connections]) * T [queueSubmitInterval] time. With twenty connections our five hundred gets would take 10+ seconds to fetch, and that's ignoring the speed/latency of the actual riak servers. Yes we could drop queueSubmitInterval, but dropping it low then causes a ton of cpu burn creating useless timers.

I know this was a bunch of complaints and not much in the line of solutions... we're actually looking at switching datastore for our simpler "write once" key-value requests which will alleviate most of the load. As a stop-gap we've implemented a super simple RiakCluster on our end which creates a bunch of RiakClient's and load balancers them properly.

Some suggestions that would help a ton:

Drop the maxConnections default to something more sane. Perhaps 100?
Get rid of queueSubmitInterval, and instead have a list of waiting commands that get popped whenever a node is free.
Update the protocol so that multiple requests can be sent down a single connection [not likely - I know.]

The text was updated successfully, but these errors were encountered:

lukebakken · 2015-12-29T22:48:55Z

Do you have a load balancer like HAProxy between your application servers and Riak?
There is no plan to change the protocol at this time. Given your use case, I would recommend setting minConnections to at least 500, with an appropriately higher maxConnections value. Please be sure that you have tuned your servers according to our recommendations (I assume you are using Linux). Linux or FreeBSD should have no issue handling this many connections.
The default for maxConnections changed to 128 in this commit but I overlooked changing the documentation (see Update maxConnections documentation. [JIRA: CLIENTS-698] #122)
I would gladly accept a PR to modify the command queue and, in the meantime, will see what I can do to improve it given your suggestions. Thank you for the pointers.

Basho-JIRA changed the title ~~Problems managing cluster connections in production~~ Problems managing cluster connections in production [JIRA: CLIENTS-697] Dec 29, 2015

Basho-JIRA added the JIRA: To Do label Dec 29, 2015

lukebakken mentioned this issue Dec 29, 2015

Update maxConnections documentation. [JIRA: CLIENTS-698] #122

Closed

lukebakken added this to the riak-nodejs-client-2.2.0 milestone Feb 22, 2016

Basho-JIRA assigned lukebakken Feb 22, 2016

lukebakken modified the milestones: riak-nodejs-client-2.2.0, riak-nodejs-client-2.3.0 Apr 26, 2016

lukebakken modified the milestone: riak-nodejs-client-2.3.0 Nov 17, 2016

lukebakken modified the milestone: riak-nodejs-client-2.5.0 Dec 22, 2016

antn unassigned lukebakken Jun 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

qix commented Dec 29, 2015

lukebakken commented Dec 29, 2015

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

Comments

qix commented Dec 29, 2015

lukebakken commented Dec 29, 2015