Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems managing cluster connections in production [JIRA: CLIENTS-697] #121

Open
qix opened this issue Dec 29, 2015 · 1 comment
Open

Comments

@qix
Copy link
Contributor

qix commented Dec 29, 2015

We've been using the Riak in production for around six months now and have had a ton of issues, primarily around connection handling.

Our system usually requires values stored in riak in a very burst-y fashion, requesting up to +/- 500 keys within a few milliseconds. We were shocked to realize that each get request required it's own connection, and that the defaults maxConnections was set to 10000. This essentially means whenever we requested the keys each key would open it's own connection and overwhelm the riak servers [keep in mind this is happening on 50-100 boxes at similar times.]

In an ideal world we would open a connection to each riak server, send all the commands down them in round-robin fashion and then wait for responses. I understand the protocol requires a roundtrip for every request right now which is its own problem -- I'm not sure if there is anything in the pipeline to solve that.

The logic in the queueCommands is useless to us as it would either create a whole lot of cpu load creating 500 timeouts repeatedly. It also takes (N [requests] / M [connections]) * T [queueSubmitInterval] time. With twenty connections our five hundred gets would take 10+ seconds to fetch, and that's ignoring the speed/latency of the actual riak servers. Yes we could drop queueSubmitInterval, but dropping it low then causes a ton of cpu burn creating useless timers.

I know this was a bunch of complaints and not much in the line of solutions... we're actually looking at switching datastore for our simpler "write once" key-value requests which will alleviate most of the load. As a stop-gap we've implemented a super simple RiakCluster on our end which creates a bunch of RiakClient's and load balancers them properly.

Some suggestions that would help a ton:

  • Drop the maxConnections default to something more sane. Perhaps 100?
  • Get rid of queueSubmitInterval, and instead have a list of waiting commands that get popped whenever a node is free.
  • Update the protocol so that multiple requests can be sent down a single connection [not likely - I know.]
@Basho-JIRA Basho-JIRA changed the title Problems managing cluster connections in production Problems managing cluster connections in production [JIRA: CLIENTS-697] Dec 29, 2015
@lukebakken
Copy link
Contributor

  • Do you have a load balancer like HAProxy between your application servers and Riak?
  • There is no plan to change the protocol at this time. Given your use case, I would recommend setting minConnections to at least 500, with an appropriately higher maxConnections value. Please be sure that you have tuned your servers according to our recommendations (I assume you are using Linux). Linux or FreeBSD should have no issue handling this many connections.
  • The default for maxConnections changed to 128 in this commit but I overlooked changing the documentation (see Update maxConnections documentation. [JIRA: CLIENTS-698] #122)
  • I would gladly accept a PR to modify the command queue and, in the meantime, will see what I can do to improve it given your suggestions. Thank you for the pointers.

@lukebakken lukebakken added this to the riak-nodejs-client-2.2.0 milestone Feb 22, 2016
@lukebakken lukebakken modified the milestone: riak-nodejs-client-2.5.0 Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants