Doubt with property "hosts" (YCSB) #140

man12 opened this Issue Aug 23, 2013 · 6 comments


None yet
5 participants

man12 commented Aug 23, 2013

I'd like to resolve one doubt that I have about the "host" property of YCSB benchmark. I just have a quick look at the code but I couldn't solve anything.
I'm trying out YCSB with Cassandra and here ( says "... the client can load balance its connections among all of the servers".

In my own test case, I have a multinode cluster with 4 nodes. I'm defining the property like that: hosts=compute0,compute1... and even so. According to the previous link, I suppose all petitions would be balanced, similar to Round Robin, with one thread.

But my doubt is with multiple client threads, i.e., what's the behaviour of the benchmark in that case? Is it like Round Robin or there is a mapping between threads and nodes? I asked this because I wanna know if the database performance or efficiency may be affected for this parameter for several clients. I don't know if each thread "goes" to a different machine (node) or not. If so, it is obviously that performance would be better.

Moreover, related with the previous questions and as finally, is it possible on YCSB each client knows where the data can be? So, they could make requests where they wanted. Am I wrong on this?

Thank you very much for your attention.


man12 commented Oct 8, 2013

Any answer? Am I crazy? Thanks..


westonplatter commented Mar 23, 2014

@man12 hey - we're backlogged on with th long list of github issues. Sorry for the delay :)

Which Cassandra client from YCSB were you using?

I don't know the answer at the moment. I'll dig into the Cassandra java code in the next week. Any insight you may have by also looking at the java code is more than welcome.


cmatser commented Mar 23, 2014

The load balance is going to be dependent on the driver. YCSB will divide the operationcount by the number of threads and give each thread an equal share all executing at the same time. It's up to the driver to be properly thread safe and load balance. The driver's I've come across so far (mongodb & cassandra) have handled this well.

For cassandra, even if you list just one of the nodes in your host property, the driver will discover all the nodes in your cluster and round-robin them by default. However, for fail-over purposes, it's best to list all nodes in your cluster in the client host property.

As for intelligent routing, that is not built into the driver. Each cassandra node has the ability to handle the request routing. So, the driver will load balance, and once the request is received by a node, it will determine whether it can service it locally or if it needs to go to one or more of the other nodes. Your read/write consistency values will factor into this as well.

YCSB does not have built-in intelligent routing either as it can be very db dependent. Some dbs will do ranges whereas other will do hash values on the key for example. Even then, your db may decide it needs to re-arrange the data based on it's own algorithms. Doing routing on the client I would imagine is difficult, maybe even discouraged. Still, if that is something you want to test, you can certainly build something yourself as an add-on to YCSB.

Sorry I didn't see this post 7 months ago. Although, 7 months ago, I was not as familiar with Cassandra. Hope this is still helpful now.


ghost commented Mar 24, 2014

Issue regarding CPU & MEMORY utilization
Hello ,
am working on data-serving benchmark,as i ve no problem with any of the
issues.If i vary a read/wrtie/insert ratio and record count,the CPU &
MEMORY utilization is not going to increase if the load increses during the
transaction phase....Oly one some point of time %CPU increases n it doesnt
stay for a second also.
My work is to build a mathematical model by recording these values.But am
unable to build a model by this kind of reading.
Could you please suggest me any kind of benchmark which helps me to do

Thanks & Regards


busbey commented Jun 6, 2015

closing as stale. if you are still having issues please check with the current Cassandra bindings.

busbey closed this Jun 6, 2015

i-refugee commented May 8, 2016 edited

when I am running a ycsb script on a cassandra cluster should I run it only in the master/seed node or at all nodes simultaneously?
Thus, cause when I run it on the master/seed node then when i check from ganglia I see more movment to this node than to the other nodes of the cluster. Why is that?
Should not all request be balanced among the nodes?

Thanks and regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment