Skip to content

New ElasticSearch Database Implementation #93

Closed
wants to merge 3 commits into from

3 participants

@saden1
saden1 commented Aug 26, 2012

I started a project to benchmark ElasticSearch. I used YCSB as part of my testing process and I would like to share my ElasticSearch database implementation.

Questions and feedback welcomed.

http://www.elasticsearch.org/

@m1ch1 m1ch1 pushed a commit that referenced this pull request Sep 11, 2012
saden1 gh-93 New ElasticSearch Database Implementation 3827d9a
@m1ch1
Collaborator
m1ch1 commented Sep 11, 2012

Hi Sharmarke,

Sorry it took a while to get to this. Thanks for the patch!

--Michi

@m1ch1 m1ch1 closed this Sep 11, 2012
@vaidik
vaidik commented Dec 19, 2012

I am new to both YCSB and ElasticSearch. I was able to run YCSB easily for Cassandra. However, I have not been able to do the same with ES (or perhaps I have but I am not sure).

Following the steps documented in YCSB/elasticsearch, I was able to start the test and I even got the results. I am not sure on which ES instance is it running on? For Cassandra, I had to start the Cassandra Server myself and then run the tests (providing the hosts details along with the ycsb command). ES, on the other hand, does not require us to do anything of that sort. So how does YCSB run these tests. I didn't even have my local ES instance up but the tests gave results.

Any insights would really help?

Thanks!

@saden1
saden1 commented Dec 19, 2012

ElasticSearch enables you to start a local node ("embedded" instance inside the JVM) and that's exactly what is being used in YCSB. ElasticSearch can be confusing initially but it all comes down to understanding the fact that you can create a local node that does or doesn't contain data and that clients are obtained via these nodes. If a node doesn't contain data then it simply routes data to other nodes in the cluster.

You can also use a TransportClient to avoid having to create a node. If you wish to test against a "remote" node then you will have to create your own properties file (i.e "myproperties.data") that contains custom ElasticSearch node configuration and pass it into YCSB as described in the documentation. If you're planning on doing some funky and cool testing "remote" ElasticSearch testing be sure to explicitly overwrite the default properties as that will insure your configuration file is in full command of ElasticSearch.

Please take a look at "http://www.elasticsearch.org/guide/reference/java-api/client.html" and note the following:

"[C]ommon usage is to start the Node and use the Client in unit/integration tests. In such a case, we would like to start a “local” Node (with a “local” discovery and transport). Again, this is just a matter of a simple setting when starting the Node. Note, “local” here means local on the JVM (well, actually class loader) level, meaning that two local servers started within the same JVM will discover themselves and form a cluster."

@vaidik
vaidik commented Dec 20, 2012

And the current ESClient in YCSB does not implement TransportClient. Thanks for all the information and the link. Got to know a lot about ES this way.

Thanks :)

@saden1
saden1 commented Dec 20, 2012

There really isn't a good reason to use Transport Client as it is less efficient. For maximum performance it is recommended that you start a local dataless node that connects to your cluster and obtain a client from it.

If you still want to benchmark ES using Transport Client feel free to change the code to fit your test case.

@vaidik
vaidik commented Dec 20, 2012

Well you're right. TransportClient does look less efficient. Therefore, I completely ignored that option.

Local dataless node looks like the way. But, how do I verify that the local node is able to connect with my cluster. As far as I know, my other ES instances will show connection info as soon as another ES instance comes up. But, that doesn't happen when I connect with the local node.

UPDATE: Running everything locally. Also, on checking /tmp/esdata manually, I was able to confirm that the supposedly dataless node is writing data to itself on running bin/ycsb load .... And the other ES node does not get any writes. Exactly the opposite is happening, most probably because the local node is not able to join the cluster.

Any clue what might be wrong?

@saden1
saden1 commented Dec 20, 2012

I suspect you don't have auto-discovery turned on or you're not using the same cluster name on both nodes. Try setting the following on both your local and remote nodes:

{code}
cluster.name=MyCluster
discovery.zen.ping.multicast.enabled=true
{code}

Also, you may want to checkout one of the many fronts available to monitor your cluster:

http://www.elasticsearch.org/guide/appendix/clients.html

@saden1
saden1 commented Dec 20, 2012

"I was able to confirm that the supposedly dataless node is writing data to itself on running"

Insure that "node.data" is set to false or "node.client" to true and try setting "node.local" to false.

@saden1
saden1 commented Dec 20, 2012

One more thing, if you have a firewall/proxy in place it can prevent you from utilizing multicast discovery and you may need to open those ports for communication. An alternative to multicast discovery is unicast discovery. Assuming you have correctly configured your remote data nodes and know their IP address and ports here's a surefire configuration that should work.

# all nodes must have the same cluster name
cluster.name=ycsb.cluster

# create a dataless client node
node.client=true

# make sure this is unique and each node has it's own name or elasticsearch generated name
node.name=ycsb.client

node.local=false

# disable mutlicast and enable unicast
discovery.zen.ping.multicast.enabled=false
discovery.zen.ping.unicast.enabled=true

# you will need to make sure these match the ip addresses and ports of your remote nodes
# In this case I have two local nodes started and I made sure to explicitly
# set their "network.host" config properties to 127.0.0.1 and their "transport.tcp.port" 
# to 9301 and 9302 respectively
discovery.zen.ping.unicast.hosts=127.0.0.1[9301],127.0.0.1[9302]

Hope this helps.

@vaidik
vaidik commented Dec 23, 2012

I tried multicast following your directions and got a warning saying "failed to receive confirmation on sent ping response...". After trying a couple of things, I realized that this might be a firewall issue that I was not able to ID. Therefore, I switched to unicast with your settings. That failed again on my machine.

Then I setup a new Ubuntu and tried unicast again. Finally I figured out that the client version mentioned in elasticsearch/pom.xml is was 0.19.8, whereas I was using the latest ES Server, 0.20.1. So the Java client being used in YCSB/elasticsearch was not compatible with the server version and that's why everything was failing. Changed the version in pom.xml and rebuilt YCSB, and it worked!

Perhaps you should point this out somewhere in the README.

Thanks for your patience and all the help! :)

@joey joey added a commit to joey/YCSB that referenced this pull request Feb 7, 2013
saden1 gh-93 New ElasticSearch Database Implementation f0c8083
@wolfgangihloff wolfgangihloff pushed a commit that referenced this pull request Jan 21, 2015
saden1 gh-93 New ElasticSearch Database Implementation 968990f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.