You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found this open comment, and may be related to a question I was going to ask.
In your webinar, you mention that search requests from es-hadoop to elasticsearch distribute the query to nodes. You also mention if one node goes down, es-hadoop will move to another node.
My question is this:
We use client nodes (no data) for all our queries (no queries go direct to a data node)
Our environment is 1 dedicated Master (no data), 2 or more clients (no DATA, HTTP enabled) and several Data nodes (non client)
How does es-hadoop distribute load in this case? does it distribute load via the list of client nodes passed into "es.nodes" SparkConf? or is it doing some type of routing request within the query (through the client node)?
@jeffsteinmetz For some reason I've only found this comment now - apologies for the huge delay. THe latest Beta (4) has support for client only nodes - in other words, es-hadoop can be forced to connect to the cluster only through these nodes. Clearly it affects parallelism since the queries are distributed between these nodes (instead of going to the data nodes directly) but performance doesn't seem to be affected too much - depends on the volume really and how import locality is.
In other words, if you are doing HUGE bulk reads, you might find it slower, if not, you are unlikely to spot any difference.
By the way, I've closed the issue since Beta4 was just released. Let me know if you have any issues/queries potentially through the mailing list or another issue.
Double check behaviour in cluster with master-only or client/tribe nodes in terms of writing/reading data.
The text was updated successfully, but these errors were encountered: