Resolve IP Address for spark.es.nodes param #623
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This request is a proposition to fix the following situation :
With the parameters :
es.nodes.discovery = false
es.nodes.client.only = false
es.nodes.data.only = true
es.nodes = abc.example.com
With abc.example.com resolving to 1.2.3.4
The data node on abc.example is incorrectly filtered because the InitializationUtils class performs a comparison between the ip extracted from "_nodes/http" and the hostname which resolves to that ip.
When enabling debug, you then get the following situation:
"DEBUG ScalaEsRDD: Found data nodes [1.2.3.4:9200]"
"DEBUG ScalaEsRDD: Filtered discovered only nodes [abc.example.com:9200] to data-only []"
Which raises an EsHadoopIllegalArgumentException : "No data nodes with HTTP-enabled available; node discovery is disabled and none of nodes specified fits the criterion [abc.example.com:9200]"
The proposed solution resolves the ip for the hostname when qualifying nodes which allows consistency for subsequent comparisons.