Elasticsearch : Cannot detect ES version #791

FK7 · 2016-06-22T09:16:47Z

Hi,
I am using Amazon ElasticSearch service and seprate Spark cluster on EMR and trying to execute elasticsearch-hadoop Apache Spark writing example , However on submitting the job on my local before I am getting following exception as info.

16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request
16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request

and later on am getting following exception.

Strack trace:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:190)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[search-spark-elasticsearch-5cengch4w4hghs5coa5xu2tyoq.us-east-1.es.amazonaws.com:9200]] 
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
    at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
    at org.elast

Following is my code that I am running -

Code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.elasticsearch.spark.rdd.EsSpark

object SparkElasticSearch extends App {

  val conf = new SparkConf().setAppName("ElasticSearchTest").setMaster("local[*]")

  val sc = new SparkContext(conf)
  var esconf = Map("es.nodes" -> "search-spark-elasticsearch-xxxxx.xxxx.es.amazonaws.com", "es.nodes.wan.only" -> "true")

  val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
  val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")

  import org.elasticsearch.spark._
  sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs", esconf)

}

Maven dependency

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>2.3.2</version>
</dependency>

Is It a bug or am I missing something here ?

The text was updated successfully, but these errors were encountered:

costin · 2016-06-27T21:50:11Z

If you only specify the host, es-hadoop will use the default port (namely 9200) which is rarely the case. Moreover with AWS (As with any service provider) one needs to open up the firewall to allow open access to it; otherwise connections from outside are not allowed.
You can double check the connectivity by enabling logging (see the dedicated chapter in the docs).

Cheers,

FK7 · 2016-06-28T07:31:23Z

@costin Thanks for your assistance.

swapnilkumbhar1602 · 2017-01-17T06:00:20Z

plz stop ur firewall on elasticsearch
sudo service firewalld stop

ghostband · 2017-07-28T05:55:18Z

i have the same question here. And my es cluster was deployed whit my hadoop cluster.
i want to write data to es whit spark dataframe.
And following is my maven repository:

org.elasticsearch
elasticsearch-spark_2.10
5.0.0-alpha4

es version 5.5.0
spark 1.6.0
scala 2.10.5

The exception is

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Can anyone help me?
thanks very much.

pavan9396 · 2018-06-19T06:42:30Z

The issue is we are overwriting the spark default configuration in mapr - /opt/mapr/spark/spark-2.1.0/conf (We are using MapR distribution)

and the spark configuration we are passing in our application were not able to bind to sparkConfig. So it is pointing to local host during index creation(127.0.0.1:9200)- check in your exception log if you faced this

I have changed the configuration details in the application and passed those while creating the sparkSession object and I have tested the application.

Now, the application is working fine and I’m able to create the index in Elastic Search and load the data.

sparkConfig passed while creating the sparkSession:

val sparkConf = new SparkConf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.es.index.auto.create", "true")
.set("spark.es.nodes", "yourESaddress")
.set("spark.es.port", "9200")
.set("spark.es.net.http.auth.user","")
.set("spark.es.net.http.auth.pass", "")
.set("spark.es.resource", indexName)
.set("spark.es.nodes.wan.only", "true")
val sparkSession = SparkSession.builder().config(sparkConf).appName("sourcedashboard").getOrCreate()

marceau06 · 2020-03-11T13:08:10Z

You have to add your ES port on your sparkConf, maybe this is 9243 or 443 as your ES is running on AWS.

You can also redirect your calls to ES to your local:

apt install -y socat && socat tcp-listen:9200,fork tcp:your_elasticsearch_ip:9200 or 9243

costin added :Rest :Spark question labels Jun 27, 2016

costin closed this as completed Jun 27, 2016

costin added invalid v2.3.3 v5.0.0-alpha4 non-issue labels Jun 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch : Cannot detect ES version #791

Elasticsearch : Cannot detect ES version #791

FK7 commented Jun 22, 2016 •

edited

costin commented Jun 27, 2016

FK7 commented Jun 28, 2016

swapnilkumbhar1602 commented Jan 17, 2017

ghostband commented Jul 28, 2017

pavan9396 commented Jun 19, 2018

marceau06 commented Mar 11, 2020

Elasticsearch : Cannot detect ES version #791

Elasticsearch : Cannot detect ES version #791

Comments

FK7 commented Jun 22, 2016 • edited

Strack trace:

Code:

costin commented Jun 27, 2016

FK7 commented Jun 28, 2016

swapnilkumbhar1602 commented Jan 17, 2017

ghostband commented Jul 28, 2017

pavan9396 commented Jun 19, 2018

marceau06 commented Mar 11, 2020

FK7 commented Jun 22, 2016 •

edited