Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch : Cannot detect ES version #791

Closed
FK7 opened this issue Jun 22, 2016 · 6 comments
Closed

Elasticsearch : Cannot detect ES version #791

FK7 opened this issue Jun 22, 2016 · 6 comments

Comments

@FK7
Copy link

FK7 commented Jun 22, 2016

Hi,
I am using Amazon ElasticSearch service and seprate Spark cluster on EMR and trying to execute elasticsearch-hadoop Apache Spark writing example , However on submitting the job on my local before I am getting following exception as info.

16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request
16/06/22 13:56:26 INFO HttpMethodDirector: I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect
16/06/22 13:56:26 INFO HttpMethodDirector: Retrying request

and later on am getting following exception.

Strack trace:

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'
    at org.elasticsearch.hadoop.rest.InitializationUtils.discoverEsVersion(InitializationUtils.java:190)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:379)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:40)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.scala:67)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[search-spark-elasticsearch-5cengch4w4hghs5coa5xu2tyoq.us-east-1.es.amazonaws.com:9200]] 
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:434)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:414)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:418)
    at org.elasticsearch.hadoop.rest.RestClient.get(RestClient.java:122)
    at org.elasticsearch.hadoop.rest.RestClient.esVersion(RestClient.java:564)
    at org.elast

Following is my code that I am running -

Code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.elasticsearch.spark.rdd.EsSpark

object SparkElasticSearch extends App {

  val conf = new SparkConf().setAppName("ElasticSearchTest").setMaster("local[*]")

  val sc = new SparkContext(conf)
  var esconf = Map("es.nodes" -> "search-spark-elasticsearch-xxxxx.xxxx.es.amazonaws.com", "es.nodes.wan.only" -> "true")

  val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
  val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")

  import org.elasticsearch.spark._
  sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs", esconf)

}

Maven dependency

<dependency>
  <groupId>org.elasticsearch</groupId>
  <artifactId>elasticsearch-hadoop</artifactId>
  <version>2.3.2</version>
</dependency>

Is It a bug or am I missing something here ?

@costin
Copy link
Member

costin commented Jun 27, 2016

If you only specify the host, es-hadoop will use the default port (namely 9200) which is rarely the case. Moreover with AWS (As with any service provider) one needs to open up the firewall to allow open access to it; otherwise connections from outside are not allowed.
You can double check the connectivity by enabling logging (see the dedicated chapter in the docs).

Cheers,

@FK7
Copy link
Author

FK7 commented Jun 28, 2016

@costin Thanks for your assistance.

@swapnilkumbhar1602
Copy link

plz stop ur firewall on elasticsearch
sudo service firewalld stop

@ghostband
Copy link

i have the same question here. And my es cluster was deployed whit my hadoop cluster.
i want to write data to es whit spark dataframe.
And following is my maven repository:

org.elasticsearch
elasticsearch-spark_2.10
5.0.0-alpha4

es version 5.5.0
spark 1.6.0
scala 2.10.5

The exception is

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot detect ES version - typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only'

Can anyone help me?
thanks very much.

@pavan9396
Copy link

The issue is we are overwriting the spark default configuration in mapr - /opt/mapr/spark/spark-2.1.0/conf (We are using MapR distribution)

and the spark configuration we are passing in our application were not able to bind to sparkConfig. So it is pointing to local host during index creation(127.0.0.1:9200)- check in your exception log if you faced this

I have changed the configuration details in the application and passed those while creating the sparkSession object and I have tested the application.

Now, the application is working fine and I’m able to create the index in Elastic Search and load the data.

sparkConfig passed while creating the sparkSession:

val sparkConf = new SparkConf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.es.index.auto.create", "true")
.set("spark.es.nodes", "yourESaddress")
.set("spark.es.port", "9200")
.set("spark.es.net.http.auth.user","")
.set("spark.es.net.http.auth.pass", "
")
.set("spark.es.resource", indexName)
.set("spark.es.nodes.wan.only", "true")
val sparkSession = SparkSession.builder().config(sparkConf).appName("sourcedashboard").getOrCreate()

@marceau06
Copy link

You have to add your ES port on your sparkConf, maybe this is 9243 or 443 as your ES is running on AWS.

You can also redirect your calls to ES to your local:

apt install -y socat && socat tcp-listen:9200,fork tcp:your_elasticsearch_ip:9200 or 9243

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants