You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing an issue when using pyspark, Elasticsearch 1.7.0, elasticsearch-hadoop-2.1.2.jar on Spark 1.5.1 (all running on my Mac OS Yosemite system). I run the simple program shown below (from the article at http://qbox.io/blog/elasticsearch-in-apache-spark-python). After the print(es_rdd.first()) statement is executed, pyshark just hangs:
Using Python version 2.7.10 (default, Oct 19 2015 18:31:17)
SparkContext available as sc, HiveContext available as sqlContext.
>>> es_rdd = sc.newAPIHadoopRDD(
... inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
... keyClass="org.apache.hadoop.io.NullWritable",
... valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
... conf={ "es.resource" : "titanic/passenger" })
15/11/27 18:16:41 WARN EsInputFormat: Cannot determine task id...
>>> print(es_rdd.first())
15/11/27 18:16:50 WARN EsInputFormat: Cannot determine task id...
15/11/27 18:16:51 WARN SimpleHttpConnectionManager: SimpleHttpConnectionManager being used incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.
When I stop Elasticsearch I get the following output:
15/11/27 18:33:32 ERROR NetworkClient: Node [10.0.0.2:9200] failed (The server 10.0.0.2 failed to respond with a valid HTTP response); no other nodes left - aborting...
15/11/27 18:33:32 WARN NewHadoopRDD: Exception in RecordReader.close()
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.0.0.2:9200]]
at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:329)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:337)
at org.elasticsearch.hadoop.rest.RestClient.deleteScroll(RestClient.java:403)
at org.elasticsearch.hadoop.rest.ScrollQuery.close(ScrollQuery.java:70)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.close(EsInputFormat.java:262)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.org$apache$spark$rdd$NewHadoopRDD$$anon$$close(NewHadoopRDD.scala:190)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
at org.apache.spark.scheduler.Task.run(Task.scala:90)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
(u'892', {u'fare': u'7.8292', u'name': u'Kelly, Mr. James', u'embarked': u'Q', u'age': u'34.5', u'parch': u'0', u'pclass': u'3', u'sex': u'male', u'ticket': u'330911', u'passengerid': u'892', u'sibsp': u'0', u'cabin': None})
Note 10.0.0.2 is the IP address of my Mac. At any rate, I end up getting the expected output (the last line above) after a series of error messages. When I use elasticsearch-hadoop.2.1.0.jar instead of 2.1.2 I do not see this problem and the program runs without error.
Is this an incompatibility problem with elasticsearch-hadoop.2.1.2.jar, Elaseticsearch 1.7.0, and Spark 1.5.1?
The text was updated successfully, but these errors were encountered:
I am getting a similar error: ERROR NetworkClient: Node [The server <...> failed to respond with a valid HTTP response] failed (<...>:9300); no other nodes left - aborting... Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[<...>:9300]]
<...> stands for the actual IP
I am using Spark 1.5.1, Elasticsearch 1.7.1 and elasticsearch-spark_2.11 version 2.1.1
Port is open, I am able to connect with a different client.
Any hints appreciated.
I'm pretty sure you bumped into the same issue as described in #591. This has been fixed in master and will be included in the upcoming 2.2 rc1. Can you please check it out once is released and if it's not working, reopen the issue?
I'm seeing an issue when using pyspark, Elasticsearch 1.7.0, elasticsearch-hadoop-2.1.2.jar on Spark 1.5.1 (all running on my Mac OS Yosemite system). I run the simple program shown below (from the article at http://qbox.io/blog/elasticsearch-in-apache-spark-python). After the print(es_rdd.first()) statement is executed, pyshark just hangs:
Using Python version 2.7.10 (default, Oct 19 2015 18:31:17)
SparkContext available as sc, HiveContext available as sqlContext.
When I stop Elasticsearch I get the following output:
Note 10.0.0.2 is the IP address of my Mac. At any rate, I end up getting the expected output (the last line above) after a series of error messages. When I use elasticsearch-hadoop.2.1.0.jar instead of 2.1.2 I do not see this problem and the program runs without error.
Is this an incompatibility problem with elasticsearch-hadoop.2.1.2.jar, Elaseticsearch 1.7.0, and Spark 1.5.1?
The text was updated successfully, but these errors were encountered: