Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleHttpConnectionManager problem with elasticsearch-hadoop-2.1.2.jar on Spark #618

Closed
vichargrave opened this issue Nov 28, 2015 · 3 comments

Comments

@vichargrave
Copy link

I'm seeing an issue when using pyspark, Elasticsearch 1.7.0, elasticsearch-hadoop-2.1.2.jar on Spark 1.5.1 (all running on my Mac OS Yosemite system). I run the simple program shown below (from the article at http://qbox.io/blog/elasticsearch-in-apache-spark-python). After the print(es_rdd.first()) statement is executed, pyshark just hangs:

Using Python version 2.7.10 (default, Oct 19 2015 18:31:17)
SparkContext available as sc, HiveContext available as sqlContext.

>>> es_rdd = sc.newAPIHadoopRDD(
...     inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
...     keyClass="org.apache.hadoop.io.NullWritable",
...     valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
...     conf={ "es.resource" : "titanic/passenger" })
15/11/27 18:16:41 WARN EsInputFormat: Cannot determine task id...
>>> print(es_rdd.first())
15/11/27 18:16:50 WARN EsInputFormat: Cannot determine task id...
15/11/27 18:16:51 WARN SimpleHttpConnectionManager: SimpleHttpConnectionManager being used incorrectly.  Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or method is using this connection manager at a time.

When I stop Elasticsearch I get the following output:

15/11/27 18:33:32 ERROR NetworkClient: Node [10.0.0.2:9200] failed (The server 10.0.0.2 failed to respond with a valid HTTP response); no other nodes left - aborting...
15/11/27 18:33:32 WARN NewHadoopRDD: Exception in RecordReader.close()
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.0.0.2:9200]]
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:142)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:329)
    at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:337)
    at org.elasticsearch.hadoop.rest.RestClient.deleteScroll(RestClient.java:403)
    at org.elasticsearch.hadoop.rest.ScrollQuery.close(ScrollQuery.java:70)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.close(EsInputFormat.java:262)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1.org$apache$spark$rdd$NewHadoopRDD$$anon$$close(NewHadoopRDD.scala:190)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
    at org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$3.apply(NewHadoopRDD.scala:156)
    at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)
    at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
    at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
    at org.apache.spark.scheduler.Task.run(Task.scala:90)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
(u'892', {u'fare': u'7.8292', u'name': u'Kelly, Mr. James', u'embarked': u'Q', u'age': u'34.5', u'parch': u'0', u'pclass': u'3', u'sex': u'male', u'ticket': u'330911', u'passengerid': u'892', u'sibsp': u'0', u'cabin': None})

Note 10.0.0.2 is the IP address of my Mac. At any rate, I end up getting the expected output (the last line above) after a series of error messages. When I use elasticsearch-hadoop.2.1.0.jar instead of 2.1.2 I do not see this problem and the program runs without error.

Is this an incompatibility problem with elasticsearch-hadoop.2.1.2.jar, Elaseticsearch 1.7.0, and Spark 1.5.1?

@larghir
Copy link

larghir commented Dec 22, 2015

I am getting a similar error:
ERROR NetworkClient: Node [The server <...> failed to respond with a valid HTTP response] failed (<...>:9300); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[<...>:9300]]

<...> stands for the actual IP

I am using Spark 1.5.1, Elasticsearch 1.7.1 and elasticsearch-spark_2.11 version 2.1.1

Port is open, I am able to connect with a different client.
Any hints appreciated.

@costin
Copy link
Member

costin commented Jan 8, 2016

I'm pretty sure you bumped into the same issue as described in #591. This has been fixed in master and will be included in the upcoming 2.2 rc1. Can you please check it out once is released and if it's not working, reopen the issue?

Thanks,

@vichargrave
Copy link
Author

OK will do. Thanks.

On 1/8/16, Costin Leau notifications@github.com wrote:

Closed #618.


Reply to this email directly or view it on GitHub:
#618 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants