Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan search type has been deprecated but still post in search #682

Closed
fmyblack opened this issue Jan 29, 2016 · 2 comments
Closed

scan search type has been deprecated but still post in search #682

fmyblack opened this issue Jan 29, 2016 · 2 comments

Comments

@fmyblack
Copy link

Hi,
I want to read from ElasticSearch into Spark according to Apache Spark support: spark-native.
Here is my conf set:

conf.set("es.nodes", "192.168.2.93");
conf.set("es.port", "9205");
conf.set("es.internal.es.version", "2.1.1");
conf.set("es.resource", "9zdata_ip_multi/access_log");
String query = "{\"query\": {\"match_all\": {}}}";
conf.set("es.query", query);

JavaSparkContext jsc = new JavaSparkContext(conf);
JavaPairRDD<String, Map<String, Object>> esRDD =JavaEsSpark.esRDD(jsc);

I found the WARN info in spark job:

16/01/29 10:09:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.2.95): 
org.apache.spark.util.TaskCompletionListenerException: 
[POST] on [9zdata_ip_multi/access_log/_search?search_type=scan&scroll=5&size=50&preference=_shards:0;_only_node:a2U5CzjsS0iAdz_hov_-ag] failed; 
server[192.168.2.93:9205] returned [400|Bad Request:]
        at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:90)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)

I know the scan search type has been deprecated in elasticsearch 2.1, Why still post search_type=scan? And the scan=10 is invalid, is should be scan=10m
How can I resolve this issue?

@costin
Copy link
Member

costin commented Jan 29, 2016

@fmyblack Specifying es.internal properties is not supported; there is no documentation for it and they are not to be used outside the library itself. They are simply not supported.
Additionally, if you don't specify a query, by default it will match everything; in your case simply remove it.
Especially since your query

Additionally, there's no indication on what caused the bad request - you assume it's the URL but it can very well be the body.

Based on the scroll param likely you are using ES-Hadoop 2.1.x which does not support Elasticsearch 2.0 or higher, for that one needs to run ES-Hadoop 2.2 (currently rc1).

P.S. Thanks for taking time to format the post and looking through the code to make things work.
However the project should not be hard to use, if something fails please report it right away (with things like including the version) as it will help resolve the situation faster.

@costin costin added v2.3.0 and removed v2.2.0 labels Jan 31, 2016
@fmyblack
Copy link
Author

fmyblack commented Feb 1, 2016

@costin OK, it works. Thank you

@fmyblack fmyblack closed this as completed Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants