You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As @costin mentioned at the post #469, I have been trying to use batch.size setting for controlling the size of the return number of documents from elasticsearch to RDD.
However, when I looked at the configurations, the only things I can find with batch.size were es.batch.size.bytes and es.batch.size.entries which don't seem to be the option to limit the document numbers returned from elasticsearch. Also when I tried these options SparkES didn't limit the document results.
What is the option to limit the number of documents returned from elasticsearch-spark?
Thanks
The text was updated successfully, but these errors were encountered:
Provided a fix in master through es.scroll.limit. By default it will read all entries however when a positive value is specified, it will limit the reads to that number.
Thanks! It works nicely. I am guessing es.scroll.limit value is assigned to each shard? When I put 1 and run it, I get 5 results.(Index with 5 primary shards)
Number of total results/items returned by each individual scroll. A negative value indicates that all documents that match should be returned. Do note that this applies per scroll which is typically bound to one of the job tasks. Thus the total number of documents returned is LIMIT * NUMBER_OF_SCROLLS (OR TASKS)
when the index is empty,I get the following error: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: ElasticsearchIllegalArgumentException[Malformed scrollId []]
Is there another way to limit the number of docs returned?
As @costin mentioned at the post #469, I have been trying to use batch.size setting for controlling the size of the return number of documents from elasticsearch to RDD.
However, when I looked at the configurations, the only things I can find with batch.size were es.batch.size.bytes and es.batch.size.entries which don't seem to be the option to limit the document numbers returned from elasticsearch. Also when I tried these options SparkES didn't limit the document results.
What is the option to limit the number of documents returned from elasticsearch-spark?
Thanks
The text was updated successfully, but these errors were encountered: