Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the EsSpark.esRDD method to read from Elasticsearch does not honor size parameters in either URI or DSL #469

Closed
sherry-ger opened this issue Jun 5, 2015 · 4 comments

Comments

@sherry-ger
Copy link

The size option is ignored. Both methods returned all results.

//conf.set("es.query", "?q=text_entry:rebel&size=1") or

val q1 = "{\"query\": {\"filtered\" : {\"query\" : {\"term\" : { \"text_entry\": \"rebel\" }}}},\"size\" : 1}"
conf.set("es.query", q1)
val esRDD = sc.newAPIHadoopRDD(conf, classOf[EsInputFormat[Text, MapWritable]], classOf[Text], classOf[MapWritable])

May be this is similar to #444

@costin
Copy link
Member

costin commented Jun 12, 2015

This is actually on purpose and needs to be documented. Since the connector does a parallel query, it also looks at the number of documents being returned so if the user specifies a parameter, it will overwrite it according to the batch.size setting (see the configuration option).
In other words, if you want to control the size, do so through that setting as it will always take precedence.

@apatrida
Copy link

its nice to have docs...

@costin
Copy link
Member

costin commented Oct 28, 2015

relates #546

@costin
Copy link
Member

costin commented Oct 29, 2015

Fixed through #546

@costin costin closed this as completed Oct 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants