Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard size estimations for Slice API do not target shards #843

Closed
jbaiera opened this issue Sep 8, 2016 · 0 comments
Closed

Shard size estimations for Slice API do not target shards #843

jbaiera opened this issue Sep 8, 2016 · 0 comments

Comments

@jbaiera
Copy link
Member

jbaiera commented Sep 8, 2016

When passing a shard id to the RestClient#count method, I noticed that the request parameter it renders with it is &preference=0 when it should be &preference=_shards:0. According to the Elasticsearch documentation:

_shards:2,3: Restricts the operation to the specified shards.
Custom (string) value: A custom value will be used to guarantee that the same shards will be used for the same custom value.

When using &preference=0, the count method pulls back the count for the entire index instead of just the count for the shard it's targeting.

For example, if we have 100 documents in an index, and 5 shards, we can assume that about 20 documents will appear in each shard. If we set es.input.maxdocsperpartition to a value of 10, then one would assume that we should have about 10 input splits (20 docs/shard divided by 10 maximum docs / partition times 5 shards). Currently, since the count method returns the count for the entire index instead of on a shard by shard basis, we get 50 input splits (100 docs divided by 10 maximum docs/partition times 5 shards).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant