You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When passing a shard id to the RestClient#count method, I noticed that the request parameter it renders with it is &preference=0 when it should be &preference=_shards:0. According to the Elasticsearch documentation:
_shards:2,3: Restricts the operation to the specified shards. Custom (string) value: A custom value will be used to guarantee that the same shards will be used for the same custom value.
When using &preference=0, the count method pulls back the count for the entire index instead of just the count for the shard it's targeting.
For example, if we have 100 documents in an index, and 5 shards, we can assume that about 20 documents will appear in each shard. If we set es.input.maxdocsperpartition to a value of 10, then one would assume that we should have about 10 input splits (20 docs/shard divided by 10 maximum docs / partition times 5 shards). Currently, since the count method returns the count for the entire index instead of on a shard by shard basis, we get 50 input splits (100 docs divided by 10 maximum docs/partition times 5 shards).
The text was updated successfully, but these errors were encountered:
When passing a shard id to the
RestClient#count
method, I noticed that the request parameter it renders with it is&preference=0
when it should be&preference=_shards:0
. According to the Elasticsearch documentation:When using
&preference=0
, the count method pulls back the count for the entire index instead of just the count for the shard it's targeting.For example, if we have 100 documents in an index, and 5 shards, we can assume that about 20 documents will appear in each shard. If we set
es.input.maxdocsperpartition
to a value of 10, then one would assume that we should have about 10 input splits (20 docs/shard
divided by10 maximum docs / partition
times5 shards
). Currently, since the count method returns the count for the entire index instead of on a shard by shard basis, we get 50 input splits (100 docs
divided by10 maximum docs/partition
times5 shards
).The text was updated successfully, but these errors were encountered: