Use new parallel reader API in ES 5.x #778

costin · 2016-06-02T13:27:24Z

ES 5.x has a proposal/PR elastic/elasticsearch/pull/18237 for allowing parallel reads for an alias/index based on a query based on an arbitrary number of slices. This goes beyond the capabilities of scan/scroll in ES 1.x/2.x and it's something that is desired in highly parallel environments or those that have memory/processing constrains (but high number of instances).

garyelephant · 2016-07-05T09:38:55Z

When will this feature be added ?

Q-RK · 2016-07-05T10:19:19Z

It will help me a lot

costin · 2016-07-12T17:10:57Z

@jimferenczi Can you please link(if you already have a ticket in place) or update this issue with your progress? Thanks.

jimczi · 2016-07-21T13:55:38Z

@costin you can find the progress here:
#812

This commits changes how we split the query in multiple partitions. For cluster running with version prior to v5.x: * We create one partition for each shard of each requested index. * If an alias is requested the search routing and the alias filter are respected. * The partition is no longer attached to a node nor an ip. Only the shardId and index name are defined in order to be able to use any replica in the cluster when the partition is consumed. This makes the retry possible if a node disapears during a job. * The ability to consume a partition on the node that is responsible for the index/shardId has been removed temporarily and should be re-added in a follow up. For cluster ruuning with version v5.x: * We first split by index then by shard and finally by the maximum number of documents allowed per partition (configurable through the new option named es.input.maxdocsperpartition. For instance an index with 5 shards, 1M documents and a maximum number of documents allowed per partition equals to 100,000, a match all query would be splitted in 50 partitions, 10 partitions per shard. * If an alias is requested the search routing and the alias filter are respected. Fixes elastic#778

costin added feature :Rest v5.0.0-alpha4 labels Jun 2, 2016

This was referenced Jun 2, 2016

Read failure when index/alias spread among 32 or more nodes. #737

Closed

Allow a higher number of partitions with an RDD #528

Closed

costin added v5.0.0-alpha5 and removed v5.0.0-alpha4 labels Jun 28, 2016

costin assigned jbaiera Jul 12, 2016

jimczi mentioned this issue Jul 21, 2016

Add the ability to create IndexPartition based on the desired number of documents per split #812

Closed

jbaiera closed this as completed in 77f2564 Jul 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use new parallel reader API in ES 5.x #778

Use new parallel reader API in ES 5.x #778

costin commented Jun 2, 2016

garyelephant commented Jul 5, 2016

Q-RK commented Jul 5, 2016

costin commented Jul 12, 2016

jimczi commented Jul 21, 2016

Use new parallel reader API in ES 5.x #778

Use new parallel reader API in ES 5.x #778

Comments

costin commented Jun 2, 2016

garyelephant commented Jul 5, 2016

Q-RK commented Jul 5, 2016

costin commented Jul 12, 2016

jimczi commented Jul 21, 2016