New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use new parallel reader API in ES 5.x #778
Labels
Comments
This was referenced Jun 2, 2016
When will this feature be added ? |
It will help me a lot |
@jimferenczi Can you please link(if you already have a ticket in place) or update this issue with your progress? Thanks. |
jimczi
added a commit
to jimczi/elasticsearch-hadoop
that referenced
this issue
Jul 26, 2016
This commits changes how we split the query in multiple partitions. For cluster running with version prior to v5.x: * We create one partition for each shard of each requested index. * If an alias is requested the search routing and the alias filter are respected. * The partition is no longer attached to a node nor an ip. Only the shardId and index name are defined in order to be able to use any replica in the cluster when the partition is consumed. This makes the retry possible if a node disapears during a job. * The ability to consume a partition on the node that is responsible for the index/shardId has been removed temporarily and should be re-added in a follow up. For cluster ruuning with version v5.x: * We first split by index then by shard and finally by the maximum number of documents allowed per partition (configurable through the new option named es.input.maxdocsperpartition. For instance an index with 5 shards, 1M documents and a maximum number of documents allowed per partition equals to 100,000, a match all query would be splitted in 50 partitions, 10 partitions per shard. * If an alias is requested the search routing and the alias filter are respected. Fixes elastic#778
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ES 5.x has a proposal/PR elastic/elasticsearch/pull/18237 for allowing parallel reads for an alias/index based on a query based on an arbitrary number of slices. This goes beyond the capabilities of scan/scroll in ES 1.x/2.x and it's something that is desired in highly parallel environments or those that have memory/processing constrains (but high number of instances).
The text was updated successfully, but these errors were encountered: