You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when a cluster is configured with a restrictive max_shards_per_node and the allocation algorithm end up distributing the shards in a way that the last shard to be placed can only be allocated on the same node as its primary, the replica is never allocated and the allocation logic does not move other shards to correct this lock.
Example:
Given a 8 node cluster, a table with 16 shards and 1 replica (32 shards total) and the setting max_shards_per_node = 4 the following behavior is observed. When the last shards is on the allocation process, it is not placed in any nodes given the constraint of 4 shards per node and the explanation on sys.allocations shows too many shards [4] allocated to this node for index for 7 of the nodes while one shows a copy of this shard is already allocated to this node.
Possible Solutions
Move shards around automatically to conform with both constraints: max_shards_per_node and the primary/replica placement.
Considered Alternatives
Manually change the max_shards_per_node to a greater number so the replica can be placed on another node
Use ALTER TABLE REROUTE to manually control the allocation of the shards, moving around a couple of shards to guarantee the allocation success
The text was updated successfully, but these errors were encountered:
This is kinda expected. In order to do a re-allocation the cluster has to temporarily create additional shard copies.
E.g. if you have a table with a single shard, moving it from node1 to node2 will temporarily result in 2 shards in the cluster. It first creates a new empty shard copy on node2, then copies the data from the still active shard on node1, and only afterwards removes the copy from node1.
But in the example I provided, one of the nodes is not full yet. So the allocation process could move one of the other shards from the full node to the other node and then allocate the missing replica on the now not full node. Something like this:
Given a 4-node cluster with 4 shards and 1 primary (8 shards total), the following decision process is used. When the replica for the 4th shard is unable to be placed on node 4 since its primary is already allocated to that node, the allocation process moves the primary for shard 2 from node 2 to node 4 and then place the replica for shard 4 on node 1. This is just an example, of course there are many other possibilities.
Problem Statement
Currently, when a cluster is configured with a restrictive
max_shards_per_node
and the allocation algorithm end up distributing the shards in a way that the last shard to be placed can only be allocated on the same node as its primary, the replica is never allocated and the allocation logic does not move other shards to correct this lock.Example:
Given a 8 node cluster, a table with 16 shards and 1 replica (32 shards total) and the setting
max_shards_per_node = 4
the following behavior is observed. When the last shards is on the allocation process, it is not placed in any nodes given the constraint of 4 shards per node and the explanation onsys.allocations
showstoo many shards [4] allocated to this node for index
for 7 of the nodes while one showsa copy of this shard is already allocated to this node
.Possible Solutions
Move shards around automatically to conform with both constraints:
max_shards_per_node
and the primary/replica placement.Considered Alternatives
Manually change the
max_shards_per_node
to a greater number so the replica can be placed on another nodeUse
ALTER TABLE REROUTE
to manually control the allocation of the shards, moving around a couple of shards to guarantee the allocation successThe text was updated successfully, but these errors were encountered: