Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider cluster.role.<role-name>.min-nr-of-members fallback #28177

Closed
raboof opened this issue Nov 18, 2019 · 2 comments
Closed

Reconsider cluster.role.<role-name>.min-nr-of-members fallback #28177

raboof opened this issue Nov 18, 2019 · 2 comments

Comments

@raboof
Copy link
Member

@raboof raboof commented Nov 18, 2019

cluster.role.<role-name>.min-nr-of-members can be used to postpone shard allocation until at least this number of nodes with this role have joined the cluster. This is useful to avoid 'flooding' nodes with entities when there are few nodes, for example during startup or when an update is in progress.

However, when cluster.role.<role-name>.min-nr-of-members is not defined, this setting falls back to cluster.min-nr-of-members. Since it is unlikely that all nodes of the cluster have the same role, this seems too strict.

I'm proposing to remove this fallback, and default cluster.role.<role-name>.min-nr-of-members to 1.

@patriknw

This comment has been minimized.

Copy link
Member

@patriknw patriknw commented Nov 18, 2019

I agree with this reasoning.

@helena helena self-assigned this Nov 18, 2019
@helena

This comment has been minimized.

Copy link
Member

@helena helena commented Nov 18, 2019

Relatedly, here is a suspicious case I'm testing:

Issue found where

  • 5 node cluster
  • 2 cluster roles: 3 nodes with role R1, 2 nodes with role R2
  • akka.cluster.min-nr-of-members = 2
  • no per role min set

The question is why do nodes of role R2 log not all regions registered yet, but think all have with a third node of that role.

  1. Attempt to access R2 from one of the proxy nodes shows:
    :2551/system/sharding/X2Proxy - X2: Request shard [16] home. Coordinator [Some(Actor[akka://system@127.0.0.1:2555/system/sharding/X2Coordinator/singleton/coordinator#757556079])]
  2. Several retries follow, after which code times out:
    2551/system/sharding/X2Proxy -X2: Retry request for shard [16] homes from coordinator at [Actor[akka://system@127.0.0.1:2555/system/sharding/X2Coordinator/singleton/coordinator#757556079]]. [1] buffered messages.
  3. On node 2555 hosting the shard the logs show the above triggers 20+ messages:
    :2555/system/sharding/X2Coordinator/singleton/coordinator - GetShardHome [16] request ignored, because not all regions have registered yet.
  4. But adding a 6th node to the cluster, with role R2 and starting both entities, node 2555 logs:
    :2555/system/sharding/X2Coordinator/singleton/coordinator - ShardRegion registered: [Actor[akka://system@127.0.0.1:2556/system/sharding/X2#697707116]]
    :2555/system/sharding/X2 - X2: Starting shard [16] in region
@helena helena added this to Reviewing in Akka 2.6.x Nov 19, 2019
helena added a commit that referenced this issue Nov 20, 2019
@helena helena added this to the 2.6.1 milestone Nov 20, 2019
@helena helena closed this Nov 20, 2019
Akka 2.6.x automation moved this from Reviewing to Done Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Akka 2.6.x
  
Done
3 participants
You can’t perform that action at this time.