Skip to content

Improve resilience in some failure cases (1.x)#25

Merged
ghostdogpr merged 1 commit intoseries/1.xfrom
failure_behavior
Sep 30, 2022
Merged

Improve resilience in some failure cases (1.x)#25
ghostdogpr merged 1 commit intoseries/1.xfrom
failure_behavior

Conversation

@ghostdogpr
Copy link
Collaborator

@ghostdogpr ghostdogpr commented Sep 29, 2022

  • When unregistering, ping the shard manager first and interrupt the unregistration immediately if it's down (don't even stop entities in that case, because they won't be rebalanced right away).
  • When a node is unresponsive, try to get the latest shard assignments from the shard manager just in case it's the storage layer that has a problem (it happened to us once that Redis had an outage and shard assignment updates were not pushed to nodes).

@ghostdogpr ghostdogpr changed the title Improve resilience in some failure cases Improve resilience in some failure cases (1.x) Sep 29, 2022
@ghostdogpr ghostdogpr merged commit 6842062 into series/1.x Sep 30, 2022
@ghostdogpr ghostdogpr deleted the failure_behavior branch September 30, 2022 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant