Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to rolling update protoactor cluster without losing states? #404

Open
cupen opened this issue Oct 14, 2020 · 3 comments
Open

How to rolling update protoactor cluster without losing states? #404

cupen opened this issue Oct 14, 2020 · 3 comments

Comments

@cupen
Copy link
Contributor

cupen commented Oct 14, 2020

I have some strong consistency scenarios like upgrade business logic in production.
I noticed that actor/grain of protoactor-go/cluster lost states when the cluster nodes shrinking by cluster.Shutdown(true).

So I create a workaround.

  • When my actor/grain is starting, it will try to get the PID of old one from external storage.
  • If there was an old one, the new one will send a message to tell the old one:"just persist states and forward message to me".
  • After that, the new one will persist PID to override the old one.

Race problem could be resolved by distributed lock, but it's low performance and disgraceful.
I'm looking for a better solution, maybe it could be similar as akka or orleans.
https://doc.akka.io/docs/akka/current/additional/rolling-updates.html
https://dotnet.github.io/orleans/docs/grains/grain_versioning/deploying_new_versions_of_grains.html

Any suggestions?

@rogeralsing
Copy link
Collaborator

Related to #408

@cupen
Copy link
Contributor Author

cupen commented Jan 21, 2021

I'm trying to create a new workflow for restart a cluster without losing states and cluster.Request timeout.
e.g.:

  1. Start some new nodes, and join in cluster.
  2. Keep the old nodes, just marks them as outflowing only which meaning they still works like normal node, but they will refuse the new ActivationRequest and respond with "Sorry, I'm outflowing only. Please request to a normal node." .
  3. Wait for a long time, until all of the actors/grains owned by old nodes become inactive.
  4. Then, shutdown the old nodes.

That's all. The step 2 will introduce a lot complexity and some delay jitter, but I think it's better than losing states and cluster.Request timeout.

@cupen
Copy link
Contributor Author

cupen commented Jan 11, 2022

About move activations atomically.
asynkron/protoactor-dotnet#741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants