-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluating our use case #36
Comments
Yes, currently Swarm doesn't have an API for saying "hey, we want to shutdown, so hand off your state to the node which will become responsible for owning you". It is something I'd like to add, but will require some thought as to the best way to do so. On the other hand, if your node shuts down, and the process is restarted elsewhere in the cluster, as long as it can "rebuild" it's state from some authoritative source (say mnesia, Redis, or some other datastore), then you can just let Swarm handle redistributing those processes when the node shuts down the way it does already. The only key there is that those processes must be started with Also, Swarm guarantees that if a network partition occurs, a copy of those registered processes will be running in every partition - so if you need a guarantee that a given process will only ever be present once, across all partitions (in a netsplit scenario), Swarm wouldn't be the right solution. |
I would say, at least initially, I'm not too concerned what the behavior is in the case of a network partition. I find something like that unlikely in the context of kubernetes. I think I might be a bit confused as to the purpose of swarm's handoff capabilities. What is the purpose of being able to handoff process state if not that the current node is going down? Is it about balancing work amongst a cluster? |
I would never underestimate the ability of the network to mess up your day ;), even within k8s (maybe even especially in k8s, due to the software defined networking layer). Swarm's current handoff implementation handles the case where a cluster of nodes suddenly loses one of the nodes - all of the processes (those registered via It gets tricky when you want to simulate a node down event, because you need to do the handoff before you go down. Due to the way the internals are written (they rely on the hash ring for a priori knowledge of where to register processes), we need to remove the node from everyone's hash ring, but still allow handoff events from that node. That can be risky if the broadcast which tells the rest of the cluster that a node should be "soft-killed" fails to reach all parties - the hash rings become out of sync and chaos ensues. Only performing handoffs when a node actually goes down is safer because the message is generated within each node when they lose communication with a connected node - we don't risk losing the event and getting out of sync. This is definitely not an impossible problem to solve, but it is tricky, hence why it wasn't done at the same time as the current handoff behavior implementation. |
I'm trying to figure out if swarm can be used to fulfill some of our needs. We're running an application inside of a kubernetes cluster, and we'd like to be able to make a few things happen: certain processes should retain state throughout a kubernetes level rolling update, and certain gen_severs should always be running exactly once somewhere in the cluster. At first I was thinking we could use swarm pretty easily, but from looking at #11 it looks like that isn't currently a part of the implementation. I've already got the clustering set up w/ libcluster. If swarm isn't the right tool for this, if there are any recommendations for alternative solutions that would fill our use cases that would be awesome.
The text was updated successfully, but these errors were encountered: