New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node not rejoining when restarted with other node unreachable #18067
Comments
|
Thanks for reporting @tpantelis, we should soon have some capacity to look into this (have not read in detail yet). AFAIR it could be that we don't allow nodes to join in there is no convergence in the cluster, but will have to double check.. |
|
As you have found node2 cannot join until previous incarnation has been removed. When trying to join new incarnation the old is downed automatically because we see new uid. It will not be removed until all other unreachable have been downed also, i.e. node3 must be downed. Then all will be removed and node2 can join. If you would have tried joining node3 at the same time this would happen automatically. This works as designed. |
|
It may be working as designed but I'm questioning the design. I'll re-iterate a couple passages above: "It is perfectly fine to have node1 and node2 running in the cluster without node3. In fact, I can start node1 and node2 from an empty cluster and they join and form a cluster just fine with node3 not running." But once all 3 are running, if node2 and node3 are both taken down, node2 can't rejoin until node3 is also restarted, unless both are auto-downed. This is the part I don't quite get. node2 was already in the cluster, so, when restarted, why can't it rejoin w/o node3 regardless of being "downed"? That seems inconsistent to me as it is allowed to join from an empty cluster w/o node3. We're using akka for HA so we only need 2 nodes running. We want to disable auto-down b/c it has an undesirable effect when a node is unreachable and not actually down in that the node becomes quarantined and is not allowed back in until it is restarted. We need a node to automatically rejoin regardless. We're developing a commercial product and end users will expect automatic recovery, as would I. I have no control over when or how many nodes will become unreachable or stopped in real deployments. So we have kind of a catch-22 wrt auto-downing. If we set it low we get one undesirable behavior and if we set it high we get another. How can we get a happy medium here? |
|
Why would you want to leave node-3 in the cluster if it is unreachable? |
|
We're using akka to provide an HA solution (using Raft) so the cluster is for the most part static (seed-nodes is configured with all IPs in the cluster during initial setup). If a node is unreachable or down it will eventually come back, unless in the uncommon case where the VM/machine suffers catastrophic failure and needs to be replaced by another machine - this will require some manual intervention which is fine. So regardless of whether a node becomes unreachable due to network partition or is shutdown/restarted (eg upgrade), we want it to seamlessly rejoin the cluster without manual intervention. Whether or not the node is actually removed as a member by the leader doesn't really matter as long as the shutdown/unreachable node is allowed to rejoin once restarted/healed. Disabling auto-down-unreachable-after does this except if more than one node is shutdown/unreachable, one of them can't rejoin until all of them are able to rejoin - this is not desirable. If we enable auto-down-unreachable-after (small value) then one shutdown/unreachable node can rejoin w/o the others but then an unreachable node due to network partition can't rejoin unless restarted which is also not desirable. Temporary network blips will occur in production environments and it will be unacceptable for customers to have to know when a blip occurred and restart nodes. So that's our quandary. |
|
What you are requesting is not that simple. Even if it was allowed to join its status will not be changed to Up until there is convergence, i.e. no unreachable. See http://doc.akka.io/docs/akka/2.4-M2/common/cluster.html#Membership_Lifecycle I think the solution is to use a proper downing strategy, which is something we are working on improving. |
|
Setting auto-down to a low value (which we had originally) would be fine except for the network partition case. Why can't a node rejoin with the same UUID after it's been downed? Can a flag be added to allow this? We could also potentially auto-restart the actor system if we could programmatically know it is not being allowed to rejoin b/c it needs to be restarted. Is there any way to detect this (a cluster event would be nice)? |
|
Hi Patrick, I work with Tom in OpenDaylight and in essence, I see there being three solutions to this and I'd love to get feedback on which one makes sense and how to go about implementing it. First is to fix Akka's clustering logic to allow for us to make use of reachable nodes as soon as they are reachable without conditions on the rechability of other nodes—at least in certain configurations. Second, is to reboot the actor system when we discover that we've been downed but can talk to other nodes. We don't know how to tell if the local actor has been downed, but presumably that is possible. Third is to just avoid using Akka clustering altogether and just use Akka remoting, which we'd rather not do because we do actually like the failure detectors and other features. Based on the comments on this thread, my guess is that the second option is really the only valid one in the short run. As Tom asks, could you provide some insight there? |
|
Please send me an email in private (patrik dot nordwall at typesafe.com) and I will give you some information about a feature that is not released yet. Are you a Typesafe subscriber by the way? |
|
We're an open source project so we don't typical subscribe to commercial versions or support for our dependencies. Is there any way you can talk publicly about the feature and give us an idea of when we might be able to use it? |
|
I understand. Let's continue here then. I'd love to talk publicly about the feature, but can't just yet. Hopefully that will change within a few weeks. I repeat, what are you going to do with node-3, the unreachable node that has not been downed? Akka Cluster is not designed for having unreachable nodes lingering around forever. At some point you have to decide that they are dead, and will never come back (no zombies please). Then node-2 would be removed and restarted node-2 can join. Akka Cluster is designed around that some membership state transitions are only performed when there is so called convergence, and that can only happen when all unreachable members have been marked as down. One such transition is for changing state from Down to Removed (and actually removing the member from the cluster). Another such transition is for changing Joining to Up. I understand that you challenge the design, but changing that is not something we can work on any time soon. You must solve the downing of unreachable nodes anyway. That is not a trivial problem to solve. Our default recommendation is that the decision of what to do should be taken by a human operator or an external monitoring system. That is not the solution you are looking for. Then you must build something that decides based on membership state. You must ensure that both sides of a network partition can make this decision by themselves and that they take the same decision about which part will keep running and which part will shut itself down. Another solution is to coordinate via an external component, such as a database or zookeeper. The problem is then what happens if that component is unavailable? |
|
@nilok We run akka cluster for the game server too,and encounter this problem when we just want to quick restart part of the cluster(for update).In our case ,we provide an external endpoint via spray to send some command to ask the cluster remove/down the specified node which we want to restart and then rejoin the cluster. I think you may need to take some control on it manualy. |
|
@nilok Still I don't think we restart the akka node via kill command is the right way,the ConductR may provide some help,but we have not try it. |
|
Thanks for all the feedback. This all makes sense. In essence we are doing a version of what was suggested using Zookeeper, but we're building it on top of Akka remoting using RAFT. Since that shares fate with the Akka nodes, we're not worried about it being unavailable differently from Akka. I think what we're really looking for is to use Akka clustering's failure detector, i.e., the reachable/unreachable flag, without having to use the full Akka clustering membership since we provide our own notion of consensus without needing to rely on the Akka clustering notion of convergence. Can you expand a bit on how people have used something like Zookeeper for this in the past? |
|
@hepin1989 It sounds like you're basically just doing graceful shutdown and restart. Can you expand on exactly how you did that. I know that @tpantelis tried to do that, but ran into some issues where it wasn't behaving like he'd want. |
|
I tried to programmatically issue a "leave" on graceful shutdown just prior to shutting down the actor system but the cluster leader didn't receive the message. Perhaps the actor system was shutdown too quickly such that the leave message wasn't processed - I didn't try a delay or wait to verify the member was removed. |
|
@tpantelis I don't think the leave should be issued from the node which is leaving ,a better way I think should using a cluster singleton to issue that,and the cluster singleton shoube better running on the master. I don't know why it's must needed for the node to restart to rejoin the cluster again,why couldn't we let the other nodes drop the stashed messages and let it in?Maybe it would be great that akka cluster could be associated dynamically. |
|
@nilok the zk cluster in practice is small too,3/5 nodes.and it's more sensible about latency.We could build what zk could do beyond akka cluster too,with 3/5 nodes.If you need more control at akka cluster,I think you need control/assist it from out side. |
|
We're getting a similar behavior that I could use some help with. We have three nodes in different AWS data centers, one of which is the sole seed node and exclusive possessor of a singleton, accomplished by using When I take down a non-seed node, the seed node marks it as UNREACHABLE and begins to periodically log the following: Fair enough. When I bring the node back up, however, the newly-started node begins to repeat: The seed node logs: and repeatedly thereafter: It seems strange to me that the log indicates things "will" happen that don't, the existing being removed and the new member being allowed to join. I've been googling that message and can't find an explanation of what I may need to do to make that actually happen. |
|
@nelsonblaha this scenario manifests when the cluster is in a non-convergent state, while a node is trying to re-join it after a restart: As soon as the re-started member issues a |
We're using akka (version 2.3.10) in a large open source project - Opendaylight SDN controller. We ran into an issue where a node is not allowed to rejoin a cluster when other node(s) are also down/unreachable until they all become reachable or are started.
Here's the scenario:
I have a 3-node cluster - call them
node1,node2andnode3- withnode1being the cluster leader. I stoppednode2andnode3(my app that is).node1quickly declared the other nodes unreachable and continued to retry the connection, ie:I also got this message which is expected based on the akka docs:
After restarting
node2, it tried to join and this output from node1 seems to indicate it did:But it actually didn't and the "New incarnation of existing member..." message repeated over and over every 11 sec.
After more testing I found that node1 would only allow node2 back in after node3 was started or was "downed", either via auto-down or manual down via JMX.
Setting
auto-down-unreachable-afterto a small value like10sgives us the behavior we want butauto-down-unreachable-aftercauses another issue so we disable it.The akka docs say "When a member is considered by the failure detector to be unreachable the leader is not allowed to perform its duties, such as changing status of new joining members to 'Up'. The node must first become reachable again, or the status of the unreachable member must be changed to 'Down'.".
I take this to mean a new joining member that hadn't previously joined. However, in my case, node2 and node3 were already in the cluster and thus weren't new joining members. Also node2 did become reachable again. The behavior seems to be that all unreachable nodes must become reachable/downed before any node is allowed to rejoin.
I'm not clear on the reasoning behind this behavior and I can live with the former case above as new nodes aren't commonly added in our environment (clusters are mainly static). But for the latter case, it seems like a bug to me that
node2, which was already in the cluster, isn't allowed to be transitioned to Up (ie rejoin) untilnode3is restarted. It is perfectly fine to havenode1andnode2running in the cluster withoutnode3. In fact, I can startnode1andnode2from an empty cluster and they join and form a cluster just fine withnode3not running. Thereforenode2should be able to rejoinnode1on restart withnode3not running. To me that's basic functionality.On a side note, while
node2was in cluster "pergatory",node1had a connection withnode2at the remoting layer and actors onnode1were able to send messages tonode2. All this while akka clustering still deemed and reportednode2as unreachable. That seems broken - a major disconnect between the 2 components.The text was updated successfully, but these errors were encountered: