Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster.yaml can become out of date for killed nodes #175

Open
SimonRichardson opened this issue Dec 8, 2021 · 7 comments
Open

cluster.yaml can become out of date for killed nodes #175

SimonRichardson opened this issue Dec 8, 2021 · 7 comments
Labels
Feature New feature, not a bug

Comments

@SimonRichardson
Copy link
Member

The cluster.yaml can become out of date if a node in the cluster is removed in a non-programmatic way or without user interaction. A typical scenario could be OOM'd node or restart that gives us a different IP address. In that case, the cluster.yaml will still show the old node even if it's gone away, even after a substantial amount of time has passed.

Having spoken with @MathieuBordere, a possible solution would be to include a last seen timestamp in the cluster.yaml and get the leader to run a goroutine to spot when the last seen timestamp is bigger than we can work with and then use client.Remove().

Alternatively, this could be done directly in the app abstraction in the run loop, and remove the nodes after a configurable timeout.

@MathieuBordere
Copy link
Contributor

You should be able to use a hostname instead of an IP in the address of a Node, would that also provide a solution to your case?

@SimonRichardson
Copy link
Member Author

I don't believe so.

When Juju runs in a cloud AWS, gcp, etc we can get an unexpected termination, although rare, it can and does happen. What the issue is asking for, is to have one location where we manage the nodes in a cluster are located, without building another health check layer upon dqlite. If a node goes away without any intervention, how do we know the topology of the cluster as it is? In Juju if a node goes away it requires manual intervention to re-establish HA. Although we expect someone to have observability alerting to notify them if a node goes away, this isn't always the case. Juju uses controllers nodes as a load balancer, which means we could have nodes attempting to communicate to a controller node that doesn't exist for some time.

At the bare minimum, we would want to see a last-seen timestamp, so that the newly elected leader from the other nodes can remove the failed node.

@MathieuBordere MathieuBordere added the Feature New feature, not a bug label Jun 12, 2023
@manadart
Copy link

What is the rationale for having the node addresses in the Raft log?

Nodes have an ID, so addresses are not used to identify them...

We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs are hosing clusters. The kindest usage scenario would be one in which we can modify cluster.yaml and bounce nodes.

@cole-miller
Copy link
Contributor

Recording addresses in the raft log is done so that log replication can be used to teach followers about changes to the cluster membership, both during normal operation and when they're joining for the first time/after a long time offline. cluster.yaml just exists to reduce the number of cases where you have manually tell a node about some current cluster member on startup -- if things aren't changing too rapidly and you come back online after a crash, hopefully at least one of the servers in cluster.yaml is still active. After startup we don't read from cluster.yaml, only refresh it periodically.

@freeekanayaka
Copy link
Contributor

What is the rationale for having the node addresses in the Raft log?

Nodes have an ID, so addresses are not used to identify them...

We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

@manadart
Copy link

What is the rationale for having the node addresses in the Raft log?
Nodes have an ID, so addresses are not used to identify them...
We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

Indeed; this is the reason I ask. We have cases in Juju where we are do this, but that's due to topology changes that we're affecting. When it isn't in our control, such as when a node is rescheduled, our options are limited.

@freeekanayaka
Copy link
Contributor

What is the rationale for having the node addresses in the Raft log?
Nodes have an ID, so addresses are not used to identify them...
We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

Indeed; this is the reason I ask. We have cases in Juju where we are do this, but that's due to topology changes that we're affecting. When it isn't in our control, such as when a node is rescheduled, our options are limited.

If you are talking about k8s, you should be able to assign to nodes a stable identity (hostname) with things like StatefulSet. At that point the IP can change at will, since what will be recorded in Raft is the node identity (hostname).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Projects
None yet
Development

No branches or pull requests

5 participants