cluster.yaml can become out of date for killed nodes #175

SimonRichardson · 2021-12-08T11:44:10Z

The cluster.yaml can become out of date if a node in the cluster is removed in a non-programmatic way or without user interaction. A typical scenario could be OOM'd node or restart that gives us a different IP address. In that case, the cluster.yaml will still show the old node even if it's gone away, even after a substantial amount of time has passed.

Having spoken with @MathieuBordere, a possible solution would be to include a last seen timestamp in the cluster.yaml and get the leader to run a goroutine to spot when the last seen timestamp is bigger than we can work with and then use client.Remove().

Alternatively, this could be done directly in the app abstraction in the run loop, and remove the nodes after a configurable timeout.

The text was updated successfully, but these errors were encountered:

MathieuBordere · 2022-02-09T18:59:13Z

You should be able to use a hostname instead of an IP in the address of a Node, would that also provide a solution to your case?

SimonRichardson · 2022-02-10T09:11:17Z

I don't believe so.

When Juju runs in a cloud AWS, gcp, etc we can get an unexpected termination, although rare, it can and does happen. What the issue is asking for, is to have one location where we manage the nodes in a cluster are located, without building another health check layer upon dqlite. If a node goes away without any intervention, how do we know the topology of the cluster as it is? In Juju if a node goes away it requires manual intervention to re-establish HA. Although we expect someone to have observability alerting to notify them if a node goes away, this isn't always the case. Juju uses controllers nodes as a load balancer, which means we could have nodes attempting to communicate to a controller node that doesn't exist for some time.

At the bare minimum, we would want to see a last-seen timestamp, so that the newly elected leader from the other nodes can remove the failed node.

manadart · 2024-02-15T07:39:46Z

What is the rationale for having the node addresses in the Raft log?

Nodes have an ID, so addresses are not used to identify them...

We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs are hosing clusters. The kindest usage scenario would be one in which we can modify cluster.yaml and bounce nodes.

cole-miller · 2024-02-15T14:53:29Z

Recording addresses in the raft log is done so that log replication can be used to teach followers about changes to the cluster membership, both during normal operation and when they're joining for the first time/after a long time offline. cluster.yaml just exists to reduce the number of cases where you have manually tell a node about some current cluster member on startup -- if things aren't changing too rapidly and you come back online after a crash, hopefully at least one of the servers in cluster.yaml is still active. After startup we don't read from cluster.yaml, only refresh it periodically.

freeekanayaka · 2024-02-15T15:00:27Z

What is the rationale for having the node addresses in the Raft log?

Nodes have an ID, so addresses are not used to identify them...

We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

manadart · 2024-02-16T09:43:01Z

What is the rationale for having the node addresses in the Raft log?
Nodes have an ID, so addresses are not used to identify them...
We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

Indeed; this is the reason I ask. We have cases in Juju where we are do this, but that's due to topology changes that we're affecting. When it isn't in our control, such as when a node is rescheduled, our options are limited.

freeekanayaka · 2024-02-17T16:10:31Z

What is the rationale for having the node addresses in the Raft log?
Nodes have an ID, so addresses are not used to identify them...
We're currently seeing more scenarios where the cluster is brittle, because rescheduled nodes/changing IPs

Note that changing the IP of a node is currently not supported, unless you reconfigure the cluster manually.

Indeed; this is the reason I ask. We have cases in Juju where we are do this, but that's due to topology changes that we're affecting. When it isn't in our control, such as when a node is rescheduled, our options are limited.

If you are talking about k8s, you should be able to assign to nodes a stable identity (hostname) with things like StatefulSet. At that point the IP can change at will, since what will be recorded in Raft is the node identity (hostname).

MathieuBordere added the Feature New feature, not a bug label Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster.yaml can become out of date for killed nodes #175

cluster.yaml can become out of date for killed nodes #175

SimonRichardson commented Dec 8, 2021

MathieuBordere commented Feb 9, 2022

SimonRichardson commented Feb 10, 2022

manadart commented Feb 15, 2024

cole-miller commented Feb 15, 2024

freeekanayaka commented Feb 15, 2024

manadart commented Feb 16, 2024

freeekanayaka commented Feb 17, 2024

cluster.yaml can become out of date for killed nodes #175

cluster.yaml can become out of date for killed nodes #175

Comments

SimonRichardson commented Dec 8, 2021

MathieuBordere commented Feb 9, 2022

SimonRichardson commented Feb 10, 2022

manadart commented Feb 15, 2024

cole-miller commented Feb 15, 2024

freeekanayaka commented Feb 15, 2024

manadart commented Feb 16, 2024

freeekanayaka commented Feb 17, 2024