New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow manual promotion of nodes with candidate priority zero. #661
Conversation
At the moment the following scenario requires some more fixes, client side only:
In that case, we get 3 nodes all with candidate priority 0. In step 2, thanks to the PR, the failover is triggered, leaving two nodes in REPORT_LSN state and the old primary in DRAINING state. New in this PR, it is then possible to tweak a node's candidate priority, immediately triggering the end of the failover process. The limitation we still need to fix (and I plan to include that later in this PR) is that the command in step 3 (set node candidate priority) doesn't recognise when the setting has been set. In this PR we don't always go through APPLY_SETTINGS anymore. This intermediary step is only needed when we have a stable primary node in PRIMARY or WAIT_PRIMARY... |
a6a7a2b
to
5ea089b
Compare
Fixed now. |
889e75a
to
c80ec2f
Compare
3b59ef5
to
36b7994
Compare
When promoting a node that is not setup as a candidate for failover, we increment its candidate priority, perform the failover, and then reset its candidate priority, as usual. Allowing this mode of operation allows operating a pg_auto_failover cluster in a fully manual way. When all the nodes have candidate priority set to zero, it is still possible to manual trigger a promotion and target any node. This is a pre-requisite to being able to drop a node in all cases, including when it's the primary node and there is no candidate for failover. Our design is complete when it's always possible to run `pg_autoctl drop node`.
We now depend on the bleeding edge of pyroute2, thanks to changes and bug fixes that were doscovered when modernizing our usage of it.
It might be that we don't have a candidate for failover, and we lost the primary. In that case we can register a new node from one of the standby nodes and then see about promoting it. At the moment we don't arrange for the new node to be automatically promoted as a primary.
Handle setting replication quorum and APPLY_SETTINGS the same way that we're doing with candidate priority.
The code based some of the decision making on the candidate priority, which is the wrong replication setting to consider here. To know if a node should be wait_primary or primary, what's important is replication quorum enabled standby nodes, and the fact that those are in the SECONDARY state. There is then a special case when all the nodes are async, where we allow the primary to be in the PRIMARY state as long as at least one node is in the SECONDARY state, where in some cases it would otherwise be set to WAIT_PRIMARY to disable sync rep. When all the nodes are async anyway, then synchronous_standby_names is always computed to be ''.
36b7994
to
229f808
Compare
When promoting a node that is not setup as a candidate for failover, we
increment its candidate priority, perform the failover, and then reset its
candidate priority, as usual.
Allowing this mode of operation allows operating a pg_auto_failover cluster
in a fully manual way. When all the nodes have candidate priority set to
zero, it is still possible to manual trigger a promotion and target any
node.
This is a pre-requisite to being able to drop a node in all cases, including
when it's the primary node and there is no candidate for failover. Our
design is complete when it's always possible to run
pg_autoctl drop node
.See #654