Allow manual promotion of nodes with candidate priority zero. #661

DimCitus · 2021-04-07T14:33:57Z

When promoting a node that is not setup as a candidate for failover, we
increment its candidate priority, perform the failover, and then reset its
candidate priority, as usual.

Allowing this mode of operation allows operating a pg_auto_failover cluster
in a fully manual way. When all the nodes have candidate priority set to
zero, it is still possible to manual trigger a promotion and target any
node.

This is a pre-requisite to being able to drop a node in all cases, including
when it's the primary node and there is no candidate for failover. Our
design is complete when it's always possible to run pg_autoctl drop node.

See #654

DimCitus · 2021-04-07T14:39:00Z

At the moment the following scenario requires some more fixes, client side only:

make NODES=3 NODES_PRIOS=0 cluster
pg_autoctl perform failover
pg_autoctl set node candidate-priority --name node3 1

In that case, we get 3 nodes all with candidate priority 0. In step 2, thanks to the PR, the failover is triggered, leaving two nodes in REPORT_LSN state and the old primary in DRAINING state. New in this PR, it is then possible to tweak a node's candidate priority, immediately triggering the end of the failover process.

The limitation we still need to fix (and I plan to include that later in this PR) is that the command in step 3 (set node candidate priority) doesn't recognise when the setting has been set. In this PR we don't always go through APPLY_SETTINGS anymore. This intermediary step is only needed when we have a stable primary node in PRIMARY or WAIT_PRIMARY...

DimCitus · 2021-04-20T16:30:58Z

The limitation we still need to fix (and I plan to include that later in this PR) is that the command in step 3 (set node candidate priority) doesn't recognise when the setting has been set. In this PR we don't always go through APPLY_SETTINGS anymore. This intermediary step is only needed when we have a stable primary node in PRIMARY or WAIT_PRIMARY...

Fixed now.

src/bin/pg_autoctl/cli_do_monitor.c

src/bin/pg_autoctl/cli_perform.c

src/monitor/node_active_protocol.c

src/monitor/node_metadata.c

src/monitor/node_metadata.h

src/bin/pg_autoctl/cli_perform.c

src/bin/pg_autoctl/fsm_transition.c

src/bin/pg_autoctl/monitor.c

When promoting a node that is not setup as a candidate for failover, we increment its candidate priority, perform the failover, and then reset its candidate priority, as usual. Allowing this mode of operation allows operating a pg_auto_failover cluster in a fully manual way. When all the nodes have candidate priority set to zero, it is still possible to manual trigger a promotion and target any node. This is a pre-requisite to being able to drop a node in all cases, including when it's the primary node and there is no candidate for failover. Our design is complete when it's always possible to run `pg_autoctl drop node`.

We now depend on the bleeding edge of pyroute2, thanks to changes and bug fixes that were doscovered when modernizing our usage of it.

It might be that we don't have a candidate for failover, and we lost the primary. In that case we can register a new node from one of the standby nodes and then see about promoting it. At the moment we don't arrange for the new node to be automatically promoted as a primary.

Handle setting replication quorum and APPLY_SETTINGS the same way that we're doing with candidate priority.

The code based some of the decision making on the candidate priority, which is the wrong replication setting to consider here. To know if a node should be wait_primary or primary, what's important is replication quorum enabled standby nodes, and the fact that those are in the SECONDARY state. There is then a special case when all the nodes are async, where we allow the primary to be in the PRIMARY state as long as at least one node is in the SECONDARY state, where in some cases it would otherwise be set to WAIT_PRIMARY to disable sync rep. When all the nodes are async anyway, then synchronous_standby_names is always computed to be ''.

DimCitus added enhancement New feature or request user experience Size:M Effort Estimate: Medium labels Apr 7, 2021

DimCitus added this to the Sprint 2021 W13 W14 milestone Apr 7, 2021

DimCitus requested a review from JelteF April 7, 2021 14:33

DimCitus self-assigned this Apr 7, 2021

DimCitus modified the milestones: Sprint 2021 W13 W14, Sprint 2021 W16 W17 Apr 19, 2021

DimCitus force-pushed the feature/perform-promotion-of-non-candidate branch from a6a7a2b to 5ea089b Compare April 20, 2021 16:30

JelteF reviewed Apr 21, 2021

View reviewed changes

DimCitus force-pushed the feature/perform-promotion-of-non-candidate branch from 889e75a to c80ec2f Compare April 21, 2021 19:42

DimCitus requested a review from JelteF April 21, 2021 19:43

DimCitus force-pushed the feature/perform-promotion-of-non-candidate branch from 3b59ef5 to 36b7994 Compare April 22, 2021 10:37

JelteF approved these changes Apr 22, 2021

View reviewed changes

src/bin/pg_autoctl/cli_perform.c Outdated Show resolved Hide resolved

src/bin/pg_autoctl/fsm_transition.c Show resolved Hide resolved

src/bin/pg_autoctl/monitor.c Outdated Show resolved Hide resolved

DimCitus added 14 commits April 22, 2021 18:00

Fix a test case that used to expect failure.

287551e

Upgrade pyroute2 in our Dockerfile, use pip.

957deb5

We now depend on the bleeding edge of pyroute2, thanks to changes and bug fixes that were doscovered when modernizing our usage of it.

Allow interrupts when waiting for our replication slot.

1179c07

Fix multi-statements connections error handling.

29450f3

Fix pg_autoctl set node candidate-priority when skipping apply_settings.

75379e7

Self-review.

b153fb7

Allow setting replication quorum in more cases/states.

1291545

Handle setting replication quorum and APPLY_SETTINGS the same way that we're doing with candidate priority.

Fix comments in computing synchronous standby names.

394c4cd

Add failover test coverage with zero candidates.

3bd9190

Fix connection lost problem on the monitor.

01221eb

Per review.

ac8c994

DimCitus added 3 commits April 22, 2021 18:00

Fix test now that we fixed Group FSM with candidate priority zero nodes.

45385b7

Per review, remove new client side FSM transitions.

0891551

Per review.

229f808

DimCitus force-pushed the feature/perform-promotion-of-non-candidate branch from 36b7994 to 229f808 Compare April 22, 2021 16:00

DimCitus merged commit 87a5110 into master Apr 22, 2021

DimCitus deleted the feature/perform-promotion-of-non-candidate branch April 22, 2021 16:46

DimCitus mentioned this pull request Apr 22, 2021

Allow dropping nodes in all cases, create a primary from a non-candidate #654

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow manual promotion of nodes with candidate priority zero. #661

Allow manual promotion of nodes with candidate priority zero. #661

DimCitus commented Apr 7, 2021

DimCitus commented Apr 7, 2021 •

edited

DimCitus commented Apr 20, 2021

Allow manual promotion of nodes with candidate priority zero. #661

Allow manual promotion of nodes with candidate priority zero. #661

Conversation

DimCitus commented Apr 7, 2021

DimCitus commented Apr 7, 2021 • edited

DimCitus commented Apr 20, 2021

DimCitus commented Apr 7, 2021 •

edited