Manage failover of additional physical replication slots #2385

gbartolini · 2023-06-29T09:22:27Z

Since version 1.18, CloudNativePG supports automated management of physical replication slots that the operator directly builds to improve both self-healing and high availability of a PostgreSQL cluster, which also includes surviving after a failover, a capability that Postgres alone cannot offer.

A stanza called .spec.replicationSlots.highAvailability in the Cluster resource spec controls this feature. For details, please refer to the “Additional Background” section below.

The current implementation, however, does not synchronize on the replicas any additional replication slots that have been independently created and are managed by users, preventing them from persisting after a failover.

The proposal is to add a new stanza called synchronizeReplicas in the replicationSlots stanza:

enabled: when set to true, every replication slot that is on the primary is synchronized on each standby (by default set to true); the value is changed from true to false, the operator must remove from each standby any replication slot that was previously created by itself
excludePatterns: list of regular expression patterns to match the names of replication slots to be excluded (by default empty)

These two parameters will not control the replication slots for high availability mentioned above. However, this feature will reuse the same logic for synchronizing slots on the replicas.

Users that want to take advantage of this feature must monitor such replication slots to ensure they don’t interfere with the operations on the primary.

Here follows an excerpt of an example to configure this feature in the configuration:

replicationSlots:
 synchronizeReplicas:
  enabled: true
  excludePatterns:
  - "^foo"

Additional Background

Replication slots are a native PostgreSQL feature introduced in 9.4 that provides an automated way to ensure that the primary does not remove WAL segments until all the attached streaming replication clients have received them and that the primary does not remove rows which could cause a recovery conflict even when the standby is (temporarily) disconnected. A replication slot exists solely on the instance that created it, and PostgreSQL does not replicate it on the standby servers. As a result, after a failover or a switchover, the new primary does not contain the replication slot from the old primary. Streaming replication clients previously connected to the old primary cannot connect to the new one due to missing replication slots.

CloudNativePG 1.18 introduced the concept of cluster-managed replication slots for High Availability purposes only by automatically managing physical replication slots for each hot standby replica in the High Availability cluster, both in the primary and the standby instances. Specifically, the operator manages slots in the primary (for each of the HA standbys) and in each standby (for any other HA standby in the cluster, by looking at the content of the pg_replication_slots view in the primary and subsequently advancing the local slots at regular intervals using pg_replication_slot_advance())

Users can create and drop replication slots through the pg_create_physical_replication_slot and pg_drop_replication_slot functions in PostgreSQL.

The text was updated successfully, but these errors were encountered:

Husselbossel · 2023-10-23T08:13:47Z

Hi @gbartolini

Great feature, exactly the thing that I'm looking for.
How would you currently advice to solve it, without this feature?
Any direction you can point me in?

Cheers,

Jos

Extend the existing synchronization of physical replication slots (currently limited to HA replication slots) to include those defined by the user on the primary, enhancing self-healing, high availability, and third-party application integration. The feature is enabled by default but can be disabled or customized through exclusion patterns by configuring the newly introduced stanza: `replicationSlots.synchronizeReplicas`. Closes #2385 Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Signed-off-by: Tao Li <tao.li@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Tao Li <tao.li@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>

gbartolini added the enhancement 🪄 New feature or request label Jun 29, 2023

armru mentioned this issue Jan 18, 2024

feat: synchronize user-defined physical replication slots #3710

Merged

mnencia closed this as completed in #3710 Feb 6, 2024

gbartolini added this to the 1.23.0 milestone Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage failover of additional physical replication slots #2385

Manage failover of additional physical replication slots #2385

gbartolini commented Jun 29, 2023

Husselbossel commented Oct 23, 2023

Manage failover of additional physical replication slots #2385

Manage failover of additional physical replication slots #2385

Comments

gbartolini commented Jun 29, 2023

Additional Background

Husselbossel commented Oct 23, 2023