Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage failover of additional physical replication slots #2385

Closed
gbartolini opened this issue Jun 29, 2023 · 1 comment · Fixed by #3710
Closed

Manage failover of additional physical replication slots #2385

gbartolini opened this issue Jun 29, 2023 · 1 comment · Fixed by #3710
Labels
enhancement 🪄 New feature or request
Milestone

Comments

@gbartolini
Copy link
Contributor

Since version 1.18, CloudNativePG supports automated management of physical replication slots that the operator directly builds to improve both self-healing and high availability of a PostgreSQL cluster, which also includes surviving after a failover, a capability that Postgres alone cannot offer.

A stanza called .spec.replicationSlots.highAvailability in the Cluster resource spec controls this feature. For details, please refer to the “Additional Background” section below.

The current implementation, however, does not synchronize on the replicas any additional replication slots that have been independently created and are managed by users, preventing them from persisting after a failover.

The proposal is to add a new stanza called synchronizeReplicas in the replicationSlots stanza:

  • enabled: when set to true, every replication slot that is on the primary is synchronized on each standby (by default set to true); the value is changed from true to false, the operator must remove from each standby any replication slot that was previously created by itself
  • excludePatterns: list of regular expression patterns to match the names of replication slots to be excluded (by default empty)

These two parameters will not control the replication slots for high availability mentioned above. However, this feature will reuse the same logic for synchronizing slots on the replicas.

Users that want to take advantage of this feature must monitor such replication slots to ensure they don’t interfere with the operations on the primary.

Here follows an excerpt of an example to configure this feature in the configuration:

replicationSlots:
 synchronizeReplicas:
  enabled: true
  excludePatterns:
  - "^foo"

Additional Background

Replication slots are a native PostgreSQL feature introduced in 9.4 that provides an automated way to ensure that the primary does not remove WAL segments until all the attached streaming replication clients have received them and that the primary does not remove rows which could cause a recovery conflict even when the standby is (temporarily) disconnected. A replication slot exists solely on the instance that created it, and PostgreSQL does not replicate it on the standby servers. As a result, after a failover or a switchover, the new primary does not contain the replication slot from the old primary. Streaming replication clients previously connected to the old primary cannot connect to the new one due to missing replication slots.

CloudNativePG 1.18 introduced the concept of cluster-managed replication slots for High Availability purposes only by automatically managing physical replication slots for each hot standby replica in the High Availability cluster, both in the primary and the standby instances. Specifically, the operator manages slots in the primary (for each of the HA standbys) and in each standby (for any other HA standby in the cluster, by looking at the content of the pg_replication_slots view in the primary and subsequently advancing the local slots at regular intervals using pg_replication_slot_advance())

Users can create and drop replication slots through the pg_create_physical_replication_slot and pg_drop_replication_slot functions in PostgreSQL.

@gbartolini gbartolini added the enhancement 🪄 New feature or request label Jun 29, 2023
@Husselbossel
Copy link

Hi @gbartolini

Great feature, exactly the thing that I'm looking for.
How would you currently advice to solve it, without this feature?
Any direction you can point me in?

Cheers,

Jos

mnencia added a commit that referenced this issue Feb 6, 2024
Extend the existing synchronization of physical replication slots
(currently limited to HA replication slots) to include those defined by
the user on the primary, enhancing self-healing, high availability, and
third-party application integration.

The feature is enabled by default but can be disabled or customized
through exclusion patterns by configuring the newly introduced stanza:
`replicationSlots.synchronizeReplicas`.

Closes #2385

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Signed-off-by: Tao Li <tao.li@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Co-authored-by: Tao Li <tao.li@enterprisedb.com>
Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
@gbartolini gbartolini added this to the 1.23.0 milestone Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🪄 New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants