Kubernetes Volume Snapshots support #2081

gbartolini · 2023-05-24T09:27:46Z

Currently, CloudNativePG supports object stores only for continuous backup, requiring that both WAL files and base backups reside in the same backup object store.

However, this technique is not adequate for very large database scenarios, especially on the recovery side with high values of RTO following a (rare) full disaster.

This epic ticket proposes the introduction in CloudNativePG of:

Hot (online) physical base backups for Postgres databases through Kubernetes native volume snapshots
Full recovery and Point In Time Recovery (PITR) through Kubernetes native volume snapshots

Both capabilities need to be declarative.

Volume snapshotting was first introduced in Kubernetes 1.12 as alpha, promoted to beta in 1.17, and moved to GA in 1.20. It’s now stable and widely available, and provides 3 custom resource definitions: VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass.

Thanks to the transparent support for both incremental and differential copy that volume snapshots provide through the underlying storage classes, this feature has two main benefits:

make CloudNativePG excel in the Very Large DataBase (VLDB) context
make CloudNativePG more “Kubernetes native”, by leveraging the Kubernetes API for volume snapshotting to be used both for backup and recovery operations, including across Kubernetes clusters, by delegating the complexity and management to the underlying storage classes

Hot backup's been the default in PostgreSQL for 20 years now, and one of the core capabilities that guaranteed success to Postgres. It enables taking physical base backups from any instance in the Postgres cluster (either primary or standby) without shutting down the server, and it requires WAL archiving (currently only on object stores).

As part of this activity, we should also address cold (offline) physical base backups and "database snapshot" recovery situations. Opposed to hot backup, cold backup is a technique that enables the operator to also work in the worst case scenario where a WAL archive is not present and “standalone consistent database snapshots” are accepted: this technique basically fences a replica, takes a snapshot and restarts it, without disrupting the primary.

Volume snapshots can have the potential to become the main base backup method in the public cloud, as well as in some enterprise on-premise environments.

The main general idea of this feature is to extend the existing API of the operator for “Backup”, “ScheduledBackup” and “Cluster” CRDs to consider another way of taking physical base backups on top of the existing one based on object stores with Barman Cloud, as well as of recovering, in a declarative way.

The text was updated successfully, but these errors were encountered:

gbartolini · 2023-10-26T09:47:01Z

The initial release of volume snapshots (version 1.21.0) only supported cold backups, which required fencing of the instance. This limitation has been waived starting with version 1.21.1. Given the minimal impact of the change on the code, maintainers have decided to backport this feature immediately instead of waiting for version 1.22.0 to be out, and make online backups the default behavior on volume snapshots too. If you are planning to rely instead on cold backups, make sure you follow the instructions below.

The initial release of volume snapshots (version 1.21.0) only supported cold backups, which required fencing of the instance. This patch waives that limitation through the introduction of the following options in the `.spec.backup.volumeSnapshot` stanza of the `Cluster` resource: - `online`: accepting `true` (default) or `false` as a value - `onlineConfiguration.immediateCheckpoint`: whether you want to request an immediate checkpoint before you start the backup procedure or not; technically, it corresponds to the `fast` argument you pass to the `pg_backup_start`/`pg_start_backup()` function in PostgreSQL, accepting `true` (default) or `false` - `onlineConfiguration.waitForArchive`: whether you want to wait for the archiver to process the last segment of the backup or not; technically, it corresponds to the `wait_for_archive` argument you pass to the `pg_backup_stop`/`pg_stop_backup()` function in PostgreSQL, accepting `true` (default) or `false` Given the minimal impact of the change on the code, the patch is also backported to the release-1.21 branch. Online backups are now the default behavior on volume snapshots too. Closes #2081 Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>

The initial release of volume snapshots (version 1.21.0) only supported cold backups, which required fencing of the instance. This patch waives that limitation through the introduction of the following options in the `.spec.backup.volumeSnapshot` stanza of the `Cluster` resource: - `online`: accepting `true` (default) or `false` as a value - `onlineConfiguration.immediateCheckpoint`: whether you want to request an immediate checkpoint before you start the backup procedure or not; technically, it corresponds to the `fast` argument you pass to the `pg_backup_start`/`pg_start_backup()` function in PostgreSQL, accepting `true` (default) or `false` - `onlineConfiguration.waitForArchive`: whether you want to wait for the archiver to process the last segment of the backup or not; technically, it corresponds to the `wait_for_archive` argument you pass to the `pg_backup_stop`/`pg_stop_backup()` function in PostgreSQL, accepting `true` (default) or `false` Given the minimal impact of the change on the code, the patch is also backported to the release-1.21 branch. Online backups are now the default behavior on volume snapshots too. Closes #2081 Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com> Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com> Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com> Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> (cherry picked from commit 4e82248)

gbartolini added the enhancement 🪄 New feature or request label May 24, 2023

gbartolini mentioned this issue May 24, 2023

Support recovery from VolumeSnapshot #1968

Closed

gbartolini mentioned this issue Aug 5, 2023

Reorganize the backup and recovery section in the docs #2536

Closed

gbartolini modified the milestones: 1.21.0, 1.22.0 Oct 12, 2023

gbartolini modified the milestones: 1.21.0, 1.22.0, 1.21.1, 1.20.4, 1.19.6 Oct 23, 2023

gbartolini linked a pull request Oct 26, 2023 that will close this issue

feat: PostgreSQL online/hot backup with volume snapshots #3102

Merged

gbartolini mentioned this issue Oct 26, 2023

feat: PostgreSQL online/hot backup with volume snapshots #3102

Merged

mnencia closed this as completed in #3102 Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes Volume Snapshots support #2081

Kubernetes Volume Snapshots support #2081

gbartolini commented May 24, 2023 •

edited

gbartolini commented Oct 26, 2023

Kubernetes Volume Snapshots support #2081

Kubernetes Volume Snapshots support #2081

Comments

gbartolini commented May 24, 2023 • edited

gbartolini commented Oct 26, 2023

gbartolini commented May 24, 2023 •

edited