Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes Volume Snapshots support #2081

Closed
gbartolini opened this issue May 24, 2023 · 1 comment · Fixed by #3102
Closed

Kubernetes Volume Snapshots support #2081

gbartolini opened this issue May 24, 2023 · 1 comment · Fixed by #3102
Labels
enhancement 🪄 New feature or request

Comments

@gbartolini
Copy link
Contributor

gbartolini commented May 24, 2023

Currently, CloudNativePG supports object stores only for continuous backup, requiring that both WAL files and base backups reside in the same backup object store.

However, this technique is not adequate for very large database scenarios, especially on the recovery side with high values of RTO following a (rare) full disaster.

This epic ticket proposes the introduction in CloudNativePG of:

  • Hot (online) physical base backups for Postgres databases through Kubernetes native volume snapshots
  • Full recovery and Point In Time Recovery (PITR) through Kubernetes native volume snapshots

Both capabilities need to be declarative.

Volume snapshotting was first introduced in Kubernetes 1.12 as alpha, promoted to beta in 1.17, and moved to GA in 1.20. It’s now stable and widely available, and provides 3 custom resource definitions: VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass.

Thanks to the transparent support for both incremental and differential copy that volume snapshots provide through the underlying storage classes, this feature has two main benefits:

  • make CloudNativePG excel in the Very Large DataBase (VLDB) context
  • make CloudNativePG more “Kubernetes native”, by leveraging the Kubernetes API for volume snapshotting to be used both for backup and recovery operations, including across Kubernetes clusters, by delegating the complexity and management to the underlying storage classes

Hot backup's been the default in PostgreSQL for 20 years now, and one of the core capabilities that guaranteed success to Postgres. It enables taking physical base backups from any instance in the Postgres cluster (either primary or standby) without shutting down the server, and it requires WAL archiving (currently only on object stores).

As part of this activity, we should also address cold (offline) physical base backups and "database snapshot" recovery situations. Opposed to hot backup, cold backup is a technique that enables the operator to also work in the worst case scenario where a WAL archive is not present and “standalone consistent database snapshots” are accepted: this technique basically fences a replica, takes a snapshot and restarts it, without disrupting the primary.

Volume snapshots can have the potential to become the main base backup method in the public cloud, as well as in some enterprise on-premise environments.

The main general idea of this feature is to extend the existing API of the operator for “Backup”, “ScheduledBackup” and “Cluster” CRDs to consider another way of taking physical base backups on top of the existing one based on object stores with Barman Cloud, as well as of recovering, in a declarative way.

@gbartolini
Copy link
Contributor Author

The initial release of volume snapshots (version 1.21.0) only supported cold backups, which required fencing of the instance. This limitation has been waived starting with version 1.21.1. Given the minimal impact of the change on the code, maintainers have decided to backport this feature immediately instead of waiting for version 1.22.0 to be out, and make online backups the default behavior on volume snapshots too. If you are planning to rely instead on cold backups, make sure you follow the instructions below.

@gbartolini gbartolini linked a pull request Oct 26, 2023 that will close this issue
mnencia added a commit that referenced this issue Oct 26, 2023
The initial release of volume snapshots (version 1.21.0) only
supported cold backups, which required fencing of the instance.
This patch waives that limitation through the introduction of the
following options in the `.spec.backup.volumeSnapshot` stanza of the
`Cluster` resource:

- `online`: accepting `true` (default) or `false` as a value

- `onlineConfiguration.immediateCheckpoint`: whether you want to
  request an immediate checkpoint before you start the backup
  procedure or not; technically, it corresponds to the `fast` argument
  you pass to the `pg_backup_start`/`pg_start_backup()` function in
  PostgreSQL, accepting `true` (default) or `false`

- `onlineConfiguration.waitForArchive`: whether you want to wait for
  the archiver to process the last segment of the backup or not;
  technically, it corresponds to the `wait_for_archive` argument you
  pass to the `pg_backup_stop`/`pg_stop_backup()` function in
  PostgreSQL, accepting `true` (default) or `false`

Given the minimal impact of the change on the code, the patch is also
backported to the release-1.21 branch.

Online backups are now the default behavior on volume snapshots too.

Closes #2081

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
cnpg-bot pushed a commit that referenced this issue Oct 26, 2023
The initial release of volume snapshots (version 1.21.0) only
supported cold backups, which required fencing of the instance.
This patch waives that limitation through the introduction of the
following options in the `.spec.backup.volumeSnapshot` stanza of the
`Cluster` resource:

- `online`: accepting `true` (default) or `false` as a value

- `onlineConfiguration.immediateCheckpoint`: whether you want to
  request an immediate checkpoint before you start the backup
  procedure or not; technically, it corresponds to the `fast` argument
  you pass to the `pg_backup_start`/`pg_start_backup()` function in
  PostgreSQL, accepting `true` (default) or `false`

- `onlineConfiguration.waitForArchive`: whether you want to wait for
  the archiver to process the last segment of the backup or not;
  technically, it corresponds to the `wait_for_archive` argument you
  pass to the `pg_backup_stop`/`pg_stop_backup()` function in
  PostgreSQL, accepting `true` (default) or `false`

Given the minimal impact of the change on the code, the patch is also
backported to the release-1.21 branch.

Online backups are now the default behavior on volume snapshots too.

Closes #2081

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Co-authored-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Co-authored-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
(cherry picked from commit 4e82248)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🪄 New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant