## How To Pick

### Key Features

- Automation Capabilities
- Reliability and Robustness
- Incremental Backup Support

### Types Of Tools

#### Open Source Standards

- pg_dump / pg_dumpall: built in logical backups
- pg_basebackup: built in PITR solution for snapshots
- pgBackRest: very commonly used (community standard) open source tool that have almost all capabilities needed, required a bit of expertise (uses postgres low level API)
- Barman: EDBs open source tool for backup and recovery
- Percona: tested pg open source tools tested to work together and bundled, enterprise ready open source solution (partnered with Red Hat, AWS, VMWare and more)

#### Managed (paid) Tools

Most of those are no-code or little code to use
- Cloud native backup tools for managed SQL deployments
- SimpleBackups
- EDB (managed)

## Demo With pgBackRest (pg-debian)

### Installation

In [None]:
apt update
apt install pgbackrest
pgbackrest --help

### Setting Up The Environment

In [None]:
pg_createcluster 16 demo -- --data-checksums --auth peer
DB_CONF_FILE=/etc/postgresql/16/demo/postgresql.conf
# Since debian is creating a DB cluster by default, we want to ensure our cluster is on port 5432
sed -i "s/port.*/port = 5432/" $DB_CONF_FILE
pg_lsclusters

#### Configurations

#### Basic Config For Stanza

In [None]:
# Create a configuration file (INI format) with configurations for a single db cluster called "stanza" in pgBackRest jargon
BACK_REST_CONFIG_FILE=/etc/pgbackrest/pgbackrest.conf
mkdir -p /etc/pgbackrest/
printf '[demo]
pg1-path=/var/lib/postgresql/16/demo\n' > $BACK_REST_CONFIG_FILE
chown -R postgres:postgres /etc/pgbackrest/

#### Configure Repository

Most probably will be another server to avoid crashing together with the host, it can be a production setup as well if all the host machine is being backed up with a different backup tool on a file system level

In [None]:
# Add repo path
mkdir -p /var/lib/pgbackrest
chmod 750 /var/lib/pgbackrest
chown postgres:postgres /var/lib/pgbackrest

# Add repo path to demo stanza
printf '\n[global]
repo1-path=/var/lib/pgbackrest\n' >> $BACK_REST_CONFIG_FILE

#### Configure WAL Archiving On Server

In [None]:
# Configure (details in WAL archiving lesson)
sed -i "s/#*archive_command.*/archive_command = 'pgbackrest --stanza=demo archive-push %p'/" $DB_CONF_FILE
sed -i "s/#*archive_mode.*/archive_mode = on/" $DB_CONF_FILE
sed -i "s/#*max_wal_senders.*/max_wal_senders = 3/" $DB_CONF_FILE
sed -i "s/#*wal_level.*/wal_level = replica/" $DB_CONF_FILE
# Check
grep "^archive_command\|^archive_mode\|^max_wal_senders\|^wal_level" $DB_CONF_FILE

#### Configure Archive Push Command

Configurations for the pgbackrest `archive-push` used to push WAL files on archiving trigger (WAL file is switched)

In [None]:
# Add compression to archived WAL files
printf '\n[global:archive-push]
compress-level=3\n' >> $BACK_REST_CONFIG_FILE

#### Configure Backup Retention

To empty disk space we are implementing a retention policy that deletes unneeded backups

It's always a good idea to keep as mush backups as possible to enhance our point in time recovery window

There are 2 types of retention policies:
1. Time based - backups older than x time are deleted if there is at least one backup that is new enough
1. Count based - whenever the backups number exceed the count the oldest are deleted

Retention Policy is enforced by the `expire` command which is called automatically every time a backup is made successfully, but it can also be ran by the user

In [None]:
# Add count based retention
sed -i "s/^\[global\].*/[global]\nrepo1-retention-full=2/" etc/pgbackrest/pgbackrest.conf

#### Configure Repo Encryption

Since the repository basically contains all over production data and even more (historic data as well) it better be secured with encryption!

It's better to encrypt on client side (pgbackrest side) even when the server supports built in encryption (all of the cloud storages are) to make sure that nothing happens in the network between them

In [None]:
# Add count based retention
ENCRYPTION_KEY=$(openssl rand -base64 48)
sed -i "s/^\[global\].*/[global]\nrepo1-cipher-type=aes-256-cbc/" $BACK_REST_CONFIG_FILE
sed -i 's,^\[global\].*,[global]\nrepo1-cipher-pass='"$ENCRYPTION_KEY"',' $BACK_REST_CONFIG_FILE

#### Create stanza

In [None]:
pg_ctlcluster 16 demo start
su postgres -c "pgbackrest --stanza=demo --log-level-console=info stanza-create"

#### Check Configurations

This checks that the configurations for the stanza created and the repositories are valid and that WAL archiving is working as expected

It actually really archives a WAL file by forcing the server to switch wal with `pg_switch_wal()`

In [None]:
su postgres -c "pgbackrest --stanza=demo --log-level-console=info check"

#### Don't Wait For Checkpoint

By default pgBackRest is waiting for a normally scheduled `CHECKPOINT` on PG server, for the demo purposes we will make it create a `CHECKPOINT` for it's purposes

Even in production, in most cases, it's a good idea to turn on this option to be sure that the backup is done in time. Only in very busy servers it can be problematic for performance since the backup routine should be scheduled probably to once a day or so 

In [None]:
sed -i "s/^\[global\].*/[global]\nstart-fast=y/" $BACK_REST_CONFIG_FILE

### Backups

#### Perform Backups

By default pgBachRest tries to perform an incremental backup, but since there is no full backup yet it falls back to full backup

You can change the backup type by passing `--type=` option to backup command

In [None]:
# Full Backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=full backup"

In [None]:
# Incremental Backup uses only full backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"
# This takes the default amount of size + new data
su postgres -c "psql -c 'select * into some_table from generate_series(1,100000)'"
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"

In [None]:
# Diff Backup can only use full backup so uses the same one
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=diff backup"

In [None]:
# Incremental uses the diff backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"

#### Information About Backups

- WAL archive min/max - archived WALs range
- Type of backup
- When performed
- Sizes:
    - db size - actual DB size
    - db backup size - size of data to backup
    - backup size - this backup size
    - set backup size - the amount of data to perform a valid backup (in incremental is calculated recursively)
- Backup reference list - dependencies

In [None]:
su postgres -c "pgbackrest info"

#### Schedule Backups

In [None]:
# Add weekly full backup on sundays
su postgres
(crontab -l; echo "0 0 * * 0 pgbackrest --stanza=demo --type=full backup") | awk '!x[$0]++' | crontab -
# Add weekly diff backup on wendsdays
(crontab -l; echo "0 0 * * 3 pgbackrest --stanza=demo --type=diff backup") | awk '!x[$0]++' | crontab -
# Add daily incremental backup
(crontab -l; echo "0 0 * * 1-2,4-6 pgbackrest --stanza=demo --type=incr backup") | awk '!x[$0]++' | crontab -
exit

### Recovery

#### Regular

Let's see a use case where `pg_control` file which contains crucial information about last REDO point (discussed in detail in WAL lesson) is removed

The db can't start without it!

Let's save it with a recover

In [None]:
# Stop and remove file
pg_ctlcluster 16 demo stop
su postgres -c "rm /var/lib/postgresql/16/demo/global/pg_control"
# Try to start again
pg_ctlcluster 16 demo start
tail -n 20 /var/log/postgresql/postgresql-16-demo.log
# Damm...

In [None]:
# Let's recover
# First, remove all db files
su postgres -c "find /var/lib/postgresql/16/demo -mindepth 1 -delete"
ls /var/lib/postgresql/16/demo/ # Nothing
# Now let's restore
su postgres -c "pgbackrest --stanza=demo restore"
ls /var/lib/postgresql/16/demo/ # It's all here!
pg_ctlcluster 16 demo start # All good just do recovery startup based on pg_wal
tail -n 20 /var/log/postgresql/postgresql-16-demo.log

#### Particular point in time