# Backup

It's time to actually do the backups!

## Types Of Backups

### Full Backup

- A complete backup of the entire PostgreSQL cluster.
- Includes all database files, configuration files, and WAL (Write-Ahead Logging) files.
- Serves as the base for all other backup types and is the largest in size and takes the most time to complete.

### Differential Backup

- A backup of all the changes made since the last full backup.
- Smaller and faster than a full backup but larger and slower than an incremental backup.
- Requires a preceding full backup to be useful for recovery.

### Incremental Backup

- A backup of all the changes made since the last full, differential, or incremental backup.
- Smallest and fastest but requires all previous backups chain back to the `full` one for a complete recovery.
- Also known as a "delta" backup.

## Perform Backups

By default pgBachRest tries to perform an incremental backup, but since there is no full backup yet it falls back to full backup

You can change the backup type by passing `--type=` option to backup command

```BASH
# Full Backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=full backup"
```

```BASH
# Incremental Backup uses only full backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"
# This takes the default amount of size + new data
su postgres -c "psql -c 'select * into some_table from generate_series(1,100000)'"
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"
```

```BASH
# Diff Backup can only use full backup so uses the same one
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=diff backup"
```

```BASH
# Incremental uses the diff backup
su postgres -c "pgbackrest --stanza=demo --log-level-console=info --type=incr backup"
```

## Information About Backups

```BASH
su postgres -c "pgbackrest info"
```

- WAL archive min/max - archived WALs range
- Type of backup
- When performed
- Sizes:
    - db size - actual DB size
    - db backup size - size of data to backup
    - backup size - this backup size
    - set backup size - the amount of data to perform a valid backup (in incremental is calculated recursively)
- Backup reference list - dependencies

## Schedule Backups

We can schedule backups by using the widely used `cron` linux scheduler.

```BASH
# Add weekly full backup on sundays
su postgres
(crontab -l; echo "0 0 * * 0 pgbackrest --stanza=demo --type=full backup") | awk '!x[$0]++' | crontab -
# Add weekly diff backup on wendsdays
(crontab -l; echo "0 0 * * 3 pgbackrest --stanza=demo --type=diff backup") | awk '!x[$0]++' | crontab -
# Add daily incremental backup
(crontab -l; echo "0 0 * * 1-2,4-6 pgbackrest --stanza=demo --type=incr backup") | awk '!x[$0]++' | crontab -
exit
```

### Optimizations

#### File Bundling

**Configuration**:
- Name: `repo-bundle`
- Default: `n`
- Allowed: `n / y`

Postgres file centric approach (for example fork per relation) is demanding a lot of DB files on disk. For Postgres it's easier to manage this way but for a backup it's an unnecessary overhead that costs disk space and performance issues on file object stores like cloud object storage (AWS S3, Azure BLOBs, ...)

To avoid that, pgBackRest can bundle the file contents to a summarized file and keep a manifest file to restore it correctly.

Advantages:
- Less files
- Less storage consume
- Quicker copy on backup especially with object stores
- Empty files are not copied, just logged in manifest

Disadvantages
- Can't stop and resume
- Can't take a single file from backup

#### Block Incremental

**Configuration**:
- Name: `repo-block`
- Default: `n`
- Allowed: `n / y`

You can allow incremental backup even on file level - only the delta between backed up file and the current state is stored.

**Note**: This option is dependent on `repo-bundle` option

#### Parallel Backup / Recovery

**Configuration**:
- Name:`process-max`
- Default: `1`
- Allowed: `1-999`

Use multiple processes to compress and transfer backup on `backup` or `restore`

On `backup` should not use too much processes since the DB needs the CPU resources as well

On `restore` you can potentially utilize most of the server resources because there is no competition and time is key for low downtime

### Recovery

#### Regular

Let's see a use case where `pg_control` file which contains crucial information about last REDO point (discussed in detail in WAL lesson) is removed

The db can't start without it!

Let's save it with a recover

In [None]:
# Stop and remove file
pg_ctlcluster 16 demo stop
su postgres -c "rm /var/lib/postgresql/16/demo/global/pg_control"
# Try to start again
pg_ctlcluster 16 demo start
tail -n 20 /var/log/postgresql/postgresql-16-demo.log
# Damm...

In [None]:
# Let's recover
# First, remove all db files
su postgres -c "find /var/lib/postgresql/16/demo -mindepth 1 -delete"
ls /var/lib/postgresql/16/demo/ # Nothing
# Now let's restore
su postgres -c "pgbackrest --stanza=demo restore"
ls /var/lib/postgresql/16/demo/ # It's all here!
pg_ctlcluster 16 demo start # All good just do recovery startup based on pg_wal
tail -n 20 /var/log/postgresql/postgresql-16-demo.log

#### Delta Recovery

Configuration:
`delta=y` Config File
`--delta` Run Time

It's possible and strongly advised to avoid deleting all the files for DB restore phase. With this option, files are checked via hash function check equality to backup and equal files are skipped.

#### Selected DBs

Recover only selected DBs instead of all

Use Cases:
- Recover a specific DB in a different machine
- Performance - incremental recovery

#### Particular point in time

Let's demonstrate a common PITR scenario -> redo a human error like drop important table

In [None]:
# Log MOD (DLL + Modifications) statements to be able to determine when the DROP happened
su postgres
sed -i "s/#*log_statement.*/log_statement = 'mod'/" $DB_CONF_FILE
pg_ctlcluster 16 demo reload

In [None]:
# Create table
pg_ctlcluster -D 16 demo reload
psql << EOM
    SET log_statement
    BEGIN; 
    DROP TABLE IF EXISTS important_table;
    CREATE TABLE important_table (message TEXT); 
    INSERT INTO important_table VALUES ('Important Data'); 
    COMMIT; 
    SELECT * FROM important_table;
EOM

In [None]:
# Drop table
psql -c "DROP TABLE important_table;"
psql -c "SELECT * FROM important_table;" # error!

In [None]:
# Recovery
## Find time of query
DROP_QUERY_TS=$(grep "DROP TABLE important_table;" /var/log/postgresql/postgresql-16-demo.log | tail -1 | cut -d" " -f1,2)
echo "Drop query ts is: $DROP_QUERY_TS"
RECOVERY_TS=$(date -d "$(date -d "$DROP_QUERY_TS") - 2 seconds" +"%Y-%m-%d %H:%M:%S")
echo "Recovery ts, right before drop happened is: $RECOVERY_TS"

## Stop Server
pg_ctlcluster 16 demo stop

## Restore until timestamp
pgbackrest --stanza=demo --delta \
    --type=time "--target=$RECOVERY_TS" \
    --target-action=promote \
    --log-level-console=info restore
# Type - time / lsn
# Target Action

## Start server
pg_ctlcluster 16 demo start
psql -c "SELECT * FROM important_table;" # We are saved!

## Checkout server log
cat /var/log/postgresql/postgresql-16-demo.log