## Lesson 8. Full Backup - ARCHIVE Mode

### Theory: STREAM vs ARCHIVE Mode

In the previous lesson, we used STREAM mode. Now let's explore ARCHIVE mode.

| Feature | STREAM Mode | ARCHIVE Mode |
|---------|-------------|---------------|
| WAL source | Streamed during backup | Continuously archived |
| PITR support | No | Yes |
| Setup complexity | Simple | More complex |
| Backup self-contained | Yes | Depends on WAL archive |

**ARCHIVE mode enables Point-in-Time Recovery (PITR)** - you can restore to any moment after the backup was created.

### How WAL Archiving Works

1. PostgreSQL writes changes to WAL (Write-Ahead Log)
2. `archive_command` copies completed WAL segments to backup location
3. pg_probackup uses archived WAL for recovery

### 1. Setup environment variables

In [None]:
import os
import time

os.environ['PG_CONFIG'] = '/usr/local/pgsql-17/bin/pg_config'
os.environ['PGPROBACKUPBIN'] = '/usr/local/pgsql-17/bin/pg_probackup'
os.environ['PGPROBACKUP_TMP_DIR'] = 'pg_probackup_demo'

### 2. Initialize pg_probackup2 environment

In [None]:
import testgres
from pg_probackup2.app import ProbackupApp
from pg_probackup2.init_helpers import Init
from pg_probackup2.storage.fs_backup import FSTestBackupDir


init_params = Init()
pg_node = testgres.NodeApp()

backup_dir = FSTestBackupDir(rel_path='rel_backup_dir', backup='backup_demo')
print(f"Backup catalog directory:\n{backup_dir.path}")

pb = ProbackupApp(
    pg_node=pg_node,
    pb_log_path=os.path.join(backup_dir.path, 'log'),
    backup_dir=backup_dir,
    probackup_path=init_params.probackup_path
)

### 3. Initialize backup catalog first

For ARCHIVE mode, we need to initialize the catalog BEFORE configuring `archive_command`, because the archive command needs to know where to push WAL files.

> pg_probackup init -B /mnt/backups

In [None]:
pb.init()

### 4. Create PostgreSQL cluster with ARCHIVE mode configuration

Required settings for WAL archiving:

```
wal_level = replica
archive_mode = on
archive_command = 'pg_probackup archive-push -B /mnt/backups --instance=main --wal-file-path=%p --wal-file-name=%f'
```

> initdb -D <data_directory>

In [None]:
node = pg_node.make_simple(
    base_dir=os.path.join(init_params.tmp_path, 'pg_node'),
    pg_options={
        "unix_socket_directories": "/tmp",
        "wal_level": "replica",
        "max_wal_senders": "2"
    }
)
print(f"PostgreSQL data directory: {node.data_dir}")
node.status()

### 5. Add instance to backup catalog

We need to add the instance BEFORE starting the node, so archive_command can work immediately.

> pg_probackup add-instance -B /mnt/backups -D /var/lib/pgpro/std-17/data --instance=main

In [None]:
pb.add_instance(instance='main', node=node)

### 6. Configure archive_command

The `archive_command` tells PostgreSQL how to archive WAL segments.

**pg_probackup archive-push parameters:**
- `-B` - path to backup catalog
- `--instance` - instance name
- `--wal-file-path=%p` - source WAL file path (%p is PostgreSQL placeholder)
- `--wal-file-name=%f` - WAL file name (%f is PostgreSQL placeholder)

> ALTER SYSTEM SET archive_mode = 'on';
> ALTER SYSTEM SET archive_command = 'pg_probackup archive-push ...';

In [None]:
# Build archive_command using pg_probackup path
archive_cmd = (
    f"{init_params.probackup_path} archive-push "
    f"-B {backup_dir.path} "
    f"--instance=main "
    f"--wal-file-path=%p "
    f"--wal-file-name=%f"
)

# Enable archive mode
node.append_conf('postgresql.conf', "archive_mode = on")
node.append_conf('postgresql.conf', f"archive_command = '{archive_cmd}'")

print(f"archive_command configured:\n{archive_cmd}")

### 7. Start PostgreSQL and verify archive settings

> pg_ctl start -D <data_directory>
> psql -c "SHOW archive_mode;"

In [None]:
node.slow_start()
print(f"PostgreSQL port: {node.port}")

# Verify archive settings
archive_mode = node.execute('postgres', 'SHOW archive_mode;')[0][0]
wal_level = node.execute('postgres', 'SHOW wal_level;')[0][0]

print(f"\nArchive configuration:")
print(f"  archive_mode: {archive_mode}")
print(f"  wal_level: {wal_level}")

node.status()

### 8. Create sample database with data

In [None]:
node.execute('postgres', 'CREATE DATABASE inventory;')

node.execute('inventory', '''
CREATE TABLE warehouses (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    location TEXT,
    capacity INT
);

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    sku TEXT UNIQUE NOT NULL,
    name TEXT NOT NULL,
    warehouse_id INT REFERENCES warehouses(id),
    quantity INT DEFAULT 0
);

INSERT INTO warehouses (name, location, capacity) VALUES
    ('Main Warehouse', 'New York', 10000),
    ('West Coast', 'Los Angeles', 8000),
    ('Distribution Center', 'Chicago', 5000);

INSERT INTO items (sku, name, warehouse_id, quantity) VALUES
    ('SKU-001', 'Widget A', 1, 500),
    ('SKU-002', 'Widget B', 1, 300),
    ('SKU-003', 'Gadget X', 2, 150);
''')

result = node.execute('inventory', 'SELECT COUNT(*) FROM warehouses;')
print(f"Created {result[0][0]} warehouses")
result = node.execute('inventory', 'SELECT COUNT(*) FROM items;')
print(f"Created {result[0][0]} items")

### 9. Force WAL switch and verify archiving

Let's force a WAL segment switch to create a new segment and trigger the archive_command.

> SELECT pg_switch_wal();

In [None]:
# Force WAL switch
node.execute('postgres', 'SELECT pg_switch_wal();')
print("Forced WAL switch")

# Wait for archive_command to execute
time.sleep(2)

# Check archive status
result = node.execute('postgres', '''
    SELECT archived_count, failed_count, last_archived_wal
    FROM pg_stat_archiver;
''')
print(f"\nArchive statistics:")
print(f"  Archived count: {result[0][0]}")
print(f"  Failed count: {result[0][1]}")
print(f"  Last archived: {result[0][2]}")

### 10. View WAL archive directory

Archived WAL files are stored in the backup catalog:

```
/mnt/backups/wal/main/
├── 000000010000000000000001
├── 000000010000000000000002
└── ...
```

In [None]:
wal_archive_dir = os.path.join(backup_dir.path, 'wal', 'main')
print(f"WAL archive directory: {wal_archive_dir}\n")

if os.path.exists(wal_archive_dir):
    wal_files = sorted(os.listdir(wal_archive_dir))
    print(f"Archived WAL files: {len(wal_files)}")
    for f in wal_files[:5]:
        print(f"  - {f}")
else:
    print("WAL archive directory not yet created (may take a moment)")

### 11. Create FULL backup in ARCHIVE mode

```bash
pg_probackup backup -B /mnt/backups --instance=main -b FULL --archive
```

In [None]:
backup_id = pb.backup_node(
    instance='main',
    node=node,
    backup_type='full',
    options=['--archive', f'--pguser={init_params.username}']
)
print(f"Backup ID: {backup_id}")

### 12. Add more data (for PITR demonstration)

In [None]:
for i in range(3):
    node.execute('inventory', f'''
        INSERT INTO items (sku, name, warehouse_id, quantity)
        VALUES ('SKU-{100+i}', 'New Item {i+1}', 1, {(i+1)*100});
    ''')
    time.sleep(1)
    print(f"Added item {i+1}")

result = node.execute('inventory', 'SELECT COUNT(*) FROM items;')
print(f"\nTotal items now: {result[0][0]}")

### 13. View backup and archive information

> pg_probackup show -B /mnt/backups --instance=main

In [None]:
print(pb.show(as_text=True, as_json=False))

### 14. Validate backup

> pg_probackup validate -B /mnt/backups --instance=main

In [None]:
print(pb.validate(instance='main'))

### 15. Theory: Point-in-Time Recovery (PITR)

With ARCHIVE mode and archived WAL, you can restore to any point in time:

```bash
pg_probackup restore -B /mnt/backups --instance=main \
    -D /path/to/new/pgdata \
    --recovery-target-time="2025-01-15 14:30:00"
```

**Recovery target options:**

| Option | Description |
|--------|-------------|
| `--recovery-target-time` | Restore to specific timestamp |
| `--recovery-target-xid` | Restore to specific transaction ID |
| `--recovery-target-lsn` | Restore to specific LSN |
| `--recovery-target-name` | Restore to named restore point |

### Cleanup

In [None]:
node.stop()

import shutil
shutil.rmtree(os.environ['PGPROBACKUP_TMP_DIR'])