# Chapter 33: Streaming Replication (Physical)

Streaming replication provides high availability and read scaling by maintaining one or more standby servers that continuously apply changes from the primary. This chapter covers the architecture, configuration, and operational procedures for robust physical replication, emphasizing the critical tradeoffs between consistency, availability, and performance that define production replication topologies.

---

## 33.1 Replication Architecture Fundamentals

### 33.1.1 Primary-Standby Concepts

PostgreSQL physical replication operates on a single-primary, multi-standby model where the primary server streams Write-Ahead Log (WAL) records to standby servers.

**Core Components**:
- **Primary**: The read-write source of truth; generates WAL
- **Standby**: Read-only replica; applies WAL via startup process
- **WAL Sender**: Primary process that transmits WAL to standbys
- **WAL Receiver**: Standby process that receives and writes WAL to disk
- **Startup Process**: Standby process that replays WAL into the database

**Replication Flow**:
```
Primary: INSERT → WAL Record → WAL Sender → Network → WAL Receiver → pg_wal/ → Startup Process → Data Files
```

### 33.1.2 Hot Standby vs Warm Standby

**Hot Standby** (Production Standard):
```ini
# postgresql.conf on standby
hot_standby = on
```
- Accepts read-only queries while applying replication
- Requires `wal_level = replica` or higher on primary
- May see replication lag (slight delay between primary commit and standby visibility)

**Warm Standby**:
```ini
hot_standby = off  # Default before recovery completes
```
- Does not accept connections during recovery
- Used for disaster recovery only, not read scaling
- Slightly faster WAL application (no query conflict resolution)

### 33.1.3 WAL Shipping vs Streaming

**WAL Shipping** (Archive-based):
- Standby restores from `restore_command` (files copied from archive)
- Delayed by archive interval (typically minutes)
- Used as fallback when streaming disconnects

**Streaming Replication** (Real-time):
- Direct TCP connection between primary and standby
- Synchronous or asynchronous
- Near-real-time lag (milliseconds to seconds)

**Hybrid Mode** (Industry Standard):
```ini
# Standby configuration
restore_command = 'cp /archive/%f %p'  # Fallback
primary_conninfo = 'host=primary.internal port=5432 user=repl_user ...'  # Streaming
```
- Uses streaming when connected
- Falls back to WAL shipping if network interrupted
- Ensures zero data loss during temporary network partitions (if archiving configured)

---

## 33.2 Primary Server Configuration

### 33.2.1 WAL Level and Replication Parameters

```ini
# postgresql.conf on PRIMARY

# 1. WAL Generation (requires restart)
wal_level = replica          # Minimum for physical replication
                             # 'replica' = supports archiving and hot standby
                             # 'logical' = adds logical decoding (more WAL volume)

# 2. Connection Slots
max_wal_senders = 10         # Maximum concurrent replication connections
                             # Count: 1 per standby + 1 per pg_basebackup + headroom

max_replication_slots = 10   # Maximum replication slots (must be >= max_wal_senders)

# 3. WAL Retention
wal_keep_size = 1GB          # Minimum WAL to retain for streaming connections (PG13+)
                             # Prevents deletion of WAL still needed by lagging standby
                             # Previously wal_keep_segments in PG12 and earlier

# 4. Archive Mode (optional but recommended for PITR and as safety net)
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
                             # Or use wal-g, pgbackrest, etc.

# 5. Checkpointing (balance between recovery time and I/O)
checkpoint_timeout = 10min
max_wal_size = 4GB
checkpoint_completion_target = 0.9
```

### 33.2.2 Replication User Creation

**Security Principle**: Dedicated replication user with minimal privileges.

```sql
-- On PRIMARY
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'cryptographically_random_string';

-- Do NOT grant superuser or createdb
-- REPLICATION privilege allows:
-- - Streaming replication connections
-- - pg_basebackup
-- - pg_dump with --snapshot
-- - pg_start_backup/pg_stop_backup

-- Optional: Connection limit to prevent resource exhaustion
ALTER USER replicator WITH CONNECTION LIMIT 5;
```

**pg_hba.conf Entries**:
```conf
# Replication connections (hostssl mandatory for production)
hostssl replication replicator 10.0.2.0/24 scram-sha-256

# Specific standby IPs (more secure)
hostssl replication replicator 10.0.2.10/32 scram-sha-256  # standby-1
hostssl replication replicator 10.0.2.11/32 scram-sha-256  # standby-2

# Never allow replication from 0.0.0.0/0 (wildcards)
```

---

## 33.3 Standby Server Setup

### 33.3.1 Initial Base Backup

The standby begins as a byte-for-byte copy of the primary.

```bash
# On STANDBY server as postgres user

# 1. Create data directory
mkdir -p /var/lib/postgresql/16/main
chown postgres:postgres /var/lib/postgresql/16/main
chmod 700 /var/lib/postgresql/16/main

# 2. Execute pg_basebackup from standby machine
pg_basebackup \
    -h primary.internal \
    -p 5432 \
    -U replicator \
    -D /var/lib/postgresql/16/main \
    -Fp \                      # Plain format (files as-is)
    -Xs \                      # Stream WAL during backup (essential)
    -P \                       # Progress bar
    -v \                       # Verbose
    -W \                       # Force password prompt (or use .pgpass)
    --checkpoint=fast \        # Immediate checkpoint (don't wait)
    --wal-method=stream        # Stream WAL (fetch if PG < 10)

# 3. Verify backup_label exists (proves base backup taken)
cat /var/lib/postgresql/16/main/backup_label
# OUTPUT: START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
#         CHECKPOINT LOCATION: 0/2000060
#         BACKUP METHOD: streamed
#         BACKUP FROM: primary
#         START TIME: 2024-10-02 14:30:00 GMT
```

### 33.3.2 Standby Configuration

```ini
# postgresql.conf on STANDBY

# Connection to primary
primary_conninfo = 'host=primary.internal port=5432 user=replicator password=secret sslmode=require application_name=standby_1'

# Hot standby for read queries
hot_standby = on

# Recovery settings
restore_command = 'cp /archive/%f %p'  # Fallback if streaming lags

# Optional: Delayed replication (disaster recovery protection against operator error)
recovery_min_apply_delay = 5min        # Apply WAL 5 minutes after primary
                                       # Prevents cascading deletes/corruption to standby immediately
```

**Standby Signal File** (PostgreSQL 12+):
```bash
# Create file to indicate this is a standby (not a crashed primary)
touch /var/lib/postgresql/16/main/standby.signal

# For PostgreSQL 11 and earlier, used recovery.conf (deprecated)
```

### 33.3.3 Read-Only Workload Tuning

Standby servers benefit from different tuning than primaries.

```ini
# postgresql.conf on STANDBY (read replica)

# More aggressive query planning for reporting workloads
random_page_cost = 1.1         # Assume SSD/NVMe, encourage index usage
effective_cache_size = 24GB    # Adjust to available RAM

# Hot standby feedback (prevent query cancellation from cleanup records)
hot_standby_feedback = on      # Send feedback to primary about queries in progress
                               # Prevents primary from removing dead tuples still needed by standby
                               # Tradeoff: May cause bloat on primary if standby has long queries

# Max conflict resolution delay
max_standby_streaming_delay = 30s    # Cancel queries blocking WAL application after 30s
max_standby_archive_delay = 60s      # Same for archive recovery
```

---

## 33.4 Synchronous vs Asynchronous Replication

### 33.4.1 Asynchronous Replication (Default)

**Characteristics**:
- Primary commits immediately; WAL streamed asynchronously
- Near-zero latency impact on primary
- Risk: Recent commits may be lost if primary fails (RPO > 0)
- Use case: Read scaling, disaster recovery with acceptable data loss (seconds)

```ini
# Primary configuration (default)
synchronous_commit = on        # Local durability only
synchronous_standby_names = '' # Empty = asynchronous
```

**Monitoring Lag**:
```sql
-- On PRIMARY: Check replication lag
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as replication_lag
FROM pg_stat_replication;

-- On STANDBY: Check received vs applied
SELECT 
    pg_last_wal_receive_lsn() as received,
    pg_last_wal_replay_lsn() as replayed,
    pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) as apply_lag_bytes,
    pg_last_xact_replay_timestamp() as last_replay_time,
    NOW() - pg_last_xact_replay_timestamp() as lag_interval;
```

### 33.4.2 Synchronous Replication

**Characteristics**:
- Primary waits for standby confirmation before committing
- Zero data loss (RPO = 0) if standby up-to-date at failure
- Latency impact: Round-trip time to standby
- Risk: If standby fails, primary may halt writes (unless configured with priority)

**Configuration Levels**:
```ini
# postgresql.conf on PRIMARY

# Option 1: ANY (quorum commit, PG10+)
# Wait for ANY 1 of 2 standbys (majority of 2 is 2, but ANY allows 1)
synchronous_standby_names = 'ANY 1 (standby_1, standby_2)'

# Option 2: FIRST (priority order)
# Wait for first 2 in list (standby_1 preferred, then standby_2)
synchronous_standby_names = 'FIRST 2 (standby_1, standby_2)'

# Option 3: Single standby (older style)
synchronous_standby_names = 'standby_1'

# Application choice per transaction (flexible)
synchronous_commit = remote_apply  # Strictest: visible on standby
# Other options:
# - remote_write: Written to standby OS (not fsynced)
# - remote_flush: Fsynced on standby (durable but not visible)
# - local: Primary only (async for this transaction)
```

**Application-Level Control**:
```sql
-- Critical transaction (wait for sync replica)
BEGIN;
SET LOCAL synchronous_commit = remote_apply;
INSERT INTO payments (...) VALUES (...);
COMMIT;

-- Non-critical transaction (async for speed)
BEGIN;
SET LOCAL synchronous_commit = local;
INSERT INTO logs (...) VALUES (...);
COMMIT;
```

### 33.4.3 Tradeoff Matrix

| Mode | Durability | Latency | Availability Risk | Use Case |
|------|------------|---------|-------------------|----------|
| **async** | Local only | ~1ms | Low | Read replicas, analytics |
| **remote_write** | OS cache | ~5-20ms | Medium | Near-sync, minor risk |
| **remote_flush** | Disk | ~10-50ms | High | Financial transactions |
| **remote_apply** | Visible | ~20-100ms | High | Strict consistency |

---

## 33.5 Replication Slots and Retention Management

### 33.5.1 The Purpose of Replication Slots

Replication slots prevent the primary from removing WAL that standby still needs, even during network interruptions.

```sql
-- On PRIMARY: Create slot for specific standby
SELECT pg_create_physical_replication_slot('standby_1_slot', true);
-- Second parameter 'true' = immediately_reserve (reserve WAL)

-- View slots
SELECT 
    slot_name,
    plugin,
    slot_type,
    database,
    active,
    restart_lsn,
    confirmed_flush_lsn,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as retained_wal
FROM pg_replication_slots;
```

### 33.5.2 The Retention Risk (Critical)

**Danger**: If a standby with a slot goes offline permanently, the primary retains WAL indefinitely, causing disk space exhaustion.

**Monitoring**:
```sql
-- Alert if retained WAL > 50GB
SELECT 
    slot_name,
    active,
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) as lag_bytes
FROM pg_replication_slots
WHERE pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 53687091200;  -- 50GB
```

**Mitigation Strategies**:
```sql
-- 1. Set max_slot_wal_keep_size (PG13+) to limit retention
-- Primary will remove WAL even if slot needs it (breaks replication, prevents disk full)
max_slot_wal_keep_size = '100GB'  -- Drop replication rather than fill disk

-- 2. Monitor and drop dead slots
SELECT pg_drop_replication_slot('standby_1_slot');
-- Only drop if standby permanently decommissioned

-- 3. Use temporary slots (auto-drop on disconnect)
-- In pg_basebackup or connection string: -S slotname --create-slot --slot-name=temporary
```

### 33.5.3 Slot Synchronization

When promoting a standby, it inherits slots from primary (if using `pg_basebackup` with `--slot`).

```sql
-- After failover, old primary may have slots that should be recreated on new primary
-- Check for inactive slots consuming space
SELECT * FROM pg_replication_slots WHERE NOT active;
```

---

## 33.6 Failover Basics and Caveats

### 33.6.1 Manual Failover Procedure

**Scenario**: Primary is down; promote standby.

```bash
# On STANDBY (to be promoted)
# Step 1: Verify replication lag is acceptable (near 0)
psql -c "SELECT pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn());"
# Should return 0 or small number

# Step 2: Prevent further connections (graceful)
pg_ctl stop -m fast -D /var/lib/postgresql/16/main
# Or if primary is truly dead, skip

# Step 3: Promote standby to primary
pg_ctl promote -D /var/lib/postgresql/16/main
# Or: SELECT pg_promote();

# Step 4: Verify promotion
psql -c "SELECT pg_is_in_recovery();"  # Should return 'f' (false)
```

### 33.6.2 The Split-Brain Risk

**Critical Danger**: If the old primary comes back online after failover, you have two primaries (split-brain), causing data divergence.

**Prevention**:
```bash
# On old primary (before bringing back), ensure it cannot accept writes:
# Option 1: Start in recovery mode (as new standby)
touch /var/lib/postgresql/16/main/standby.signal
# Configure primary_conninfo to point to new primary

# Option 2: Use fencing (STONITH - Shoot The Other Node In The Head)
# AWS: Detach EBS volumes, terminate instance
# VMware: Power off VM
# Physical: IPMI power off
```

### 33.6.3 Timeline Divergence

Each promotion creates a new timeline to prevent old primaries from confusing the cluster.

```bash
# Check current timeline
psql -c "SELECT timeline_id FROM pg_control_checkpoint();"

# WAL file naming includes timeline: TTTTTTTTXXXXXXXXYYYYYYYY
# 00000002.history contains fork point from timeline 1
```

**Recovery**: If old primary is started accidentally and diverges, you must:
1. Resync from new primary using `pg_basebackup`, OR
2. Use `pg_rewind` if divergence is small (rewinds data files to match new primary)

```bash
# pg_rewind (requires wal_log_hints = on in primary config)
pg_rewind \
    --target-pgdata=/var/lib/postgresql/16/main \
    --source-server='host=new_primary.internal port=5432 user=postgres' \
    --write-recovery-conf
# Rewinds old primary to match new primary, allowing it to become standby
```

### 33.6.4 Automated Failover Tools

Manual failover takes minutes. For RTO < 60 seconds, use:

- **Patroni**: Python-based HA template with etcd/ZooKeeper/Consul
- **repmgr**: Replication manager with automatic failover
- **pg_auto_failover**: Official Microsoft/PostgreSQL extension
- **Stolon**: Cloud-native PostgreSQL HA (Kubernetes)

**Caution**: Automated failover without proper fencing and health checks causes more outages than it prevents. Implement only after thorough testing.

---

## 33.7 Monitoring and Maintenance

### 33.7.1 Critical Replication Metrics

```sql
-- Replication lag in human-readable format
SELECT 
    client_addr,
    application_name,
    state,
    pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) as lag_size,
    EXTRACT(EPOCH FROM (now() - backend_start)) as connected_seconds
FROM pg_stat_replication;

-- Check for replication conflicts (queries blocking WAL replay)
SELECT 
    datname,
    usename,
    application_name,
    client_addr,
    state,
    sync_state
FROM pg_stat_replication
WHERE sync_state = 'async'  -- Monitor async standbys for lag spikes
ORDER BY sent_lsn - replay_lsn DESC;
```

### 33.7.2 Handling Replication Conflicts

Hot standby may cancel queries if they conflict with WAL replay (e.g., accessing rows being modified by replay).

```sql
-- Check for conflicts
SELECT 
    datname,
    conflicts,
    pg_stat_database.conflicts_snapshot,
    pg_stat_database.conflicts_bufferpin,
    pg_stat_database.conflicts_deadlock
FROM pg_stat_database
WHERE datname = 'production';

-- Reduce conflicts:
-- 1. hot_standby_feedback = on (in standby postgresql.conf)
-- 2. Increase max_standby_streaming_delay (allow longer locks)
-- 3. Avoid long-running queries on standbys during heavy write periods
```

---

## Chapter Summary

In this chapter, you learned:

1. **Physical Replication Architecture**: Primary generates WAL; standbys apply via startup process. Streaming provides near-real-time replication, while WAL shipping provides resilience against network partitions. Configure `wal_level = replica` and dedicated `replicator` users restricted by `pg_hba.conf` to specific standby IPs.

2. **Standby Setup**: Use `pg_basebackup` with `-Xs` (stream WAL) to initialize standbys. Create `standby.signal` file to indicate recovery mode. Configure `primary_conninfo` with SSL requirements and `hot_standby = on` for read replicas.

3. **Synchronous vs Asynchronous**: Asynchronous replication (default) offers minimal latency but risks data loss (RPO > 0). Synchronous replication with `synchronous_commit = remote_flush` or `remote_apply` guarantees zero data loss but adds latency (network round-trip) and availability risk (primary stalls if standby fails). Use `ANY 1 (standby_1, standby_2)` quorum commit for balance.

4. **Replication Slots**: Slots (`pg_create_physical_replication_slot`) prevent WAL removal needed by standbys, but create disk space risk if standby disconnects permanently. Set `max_slot_wal_keep_size` to prevent disk exhaustion, monitor `pg_replication_slots` for inactive slots, and drop slots only when standbys are permanently decommissioned.

5. **Failover Procedures**: Promote standby with `pg_ctl promote` or `SELECT pg_promote()`. Always fence the old primary (power off or reconfigure as standby) to prevent split-brain. Use `pg_rewind` to resync diverged old primaries if `wal_log_hints` was enabled, otherwise require full `pg_basebackup` resync.

6. **Operational Safety**: Monitor replication lag via `pg_stat_replication` and `pg_stat_wal_receiver`. Use `hot_standby_feedback` to reduce query conflicts but watch for primary bloat. Never let replication lag exceed `wal_keep_size` or slot retention limits, or standby will require reinitialization.

**Next**: In Chapter 34, we will cover Logical Replication—exploring publication/subscription mechanics, selective data replication, conflict handling for multi-master scenarios, and use cases for zero-downtime migrations and data warehousing.