# Chapter 32: Upgrades and Major Version Changes

PostgreSQL's annual release cycle delivers significant performance improvements, security enhancements, and feature additions. However, major version upgrades (e.g., 15 → 16) modify the on-disk data format, requiring explicit migration procedures. This chapter covers the strategic planning, execution methods, and validation procedures for zero-downtime or minimal-downtime upgrades in production environments.

---

## 32.1 Major Version Upgrade Strategies

PostgreSQL offers three distinct approaches for major version upgrades, each with different downtime, risk, and resource tradeoffs.

### 32.1.1 Strategy Comparison Matrix

| Strategy | Downtime | Complexity | Risk | Use Case |
|----------|----------|------------|------|----------|
| **pg_dump/pg_restore** | Hours to days | Low | Low (tested) | Small databases (< 500GB), cloud migrations |
| **pg_upgrade** | Minutes | Medium | Medium | Large databases, same-server upgrades |
| **Logical Replication** | Seconds | High | Low | Zero-downtime requirements, complex topologies |

### 32.1.2 pg_dump/pg_restore (Logical Migration)

Best for small databases or when changing architectures (x86 → ARM, different locale).

```bash
# Pre-upgrade: Verify version compatibility
pg_dump --version  # Source version
pg_restore --version  # Target version (must be >= source)

# Step 1: Schema-only dump (analyze and adjust)
pg_dump -h source.db -U postgres -d production \
    --schema-only \
    --no-owner \
    --no-privileges \
    -f pre_upgrade_schema.sql

# Review for deprecated features:
# - hash_merge (removed in PG 14)
# - tsearch2 (removed in PG 11)
# - xml2 functions (deprecated)

# Step 2: Create target database
createdb -h target.db -U postgres production_new

# Step 3: Restore schema
psql -h target.db -U postgres -d production_new -f pre_upgrade_schema.sql

# Step 4: Data-only dump in parallel sections
# Large tables separately
pg_dump -h source.db -U postgres -d production \
    --data-only \
    --table=orders \
    -Fc | pg_restore -h target.db -U postgres -d production_new --jobs=4

# Remaining tables
pg_dump -h source.db -U postgres -d production \
    --data-only \
    --exclude-table=orders \
    -Fc | pg_restore -h target.db -U postgres -d production_new --jobs=4

# Step 5: Verify row counts match
# Step 6: Recreate indexes (were created with schema but empty)
# Step 7: Run ANALYZE
# Step 8: Cutover application connections
```

### 32.1.3 pg_upgrade (In-Place Upgrade)

The fastest method for large databases, using hard links or copying files.

**Prerequisites**:
- Both old and new PostgreSQL versions installed
- New data directory initialized with `initdb`
- Compatible locales and encodings

```bash
# Step 1: Install new PostgreSQL version (e.g., 16)
# Keep old version running (e.g., 15)

# Step 2: Initialize new data directory
/usr/lib/postgresql/16/bin/initdb \
    -D /var/lib/postgresql/16/main \
    --encoding=UTF8 \
    --locale=en_US.UTF-8 \
    --data-checksums

# Step 3: Stop both servers (brief downtime begins)
systemctl stop postgresql@15-main
systemctl stop postgresql@16-main  # Should be down already

# Step 4: Run pg_upgrade
/usr/lib/postgresql/16/bin/pg_upgrade \
    --old-bindir=/usr/lib/postgresql/15/bin \
    --new-bindir=/usr/lib/postgresql/16/bin \
    --old-datadir=/var/lib/postgresql/15/main \
    --new-datadir=/var/lib/postgresql/16/main \
    --link \                    # Use hard links (fast, shares disk blocks)
                                # Omit for copy mode (slower, preserves old cluster)
    --jobs=4 \                  # Parallel index rebuild
    --verbose

# Step 5: Handle extensions
# pg_upgrade outputs scripts to run:
# ./analyze_new_cluster.sh      # Run ANALYZE
# ./delete_old_cluster.sh       # Only after verification

# Step 6: Copy configuration
cp /var/lib/postgresql/15/main/postgresql.conf /var/lib/postgresql/16/main/
cp /var/lib/postgresql/15/main/pg_hba.conf /var/lib/postgresql/16/main/
# Review and adjust for version changes

# Step 7: Start new cluster
systemctl start postgresql@16-main

# Step 8: Verify
psql -c "SELECT version();"  # Should show PostgreSQL 16
psql -c "SELECT pg_database.datname FROM pg_database;"

# Step 9: Cleanup (only after 48-hour burn-in)
# ./delete_old_cluster.sh
```

### 32.1.4 Logical Replication (Zero-Downtime)

For systems requiring continuous availability during upgrades.

**Architecture**:
- Source: PostgreSQL 15 (production)
- Target: PostgreSQL 16 (new instance)
- Replication: Logical replication of specific tables
- Cutover: Switch application connection string

```sql
-- On PostgreSQL 16 (target), create publication from source
-- First, enable logical replication on source (postgresql.conf):
-- wal_level = logical
-- max_replication_slots = 10
-- max_wal_senders = 10

-- On source (PG 15):
CREATE PUBLICATION upgrade_migration FOR TABLE users, orders, products;

-- On target (PG 16):
CREATE SUBSCRIPTION upgrade_sub 
CONNECTION 'host=pg15.internal port=5432 dbname=production user=repl_user password=...' 
PUBLICATION upgrade_migration;

-- Monitor replication lag
SELECT 
    subname, 
    pid, 
    received_lsn, 
    latest_end_lsn,
    pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) as lag
FROM pg_stat_subscription;

-- Once lag is 0, perform cutover:
-- 1. Stop writes on source (brief maintenance window)
-- 2. Wait for final lag to clear
-- 3. Drop subscription on target (stops replication)
-- 4. Update application connection strings to PG 16
-- 5. Resume writes
```

---

## 32.2 Pre-Upgrade Checklist

### 32.2.1 Compatibility Assessment

```bash
# Check for deprecated features
pg_dump --schema-only | grep -i "hash_merge\|tsearch2\|xml2"

# Check for removed features in target version
# PostgreSQL 15: removed "public" schema permission for all users
# PostgreSQL 14: removed "hash_merge" join type
# PostgreSQL 13: removed "floating-point" operators for money type

# Check extension compatibility
psql -c "SELECT * FROM pg_available_extensions WHERE name IN (SELECT extname FROM pg_extension);"
# Verify target version has same extensions available
```

### 32.2.2 Resource Preparation

```bash
# Disk space check (pg_upgrade with --link needs same filesystem)
df -h /var/lib/postgresql

# For copy mode: need 2x current data directory size
du -sh /var/lib/postgresql/15/main

# Memory check (pg_upgrade --jobs needs RAM)
free -h

# Port availability (if running side-by-side temporarily)
ss -tlnp | grep 5432
```

### 32.2.3 Extension Compatibility Matrix

| Extension | PG 14 | PG 15 | PG 16 | Notes |
|-----------|-------|-------|-------|-------|
| postgis | 3.1+ | 3.3+ | 3.4+ | Rebuild required |
| pg_stat_statements | Built-in | Built-in | Built-in | Reset on upgrade |
| pgcrypto | Built-in | Built-in | Built-in | No changes |
| pg_trgm | Built-in | Built-in | Built-in | No changes |
| timescaledb | 2.6+ | 2.10+ | 2.12+ | Major version specific |
| pg_partman | 4.6+ | 4.7+ | 5.0+ | Check partition maintenance |

**Post-Upgrade Extension Handling**:

```bash
# After pg_upgrade, extensions may need rebuilding
psql -c "ALTER EXTENSION postgis UPDATE;"
psql -c "SELECT postgis_extensions_upgrade();"

# For extensions with C libraries, may need:
# 1. Install new version binaries
# 2. DROP EXTENSION ... CASCADE
# 3. CREATE EXTENSION ...
# (Only if update path not available)
```

---

## 32.3 Post-Upgrade Validation

### 32.3.1 Immediate Health Checks

```sql
-- Version verification
SELECT version();

-- Database integrity (lightweight check)
SELECT datname, pg_database.datallowconn 
FROM pg_database 
WHERE datname = current_database();

-- Extension status
SELECT extname, extversion 
FROM pg_extension 
ORDER BY extname;

-- Critical table row counts (sanity check)
SELECT 
    schemaname, 
    tablename, 
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
```

### 32.3.2 Performance Validation

```sql
-- Update statistics (critical after upgrade)
ANALYZE VERBOSE;

-- Check for missing statistics
SELECT schemaname, tablename, attname 
FROM pg_stats 
WHERE schemaname = 'public' AND null_frac IS NULL;

-- Query plan stability check
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders 
WHERE created_at > '2024-01-01' 
ORDER BY created_at DESC 
LIMIT 100;
-- Compare plans pre/post upgrade
```

### 32.3.3 Application Connectivity Testing

```bash
# Test application connection strings
psql "postgresql://app_user:password@localhost:5432/production?sslmode=require" -c "SELECT 1"

# Verify connection pooling (PgBouncer)
psql "postgresql://app_user:password@pgbouncer-host:6432/production" -c "SHOW STATS"

# Check for connection leaks or authentication issues
tail -f /var/log/postgresql/postgresql-16-main.log | grep -i "connection\|authentication\|error"
```

---

## 32.4 Rollback Strategies

### 32.4.1 When Rollback Is Possible

**pg_dump/pg_restore**: Can drop new database and restore old one (data loss since cutover).

**pg_upgrade with --link**: Cannot rollback—old data directory is hard-linked and modified. Must restore from pre-upgrade backup.

**pg_upgrade without --link**: Old data directory intact but requires downtime to switch back.

**Logical Replication**: Can reverse direction (PG 16 → PG 15) if replication slots preserved, but risky.

### 32.4.2 The "Escape Hatch" Pattern

```bash
# Before upgrade, create a replication slot for potential rollback
pg_recvlogical -d production --slot upgrade_rollback_slot --create-slot

# If upgrade fails, you have 24 hours (default) to:
# 1. Stop new cluster
# 2. Start old cluster
# 3. Consume changes from slot to catch up (if using logical replication)
```

### 32.4.3 Documentation Requirements

Every upgrade must document:
- Exact start time and target version
- `pg_upgrade` output or migration logs
- Post-upgrade validation results
- Rollback decision point (time after which rollback is no longer viable)
- Who performed the upgrade and who approved it

---

## Chapter Summary

In this chapter, you learned:

1. **Upgrade Strategies**: Choose `pg_dump/pg_restore` for small databases (<500GB) or cross-architecture migrations; `pg_upgrade` for large databases where downtime must be minimized; and logical replication for zero-downtime requirements. The `--link` mode in `pg_upgrade` provides the fastest upgrade but prevents rollback.

2. **pg_upgrade Execution**: Install both PostgreSQL versions in parallel, initialize the new data directory with `initdb`, run `pg_upgrade` with appropriate `--jobs` parallelism, and execute the generated `analyze_new_cluster.sh` script. Never delete the old cluster until the new one has passed burn-in testing.

3. **Pre-Upgrade Validation**: Verify extension compatibility matrices, check for deprecated features (removed operators, changed defaults), ensure sufficient disk space (2x for copy mode), and confirm WAL archiving is current. Test the upgrade process on a production-sized staging environment first.

4. **Post-Upgrade Procedures**: Run `ANALYZE` to rebuild statistics, update extensions (`ALTER EXTENSION ... UPDATE`), verify query plan stability, test application connectivity through connection pools, and monitor error logs for authentication or compatibility issues.

5. **Rollback Planning**: Understand that `pg_upgrade --link` modifies the old data directory in place, making rollback impossible without pre-upgrade physical backups. Maintain logical replication slots or physical backups as escape hatches. Document the point of no return and ensure the team knows when rollback is no longer viable.

6. **Extension Compatibility**: Major PostgreSQL versions often require extension updates. Check that PostGIS, TimescaleDB, and custom C extensions have versions compiled for the target PostgreSQL release. Rebuild extensions against new server headers when necessary.

**Next**: In Chapter 33, we will explore Streaming Replication (Physical)—covering primary/standby architecture, synchronous vs asynchronous replication, replication slots and retention management, and failover basics with their operational caveats.