# Chapter 13: Partitioning (Declarative)

PostgreSQL's declarative partitioning, introduced in PostgreSQL 10 and significantly enhanced in subsequent versions, provides a robust mechanism for dividing large tables into smaller, more manageable physical pieces. This chapter covers when to partition, how to implement range, list, and hash strategies, and operational patterns for maintaining high-performance partitioned tables in production environments.

## 13.1 Partitioning Fundamentals and Decision Criteria

### 13.1.1 When Partitioning Helps (and When It Doesn't)

Partitioning solves specific scale problems but introduces operational complexity. Understanding the trade-offs is essential before implementing.

```sql
-- Scenario analysis: Should you partition?

-- GOOD candidates for partitioning:
-- 1. Time-series data with rolling retention (logs, events, metrics)
-- 2. Tables > 100GB with clear access patterns (recent data hot, old data cold)
-- 3. Need to efficiently bulk-drop old data (DROP vs DELETE)
-- 4. Different storage requirements per data segment (SSD for recent, HDD for archive)

-- BAD candidates for partitioning:
-- 1. Tables < 10GB (overhead outweighs benefits)
-- 2. No clear partition key (random access patterns)
-- 3. Heavy cross-partition joins/aggregations without partition-wise support
-- 4. Frequent updates that move rows between partitions (partition key updates are expensive)

-- Example: Time-series logging (IDEAL for partitioning)
CREATE TABLE events (
    event_id BIGSERIAL,
    event_type VARCHAR(50),
    user_id BIGINT,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (event_id, created_at)  -- Must include partition key
) PARTITION BY RANGE (created_at);  -- Partition by time

-- Partitioning benefits demonstrated:
-- 1. Fast bulk deletion: DROP TABLE events_2023_q1; (milliseconds vs DELETE hours)
-- 2. Index efficiency: Indexes per partition are smaller, faster to maintain
-- 3. Vacuum efficiency: Autovacuum processes partitions in parallel
-- 4. Query pruning: Planner skips irrelevant partitions (partition pruning)
```

### 13.1.2 Partitioning Types: Range, List, and Hash

PostgreSQL supports three partitioning strategies, each suited to different data distribution patterns.

```sql
-- RANGE PARTITIONING: Best for time-series, sequential data
-- Each partition holds a specific range of values (non-overlapping)
CREATE TABLE measurements (
    city_id INT NOT NULL,
    logdate DATE NOT NULL,
    peaktemp INT,
    unitsales INT
) PARTITION BY RANGE (logdate);

-- LIST PARTITIONING: Best for categorical data with known discrete values
-- Each partition holds specific values from a list
CREATE TABLE orders_by_region (
    order_id BIGINT,
    region VARCHAR(20) NOT NULL,  -- 'NORTH', 'SOUTH', 'EAST', 'WEST'
    amount_cents INT,
    created_at TIMESTAMPTZ
) PARTITION BY LIST (region);

-- HASH PARTITIONING: Best for even distribution when no natural range/list exists
-- Distributes rows based on hash of partition key modulo number of partitions
CREATE TABLE transactions (
    tx_id BIGSERIAL,
    account_id BIGINT NOT NULL,
    amount_cents INT,
    tx_type VARCHAR(20)
) PARTITION BY HASH (account_id);

-- Sub-partitioning (Composite): Partition by range, then sub-partition by hash
CREATE TABLE events_composite (
    event_id BIGSERIAL,
    tenant_id INT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    event_data JSONB
) PARTITION BY RANGE (created_at);

-- Later: Each monthly partition will be sub-partitioned by tenant_id hash
```

## 13.2 Implementing Range Partitioning (Time-Series Pattern)

### 13.2.1 Creating the Partitioned Table Structure

```sql
-- Step 1: Create parent table (no data stored here, only structure)
CREATE TABLE events (
    event_id BIGSERIAL,
    user_id BIGINT NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (event_id, created_at)  -- PK must include partition key
) PARTITION BY RANGE (created_at);

-- Important constraints:
-- 1. Primary keys must include all partition key columns
-- 2. Unique constraints must include all partition key columns  
-- 3. CHECK constraints are inherited by partitions

-- Step 2: Create individual partitions (child tables)
-- Each partition is a separate physical table with bounds
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE events_2024_03 PARTITION OF events
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

-- Partition boundaries are [inclusive, exclusive)
-- '2024-02-01' goes into February partition, not January

-- Step 3: Create indexes (on parent, inherited by partitions)
CREATE INDEX idx_events_user_id ON events(user_id);
CREATE INDEX idx_events_type ON events(event_type);
-- Each partition gets its own index files (smaller, faster to maintain)

-- Step 4: Default partition (catches data outside defined ranges)
CREATE TABLE events_default PARTITION OF events DEFAULT;
-- Safety net for data outside expected ranges
-- Monitor this closely - if it grows, your partitioning logic is incomplete
```

### 13.2.2 Automated Partition Management (Rolling Windows)

Production systems require automated creation and archival of time-based partitions.

```sql
-- Function to create next month's partition
CREATE OR REPLACE FUNCTION create_monthly_partition(
    target_table TEXT,
    target_date DATE
) RETURNS TEXT AS $$
DECLARE
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
    create_sql TEXT;
BEGIN
    partition_name := target_table || '_' || TO_CHAR(target_date, 'YYYY_MM');
    start_date := DATE_TRUNC('month', target_date);
    end_date := start_date + INTERVAL '1 month';
    
    -- Check if partition exists
    IF EXISTS (
        SELECT 1 FROM pg_tables 
        WHERE tablename = partition_name AND schemaname = 'public'
    ) THEN
        RAISE NOTICE 'Partition % already exists', partition_name;
        RETURN partition_name;
    END IF;
    
    create_sql := format(
        'CREATE TABLE %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
        partition_name,
        target_table,
        start_date,
        end_date
    );
    
    EXECUTE create_sql;
    
    -- Copy indexes from parent (they inherit automatically, but let's verify)
    -- Analyze the new partition for query planning
    EXECUTE format('ANALYZE %I', partition_name);
    
    RAISE NOTICE 'Created partition % for range [% to %]', partition_name, start_date, end_date;
    RETURN partition_name;
END;
$$ LANGUAGE plpgsql;

-- Scheduled job (using pg_cron or external scheduler)
SELECT create_monthly_partition('events', DATE_TRUNC('month', NOW()) + INTERVAL '1 month');

-- Archival function (detach and drop old partitions)
CREATE OR REPLACE FUNCTION archive_old_partitions(
    target_table TEXT,
    retention_months INT
) RETURNS TABLE (dropped_partition TEXT, archived_bytes BIGINT) AS $$
DECLARE
    partition_rec RECORD;
    cutoff_date DATE;
BEGIN
    cutoff_date := DATE_TRUNC('month', NOW()) - (retention_months || ' months')::INTERVAL;
    
    FOR partition_rec IN 
        SELECT 
            parent.relname as parent_table,
            child.relname as partition_name,
            pg_total_relation_size(child.oid) as partition_size
        FROM pg_inherits
        JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
        JOIN pg_class child ON pg_inherits.inhrelid = child.oid
        WHERE parent.relname = target_table
          AND child.relname ~ '^\w+_\d{4}_\d{2}$'  -- Pattern: table_YYYY_MM
    LOOP
        -- Extract date from partition name (assuming standard naming)
        DECLARE
            partition_year INT;
            partition_month INT;
            partition_date DATE;
        BEGIN
            partition_year := (regexp_match(partition_rec.partition_name, '_(\d{4})_(\d{2})$'))[1]::INT;
            partition_month := (regexp_match(partition_rec.partition_name, '_(\d{4})_(\d{2})$'))[2]::INT;
            partition_date := make_date(partition_year, partition_month, 1);
            
            IF partition_date < cutoff_date THEN
                -- Option 1: Detach (keeps table but removes from partition set)
                EXECUTE format('ALTER TABLE %I DETACH PARTITION %I', 
                              target_table, partition_rec.partition_name);
                
                -- Option 2: Archive to cold storage (export then drop)
                -- pg_dump -t partition_rec.partition_name ...
                
                -- Option 3: Drop completely (irreversible)
                EXECUTE format('DROP TABLE %I', partition_rec.partition_name);
                
                dropped_partition := partition_rec.partition_name;
                archived_bytes := partition_rec.partition_size;
                RETURN NEXT;
            END IF;
        END;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Usage: Keep only 12 months of data
SELECT * FROM archive_old_partitions('events', 12);
```

### 13.2.3 Partition Pruning (Query Optimization)

The query planner eliminates irrelevant partitions at planning or execution time.

```sql
-- Enable partition pruning (default in modern PostgreSQL)
SET enable_partition_pruning = on;

-- Demonstrate pruning with EXPLAIN
EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT * FROM events 
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-20';

-- Output shows:
-- Seq Scan on events_2024_01  (only this partition scanned)
-- Planning Time: 0.5 ms
-- Execution Time: 2.1 ms

-- Without pruning (if we query across all partitions or disable pruning):
EXPLAIN (ANALYZE)
SELECT * FROM events WHERE user_id = 12345;
-- Seq Scan on events_2024_01
-- Seq Scan on events_2024_02
-- Seq Scan on events_2024_03
-- ... all partitions scanned because user_id isn't partition key

-- Runtime pruning (PostgreSQL 11+)
-- Pruning happens at execution for parameterized queries
PREPARE get_events(DATE, DATE) AS
SELECT * FROM events 
WHERE created_at BETWEEN $1 AND $2;

EXPLAIN (ANALYZE) EXECUTE get_events('2024-01-01', '2024-01-05');
-- Only scans January partition, even with parameters

-- Constraints for effective pruning:
-- 1. Partition key must be in WHERE clause
-- 2. Operators must be partition-boundary compatible (=, <, >, <=, >=, IN, BETWEEN)
-- 3. Functions on partition key prevent pruning:
EXPLAIN SELECT * FROM events WHERE DATE_TRUNC('month', created_at) = '2024-01-01';
-- Seq Scan on ALL partitions (function prevents pruning)

-- Fix: Rewrite to use direct comparison
EXPLAIN SELECT * FROM events 
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
-- Only scans January partition
```

## 13.3 List and Hash Partitioning Strategies

### 13.3.1 List Partitioning for Categorical Data

```sql
-- Multi-tenant SaaS: Partition by tenant_id (if few tenants, or tiered tenants)
CREATE TABLE user_data (
    user_id BIGSERIAL,
    tenant_id VARCHAR(20) NOT NULL,  -- 'enterprise', 'midmarket', 'startup'
    email VARCHAR(255),
    profile JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (user_id, tenant_id)  -- PK includes partition key
) PARTITION BY LIST (tenant_id);

-- Create partitions for known tenants
CREATE TABLE user_data_enterprise PARTITION OF user_data
    FOR VALUES IN ('enterprise', 'fortune500');

CREATE TABLE user_data_midmarket PARTITION OF user_data
    FOR VALUES IN ('midmarket', 'smb');

CREATE TABLE user_data_startup PARTITION OF user_data
    FOR VALUES IN ('startup', 'individual');

-- Default partition for new/unexpected tenants
CREATE TABLE user_data_default PARTITION OF user_data DEFAULT;

-- Moving data between partitions (must delete and re-insert)
-- PostgreSQL doesn't allow UPDATE to change partition key to different partition
UPDATE user_data SET tenant_id = 'enterprise' WHERE user_id = 123;
-- ERROR: new row for relation "user_data_startup" violates partition constraint

-- Correct approach:
BEGIN;
-- 1. Insert into target partition
INSERT INTO user_data_enterprise (user_id, tenant_id, email, profile)
SELECT user_id, 'enterprise', email, profile 
FROM user_data_startup 
WHERE user_id = 123;

-- 2. Delete from source
DELETE FROM user_data_startup WHERE user_id = 123;
COMMIT;
```

### 13.3.2 Hash Partitioning for Even Distribution

```sql
-- When no natural range or list exists, use hash for I/O parallelism
CREATE TABLE sensor_readings (
    reading_id BIGSERIAL,
    sensor_uuid UUID NOT NULL,  -- Random distribution
    temperature NUMERIC(5,2),
    humidity NUMERIC(5,2),
    recorded_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY HASH (sensor_uuid);

-- Create 8 partitions (number should be power of 2 for balance)
CREATE TABLE sensor_readings_p0 PARTITION OF sensor_readings
    FOR VALUES WITH (MODULUS 8, REMAINDER 0);
CREATE TABLE sensor_readings_p1 PARTITION OF sensor_readings
    FOR VALUES WITH (MODULUS 8, REMAINDER 1);
-- ... continue through p7

-- Benefits:
-- 1. Even distribution regardless of sensor UUID distribution
-- 2. Parallel vacuum and queries across partitions
-- 3. Can attach/detach partitions for rebalancing (complex)

-- Limitations:
-- 1. Cannot easily add partitions (data must be redistributed)
-- 2. Range queries on sensor_uuid don't prune (hash is random)
-- 3. Hard to query specific "recent" data without secondary indexes
```

## 13.4 Partition-Wise Operations

### 13.4.1 Partition-Wise Joins

When two partitioned tables share compatible partition schemes, PostgreSQL can join partitions pairwise rather than scanning entire tables.

```sql
-- Create two identically partitioned tables
CREATE TABLE orders (
    order_id BIGSERIAL,
    customer_id BIGINT,
    order_date DATE NOT NULL,
    amount_cents INT
) PARTITION BY RANGE (order_date);

CREATE TABLE order_shipments (
    shipment_id BIGSERIAL,
    order_id BIGINT,
    shipped_date DATE NOT NULL,
    carrier VARCHAR(50)
) PARTITION BY RANGE (shipped_date);

-- Create monthly partitions for both (same ranges)
-- orders_2024_01, orders_2024_02...
-- order_shipments_2024_01, order_shipments_2024_02...

-- Partition-wise join enabled (PostgreSQL 11+)
SET enable_partitionwise_join = on;

EXPLAIN (ANALYZE)
SELECT o.order_id, o.amount_cents, s.carrier
FROM orders o
JOIN order_shipments s ON o.order_id = s.order_id
WHERE o.order_date >= '2024-01-01' 
  AND o.order_date < '2024-02-01'
  AND s.shipped_date >= '2024-01-01' 
  AND s.shipped_date < '2024-02-01';

-- Without partition-wise join:
-- Hash Join on (orders_all_partitions vs order_shipments_all_partitions)

-- With partition-wise join:
-- Append
--   -> Hash Join on (orders_2024_01 vs order_shipments_2024_01)
--   -> Hash Join on (orders_2024_02 vs order_shipments_2024_02)
-- Much faster, less memory

-- Requirements for partition-wise join:
-- 1. Both tables partitioned by same type (range/range, list/list, hash/hash)
-- 2. Partition bounds must match exactly (or be compatible subsets)
-- 3. Join condition must include partition keys
```

### 13.4.2 Partition-Wise Aggregation

```sql
-- Aggregation pushed to partitions, then combined
SET enable_partitionwise_aggregate = on;

EXPLAIN (ANALYZE)
SELECT DATE_TRUNC('day', created_at) as day, COUNT(*), AVG(temperature)
FROM sensor_readings  -- Hash partitioned, but aggregates still pushed
WHERE recorded_at >= '2024-01-01' AND recorded_at < '2024-02-01'
GROUP BY 1;

-- Execution plan:
-- Gather
--   -> Append
--         -> Partial GroupAggregate on partition p0
--         -> Partial GroupAggregate on partition p1
--         -> ... (parallel aggregation per partition)

-- Without partition-wise aggregate:
-- Gather all rows to coordinator, then aggregate (memory and CPU intensive)
```

## 13.5 Partition Maintenance and Best Practices

### 13.5.1 Index Strategies for Partitioned Tables

```sql
-- Global vs Local indexes
-- PostgreSQL only supports local indexes (indexes on partitions)
-- There are no "global" indexes spanning all partitions (Oracle feature)

-- Creating indexes on partitioned tables automatically creates on all partitions
CREATE INDEX idx_events_user_id ON events(user_id);
-- Creates: idx_events_user_id on events_2024_01, events_2024_02, etc.

-- Unique indexes must include partition key
CREATE UNIQUE INDEX idx_events_unique ON events(event_id, created_at);
-- Must include created_at because it's the partition key

-- Adding index to existing partitioned table
CREATE INDEX CONCURRENTLY idx_events_payload ON events USING GIN(payload);
-- Runs on all partitions concurrently (PostgreSQL 11+)

-- Partition-specific indexes (not inherited)
CREATE INDEX idx_events_2024_01_special ON events_2024_01(user_id) 
WHERE event_type = 'error';  -- Partial index on specific partition only
```

### 13.5.2 Attach and Detach Operations

```sql
-- Detach partition (fast metadata operation, no data movement)
ALTER TABLE events DETACH PARTITION events_2023_12;
-- Partition becomes standalone table instantly
-- Can then archive, drop, or process separately

-- Attach existing table as partition (must have compatible structure)
CREATE TABLE events_2024_04 (LIKE events INCLUDING ALL);

-- Load data into standalone table first (faster than inserting through parent)
COPY events_2024_04 FROM '/data/april_events.csv' WITH CSV;

-- Validate constraints before attaching (avoids long locks)
ALTER TABLE events_2024_04 ADD CONSTRAINT valid_range 
    CHECK (created_at >= '2024-04-01' AND created_at < '2024-05-01') NOT VALID;

-- Attach (brief lock only if constraints validated)
ALTER TABLE events ATTACH PARTITION events_2024_04 
    FOR VALUES FROM ('2024-04-01') TO ('2024-05-01');
-- Brief exclusive lock on parent, but operations continue on other partitions

-- Detach with pending data (PostgreSQL 14+)
ALTER TABLE events DETACH PARTITION events_2024_01 CONCURRENTLY;
-- No lock, but requires validation that no data exists in parent referencing this partition
```

### 13.5.3 Common Pitfalls and Solutions

```sql
-- PITFALL 1: Too many partitions
-- Creating daily partitions for 10 years = 3650 partitions
-- Slows down planning time, catalog bloat
-- Solution: Use monthly partitions, or sub-partitioning (month then day)

-- PITFALL 2: Updating partition key
-- This fails if it would move row to different partition
UPDATE events SET created_at = created_at + INTERVAL '1 month' WHERE event_id = 123;
-- ERROR: row movement between partitions not allowed by default (before PG 11)
-- In PG 11+: Enable row movement explicitly
ALTER TABLE events ENABLE ROW MOVEMENT;
-- But: It's slow (DELETE + INSERT), avoid frequent updates to partition keys

-- PITFALL 3: Long transactions holding back drop
-- If a long-running query is using partition X, you cannot DROP or DETACH it
-- Check for locks:
SELECT * FROM pg_locks WHERE NOT granted;
-- Wait for or terminate blocking queries before maintenance

-- PITFALL 4: Foreign keys TO partitioned tables
-- Foreign keys referencing partitioned tables are supported but:
-- 1. No unique constraints on partitioned tables (excluding PK with partition key)
-- 2. FK checks may need to scan all partitions (slow)
-- Solution: Reference the parent table, not individual partitions

-- PITFALL 5: Cross-partition constraints
-- Cannot enforce unique constraints across partitions (except with PK including partition key)
-- Solution: Application-level enforcement or careful partition key selection

-- PITFALL 6: Statistics and planning
-- Each partition has separate statistics
-- If partition is empty or new, planner may make bad estimates
-- Solution: Analyze after creating partitions
ANALYZE events_2024_05;
```

### 13.5.4 Monitoring Partition Health

```sql
-- List all partitions and sizes
SELECT 
    parent.relname AS parent_table,
    child.relname AS partition_name,
    pg_size_pretty(pg_total_relation_size(child.oid)) AS total_size,
    pg_size_pretty(pg_relation_size(child.oid)) AS table_size,
    pg_size_pretty(pg_indexes_size(child.oid)) AS index_size
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = 'events'
ORDER BY child.relname;

-- Check for partition constraint violations (orphaned data in default)
SELECT 
    tableoid::regclass AS partition_name,
    COUNT(*) as row_count
FROM events_default
GROUP BY tableoid;

-- If default partition has data, create specific partitions and migrate:
-- 1. CREATE TABLE events_2024_06 PARTITION OF events...
-- 2. INSERT INTO events_2024_06 SELECT * FROM events_default WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';
-- 3. DELETE FROM events_default WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';

-- Check partition pruning effectiveness
EXPLAIN (ANALYZE, VERBOSE)
SELECT * FROM events WHERE created_at = '2024-01-15';
-- Look for "Partitions: events_2024_01" in output (pruning worked)
-- vs "Partitions: ALL" (pruning failed, full scan)
```

---

## Chapter Summary

In this chapter, you learned:

1. **When to Partition**: Use partitioning for tables >100GB with time-series or categorical access patterns requiring efficient bulk deletion; avoid for small tables (<10GB), random access patterns, or frequent partition key updates. Partitioning trades operational complexity for query performance and maintenance efficiency.

2. **Partitioning Strategies**: Use **Range partitioning** for time-series data (rolling windows, archival); **List partitioning** for discrete categorical values (tenant tiers, regions); **Hash partitioning** for even distribution when no natural range exists (random UUIDs). Combine strategies via sub-partitioning for large-scale data (range by date, then hash by tenant).

3. **Partition Management**: Implement automated scripts to create future partitions (proactive) and detach/drop old partitions (retention policies); use the `DEFAULT` partition as a safety net but monitor closely; attach pre-populated tables for fast bulk loads without locking parent table.

4. **Performance Optimization**: Enable `enable_partition_pruning` to eliminate irrelevant partitions at planning time; use `enable_partitionwise_join` and `enable_partitionwise_aggregate` when joining or aggregating compatibly partitioned tables for significant performance gains; ensure partition keys appear in `WHERE` clauses without functions to enable pruning.

5. **Constraints and Indexing**: All unique constraints (including `PRIMARY KEY`) must include partition key columns; indexes are always local (per-partition), created automatically on parent and inherited by children; use partial indexes on specific partitions for specialized access patterns.

6. **Operational Best Practices**: Use `DETACH PARTITION CONCURRENTLY` (PostgreSQL 14+) for zero-downtime archival; avoid updating partition keys (requires `ENABLE ROW MOVEMENT` and causes DELETE/INSERT overhead); run `ANALYZE` on new partitions immediately to update planner statistics; monitor for partition bloat and default partition growth indicating incomplete partitioning logic.

---

**Next:** In Chapter 14, we will dive into indexing and query performance—covering B-tree internals, multi-column index strategies, covering indexes, specialized index types (GIN, GiST, BRIN), and the art of reading and optimizing execution plans with EXPLAIN.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='12. views_and_materialized_views.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='13. partitioning_declarative.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
