# Chapter 18: Performance Tuning Playbook

Performance optimization requires methodical diagnosis followed by targeted refactoring. This chapter provides actionable patterns for query restructuring, systematic index selection, batch operation optimization, and elimination of N+1 anti-patterns. These are the practical techniques used in production environments to achieve sub-100ms response times at scale.

## 18.1 Query Refactoring Patterns

Query structure fundamentally determines execution efficiency. These patterns transform computationally expensive operations into index-friendly, set-based operations.

### 18.1.1 Eliminating SELECT * and Projection Pushdown

```sql
-- Anti-pattern: SELECT * with ORM defaults
SELECT * FROM orders 
WHERE customer_id = 123 
ORDER BY created_at DESC 
LIMIT 20;

-- Problems:
-- 1. Retrieves TOAST columns (large text/json) unnecessarily
-- 2. Prevents index-only scans (must fetch heap for all columns)
-- 3. Increases network bandwidth
-- 4. Breaks covering index strategies

-- Refactored: Explicit column selection
SELECT order_id, total_amount, status, created_at 
FROM orders 
WHERE customer_id = 123 
ORDER BY created_at DESC 
LIMIT 20;

-- Benefits:
-- 1. Enables index-only scan if covering index exists:
--    CREATE INDEX idx_orders_cust_created_cover ON orders(customer_id, created_at) 
--    INCLUDE (order_id, total_amount, status);
-- 2. Heap fetches reduced from 20 full rows to 0 (if covering) or minimal
-- 3. TOAST data (large JSONB descriptions) not fetched

-- Projection pushdown in CTEs (PostgreSQL 12+):
WITH recent_orders AS MATERIALIZED (
    SELECT order_id, customer_id, total FROM orders 
    WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders ro
JOIN customers c ON ro.customer_id = c.customer_id;

-- Without MATERIALIZED, PostgreSQL inlines CTE and may push projections down
-- With MATERIALIZED, CTE computes fully then projects
-- Use MATERIALIZED when CTE acts as optimization fence (preventing bad plans)
-- Avoid MATERIALIZED when CTE result large and further filtering possible
```

### 18.1.2 Subquery vs JOIN: Performance Characteristics

```sql
-- Scenario: Find customers with recent orders

-- Approach 1: Correlated Subquery (row-by-row execution)
SELECT customer_id, name 
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id 
      AND o.created_at > NOW() - INTERVAL '30 days'
);

-- Plan: Nested Loop Semi Join
-- For each customer (outer), probe orders index (inner)
-- Optimal when customers table small, orders index selective

-- Approach 2: JOIN with DISTINCT (set-based)
SELECT DISTINCT c.customer_id, c.name 
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id 
WHERE o.created_at > NOW() - INTERVAL '30 days';

-- Plan: Hash Join or Merge Join
-- Builds hash of orders first, then probes with customers
-- Optimal when orders table moderate size, many customers have recent orders

-- Approach 3: Semi-Join with IN (alternative syntax)
SELECT customer_id, name 
FROM customers 
WHERE customer_id IN (
    SELECT customer_id FROM orders 
    WHERE created_at > NOW() - INTERVAL '30 days'
);

-- Planner typically converts IN to JOIN or EXISTS automatically
-- Use EXISTS when null handling matters (IN fails with NULLs in subquery)

-- Decision matrix:
-- Small outer, selective inner index: Correlated subquery (Nested Loop)
-- Large outer, moderate inner: JOIN with DISTINCT (Hash/Merge)
-- Need additional columns from inner: JOIN (not EXISTS)
-- Anti-patterns (NOT IN): Use NOT EXISTS instead (handles NULLs correctly)
```

### 18.1.3 LATERAL Joins for Top-N per Group

```sql
-- Problem: Find the 3 most recent orders per customer
-- Anti-pattern: Self-join with correlated subquery (O(n²))
SELECT c.customer_id, o.order_id, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IN (
    SELECT order_id 
    FROM orders o2 
    WHERE o2.customer_id = c.customer_id 
    ORDER BY created_at DESC 
    LIMIT 3
);

-- Inefficient: Correlated subquery runs per customer, no index optimization for LIMIT

-- Solution: LATERAL join (row-by-row with optimizer support)
SELECT c.customer_id, o.order_id, o.total, o.created_at
FROM customers c
LEFT JOIN LATERAL (
    SELECT order_id, total, created_at
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY created_at DESC
    LIMIT 3
) o ON true
WHERE c.status = 'active';

-- Plan structure:
-- Nested Loop Left Join
--   -> Index Scan on customers (filtered by status)
--   -> Limit (child of Nested Loop)
--         -> Index Scan on orders (using idx_orders_customer_created)
--               Index Cond: (customer_id = c.customer_id)

-- Key advantages:
-- 1. Index on (customer_id, created_at DESC) used efficiently for each customer
-- 2. Limit 3 applied per customer before join (minimal rows fetched)
-- 3. No sorting of entire result set

-- Alternative for small groups: Window functions
SELECT customer_id, order_id, total, created_at
FROM (
    SELECT 
        c.customer_id, 
        o.order_id, 
        o.total, 
        o.created_at,
        ROW_NUMBER() OVER (
            PARTITION BY c.customer_id 
            ORDER BY o.created_at DESC
        ) as rn
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    WHERE c.status = 'active'
) sub
WHERE rn <= 3;

-- Comparison:
-- LATERAL: Best when few customers, many orders per customer
-- Window functions: Best when many customers, scanning all orders acceptable
-- LATERAL uses index efficiently for each probe; Window scans then filters
```

### 18.1.4 Window Functions vs Self-Joins

```sql
-- Problem: Calculate running total of orders per customer

-- Anti-pattern: Self-join (quadratic growth)
SELECT 
    a.customer_id,
    a.order_id,
    a.created_at,
    SUM(b.total) as running_total
FROM orders a
JOIN orders b ON a.customer_id = b.customer_id 
    AND b.created_at <= a.created_at
GROUP BY a.customer_id, a.order_id, a.created_at;

-- Complexity: O(n²) - joins every row with all previous rows
-- Fails beyond few thousand rows per customer

-- Solution: Window functions (linear complexity)
SELECT 
    customer_id,
    order_id,
    created_at,
    SUM(total) OVER (
        PARTITION BY customer_id 
        ORDER BY created_at
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM orders;

-- Plan: WindowAgg with Sort (or Index Scan if covering index exists)
-- Complexity: O(n log n) due to sort, then O(n) for aggregation
-- 1000x faster for large partitions

-- Advanced window frame: Moving average
SELECT 
    customer_id,
    created_at,
    total,
    AVG(total) OVER (
        PARTITION BY customer_id 
        ORDER BY created_at
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as seven_order_avg
FROM orders;

-- Frame exclusion (exclude current row from calculation):
SUM(total) OVER (
    PARTITION BY customer_id 
    ORDER BY created_at
    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) as total_before_this_order
```

### 18.1.5 CTE Optimization Strategies

```sql
-- PostgreSQL 12+ behavior: CTEs inlined by default (optimization fence removed)
-- PostgreSQL 11 and earlier: CTEs always materialized (optimization fence)

-- Anti-pattern: Unnecessary materialization (PostgreSQL 12+)
WITH recent_orders AS (
    SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '1 day'
)
SELECT * FROM recent_orders 
WHERE customer_id = 123;

-- In PostgreSQL 12+, this inlines to:
-- SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '1 day' AND customer_id = 123;
-- Can use index on customer_id efficiently

-- When to force materialization (optimization fence):
WITH monthly_stats AS MATERIALIZED (
    SELECT 
        customer_id, 
        COUNT(*) as order_count,
        SUM(total) as revenue
    FROM orders
    WHERE created_at > NOW() - INTERVAL '30 days'
    GROUP BY customer_id
)
SELECT * FROM monthly_stats 
WHERE order_count > 5;

-- MATERIALIZED prevents planner from pushing filters into aggregation
-- Good when: CTE result small, but source tables huge, and filter is selective
-- Forces aggregation completion before filtering

-- CTE for recursive queries (hierarchies):
WITH RECURSIVE subordinates AS (
    -- Anchor: Direct reports
    SELECT employee_id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id = 123
    
    UNION ALL
    
    -- Recursive: Next level down
    SELECT e.employee_id, e.name, e.manager_id, s.level + 1
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.employee_id
    WHERE s.level < 5  -- Prevent infinite recursion
)
SELECT * FROM subordinates;

-- Recursive CTEs always materialize per iteration
-- Ensure recursion terminates (cycle detection or depth limit)
-- Index on manager_id critical for performance
```

## 18.2 Index Selection Workflow

Creating the right index requires analyzing query patterns, selectivity, and write overhead systematically.

### 18.2.1 Decision Tree for Index Creation

```sql
-- Step 1: Identify candidate queries (slow query log, pg_stat_statements)
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time,
    rows
FROM pg_stat_statements
WHERE query LIKE '%orders%'
ORDER BY total_exec_time DESC
LIMIT 10;

-- Step 2: Analyze query patterns with EXPLAIN
EXPLAIN (ANALYZE, BUFFERS)
SELECT order_id, total, status 
FROM orders 
WHERE customer_id = 123 
  AND created_at BETWEEN '2024-01-01' AND '2024-01-31';

-- Current plan shows: Seq Scan, high buffers
-- Decision: Needs index

-- Step 3: Determine selectivity of each predicate
SELECT 
    COUNT(DISTINCT customer_id) as unique_customers,
    COUNT(*) as total_orders,
    COUNT(DISTINCT customer_id)::float / COUNT(*) as customer_selectivity,
    AVG(CASE WHEN created_at BETWEEN '2024-01-01' AND '2024-01-31' 
        THEN 1.0 ELSE 0.0 END) as date_selectivity
FROM orders;

-- Rule: Place most selective equality first in composite index
-- If customer_id = 123 returns 0.1% of rows, and date range returns 10%,
-- Index should be (customer_id, created_at) not (created_at, customer_id)

-- Step 4: Check for covering index opportunity
-- Query selects: order_id, total, status
-- Index (customer_id, created_at) enables Index Scan
-- Index (customer_id, created_at) INCLUDE (order_id, total, status) 
--   enables Index-Only Scan (no heap access)

-- Step 5: Validate index effectiveness
CREATE INDEX CONCURRENTLY idx_orders_customer_created_cover 
ON orders(customer_id, created_at) 
INCLUDE (order_id, total, status);

-- Verify usage:
EXPLAIN (ANALYZE, BUFFERS)
SELECT order_id, total, status 
FROM orders 
WHERE customer_id = 123 
  AND created_at BETWEEN '2024-01-01' AND '2024-01-31';

-- Should show: Index Only Scan using idx_orders_customer_created_cover
-- Heap Fetches: 0 (or very low if visibility map stale)

-- Step 6: Monitor for 48 hours then decide
SELECT 
    indexrelname,
    idx_scan,  -- Should increase if query runs frequently
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_orders_customer_created_cover';

-- If idx_scan = 0 after 48 hours, index unused - drop it
```

### 18.2.2 Composite Index Strategy

```sql
-- Multi-column index ordering rules:
-- 1. Equality columns (=) before range columns (<, >, BETWEEN)
-- 2. High selectivity before low selectivity (for equality)
-- 3. Columns used for ORDER BY after predicates
-- 4. Include columns for covering after key columns

-- Example: Query patterns on events table
-- Pattern A: WHERE device_id = ? AND event_time BETWEEN ? AND ?
-- Pattern B: WHERE device_id = ? AND event_type = ? AND event_time > ?
-- Pattern C: WHERE event_type = ? ORDER BY event_time

-- Analysis:
-- Pattern A: (device_id, event_time) - device_id equality, event_time range
-- Pattern B: (device_id, event_type, event_time) - two equalities, one range
-- Pattern C: (event_type, event_time) - equality + order by

-- Consolidation strategy:
-- Option 1: Single index for A and B
CREATE INDEX idx_events_device_type_time 
ON events(device_id, event_type, event_time);

-- Pattern A uses first two columns (device_id equality, event_time range)
--   But cannot skip event_type! Index usable if event_type is any value (IN clause)
-- Pattern B uses all three perfectly

-- Option 2: Separate indexes (higher write cost, better read performance)
CREATE INDEX idx_events_device_time ON events(device_id, event_time);
CREATE INDEX idx_events_device_type_time ON events(device_id, event_type, event_time);

-- Option 3: Partial index for Pattern C (if event_type low cardinality)
CREATE INDEX idx_events_type_time ON events(event_type, event_time) 
WHERE event_type IN ('error', 'critical');

-- Decision criteria:
-- If Pattern A frequent and event_type not selective: Option 1
-- If Pattern B dominant: Option 1 (optimized for B)
-- If write performance critical: Minimize indexes (Option 1 or 2, not both)
```

### 18.2.3 Partial Index Opportunities

```sql
-- High-write tables: Index only "hot" data (recent, active, pending)

-- Example: Orders table with 90% completed, 10% pending/processing
-- Query pattern: Check pending orders frequently
-- Full index on status is wasteful (90% of index rarely used)

-- Solution: Partial index on active statuses only
CREATE INDEX idx_orders_pending ON orders(created_at, priority) 
WHERE status IN ('pending', 'processing', 'on_hold');

-- Benefits:
-- 1. Index size ~10% of full index
-- 2. Faster inserts (completed orders don't touch this index)
-- 3. Better cache locality (frequently accessed data in compact index)

-- Query must match predicate exactly:
SELECT * FROM orders 
WHERE status = 'pending' AND created_at < NOW() - INTERVAL '1 hour';
-- Uses index (status IN pending matches predicate)

SELECT * FROM orders 
WHERE status IN ('pending', 'processing') AND created_at < NOW() - INTERVAL '1 hour';
-- Uses index (matches partial index condition)

SELECT * FROM orders 
WHERE status = 'completed';
-- Sequential scan (not in partial index, which is correct)

-- Unique partial indexes for conditional uniqueness:
CREATE UNIQUE INDEX idx_unique_email_active ON users(email) 
WHERE deleted_at IS NULL;
-- Allows: alice@example.com (active), then soft delete, then new alice@example.com
-- Prevents: Two active users with same email
```

## 18.3 Batch Operations and Round-Trip Minimization

Database round-trips are often the bottleneck, not query complexity. Batch operations amortize network latency and transaction overhead.

### 18.3.1 Bulk Insert Strategies

```sql
-- Anti-pattern: Individual inserts in loop (N round trips)
-- Application code:
-- for order in orders:
--     db.execute("INSERT INTO orders (...) VALUES (...)", order)  -- N trips

-- Solution 1: Multi-row VALUES (up to 1000 rows per statement)
INSERT INTO orders (customer_id, total, status) 
VALUES 
    (1, 100.00, 'pending'),
    (2, 200.00, 'pending'),
    (3, 150.00, 'completed'),
    -- ... up to 1000 rows
ON CONFLICT (order_id) DO UPDATE 
SET status = EXCLUDED.status, updated_at = NOW();

-- Benefits:
-- 1. Single parse/plan cycle
-- 2. Single network round trip
-- 3. Optimized WAL (single transaction)
-- 4. Foreign key checks batched

-- Solution 2: COPY protocol (fastest for large loads)
-- Application uses COPY FROM STDIN or file
COPY orders (customer_id, total, status) FROM STDIN WITH (FORMAT CSV);
-- Or from file (superuser only usually):
COPY orders FROM '/data/orders.csv' WITH (FORMAT CSV, HEADER);

-- Performance: 10-50x faster than individual inserts
-- Bypasses SQL layer, writes directly to heap
-- Triggers fire per batch, not per row (check trigger logic)

-- Solution 3: Unnest for dynamic batching (prepared statement friendly)
INSERT INTO orders (customer_id, total, status)
SELECT * FROM UNNEST(
    $1::int[],      -- customer_ids
    $2::numeric[],  -- totals  
    $3::text[]      -- statuses
) AS t(customer_id, total, status);

-- Application passes arrays:
-- execute(query, [array_of_ids], [array_of_totals], [array_of_statuses])
-- Single round trip, dynamic batch size
```

### 18.3.2 Batch Updates with CTID

```sql
-- Problem: Update millions of rows without locking table for duration
-- Anti-pattern:
UPDATE large_table SET status = 'archived' WHERE created_at < '2023-01-01';
-- Locks table for minutes/hours, generates massive WAL

-- Solution: Batch updates by CTID (physical row ID)
DO $$
DECLARE
    batch_size CONSTANT int := 1000;
    rows_updated int;
BEGIN
    LOOP
        UPDATE large_table 
        SET status = 'archived'
        WHERE ctid IN (
            SELECT ctid 
            FROM large_table 
            WHERE status != 'archived' 
              AND created_at < '2023-01-01'
            LIMIT batch_size
        );
        
        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;
        
        COMMIT;  -- Release locks every batch
        PERFORM pg_sleep(0.1);  -- Brief pause for concurrent queries
        
        -- Optional: Check replication lag and pause if needed
    END LOOP;
END $$;

-- CTID advantages:
-- 1. Fastest possible row identification (no index lookup)
-- 2. No sorting overhead
-- 3. Minimal locking per batch

-- Alternative: Keyset pagination for batching
UPDATE large_table 
SET status = 'archived'
WHERE id IN (
    SELECT id 
    FROM large_table 
    WHERE status != 'archived' 
      AND created_at < '2023-01-01'
    ORDER BY id
    LIMIT 1000
);
-- Slower than CTID (requires index scan) but works with primary keys
-- Better for replication (CTID changes after VACUUM FULL)
```

### 18.3.3 Array Operations vs JOINs

```sql
-- Scenario: Check if any of user's tags match campaign tags

-- Anti-pattern: JOIN with DISTINCT (expensive deduplication)
SELECT DISTINCT u.user_id, u.email
FROM users u
JOIN user_tags ut ON u.user_id = ut.user_id
JOIN campaign_tags ct ON ut.tag = ct.tag
WHERE ct.campaign_id = 123;

-- Solution 1: Array containment (if tags stored as arrays)
SELECT user_id, email 
FROM users 
WHERE tags && ARRAY['postgres', 'database', 'performance'];
-- && = overlap operator
-- Uses GIN index on tags array: CREATE INDEX idx_user_tags ON users USING GIN(tags);

-- Solution 2: EXISTS with LIMIT (stops at first match)
SELECT u.user_id, u.email
FROM users u
WHERE EXISTS (
    SELECT 1 FROM user_tags ut
    JOIN campaign_tags ct ON ut.tag = ct.tag
    WHERE ut.user_id = u.user_id 
      AND ct.campaign_id = 123
    LIMIT 1
);

-- Solution 3: Integer arrays for many-to-many (space efficient)
-- Store tags as int[] referencing tag_ids instead of text
-- Smaller indexes, faster comparisons

-- Solution 4: Bulk lookup with VALUES
WITH target_tags(tag) AS (
    VALUES ('postgres'), ('database'), ('performance')
)
SELECT DISTINCT u.user_id, u.email
FROM users u
JOIN user_tags ut ON u.user_id = ut.user_id
JOIN target_tags tt ON ut.tag = tt.tag;
-- Efficient for small tag lists (under 100)
```

## 18.4 Eliminating N+1 Queries

The N+1 query pattern occurs when application queries parent rows, then iterates to query children for each parent, resulting in N+1 round trips.

### 18.4.1 The N+1 Problem Defined

```sql
-- Application pattern causing N+1:
-- customers = db.query("SELECT * FROM customers LIMIT 100")
-- for customer in customers:
--     orders = db.query("SELECT * FROM orders WHERE customer_id = ?", customer.id)
-- Total queries: 1 + 100 = 101 queries

-- PostgreSQL perspective: 100 separate queries with identical plan:
-- Index Scan using idx_orders_customer on orders
--   Index Cond: (customer_id = $1)
--   Planning Time: 0.5ms * 100 = 50ms overhead
--   Execution Time: 0.2ms * 100 = 20ms
--   Network round trips: 100 * latency (2ms) = 200ms
-- Total: 270ms for simple operation

-- Solution 1: JOIN (single query)
SELECT c.*, o.order_id, o.total, o.status
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id IN (SELECT customer_id FROM customers LIMIT 100);

-- Returns: 100 * avg_orders_per_customer rows (cartesian product)
-- Application must group by customer_id to reconstruct objects
-- Single round trip, but transfers redundant customer data

-- Solution 2: Aggregation (optimal for limited children)
SELECT 
    c.customer_id,
    c.name,
    c.email,
    COALESCE(
        jsonb_agg(
            jsonb_build_object(
                'order_id', o.order_id,
                'total', o.total,
                'status', o.status
            ) ORDER BY o.created_at DESC
        ) FILTER (WHERE o.order_id IS NOT NULL),
        '[]'::jsonb
    ) as orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id IN (SELECT customer_id FROM customers LIMIT 100)
GROUP BY c.customer_id, c.name, c.email;

-- Returns: Exactly 100 rows
-- Orders embedded as JSONB array
-- Application parses single row into parent + children
-- Network efficient, single query
```

### 18.4.2 LATERAL for Limited Children

```sql
-- Scenario: Get top 5 orders per customer (limit children per parent)

-- Solution: LATERAL with LIMIT (as shown in 18.1.3, expanded here)
SELECT 
    c.customer_id,
    c.name,
    o.order_id,
    o.total,
    o.created_at
FROM customers c
LEFT JOIN LATERAL (
    SELECT order_id, total, created_at
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY o.created_at DESC
    LIMIT 5
) o ON true
WHERE c.segment = 'premium';

-- Plan characteristics:
-- Nested Loop Left Join
--   -> Seq Scan on customers (filtered by segment)
--   -> Limit
--         -> Index Scan on orders 
--              Index Cond: (customer_id = c.customer_id)

-- Scalability: O(customers * log(orders_per_customer))
-- Efficient even with 10,000 customers (10k index lookups)
-- vs JOIN which would sort/filter 50k rows (10k * 5)
```

### 18.4.3 Array Aggregation Pattern

```sql
-- Alternative to JSONB: Array aggregation for scalar children
SELECT 
    c.customer_id,
    c.name,
    array_agg(o.order_id ORDER BY o.created_at) as order_ids,
    array_agg(o.total ORDER BY o.created_at) as totals
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.created_at > '2024-01-01'
GROUP BY c.customer_id, c.name;

-- Returns parallel arrays:
-- customer_id | name | order_ids    | totals
-- 1           | Alice| {101,102,103}| {100,200,150}

-- Application reconstructs by array index
-- More compact than JSONB for simple data
-- Faster parsing in some languages

-- For fixed number of children (pivot pattern):
SELECT 
    customer_id,
    max(case when rn = 1 then order_id end) as order_1_id,
    max(case when rn = 1 then total end) as order_1_total,
    max(case when rn = 2 then order_id end) as order_2_id,
    max(case when rn = 2 then total end) as order_2_total
FROM (
    SELECT 
        c.customer_id,
        o.order_id,
        o.total,
        row_number() OVER (PARTITION BY c.customer_id ORDER BY o.created_at DESC) as rn
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
) sub
WHERE rn <= 2
GROUP BY customer_id;
-- Returns flat structure (good for CSV export, simple APIs)
```

## 18.5 Slow Endpoint Diagnostics Checklist

Systematic diagnosis prevents guesswork when production queries degrade.

### 18.5.1 Immediate Checks (Under 2 Minutes)

```sql
-- 1. Check for locks blocking query
SELECT 
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity 
    ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks 
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.relation = blocked_locks.relation
    AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity 
    ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- 2. Check query progress (PostgreSQL 13+)
SELECT 
    pid,
    query,
    backend_type,
    wait_event_type,
    wait_event,
    query_start,
    now() - query_start as duration
FROM pg_stat_activity 
WHERE state = 'active' 
  AND query ILIKE '%your_table%'
  AND now() - query_start > interval '10 seconds';

-- 3. Check for waiting on I/O (buffer cache miss)
-- Run EXPLAIN (ANALYZE, BUFFERS) on slow query
-- If shared read >> shared hit, cache cold or working set too large

-- 4. Check table bloat (affects seq scan speed)
SELECT 
    schemaname,
    tablename,
    n_dead_tup,
    n_live_tup,
    round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 2) as dead_pct
FROM pg_stat_user_tables
WHERE tablename = 'orders';
-- If dead_pct > 20%, table bloated (needs vacuum or reindex)
```

### 18.5.2 Query Plan Analysis

```sql
-- Check for plan regression (different plan than yesterday):
-- Compare current EXPLAIN output to historical plans (if logged)

-- Key indicators of bad plan:
-- 1. Seq Scan on large table with selective filter
--    - Fix: Missing index or statistics stale (ANALYZE)
-- 2. Nested Loop with high loops count (>1000)
--    - Fix: Usually bad for large outer tables, force hash join with enable_nestloop=off
-- 3. High "Rows Removed by Filter" relative to rows returned
--    - Fix: Index not selective enough, consider partial index
-- 4. Sort Method: external merge (disk sort)
--    - Fix: Increase work_mem or add ORDER BY matching index
-- 5. High Buffer reads with low row counts (inefficient index)
--    - Fix: Check for implicit casts or function usage on indexed column

-- Quick statistics refresh (if plan looks wrong):
ANALYZE (VERBOSE) orders;  -- Update stats for suspected table
```

### 18.5.3 System-Level Checks

```sql
-- Check connection pool saturation:
SELECT 
    state,
    count(*)
FROM pg_stat_activity
GROUP BY state;
-- If 'active' count approaches max_connections, pool exhaustion

-- Check index usage (unused indexes waste write performance):
SELECT 
    schemaname,
    tablename,
    indexrelname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check for missing foreign key indexes (cause table locks on parent delete):
SELECT
    tc.table_name, 
    kcu.column_name,
    ccu.table_name AS foreign_table_name,
    CASE 
        WHEN EXISTS (
            SELECT 1 FROM pg_indexes 
            WHERE tablename = tc.table_name 
            AND indexdef LIKE '%(' || kcu.column_name || ')%'
        ) THEN 'Indexed'
        ELSE 'MISSING INDEX - Add immediately'
    END as index_status
FROM 
    information_schema.table_constraints AS tc 
    JOIN information_schema.key_column_usage AS kcu
      ON tc.constraint_name = kcu.constraint_name
    JOIN information_schema.constraint_column_usage AS ccu
      ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';

-- Check replication lag (if on replica):
SELECT 
    extract(epoch from (now() - pg_last_xact_replay_timestamp())) as lag_seconds
WHERE pg_is_in_recovery();
-- If lag > 10 seconds on OLTP system, check WAL generation rate
```

### 18.5.4 Vacuum and Maintenance Status

```sql
-- Check auto-vacuum health:
SELECT 
    relname,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    n_live_tup,
    n_dead_tup,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze,
    vacuum_count,
    autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 10;

-- If last_autovacuum is old and n_dead_tup high:
-- 1. Check if autovacuum is running: SELECT * FROM pg_stat_activity WHERE query LIKE 'autovacuum:%';
-- 2. May need aggressive settings: ALTER TABLE table_name SET (autovacuum_vacuum_scale_factor = 0.01);
-- 3. Or manual: VACUUM ANALYZE table_name;

-- Check for long-running transactions (prevent vacuum progress):
SELECT 
    pid,
    usename,
    application_name,
    state,
    now() - xact_start as xact_duration,
    now() - query_start as query_duration,
    left(query, 100) as query_snippet
FROM pg_stat_activity
WHERE xact_start < now() - interval '5 minutes'
  AND state != 'idle'
ORDER BY xact_start;
-- Long transactions hold back vacuum, causing bloat
```

---

## Chapter Summary

In this chapter, you learned:

1. **Query Refactoring**: Replace `SELECT *` with explicit columns to enable index-only scans. Use `EXISTS` instead of `IN` for semi-joins with potential NULLs. Prefer window functions over self-joins for running totals (O(n log n) vs O(n²)). Use `LATERAL` joins for top-N-per-group queries rather than correlated subqueries.

2. **Index Selection Workflow**: Identify slow queries via `pg_stat_statements`. Place equality columns before range columns in composite indexes. Use partial indexes for selective filtering on high-churn tables (indexing only "active" rows). Verify effectiveness with `EXPLAIN (ANALYZE, BUFFERS)` and monitor `pg_stat_user_indexes` for usage.

3. **Batch Operations**: Use multi-row `INSERT ... VALUES` (up to 1000 rows) or `COPY` for bulk loading. Implement CTID-based batching for large updates to prevent long transactions and table bloat. Replace iterative single-row updates with array `UNNEST` operations to minimize round trips.

4. **N+1 Elimination**: Replace application-layer loops with single queries using `JOIN` + aggregation or `LATERAL` with `LIMIT`. Use `jsonb_agg()` or `array_agg()` to return hierarchical data in flat result sets. `LATERAL` joins provide optimal performance for fetching limited children per parent (top-N pattern).

5. **Diagnostics Checklist**: Check for blocking locks (`pg_locks`), connection pool saturation (`pg_stat_activity` state counts), and plan regression (compare estimates vs actuals in `EXPLAIN`). Verify table bloat (`n_dead_tup` ratio) and missing foreign key indexes. Monitor replication lag and long-running transactions that prevent vacuum progress.

**Next:** In Chapter 19, we will explore Transactions and MVCC—covering isolation levels, row versioning, visibility rules, and the practical implications of PostgreSQL's multi-version concurrency control for application correctness.