# Chapter 17: EXPLAIN Like You Mean It

Reading execution plans is the definitive skill for PostgreSQL performance tuning. This chapter transforms EXPLAIN output from cryptic text into actionable intelligence, covering interpretation methodologies, buffer analysis, anti-pattern detection, and statistical rigor when validating optimizations.

## 17.1 EXPLAIN Fundamentals

PostgreSQL provides multiple EXPLAIN modes that serve different diagnostic purposes. Understanding when to use each format prevents misinterpretation of performance characteristics.

### 17.1.1 EXPLAIN vs EXPLAIN ANALYZE

```sql
-- EXPLAIN (estimates only): Shows planner's cost model predictions
EXPLAIN SELECT * FROM users WHERE user_id = 123;

-- Output:
-- Index Scan using users_pkey on users  (cost=0.29..8.30 rows=1 width=72)
--   Index Cond: (user_id = 123)

-- Key characteristics:
-- - No query execution (fast, safe on production)
-- - Shows estimated costs (arbitrary units) and row counts
-- - Reveals chosen plan without runtime overhead
-- - Cannot detect runtime issues (memory spills, lock contention)

-- EXPLAIN ANALYZE (actual execution): Runs query and compares estimates to reality
EXPLAIN (ANALYZE) SELECT * FROM users WHERE user_id = 123;

-- Output adds:
-- (actual time=0.012..0.013 rows=1 loops=1)
-- Planning Time: 0.150 ms
-- Execution Time: 0.025 ms

-- Critical differences:
-- - Actually executes the query (writes happen, locks acquired)
-- - Shows actual time (milliseconds) and actual row counts
-- - Planning Time: Parser + Rewriter + Planner duration
-- - Execution Time: Executor runtime (excluding planning)
-- - loops: How many times node executed (crucial for nested loops)

-- DANGER: EXPLAIN ANALYZE executes writes!
EXPLAIN (ANALYZE) DELETE FROM orders WHERE status = 'pending';
-- Actually deletes rows! Use transactions for safety:
BEGIN;
EXPLAIN (ANALYZE) DELETE FROM orders WHERE status = 'pending';
ROLLBACK;

-- EXPLAIN (ANALYZE, BUFFERS): Adds I/O statistics (essential for performance)
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE user_id = 123;

-- Additional output:
-- Buffers: shared hit=3
-- - shared hit: Block found in shared buffer cache (fast)
-- - shared read: Block read from disk (slow)
-- - shared dirtied: Block modified in cache
-- - shared written: Block written to disk by checkpoint/bgwriter
-- - local hits/read: For temporary tables
-- - temp read/write: For work_mem spills to disk
```

### 17.1.2 Output Formats

```sql
-- TEXT (default): Human-readable, compact
EXPLAIN (FORMAT TEXT) SELECT * FROM users;

-- JSON (machine-readable, programmatic analysis):
EXPLAIN (FORMAT JSON) SELECT * FROM users;
-- Useful for: Automated plan analysis tools, diffing plans, storage

-- XML (verbose, tool integration):
EXPLAIN (FORMAT XML) SELECT * FROM users;

-- YAML (structured, moderately readable):
EXPLAIN (FORMAT YAML) SELECT * FROM users;

-- Recommended combination for deep analysis:
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON, COSTS, VERBOSE, TIMING)
SELECT * FROM users WHERE email = 'test@example.com';

-- COSTS: Show cost calculations (can disable for cleaner output)
-- VERBOSE: Show additional info (schema names, column names, partition pruning)
-- TIMING: Include actual time (can disable if timing overhead concerns)
-- SETTINGS: Include modified configuration parameters (PostgreSQL 12+)
```

## 17.2 Reading Plans from the Bottom Up

Execution plans are trees where data flows from leaf nodes (scans) through intermediate nodes (joins, sorts) to the root. Reading bottom-up reveals the actual execution flow.

### 17.2.1 Node Structure and Indentation

```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.email, COUNT(o.order_id)
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
  AND o.created_at > '2024-01-01'
GROUP BY u.email
ORDER BY COUNT(o.order_id) DESC
LIMIT 10;

-- Typical plan structure:
-- Limit  (cost=100.00..110.00 rows=10 width=36) (actual time=5.234..5.240 rows=10 loops=1)
--   -> Sort  (cost=100.00..105.00 rows=100 width=36) (actual time=5.230..5.232 rows=10 loops=1)
--         Sort Key: (count(o.order_id)) DESC
--         Sort Method: top-N heapsort  Memory: 26kB
--         -> HashAggregate  (cost=50.00..60.00 rows=100 width=36) (actual time=4.800..4.900 rows=150 loops=1)
--               Group Key: u.email
--               Batches: 1  Memory Usage: 40kB
--               -> Hash Join  (cost=20.00..40.00 rows=500 width=72) (actual time=0.500..3.200 rows=1000 loops=1)
--                     Hash Cond: (o.user_id = u.user_id)
--                     -> Seq Scan on orders o  (cost=0.00..15.00 rows=500 width=16) (actual time=0.100..2.000 rows=5000 loops=1)
--                           Filter: (created_at > '2024-01-01')
--                           Rows Removed by Filter: 45000
--                     -> Hash  (cost=15.00..15.00 rows=1000 width=72) (actual time=0.200..0.200 rows=1000 loops=1)
--                           Buckets: 1024  Batches: 1  Memory Usage: 50kB
--                           -> Seq Scan on users u  (cost=0.00..15.00 rows=1000 width=72) (actual time=0.050..0.150 rows=1000 loops=1)
--                                 Filter: (status = 'active')
--                                 Rows Removed by Filter: 500

-- Reading methodology:
-- 1. Start at bottom: Seq Scan on users (line 15)
--    - Reads users table, filters for status='active'
--    - Returns 1000 rows (actual), estimated 1000 (rows=1000)
--    - Cost: 0.00..15.00 (startup..total)
--    - Time: 0.050ms to first row, 0.150ms total
--    - Filter removed 500 rows (1500 total, 1000 passed)

-- 2. Next up: Hash (line 13)
--    - Builds hash table from users scan
--    - Memory: 50kB (fits in work_mem)
--    - Buckets: 1024 (hash table size)

-- 3. Next: Seq Scan on orders (line 9)
--    - Reads orders table
--    - Filter: created_at > '2024-01-01'
--    - Returns 5000 rows, but removed 45000 (high selectivity filter)
--    - This is the "outer" table for the hash join

-- 4. Hash Join (line 7)
--    - Joins orders (outer) with users hash (inner)
--    - Hash Cond: user_id match
--    - Returns 1000 rows (actual)
--    - loops=1 (executed once)

-- 5. HashAggregate (line 5)
--    - Groups by email, counts orders
--    - Memory: 40kB (hash table for groups)
--    - Returns 150 groups

-- 6. Sort (line 3)
--    - Sorts by count DESC
--    - Method: top-N heapsort (efficient for LIMIT)
--    - Only sorts enough to get top 10, not full result

-- 7. Limit (root, line 1)
--    - Stops after 10 rows
--    - Final execution time: 5.234ms

-- Indentation meaning:
-- -> indicates child-parent relationship
-- Nodes at same indentation level are siblings
-- Data flows upward (child to parent)
```

### 17.2.2 Understanding Cost Components

```sql
-- Cost format: (startup_cost..total_cost)
-- startup_cost: Cost to produce first row (e.g., sort must complete first)
-- total_cost: Cost to produce all rows

-- Example where startup matters:
EXPLAIN (ANALYZE)
SELECT * FROM orders ORDER BY total DESC;

-- Sort node:
-- (cost=15000.00..17500.00 rows=100000 width=72)
-- startup=15000: Must read and sort all rows before returning first
-- total=17500: Sorting cost + scanning cost

-- Contrast with Index Scan:
-- Index Scan using idx_orders_total (cost=0.29..3000.00 rows=100000 width=72)
-- startup=0.29: Can return first row immediately (index is sorted)
-- total=3000: Just the scan cost, no sort penalty

-- Width: Estimated bytes per row
-- Important for memory calculations (work_mem usage)
-- Wide rows (TOASTed data) consume more memory during sorts/hashes

-- Loops: Number of executions
-- Critical for nested loops (outer row count = loops)
EXPLAIN (ANALYZE)
SELECT * FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active';

-- Nested Loop (loops=100)
--   -> Index Scan on users (loops=1, returns 100 rows)
--   -> Index Scan on orders (loops=100, returns 5 rows per loop)
-- Total orders scanned: 100 loops * 5 rows = 500 rows
-- If loops shows high number, outer table is driving many iterations
```

## 17.3 Buffer Analysis and I/O Patterns

The BUFFERS option reveals physical I/O patterns, distinguishing memory-resident queries from disk-bound performance killers.

### 17.3.1 Interpreting Buffer Metrics

```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM large_table WHERE category_id = 5;

-- Output:
-- Seq Scan on large_table  (cost=0.00..15406.00 rows=50000 width=72)
--   (actual time=0.015..125.432 rows=50000 loops=1)
--   Filter: (category_id = 5)
--   Rows Removed by Filter: 50000
--   Buffers: shared read=10834, shared hit=200

-- Analysis:
-- shared read=10834: Read 10,834 pages from disk (slow)
-- shared hit=200: Found 200 pages in shared buffer cache (fast)
-- Total pages: 11,034 pages scanned
-- If table is 11,034 pages, this is a full table scan (expected for seq scan)

-- I/O time estimation:
-- shared read * 8KB = bytes read from disk
-- 10834 * 8KB = ~84MB read
-- On SSD (200MB/s): ~420ms expected I/O time
-- Actual time 125ms suggests some cache warming or readahead

-- Bad pattern: High shared read with index scan (cache pollution)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table WHERE id IN (SELECT id FROM small_table);

-- Nested Loop
--   -> Seq Scan on small_table
--   -> Index Scan using large_table_pkey
--        Buffers: shared read=1000000  -- Random I/O disaster
-- If outer table has 100k rows, that's 100k random lookups (cache misses)

-- Good pattern: Sequential I/O
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table WHERE created_at > '2024-01-01';

-- Seq Scan
--   Buffers: shared read=10000
-- Sequential read of 10k pages is much faster than 10k random reads
-- SSD throughput makes seq scan viable even for moderate selectivity
```

### 17.3.2 Work Memory and Disk Spills

```sql
-- Detecting work_mem spills (slow disk-based operations)
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
ORDER BY customer_id, created_at 
LIMIT 100;

-- Sort node indicators:
-- Sort Method: external merge  Disk: 5000kB
-- OR
-- Sort Method: quicksort  Memory: 4096kB

-- Analysis:
-- "external merge" = Spilled to disk (work_mem exceeded)
-- "quicksort" = In-memory sort (fast)
-- "top-N heapsort" = In-memory optimized for LIMIT

-- Hash operations spills:
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id;

-- HashAggregate
--   Batches: 4  Memory Usage: 1048576kB  Disk Usage: 2048kB
-- Batches > 1 indicates hash table didn't fit in work_mem
-- Disk Usage shows temp files written

-- Fix: Increase work_mem (session or globally)
SET work_mem = '128MB';  -- Per-operation limit
-- Or optimize query (better indexing, reduce GROUP BY cardinality)

-- Caution: work_mem is per-operation, per-connection
-- 100 connections * 128MB = 12.8GB potential memory usage
-- Set conservatively (default 4MB), increase only for specific queries
```

## 17.4 Common Anti-Patterns and Solutions

Execution plans reveal fundamental query flaws that prevent index usage or cause excessive resource consumption.

### 17.4.1 Functions on Indexed Columns

```sql
-- Anti-pattern: Function prevents index usage
EXPLAIN (ANALYZE)
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Seq Scan on users
--   Filter: (lower(email) = 'alice@example.com')
--   Rows Removed by Filter: 99999

-- Problem: Function on column prevents B-tree index usage
-- Index stores original values, not LOWER(values)
-- Planner must scan every row and apply function

-- Solution 1: Functional index (if function is immutable)
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now query uses Index Scan

-- Solution 2: Case-insensitive collation (if always case-insensitive)
-- Use citext extension:
CREATE EXTENSION IF NOT EXISTS citext;
ALTER TABLE users ALTER COLUMN email TYPE citext;
-- Now standard equality works case-insensitively with B-tree index

-- Solution 3: Normalize on write (application layer)
-- Store email_lower column, index that, query that column

-- Date function anti-pattern:
EXPLAIN SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2024;

-- Seq Scan with Filter: (EXTRACT(year FROM created_at) = 2024)
-- Cannot use index on created_at

-- Solution: Range query (SARGable)
EXPLAIN SELECT * FROM orders 
WHERE created_at >= '2024-01-01' 
  AND created_at < '2025-01-01';
-- Index Scan using idx_orders_created_at

-- Timestamp truncation anti-pattern:
EXPLAIN SELECT * FROM events 
WHERE DATE_TRUNC('day', event_time) = '2024-01-01';

-- Solution: Range query
WHERE event_time >= '2024-01-01' 
  AND event_time < '2024-01-02';
```

### 17.4.2 Implicit Type Conversion

```sql
-- Anti-pattern: Mismatched types cause function application
EXPLAIN SELECT * FROM users WHERE user_id = '123';
-- user_id is BIGINT, '123' is TEXT

-- Execution plan shows:
-- Filter: (user_id = '123'::bigint)
-- OR worse:
-- Filter: (to_char(user_id) = '123')  -- Cannot use index!

-- PostgreSQL applies to_char() to user_id (column), not to literal
-- This is a function on the column = Seq Scan

-- Solution: Explicit casting on literal side
EXPLAIN SELECT * FROM users WHERE user_id = 123::BIGINT;
-- Or ensure application sends correct type

-- Common type mismatch scenarios:
-- 1. UUIDs sent as text without cast
-- 2. Timestamps compared to text dates
-- 3. Integer IDs compared to floating point (JavaScript numbers)

-- JSONB type pitfalls:
EXPLAIN SELECT * FROM events WHERE payload->>'user_id' = 123;
-- payload->>'user_id' returns text, comparing to integer 123
-- Implicit cast on column side: ((payload ->> 'user_id'))::integer = 123
-- Cannot use GIN index on JSONB

-- Solution: Compare to text literal
WHERE payload->>'user_id' = '123';
-- Or create expression index on (payload->>'user_id')::integer
```

### 17.4.3 Leading Wildcards and Pattern Matching

```sql
-- Anti-pattern: Leading wildcard prevents index usage
EXPLAIN SELECT * FROM users WHERE email LIKE '%@example.com';

-- Seq Scan
-- Filter: (email ~~ '%@example.com'::text)

-- B-tree indexes support only prefix patterns: 'alice@%'
-- Leading wildcard requires full table scan

-- Solution 1: Trigram index (GIN or GiST)
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_email_trgm ON users USING GIN(email gin_trgm_ops);

-- Now LIKE '%@example.com' uses Bitmap Index Scan

-- Solution 2: Reverse string functional index (for suffix matching)
CREATE INDEX idx_users_email_reverse ON users(REVERSE(email));
-- Query: WHERE REVERSE(email) LIKE REVERSE('%@example.com')  -- becomes 'moc.elpmaxe%'

-- Solution 3: Full-text search (for word matching)
-- Use tsvector/tsquery instead of LIKE for word-level matching
```

### 17.4.4 OR Conditions and Union Alternatives

```sql
-- Anti-pattern: OR across different columns prevents index usage
EXPLAIN SELECT * FROM users 
WHERE email = 'test@example.com' 
   OR phone = '555-1234';

-- Seq Scan
-- Filter: ((email = 'test@example.com') OR (phone = '555-1234'))

-- B-tree index can only be used for single-column lookup
-- OR requires scanning both conditions (effectively full table)

-- Solution 1: Rewrite as UNION (can use two indexes)
EXPLAIN
SELECT * FROM users WHERE email = 'test@example.com'
UNION
SELECT * FROM users WHERE phone = '555-1234';

-- Plan:
-- Append
--   -> Index Scan using idx_users_email
--   -> Index Scan using idx_users_phone

-- Solution 2: Create composite index (if OR is actually AND)
-- If query should be AND not OR:
CREATE INDEX idx_users_email_phone ON users(email, phone);
-- Supports: WHERE email = 'x' AND phone = 'y'

-- Solution 3: Use pg_trgm for multi-column text search
-- If email and phone are both text, single GIN index on expression

-- Complex OR across tables:
SELECT * FROM orders 
WHERE customer_id = 123 OR status = 'urgent';
-- If customer_id is selective and status is not, planner may choose wrong index
-- Consider UNION approach or partial indexes
```

### 17.4.5 OFFSET Pagination Performance Cliff

```sql
-- Anti-pattern: OFFSET for pagination (linear slowdown)
EXPLAIN (ANALYZE)
SELECT * FROM orders 
ORDER BY created_at DESC 
LIMIT 10 OFFSET 10000;

-- Limit  (cost=1000.00..1010.00 rows=10 width=72)
--   -> Sort  (cost=1000.00..1025.00 rows=10010 width=72)
--         Sort Key: created_at DESC
--   -> Seq Scan on orders  (cost=0.00..500.00 rows=10010 width=72)

-- Analysis:
-- Must scan and sort 10,010 rows, discard first 10,000
-- Time increases linearly with OFFSET
-- At OFFSET 1,000,000, scanning 1M+ rows

-- Solution: Keyset pagination (seek method)
EXPLAIN (ANALYZE)
SELECT * FROM orders 
WHERE created_at < '2024-01-15 10:00:00'  -- Last seen value
ORDER BY created_at DESC 
LIMIT 10;

-- Index Scan using idx_orders_created
-- Index Cond: (created_at < '2024-01-15 10:00:00')
-- Constant time regardless of page number

-- Implementation requires:
-- 1. Unique sort key (or composite: created_at, id)
-- 2. Store last seen values from previous page
-- 3. Cannot jump to arbitrary page number (sequential navigation only)
```

## 17.5 Parameter Sniffing and Plan Stability

PostgreSQL caches execution plans, which can cause performance regression when data distribution varies significantly across parameter values.

### 17.5.1 Generic vs Custom Plans

```sql
-- Prepared statements and plan caching:
PREPARE get_orders_by_status (TEXT) AS
SELECT * FROM orders WHERE status = $1;

-- First 5 executions: Custom plans generated for specific parameter values
-- PostgreSQL estimates cost of generic plan vs custom plan
-- If custom plan is significantly cheaper, continues using custom plans
-- After 5 executions: May switch to generic plan (if cheaper overall)

-- Check which plan is being used:
EXPLAIN EXECUTE get_orders_by_status('pending');
-- vs
EXPLAIN EXECUTE get_orders_by_status('completed');

-- Problem: 'pending' might be 10 rows, 'completed' might be 1M rows
-- Plan optimized for 10 rows (nested loop) is disaster for 1M rows

-- Solution 1: Force custom plans
SET plan_cache_mode = 'force_custom_plan';
-- Generates new plan for each execution
-- Cost: Planning overhead on every execution (acceptable for OLTP, not for analytics)

-- Solution 2: Partitioning (separate plans per partition)
-- If orders partitioned by status, each partition has own statistics

-- Solution 3: Query structure that works for both
-- Use hash join instead of nested loop (robust for both small and large inputs)
-- Add hints via enable_nestloop = off (last resort)
```

### 17.5.2 Statistics and Correlation

```sql
-- Plan changes when statistics become stale:
-- Table grows from 10k to 10M rows, but stats still show 10k
EXPLAIN SELECT * FROM users WHERE created_at > NOW() - INTERVAL '1 day';
-- May show Seq Scan (thinks table is small)
-- Actually should be Index Scan (recent data is small % of large table)

-- Detecting statistics issues:
EXPLAIN (ANALYZE)
SELECT * FROM large_table WHERE rare_column = 'unique_value';
-- Index Scan (cost=0.29..8.30 rows=1 width=72)
-- (actual time=0.010..150.000 rows=50000 loops=1)
-- ^^^ Estimated 1 row, got 50,000 = statistics are wrong!

-- Fix:
ANALYZE large_table;
-- Or for specific columns:
ANALYZE large_table (rare_column);

-- Extended statistics for correlated columns:
CREATE STATISTICS stats_orders_status_date ON status, created_at FROM orders;
ANALYZE orders;
-- Helps planner understand that 'pending' orders are recent (correlated)
-- Without this, assumes independence (multiplies selectivities)
```

## 17.6 Measuring Improvements Correctly

Validating optimizations requires statistical rigor to distinguish genuine improvements from measurement noise.

### 17.6.1 Timing Methodology

```sql
-- Bad practice: Measuring once
EXPLAIN (ANALYZE) SELECT ...;  -- 5ms
-- Add index
EXPLAIN (ANALYZE) SELECT ...;  -- 3ms
-- Conclusion: "40% faster!" (maybe)

-- Good practice: Multiple samples with cache warming
-- 1. Run query 5 times to warm caches
-- 2. Run EXPLAIN (ANALYZE, TIMING) 10 times
-- 3. Discard outliers (first run often slower due to parsing)
-- 4. Average remaining times

-- Better: Use pg_bench or custom script for load testing
-- Single query timing != production performance under concurrency

-- Buffer analysis is more stable than timing:
-- Compare Buffers: shared read (disk I/O)
-- If shared read drops from 10000 to 10, improvement is real regardless of timing noise

-- EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for programmatic analysis:
-- Extract actual_time, actual_rows, shared_hit, shared_read
-- Calculate ratios: shared_hit / (shared_hit + shared_read) = cache hit ratio
```

### 17.6.2 Common Measurement Pitfalls

```sql
-- Pitfall 1: Cold cache vs warm cache
-- First run after restart: All shared read (slow)
-- Second run: All shared hit (fast)
-- Always specify which you're measuring:
-- "Cold cache performance" = after restart or DISCARD BUFFERS (superuser only)
-- "Warm cache performance" = after running query 3+ times

-- Pitfall 2: Planning time dominates
EXPLAIN (ANALYZE) SELECT * FROM complex_view WHERE id = 1;
-- Planning Time: 45.000 ms  -- Complex view expansion
-- Execution Time: 0.500 ms   -- Fast execution
-- "Optimization" that adds 1ms to execution but reduces planning by 40ms is win

-- Pitfall 3: Timing overhead of EXPLAIN ANALYZE itself
-- EXPLAIN ANALYZE adds instrumentation overhead (especially TIMING)
-- For very fast queries (<0.1ms), overhead can exceed actual execution time
-- Use EXPLAIN (ANALYZE, TIMING OFF) to reduce overhead (still counts rows, buffers)

-- Pitfall 4: Concurrency effects
-- Single-user EXPLAIN != multi-user performance
-- Lock contention, buffer cache eviction, I/O saturation not visible in isolation
-- Always test under realistic concurrency (pgbench, pgbench-tools, custom scripts)

-- Pitfall 5: Different data volumes
-- EXPLAIN on dev (10k rows) != production (10M rows)
-- Plan types change (nested loop -> hash join) at different scales
-- Use production-like data volumes for valid comparisons
```

## 17.7 Visual Plan Analysis Tools

While text EXPLAIN is essential, visualization tools reveal plan structure and bottlenecks more intuitively.

### 17.7.1 Using explain.depesz.com

```sql
-- Generate plan with all options:
EXPLAIN (ANALYZE, BUFFERS, COSTS, VERBOSE, FORMAT JSON)
YOUR_QUERY_HERE;

-- Copy JSON output to https://explain.depesz.com
-- Features:
-- 1. Hierarchical tree view with color coding
-- 2. "Rows x" column showing estimated vs actual row ratios
--    - Red: Underestimated (bad for nested loop outer tables)
--    - Blue: Overestimated (bad for memory allocation)
-- 3. Exclusive vs inclusive time breakdown
-- 4. Buffer usage visualization

-- Interpretation:
-- Look for "fat" nodes (high exclusive time)
-- Look for red rows (underestimation indicates statistics issues)
```

### 17.7.2 explain.dalibo.com (PEV2)

```sql
-- Alternative: https://explain.dalibo.com (PEV2: PostgreSQL Explain Visualizer)
-- Paste JSON plan output
-- Features:
-- 1. Flame graph style visualization
-- 2. I/O highlighting (high buffer usage nodes)
-- 3. Row estimation error highlighting
-- 4. Shareable URLs for team collaboration

-- Both tools support:
-- - Comparing before/after plans (side-by-side)
-- - Highlighting nodes with high "loops" counts (nested loop issues)
-- - Identifying sequential scans on large tables
```

## 17.8 Advanced Plan Diagnostics

### 17.8.1 Partition Pruning Verification

```sql
-- Check if partition pruning is working:
EXPLAIN (ANALYZE, VERBOSE)
SELECT * FROM events 
WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31';

-- Output should show:
-- Append
--   -> Index Scan using events_2024_01_pkey on events_2024_01
--         Index Cond: ((event_date >= '2024-01-01') AND (event_date <= '2024-01-31'))
--   -> Index Scan using events_2024_02_pkey on events_2024_02
--         Index Cond: ((event_date >= '2024-01-01') AND (event_date <= '2024-01-31'))
--   -> Index Scan using events_2024_03_pkey on events_2024_03
--         Index Cond: ((event_date >= '2024-01-01') AND (event_date <= '2024-01-31'))

-- If you see Seq Scan on events_2023_12 or events_2024_04, pruning failed
-- Check: Constraint exclusion enabled, partition bounds correct
```

### 17.8.2 Parallel Query Plans

```sql
-- Check parallel execution:
EXPLAIN (ANALYZE, VERBOSE)
SELECT COUNT(*) FROM large_table WHERE amount > 100;

-- Gather  (cost=1000.00..5000.00 rows=1 width=8)
--   Workers Planned: 2
--   Workers Launched: 2
--   -> Parallel Seq Scan on large_table
--         Filter: (amount > 100)
--         Rows Removed by Filter: 300000

-- Key indicators:
-- Workers Launched: Actually used parallel workers
-- If Workers Planned > Workers Launched: max_parallel_workers insufficient
-- Parallel Seq Scan / Parallel Index Scan / Parallel Bitmap Heap Scan

-- Not all operations parallelize:
-- Hash joins parallelize, nested loops do not (usually)
-- Sorts parallelize (Parallel Sort), but final gather is serial
-- Cursors disable parallelism
```

---

## Chapter Summary

In this chapter, you learned:

1. **EXPLAIN Modes**: `EXPLAIN` shows planner estimates (no execution); `EXPLAIN ANALYZE` executes and shows actuals; `EXPLAIN (ANALYZE, BUFFERS)` adds I/O statistics essential for performance analysis. Always use `BUFFERS` for disk I/O visibility.

2. **Plan Reading**: Read bottom-up (leaf scans to root). Indentation indicates parent-child relationships. `actual time=X..Y` shows first-row and total time; `loops=N` indicates iteration count (critical for nested loops); `rows=N` vs `actual rows=M` reveals estimation errors.

3. **Buffer Analysis**: `shared hit` = cache reads (fast); `shared read` = disk reads (slow). High `shared read` with index scans indicates cache misses or random I/O. `external merge` or `Batches: 2+` indicates `work_mem` spills to disk (performance killer).

4. **Anti-Patterns**: Functions on columns prevent index usage (use functional indexes or rewrite queries); implicit type conversions cause `to_char()` or casting on columns; leading wildcards (`%text`) require trigram indexes; `OR` conditions across columns force sequential scans (use `UNION` instead); `OFFSET` pagination is O(n) (use keyset pagination).

5. **Plan Stability**: Parameter sniffing occurs when generic plans optimized for one parameter value perform poorly for others. Use `plan_cache_mode = 'force_custom_plan'` for variable data distributions. Stale statistics cause cardinality misestimates (red rows in Depesz); run `ANALYZE` after significant data changes.

6. **Measurement Rigor**: Single timings are noise; measure buffer counts (deterministic) over wall-clock time where possible. Warm caches before benchmarking. Account for planning time in complex views. Test at production scale and concurrency, not just single-user EXPLAIN.

**Next:** In Chapter 18, we will explore Performance Tuning Playbookâ€”covering query refactoring patterns, index selection workflows, batch operations, N+1 elimination strategies, and practical checklists for slow endpoint diagnosis.