# Chapter 15: Index Fundamentals

Indexes are the primary tool for query optimization in PostgreSQL, but they come with significant trade-offs. This chapter establishes the foundational knowledge required to design, implement, and maintain B-tree indexes—the default and most common index type—while understanding their costs and limitations in production environments.

## 15.1 B-Tree Index Architecture

PostgreSQL uses Lehman-Yao B+-trees for standard indexes. Understanding their structure explains why certain queries use indexes while others cannot.

### 15.1.1 How B-Tree Indexes Work

```sql
-- Create a standard B-tree index
CREATE INDEX idx_users_email ON users(email);

-- B-tree structure visualization (conceptual):
-- Root Node (Level 2)
--   [A-M] -> Intermediate Node
--   [N-Z] -> Intermediate Node
--
-- Intermediate Node (Level 1)
--   [A-F] -> Leaf Block 1
--   [G-M] -> Leaf Block 2
--
-- Leaf Nodes (Level 0) - Doubly linked list
--   Block 1: [alice@, bob@, charlie@, david@, emma@, frank@]
--   Block 2: [george@, henry@, ...]
--   Block 3: [nancy@, oscar@, ...]

-- Key properties:
-- 1. Balanced tree: All leaf nodes at same depth (O(log n) lookup)
-- 2. Leaf nodes contain actual index entries (value + ctid pointer)
-- 3. Leaf nodes linked left-to-right (enables fast range scans)
-- 4. Non-leaf nodes contain routing information (high keys)

-- Index entry format in leaf:
-- (email_value, ctid) -> e.g., ('alice@example.com', '(0,15)')
-- ctid = (block_number, row_offset) - physical row location
```

### 15.1.2 Index Pages and Fill Factor

```sql
-- B-tree pages default to 8KB (same as heap pages)
-- Default fillfactor: 90% for B-trees (leaves room for inserts)

CREATE INDEX idx_users_email_ff ON users(email) WITH (fillfactor = 70);
-- Lower fillfactor (e.g., 70%) leaves more room for inserts
-- Reduces page splits on write-heavy tables
-- Trade-off: Index becomes larger (more pages to scan)

-- When to adjust fillfactor:
-- 1. Static tables (no updates): fillfactor = 100 (most compact)
-- 2. Heavy insert at end (sequences): fillfactor = 90 (default, good)
-- 3. Random updates (UUIDs, random strings): fillfactor = 70-80
-- 4. Heavy update workload: fillfactor = 50 (extreme, rarely needed)

-- Check index size and bloat:
SELECT
    schemaname,
    tablename,
    indexrelname as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    idx_scan as times_used,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
```

## 15.2 When to Index (and When Not To)

Indexes accelerate reads but slow down writes and consume disk space. Industry standards dictate selective indexing based on query patterns.

### 15.2.1 Index Selection Criteria

```sql
-- Cardinality rule: Index columns with high cardinality (many distinct values)
-- Low cardinality (boolean, status with few values) rarely benefits from B-tree

-- Good candidates for indexing:
-- 1. Primary keys (automatically indexed)
-- 2. Foreign keys (should be indexed for JOIN performance)
-- 3. Frequently filtered columns in WHERE clauses
-- 4. Columns used in ORDER BY (avoids sort)
-- 5. Columns used in JOIN conditions

-- Example: E-commerce order queries
-- High cardinality: order_id, email, tracking_number (good indexes)
-- Low cardinality: status (pending/shipped/delivered), is_paid (boolean)

-- Check cardinality before indexing:
SELECT 
    column_name,
    COUNT(DISTINCT column_name) as distinct_values,
    COUNT(*) as total_rows,
    ROUND(100.0 * COUNT(DISTINCT column_name) / COUNT(*), 2) as selectivity_pct
FROM orders
GROUP BY column_name;

-- Rule of thumb: 
-- If selectivity < 1%, partial index (WHERE clause) may be better
-- If selectivity < 0.1%, B-tree probably useless (consider bitmap or no index)

-- Anti-pattern: Indexing low cardinality without partial predicate
CREATE INDEX idx_orders_status ON orders(status);  -- Usually bad
-- If status has 3 values and evenly distributed, index returns 33% of table
-- Sequential scan often faster (sequential I/O vs random I/O)

-- Solution: Partial index for specific high-value status
CREATE INDEX idx_orders_pending ON orders(created_at) 
WHERE status = 'pending';
-- Only indexes pending orders (high selectivity within this subset)
-- Smaller index, faster queries for "find old pending orders"
```

### 15.2.2 The Write Amplification Cost

```sql
-- Every index adds write overhead:
-- INSERT: Write to heap + write to every index
-- UPDATE: If indexed column changed: delete from index + insert to index
-- DELETE: Mark as deleted in all indexes (cleanup by vacuum)

-- Demonstration of write cost:
-- Table with 1 index vs 5 indexes
CREATE TABLE test_insert (
    id SERIAL PRIMARY KEY,
    col1 TEXT,
    col2 TEXT,
    col3 TEXT,
    col4 TEXT,
    col5 TEXT
);

-- Test 1: Baseline (only PK index)
INSERT INTO test_insert (col1, col2, col3, col4, col5) 
SELECT md5(random()::text), md5(random()::text), 
       md5(random()::text), md5(random()::text), md5(random()::text)
FROM generate_series(1, 100000);
-- Time: ~2 seconds

-- Add 4 more indexes
CREATE INDEX idx1 ON test_insert(col1);
CREATE INDEX idx2 ON test_insert(col2);
CREATE INDEX idx3 ON test_insert(col3);
CREATE INDEX idx4 ON test_insert(col4);

-- Test 2: With 5 indexes total
TRUNCATE test_insert;
INSERT INTO test_insert (col1, col2, col3, col4, col5) 
SELECT md5(random()::text), md5(random()::text), 
       md5(random()::text), md5(random()::text), md5(random()::text)
FROM generate_series(1, 100000);
-- Time: ~6-8 seconds (3-4x slower)

-- UPDATE impact (worst case: updating indexed columns)
UPDATE test_insert SET col1 = col1 || '_updated';
-- Must delete old index entry and insert new one for every row
-- If column not indexed (e.g., col5 not indexed), heap-only tuple (HOT) update possible
-- HOT updates: No index modification if indexed columns unchanged (much faster)

-- Monitoring index bloat from updates:
SELECT
    schemaname,
    tablename,
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND pg_relation_size(indexrelid) > 1000000
ORDER BY pg_relation_size(indexrelid) DESC;
-- Finds large unused indexes (candidates for removal)
```

## 15.3 Multi-Column Indexes

Multi-column (composite) indexes can satisfy multiple query predicates, but column order determines effectiveness.

### 15.3.1 Column Ordering Rules

```sql
-- Create composite index
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

-- B-tree composite structure:
-- Leaf entries sorted by (user_id first, then created_at)
-- ('user_123', '2024-01-15'), ('user_123', '2024-01-16'), ('user_124', '2024-01-10')...

-- Query patterns that use this index:
-- Good: Equality on first column, any condition on second
SELECT * FROM orders 
WHERE user_id = 123 AND created_at > '2024-01-01';
-- Index Scan using idx_orders_user_created
-- Can use both columns efficiently

-- Good: Equality on first column only
SELECT * FROM orders WHERE user_id = 123;
-- Index Scan using idx_orders_user_created
-- Uses first column, stops at end of user_123 range

-- Partial: Range on first column, equality on second
SELECT * FROM orders 
WHERE user_id > 100 AND user_id < 200 
  AND created_at = '2024-01-01';
-- Index Scan (may use or may not depending on selectivity)
-- Can scan range of user_ids, but created_at condition is filter (not index access)

-- Bad: Only second column in predicate
SELECT * FROM orders WHERE created_at = '2024-01-01';
-- Cannot use idx_orders_user_created (created_at is second column)
-- Sequential scan or separate index on created_at needed

-- Rule: Column order should match query patterns
-- 1. Equality filters (=) first (most selective)
-- 2. Range filters (>, <, BETWEEN) second
-- 3. Columns used for sorting (ORDER BY) third
-- 4. Columns for covering (INCLUDE) last
```

### 15.3.2 Index Prefix Usage and Skip Scans

```sql
-- PostgreSQL can use partial index matches (leftmost prefix rule)
-- Index: (a, b, c)
-- Uses: (a), (a,b), (a,b,c)
-- Does not use: (b), (c), (b,c), (a,c) [without b]

-- Example with three columns:
CREATE INDEX idx_triple ON table_a(col1, col2, col3);

-- Uses index (leftmost prefix):
SELECT * FROM table_a WHERE col1 = 'x';
SELECT * FROM table_a WHERE col1 = 'x' AND col2 = 'y';
SELECT * FROM table_a WHERE col1 = 'x' AND col2 = 'y' AND col3 = 'z';

-- Does NOT use index (missing leading column):
SELECT * FROM table_a WHERE col2 = 'y';
SELECT * FROM table_a WHERE col3 = 'z';
SELECT * FROM table_a WHERE col2 = 'y' AND col3 = 'z';

-- Special case: Index condition with IN (treated as multiple equalities)
SELECT * FROM table_a WHERE col1 IN (1,2,3) AND col2 = 'x';
-- Uses index (col1 is leading, even with IN)

-- Lossy index usage (columns after first range):
CREATE INDEX idx_range ON orders(created_at, status);
SELECT * FROM orders 
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
  AND status = 'shipped';
-- Uses index on created_at (range), but status is filter condition
-- Not as efficient as (status, created_at) if status is selective
```

### 15.3.3 Multiple Indexes vs Composite Indexes

```sql
-- Option A: Two separate indexes
CREATE INDEX idx_orders_user ON orders(user_id);
CREATE INDEX idx_orders_created ON orders(created_at);

-- Option B: One composite index
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

-- Decision matrix:
-- Query pattern 1: WHERE user_id = ?
-- Both work equally well

-- Query pattern 2: WHERE created_at = ?
-- Only Option A works (Option B cannot use second column alone)

-- Query pattern 3: WHERE user_id = ? AND created_at = ?
-- Option B is better (single index lookup)
-- Option A might use BitmapAnd (combine two indexes) or pick one and filter

-- Query pattern 4: WHERE user_id = ? OR created_at = ?
-- Option A might use BitmapOr (combine two indexes)
-- Option B cannot help (different columns in OR)

-- Industry standard:
-- 1. If you always query both columns together -> Composite index
-- 2. If you query columns independently -> Separate indexes
-- 3. If mixed workload -> Consider both (separate + composite) if write load allows

-- Check actual usage with pg_stat_user_indexes:
SELECT indexrelname, idx_scan, idx_tup_read 
FROM pg_stat_user_indexes 
WHERE tablename = 'orders';
-- If one index never scanned, consider dropping it
```

## 15.4 Covering Indexes (INCLUDE)

PostgreSQL 11 introduced the `INCLUDE` clause for covering indexes, enabling index-only scans without bloating the index tree.

### 15.4.1 INCLUDE Clause Fundamentals

```sql
-- Traditional composite index (key columns):
CREATE INDEX idx_orders_user_created_old ON orders(user_id, created_at, status, total);
-- All columns are part of the index key
-- Tree structure organized by (user_id, created_at, status, total)
-- Larger index, updates to status/total require index reorganization

-- Modern covering index (INCLUDE for payload):
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at) 
INCLUDE (status, total);
-- user_id, created_at: Key columns (tree structure, ordered)
-- status, total: Included columns (payload, stored only in leaf pages)

-- Benefits:
-- 1. Smaller tree (only 2 key levels vs 4)
-- 2. Faster updates to included columns (no tree rebalancing)
-- 3. Enables index-only scans for: SELECT status, total WHERE user_id = ?

-- Index-only scan demonstration:
SELECT status, total FROM orders WHERE user_id = 123;
-- Index Only Scan using idx_orders_user_created
-- Heap Fetches: 0 (if visibility map clean)
-- No table access required!

-- Contrast with composite index:
-- If index was (user_id, created_at, status):
-- SELECT status FROM orders WHERE user_id = 123 ORDER BY created_at;
-- Uses index (good)
-- But updating status requires tree reorganization (bad)
```

### 15.4.2 INCLUDE Column Selection

```sql
-- Best practices for INCLUDE columns:
-- 1. Columns frequently selected but rarely filtered
-- 2. Columns that would enable index-only scans
-- 3. Avoid: Large text columns, JSONB, bytea (bloats index)
-- 4. Avoid: Frequently updated columns (write amplification)

-- Good example:
CREATE INDEX idx_products_category ON products(category_id) 
INCLUDE (name, price, stock_quantity);
-- Query: SELECT name, price FROM products WHERE category_id = 5;
-- Index-only scan possible
-- Updates to stock_quantity don't restructure tree (just leaf update)

-- Bad example (anti-pattern):
CREATE INDEX idx_bad ON orders(user_id) 
INCLUDE (description, notes, full_text);
-- Large text columns bloat index
-- Better to fetch these from heap (sequential read of few pages)

-- Size comparison:
-- Key-only index: Smaller tree, faster traversal
-- INCLUDE index: Larger leaves but same tree height
-- Composite index: Larger tree, slower traversal, update overhead

-- Monitoring index-only scan effectiveness:
SELECT 
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_orders_user_created';
-- High idx_scan with low idx_tup_fetch suggests good index-only scan usage
-- (idx_tup_fetch = heap fetches, should be low for covering indexes)
```

## 15.5 Index Maintenance and Bloat

Indexes suffer from bloat (dead space) due to updates and deletes. Proper maintenance ensures consistent performance.

### 15.5.1 Understanding Index Bloat

```sql
-- How bloat occurs:
-- UPDATE on indexed column: Delete old index entry, insert new one
-- Old entry becomes dead space until vacuumed
-- DELETE: Mark index entry as dead

-- Check index bloat (approximate):
SELECT
    schemaname,
    tablename,
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) as current_size,
    round(100 * (pg_relation_size(indexrelid) - 
           (pg_relation_size(indexrelid) * 
            (idx_tup_fetch::float / NULLIF(idx_tup_read, 0))
           )) / pg_relation_size(indexrelid), 2) as bloat_pct
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Better bloat check using pgstattuple extension:
CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('idx_orders_user_created');
-- Returns: tuple_count, tuple_len, dead_tuple_count, dead_tuple_len, free_space

-- When to REINDEX:
-- 1. Free space > 30% of total size
-- 2. Index size significantly larger than expected based on row count
-- 3. Query performance degradation on indexed lookups

-- REINDEX methods:
-- Method 1: CONCURRENTLY (no lock, preferred for production)
REINDEX INDEX CONCURRENTLY idx_orders_user_created;
-- Creates new index, swaps names, drops old
-- Extra disk space required temporarily (2x index size)

-- Method 2: REINDEX INDEX (exclusive lock, blocks reads/writes)
REINDEX INDEX idx_orders_user_created;

-- Method 3: Recreate with new name, drop old (manual swap)
CREATE INDEX CONCURRENTLY idx_new ON orders(user_id);
DROP INDEX CONCURRENTLY idx_old;
ALTER INDEX idx_new RENAME TO idx_old;
```

### 15.5.2 Vacuum and Index Cleanup

```sql
-- Autovacuum handles index cleanup, but aggressive updates need tuning:

-- Check if indexes need vacuum:
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public';

-- Index-only scans require clean visibility map:
-- If many heap fetches occur, visibility map is stale
-- Solution: Aggressive vacuum settings for high-churn tables:
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.1,
    autovacuum_vacuum_threshold = 50,
    autovacuum_analyze_scale_factor = 0.05
);

-- For massive tables (millions of rows), lower scale factor:
ALTER TABLE huge_table SET (
    autovacuum_vacuum_scale_factor = 0.01,  -- 1% instead of 10%
    autovacuum_vacuum_threshold = 1000
);

-- Manual vacuum for urgent cleanup:
VACUUM ANALYZE orders;  -- Cleans dead tuples, updates stats
VACUUM FULL orders;     -- Rewrites table and indexes compactly (locks table!)
REINDEX TABLE CONCURRENTLY orders;  -- Rebuild all indexes
```

## 15.6 Specialized B-Tree Features

### 15.6.1 Unique Indexes and Constraints

```sql
-- Unique constraint (preferred method):
ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);
-- Creates unique index automatically named users_email_unique

-- Explicit unique index (if you want custom name):
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial unique indexes (conditional uniqueness):
CREATE UNIQUE INDEX idx_users_active_email ON users(email) 
WHERE deleted_at IS NULL;
-- Only active users must have unique emails
-- Allows: 'alice@' deleted, new 'alice@' inserted later

-- Unique index with INCLUDE:
CREATE UNIQUE INDEX idx_users_email ON users(email) INCLUDE (created_at);
-- Enforces uniqueness on email, but covers created_at for index-only scans
-- Note: INCLUDE columns don't affect uniqueness (only key columns do)

-- Handling NULLs in unique indexes:
-- PostgreSQL considers NULL != NULL (standard SQL)
-- Multiple NULL values allowed in unique index
-- If you need to prevent multiple NULLs:
CREATE UNIQUE INDEX idx_users_phone_not_null ON users(phone) 
WHERE phone IS NOT NULL;
```

### 15.6.2 Expression Indexes and Functional Indexing

```sql
-- Index on expression (function-based index):
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Supports case-insensitive lookups efficiently:
SELECT * FROM users WHERE LOWER(email) = LOWER('Alice@Example.com');

-- Text search normalization:
CREATE INDEX idx_users_name_trgm ON users USING GIN(name gin_trgm_ops);
-- But for exact normalized matches, use expression index:
CREATE INDEX idx_users_normalized_phone ON users(REGEXP_REPLACE(phone, '[^0-9]', '', 'g'));
-- Supports: SELECT * FROM users WHERE REGEXP_REPLACE(phone, '[^0-9]', '', 'g') = '5551234567';

-- Date extraction indexes (avoid if possible, prefer range queries):
CREATE INDEX idx_orders_month ON orders(EXTRACT(MONTH FROM created_at));
-- Supports: WHERE EXTRACT(MONTH FROM created_at) = 12
-- But better approach: Index on created_at, query with ranges
-- Only use expression index if range query not possible

-- Immutable functions required:
-- Index functions must be IMMUTABLE (same input always same output)
-- NOW() is not immutable (changes over time), use CURRENT_TIMESTAMP in defaults, not indexes
-- LOWER() is immutable (good)
-- Random() is not immutable (cannot index)

-- Check function volatility:
SELECT provolatile FROM pg_proc WHERE proname = 'lower';  -- Returns 'i' (immutable)
```

## 15.7 Indexing Strategy Patterns

### 15.7.1 Foreign Key Indexing

```sql
-- Foreign keys should always be indexed (not automatic in PostgreSQL)
-- Without index on FK, DELETE on parent table requires sequential scan of child
-- Example: orders.user_id references users.user_id

-- Parent table (users) - has PK index automatically
-- Child table (orders) - needs index for FK:
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Why this matters:
-- DELETE FROM users WHERE user_id = 123;
-- PostgreSQL must check if any orders reference user 123
-- Without index: Seq Scan on orders (slow for large tables)
-- With index: Index Scan on orders (fast)

-- Composite FKs need composite indexes:
ALTER TABLE order_items 
ADD CONSTRAINT fk_order_items_order 
FOREIGN KEY (order_id, tenant_id) REFERENCES orders(order_id, tenant_id);
-- Create matching index:
CREATE INDEX idx_order_items_order_tenant ON order_items(order_id, tenant_id);
-- Order must match FK column order exactly for the check to use index efficiently
```

### 15.7.2 Indexing for Sorting (ORDER BY)

```sql
-- Indexes can avoid sort operations:
CREATE INDEX idx_orders_created_sort ON orders(created_at DESC, order_id DESC);

-- Query that benefits:
SELECT * FROM orders 
WHERE user_id = 123 
ORDER BY created_at DESC 
LIMIT 10;
-- Index Scan backwards (if index is (user_id, created_at))
-- Or Index Scan if composite matches exactly

-- Mixed sort directions:
CREATE INDEX idx_orders_mixed ON orders(user_id ASC, created_at DESC);
-- Supports: WHERE user_id = ? ORDER BY created_at DESC
-- Does not support: WHERE user_id = ? ORDER BY created_at ASC (backwards scan possible though)

-- NULL ordering:
-- Default: NULLS LAST for ASC, NULLS FIRST for DESC
-- If query uses NULLS FIRST/LAST explicitly, match index:
CREATE INDEX idx_orders_nulls ON orders(priority NULLS FIRST);
-- Supports: ORDER BY priority NULLS FIRST
```

### 15.7.3 Indexing for LIKE Patterns

```sql
-- B-tree indexes support LIKE only with prefix patterns:
CREATE INDEX idx_users_email_prefix ON users(email);

-- Works (can use index):
SELECT * FROM users WHERE email LIKE 'alice@%';
-- Range scan: email >= 'alice@' AND email < 'alice@' || chr(127)

-- Does NOT work (cannot use B-tree index):
SELECT * FROM users WHERE email LIKE '%@example.com';
SELECT * FROM users WHERE email LIKE '%alice%';

-- Solution for suffix/pattern matching: Trigram indexes (GIN/GiST)
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_users_email_trgm ON users USING GIN(email gin_trgm_ops);
-- Now supports: email LIKE '%@example.com' (efficiently)
-- Supports: email ILIKE '%ALICE%' (case insensitive)

-- Pattern matching with wildcards in middle:
-- B-tree: Not supported
-- Trigram: Supported but slower than prefix
```

## 15.8 Industry Best Practices and Anti-Patterns

### 15.8.1 Indexing Checklist for Production

```sql
-- Pre-deployment validation:

-- 1. Verify index is used (check with EXPLAIN)
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM table WHERE column = 'value';
-- Ensure Index Scan or Index Only Scan appears, not Seq Scan (unless small table)

-- 2. Check for duplicate indexes:
SELECT
    t.tablename,
    array_agg(i.indexname ORDER BY i.indexname) as indexes,
    pg_size_pretty(sum(pg_relation_size(i.indexname::regclass))) as total_size
FROM pg_indexes i
JOIN pg_tables t ON i.tablename = t.tablename
WHERE t.schemaname = 'public'
GROUP BY t.tablename, array_to_string(array_agg(i.indexname), ',')
HAVING count(*) > 1;
-- Remove redundant indexes (exact duplicates or (a) vs (a,b) where (a) is redundant)

-- 3. Validate foreign key indexes:
SELECT
    tc.table_name,
    kcu.column_name,
    ccu.table_name AS foreign_table,
    CASE 
        WHEN EXISTS (
            SELECT 1 FROM pg_indexes 
            WHERE tablename = tc.table_name 
            AND indexdef LIKE '%(' || kcu.column_name || ')%'
        ) THEN 'Indexed'
        ELSE 'MISSING INDEX'
    END as index_status
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu 
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu 
    ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
-- Ensure all FK columns have indexes

-- 4. Monitor index usage in production:
SELECT 
    indexrelname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan < 50 
  AND pg_relation_size(indexrelid) > 10000000
ORDER BY pg_relation_size(indexrelid) DESC;
-- Candidates for removal if unused for 30+ days
```

### 15.8.2 Common Indexing Mistakes

```sql
-- Mistake 1: Indexing every column
-- Creates write amplification nightmare
-- Rule: Index only columns that appear in WHERE, JOIN, ORDER BY frequently

-- Mistake 2: Indexing columns with heavy updates
-- Indexes on updated_at, last_login cause constant index churn
-- Consider if queries actually benefit vs cost

-- Mistake 3: Using indexes for small tables
-- Tables < 1000 rows usually don't benefit (sequential scan faster)
-- Exception: Unique constraints require indexes regardless

-- Mistake 4: Leading wildcard LIKE without trigram
-- idx_email with LIKE '%domain.com' won't use index
-- Must use pg_trgm extension for suffix matching

-- Mistake 5: Composite index with wrong column order
-- Index on (created_at, status) won't help WHERE status = 'x'
-- Must match query patterns: equality columns first, then range

-- Mistake 6: Ignoring partial indexes for low cardinality
-- Index on status (3 values) is wasteful
-- Partial index WHERE status = 'rare_value' is efficient

-- Mistake 7: Not using CONCURRENTLY in production
CREATE INDEX idx_name ON table(column);  -- Locks table!
-- Always use:
CREATE INDEX CONCURRENTLY idx_name ON table(column);
-- Slower (two passes) but no blocking
-- Cannot run inside transaction block

-- Mistake 8: Index bloat blindness
-- Never running REINDEX on high-churn tables
-- Schedule regular REINDEX CONCURRENTLY for heavily updated indexes
```

---

## Chapter Summary

In this chapter, you learned:

1. **B-Tree Architecture**: PostgreSQL uses B+-trees with leaf pages containing (value, ctid) pairs linked in sorted order. Tree depth remains balanced (O(log n)) for consistent lookup performance.

2. **Index Selection**: Index columns with high cardinality (many distinct values). Low cardinality columns (booleans, enums) require partial indexes (WHERE clause) to be effective. Every index adds write overhead—INSERT/UPDATE operations slow proportionally to index count.

3. **Multi-Column Indexes**: Column order follows the leftmost prefix rule. Place equality columns (=) first, range columns (>, <) second. Index can satisfy queries using the leftmost subset of columns, but not queries skipping leading columns.

4. **Covering Indexes (INCLUDE)**: Store additional columns in leaf pages only (not tree structure) to enable Index-Only Scans. Key columns determine tree organization; included columns provide payload for visibility without heap access. Reduces I/O but avoid including large text or frequently updated columns.

5. **Index Maintenance**: Indexes bloat from UPDATE/DELETE operations. Dead space accumulates until VACUUM reclaims it. Use `REINDEX INDEX CONCURRENTLY` to rebuild bloated indexes without downtime. Monitor `pg_stat_user_indexes` for usage patterns and bloat indicators.

6. **Specialized Features**: Unique indexes enforce constraints; partial unique indexes enforce conditional uniqueness (e.g., active emails only). Expression indexes support functional lookups (LOWER(email)) but require immutable functions. Foreign keys must be manually indexed for performance.

**Next:** In Chapter 16, we will explore Advanced Index Types—covering GIN and GiST for full-text search and JSONB, BRIN for large time-series tables, and specialized indexes for ranges and arrays.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='14. how_postgresql_executes_queries.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='16. advanced_index_types.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
