# Chapter 41: JSONB in Production

PostgreSQL's JSONB (binary JSON) storage bridges the gap between rigid relational schemas and flexible document models. Unlike text-based JSON storage, JSONB uses a decomposed binary format with indexing support, making it suitable for high-performance production workloads. However, misuse leads to table bloat, inefficient queries, and vacuum nightmares. This chapter establishes the decision framework, indexing strategies, and operational patterns required to deploy JSONB successfully at scale.

## 41.1 JSONB Architecture and Storage Mechanics

Understanding the physical storage model prevents costly architectural mistakes.

### 41.1.1 Binary Representation and Compression

JSONB stores data in a decomposed binary format—not as raw text. Keys are sorted, duplicates eliminated at the top level, and numeric values stored as `numeric` type (variable precision).

```sql
-- Storage comparison
CREATE TABLE json_storage_test (
    id serial PRIMARY KEY,
    json_text json,      -- Raw text storage
    json_binary jsonb    -- Decomposed binary
);

INSERT INTO json_storage_test (json_text, json_binary) 
VALUES (
    '{"b": 1, "a": 2, "a": 3}',  -- Text: preserves duplicate keys (last wins on parse)
    '{"b": 1, "a": 2, "a": 3}'   -- JSONB: sorts keys, removes duplicates, normalizes whitespace
);

-- JSONB normalizes to: {"a": 2, "b": 1} (keys sorted, first duplicate kept)

-- Size comparison
SELECT 
    pg_column_size(json_text) as text_bytes,
    pg_column_size(json_binary) as binary_bytes,
    pg_column_size('{"a": 2, "b": 1}'::json) as normalized_text
FROM json_storage_test;
-- Typically: jsonb 10-20% smaller than json due to key deduplication
-- But: jsonb has fixed overhead (varlena header) per document
```

**TOAST Behavior**:
JSONB values exceeding ~2KB are compressed and stored out-of-line in TOAST tables (see Chapter 38 for TOAST mechanics). This impacts performance:

```sql
-- Check TOAST compression ratio
SELECT 
    relname,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size,
    pg_size_pretty(pg_relation_size(relid)) as main_size,
    pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as toast_size
FROM pg_stat_user_tables
WHERE relname = 'json_storage_test';

-- Large JSONB documents in TOAST require decompression on access
-- Pattern: Keep hot data in columns, cold/variable data in JSONB
```

### 41.1.2 When JSONB vs. Normalized Columns

**Use JSONB when**:
- Schema varies per row (heterogeneous data sources, user-defined fields)
- Deep nesting represents complex objects not queried relationally (configuration blobs)
- Rapid prototyping before schema crystallization
- Storing API payloads for audit/compliance (write-once, rarely query)

**Use Normalized Tables when**:
- Fields are queried individually with high selectivity (WHERE clauses)
- Data requires foreign key constraints or complex joins
- Aggregations run across the dataset (SUM, AVG on specific fields)
- Schema is stable and relationships are well-defined

**The Hybrid Pattern** (Industry Standard):
```sql
-- Hot fields in columns, variable metadata in JSONB
CREATE TABLE products (
    product_id bigint PRIMARY KEY,
    sku text NOT NULL UNIQUE,
    price_cents int NOT NULL CHECK (price_cents > 0),
    category_id int REFERENCES categories(category_id),
    
    -- Static core: queried, indexed, constrained
    
    attributes jsonb DEFAULT '{}',
    -- Variable: color, size, material, specifications per category
    -- Not every product has the same attributes
    
    created_at timestamptz DEFAULT now()
);

-- Create GIN index on JSONB for flexible attribute filtering
CREATE INDEX idx_product_attrs ON products USING GIN (attributes);

-- But query hot fields (price) via columns, not JSONB
SELECT * FROM products 
WHERE price_cents BETWEEN 1000 AND 5000  -- Column: B-tree efficient
  AND attributes @> '{"color": "red"}';   -- JSONB: GIN index
```

## 41.2 Indexing JSONB Effectively

Full table scans of JSONB are catastrophic for performance. Strategic indexing is non-negotiable.

### 41.2.1 GIN Indexes (Generalized Inverted Index)

GIN indexes support containment (`@>`), existence (`?`), and key-existence (`?&`, `?|`) operators efficiently.

```sql
-- Default GIN index (good for general containment queries)
CREATE INDEX idx_events_data ON events USING GIN (event_data);

-- Query patterns supported:
SELECT * FROM events 
WHERE event_data @> '{"user_id": 123, "action": "purchase"}';  -- Fast: uses GIN

SELECT * FROM events 
WHERE event_data ? 'error_code';  -- Fast: checks key existence

SELECT * FROM events 
WHERE event_data ?& array['user_id', 'session_id'];  -- Has all keys
```

**GIN Index Variants**:

```sql
-- jsonb_ops (default): Supports @>, ?, ?&, ?| operators
-- Larger index, faster containment checks

CREATE INDEX idx_events_data_ops ON events 
USING GIN (event_data jsonb_ops);

-- jsonb_path_ops: Supports @> only (containment)
-- 50% smaller index, faster containment, but no key existence queries
CREATE INDEX idx_events_data_path ON events 
USING GIN (event_data jsonb_path_ops);

-- Decision tree:
-- Need ? or ?& operators? -> jsonb_ops
-- Only @> containment? -> jsonb_path_ops (preferred for space/performance)
```

**Partial GIN Indexes** (Critical for High-Write Tables):
GIN indexes are expensive to maintain. If only 10% of rows have JSONB worth indexing:

```sql
-- Only index events with metadata (nulls excluded automatically, but explicit is clearer)
CREATE INDEX idx_events_metadata ON events 
USING GIN (event_data) 
WHERE event_data IS NOT NULL AND event_data <> '{}';

-- Or index only specific event types
CREATE INDEX idx_events_errors ON events 
USING GIN (event_data) 
WHERE event_data @> '{"level": "error"}';
```

### 41.2.2 Expression Indexes (Projected Fields)

When querying specific JSONB paths frequently, extract them into expression indexes with B-trees for range queries and exact matching.

```sql
-- Extracting nested values for indexing
CREATE INDEX idx_events_user_id ON events 
USING BTREE ((event_data ->> 'user_id'));  -- Note: ->> returns text

-- For numeric comparisons, cast in index
CREATE INDEX idx_events_amount ON events 
USING BTREE (((event_data ->> 'amount')::numeric));

-- Query using exact expression match
SELECT * FROM events 
WHERE (event_data ->> 'user_id')::bigint = 12345;
-- Uses idx_events_user_id efficiently

-- Compound index with column + JSONB extraction
CREATE INDEX idx_events_time_user ON events 
USING BTREE (created_at, ((event_data ->> 'user_id')::bigint));
```

**Computed Columns** (PostgreSQL 12+):
Store extracted values in generated columns for cleaner indexing and constraints:

```sql
ALTER TABLE events 
ADD COLUMN user_id bigint 
GENERATED ALWAYS AS ((event_data ->> 'user_id')::bigint) STORED;

-- Now index the column directly (more efficient than expression index)
CREATE INDEX idx_events_user ON events (user_id);

-- Can even create foreign keys on extracted data (if trusted/clean)
-- ALTER TABLE events ADD CONSTRAINT fk_user 
--   FOREIGN KEY (user_id) REFERENCES users(user_id);
```

### 41.2.3 Index Anti-Patterns

```sql
-- ANTI-PATTERN: Indexing large JSONB documents
CREATE INDEX idx_bad ON large_table USING GIN (huge_jsonb_column);
-- 10MB JSONB documents create massive index entries
-- Solution: Extract keys you query, index those specifically

-- ANTI-PATTERN: GIN index on high-churn JSONB
UPDATE events SET event_data = event_data || '{"count": 1}';
-- Every update touches every GIN index entry for that row
-- GIN indexes on frequently updated JSONB cause severe bloat

-- MITIGATION: 
-- 1. Use partial index (WHERE clause) to exclude updated rows
-- 2. Store volatile counters in separate columns, not JSONB
-- 3. Use HOT updates (fillfactor, keep JSONB unchanged)

-- ANTI-PATTERN: Leading wildcards in JSONB strings
SELECT * FROM events WHERE event_data ->> 'message' LIKE '%error%';
-- No index can help (seq scan required)
-- Solution: Use pg_trgm extension for text search within JSONB
CREATE INDEX idx_events_msg_trgm ON events 
USING GIN ((event_data ->> 'message') gin_trgm_ops);
```

## 41.3 Query Patterns and Operators

PostgreSQL provides extensive JSONB operators with varying performance characteristics.

### 41.3.1 Navigation Operators

```sql
-- ->  Returns JSONB (can chain)
SELECT event_data -> 'user' -> 'profile' ->> 'name' FROM events;
-- event_data -> 'user' returns JSONB object
-- ->> 'name' returns text

-- #>  Path extraction (array notation)
SELECT event_data #> '{user,profile,name}' FROM events;
-- Equivalent to -> chain, cleaner for deep paths

-- #>> Returns text
SELECT event_data #>> '{user,profile,email}' FROM events;
```

**Performance Note**:
Extracting deeply nested values requires parsing the JSONB structure at each level. For hot paths, flatten to top-level keys or use computed columns.

### 41.3.2 Containment and Existence

```sql
-- @>  Contains: Does left JSONB contain the right JSONB structure?
SELECT * FROM events 
WHERE event_data @> '{"status": "completed", "payment_method": "card"}';
-- Order of keys doesn't matter
-- Matches: {"status": "completed", "payment_method": "card", "amount": 100}
-- Matches: {"payment_method": "card", "status": "completed"}
-- Does not match: {"status": "completed"} (missing payment_method)

-- <@  Contained by (reverse)
SELECT * FROM events 
WHERE '{"user_id": 123}' <@ event_data;
-- Find events that contain user_id: 123 anywhere in structure

-- ?   Key/element existence
WHERE event_data ? 'error_trace'  -- Top-level key exists

-- ?|  Any key exists (OR)
WHERE event_data ?| array['error', 'warning', 'critical']

-- ?&  All keys exist (AND)
WHERE event_data ?& array['user_id', 'session_id', 'timestamp']
```

**Containment Index Usage**:
`@>` operator utilizes GIN indexes efficiently. This is the primary query pattern for document retrieval.

### 41.3.3 Aggregation and Unnesting

Processing arrays within JSONB:

```sql
-- jsonb_array_elements: Expands array to set (one row per element)
SELECT 
    event_id,
    jsonb_array_elements(event_data -> 'items') ->> 'sku' as sku,
    (jsonb_array_elements(event_data -> 'items') ->> 'qty')::int as quantity
FROM orders 
WHERE event_data @> '{"status": "confirmed"}';

-- Aggregation across JSONB arrays
SELECT 
    event_data ->> 'category' as category,
    SUM((elem ->> 'amount')::numeric) as total
FROM events,
LATERAL jsonb_array_elements(event_data -> 'line_items') as elem
GROUP BY event_data ->> 'category';
-- Note: LATERAL join required to reference event_data in the function
```

**Performance Warning**:
`jsonb_array_elements` is a set-returning function that materializes the entire array. For large arrays (>1000 elements), consider normalizing to a separate table or using PL/pgSQL for batch processing.

## 41.4 Mutation Strategies and Bloat Prevention

JSONB updates are row replacements, not in-place edits. This creates bloat and vacuum pressure.

### 41.4.1 The Update Cost Reality

```sql
-- JSONB "partial update" is actually:
-- 1. Read entire row (including TOAST if compressed)
-- 2. Decompress JSONB
-- 3. Modify in memory
-- 4. Compress new JSONB
-- 5. Write new row version (dead tuple created)
-- 6. Update all indexes (including GIN)

UPDATE events 
SET event_data = jsonb_set(
    event_data, 
    '{count}', 
    ((event_data ->> 'count')::int + 1)::text::jsonb
)
WHERE event_id = 123;
-- This creates dead tuples and WAL proportional to entire JSONB size, not just the key
```

**Mitigation Strategies**:

1. **Separate Volatile Fields**:
```sql
-- Instead of updating JSONB counter
ALTER TABLE events ADD COLUMN view_count int DEFAULT 0;
UPDATE events SET view_count = view_count + 1 WHERE event_id = 123;
-- HOT update possible if indexed properly, minimal WAL
```

2. **Accumulate then Batch**:
```sql
-- Write deltas to separate table, apply periodically
CREATE TABLE event_counters (
    event_id bigint PRIMARY KEY REFERENCES events(event_id),
    increment int DEFAULT 0
);

-- High-speed inserts/updates (smaller row, less bloat)
INSERT INTO event_counters (event_id, increment) 
VALUES (123, 1)
ON CONFLICT (event_id) DO UPDATE SET increment = event_counters.increment + 1;

-- Periodic merge (every 5 minutes)
UPDATE events e
SET event_data = jsonb_set(
    e.event_data, 
    '{view_count}', 
    (COALESCE(e.event_data->>'view_count','0')::int + c.increment)::text::jsonb
)
FROM event_counters c
WHERE e.event_id = c.event_id;
```

3. **jsonb_set vs jsonb_insert vs concatenation**:
```sql
-- jsonb_set: Updates or inserts key (creates new object)
jsonb_set('{"a":1}', '{b}', '2')  -> {"a": 1, "b": 2}

-- jsonb_insert: Inserts into array at specific position
jsonb_insert('{"a":[1,3]}', '{a,1}', '2')  -> {"a": [1, 2, 3]}

-- || concatenation: Merges objects (last wins on conflicts)
'{"a":1}'::jsonb || '{"b":2}'::jsonb  -> {"a": 1, "b": 2}
'{"a":1}'::jsonb || '{"a":2}'::jsonb  -> {"a": 2}  -- Overwrites
```

### 41.4.2 Schema Evolution in JSONB

Unlike rigid columns, JSONB accommodates schema drift but requires migration discipline.

**Versioning Pattern**:
```sql
-- Track schema version within document
ALTER TABLE events ADD COLUMN schema_version int DEFAULT 1;

-- Migration logic in application or trigger
CREATE OR REPLACE FUNCTION migrate_event_data()
RETURNS trigger AS $$
BEGIN
    IF NEW.event_data ->> 'version' IS NULL THEN
        -- Migrate v0 to v1: Flatten nested user object
        NEW.event_data = jsonb_set(
            NEW.event_data,
            '{user_id}',
            NEW.event_data #> '{user,id}'
        ) - 'user' || '{"version": "1"}';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER event_migration
    BEFORE INSERT OR UPDATE ON events
    FOR EACH ROW
    EXECUTE FUNCTION migrate_event_data();
```

**Constraint Validation** (Schema Enforcement):
```sql
-- Enforce required top-level keys
ALTER TABLE events 
ADD CONSTRAINT check_event_structure 
CHECK (
    event_data ? 'event_type' AND 
    event_data ? 'timestamp' AND
    jsonb_typeof(event_data -> 'payload') = 'object'
);

-- Validate data types (example: ensure amount is numeric)
CREATE OR REPLACE FUNCTION validate_event()
RETURNS trigger AS $$
BEGIN
    IF NEW.event_data ->> 'amount' IS NOT NULL THEN
        PERFORM (NEW.event_data ->> 'amount')::numeric;
    END IF;
    RETURN NEW;
EXCEPTION WHEN invalid_text_representation THEN
    RAISE EXCEPTION 'Amount must be numeric';
END;
$$ LANGUAGE plpgsql;
```

## 41.5 Migration Patterns

### 41.5.1 JSONB to Normalized Tables (Extraction)

When JSONB grows too large or query patterns stabilize:

```sql
-- Step 1: Add columns
ALTER TABLE events 
ADD COLUMN user_id bigint,
ADD COLUMN event_type text;

-- Step 2: Backfill in batches (non-blocking)
DO $$
DECLARE
    batch_size int := 1000;
    rows_updated int;
BEGIN
    LOOP
        WITH batch AS (
            SELECT event_id, event_data 
            FROM events 
            WHERE user_id IS NULL  -- Unmigrated
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED
        )
        UPDATE events e
        SET 
            user_id = (b.event_data ->> 'user_id')::bigint,
            event_type = b.event_data ->> 'type'
        FROM batch b
        WHERE e.event_id = b.event_id;
        
        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;
        COMMIT;
    END LOOP;
END $$;

-- Step 3: Create indexes on new columns
CREATE INDEX idx_events_user ON events (user_id);

-- Step 4: Dual-write in application (write to both JSONB and columns)
-- Step 5: Switch reads to columns
-- Step 6: Eventually remove JSONB extraction triggers
```

### 41.5.2 Normalized to JSONB (Consolidation)

For truly dynamic attributes that resist normalization:

```sql
-- Migrate variable attributes to JSONB
ALTER TABLE products 
ADD COLUMN attributes jsonb DEFAULT '{}';

UPDATE products 
SET attributes = jsonb_build_object(
    'color', color,
    'size', size,
    'material', material
) || COALESCE(attributes, '{}');

-- Drop old columns after verification
ALTER TABLE products DROP COLUMN color, DROP COLUMN size, DROP COLUMN material;
```

## Chapter Summary

In this chapter, you learned:

1. **Storage Architecture**: JSONB uses decomposed binary format with compression via TOAST. It eliminates duplicate keys and sorts them, typically achieving 10-20% better compression than text JSON. Large documents (>2KB) are TOASTed, impacting read performance.

2. **Modeling Strategy**: Use hybrid approaches—store frequently queried, constrained fields in traditional columns; use JSONB for variable schema attributes, audit payloads, and heterogeneous metadata. Avoid storing volatile counters or frequently updated values in JSONB due to update costs.

3. **Indexing**: Use GIN indexes with `jsonb_path_ops` for containment queries (`@>`) to save 50% index size. Use expression indexes (`((data->>'key')::type)`) for range queries and exact matches on extracted values. Partial GIN indexes prevent indexing overhead on rows without JSONB data.

4. **Query Optimization**: Leverage containment operators (`@>`) for document lookup; they utilize GIN indexes efficiently. Use `jsonb_array_elements` sparingly on large arrays (consider normalization instead). Extract hot paths to computed/generated columns for B-tree indexing.

5. **Mutation Costs**: JSONB updates rewrite the entire row and all GIN indexes, creating bloat. Mitigate by separating volatile fields into regular columns, batching updates via delta tables, or using HOT-update-friendly patterns (keep JSONB immutable once written).

6. **Schema Evolution**: Implement versioning within JSONB documents and migration triggers for gradual schema upgrades. Use CHECK constraints and validation triggers to enforce structural integrity when strict typing is required.

7. **Migration Paths**: Extract JSONB to columns using batched backfills with `FOR UPDATE SKIP LOCKED` to avoid long transactions. Consolidate columns to JSONB for truly dynamic attributes, but maintain indexes on critical extracted fields for performance.

---

**Next:** In Chapter 42, we will examine Full-Text Search—covering the `tsvector` and `tsquery` types, text search configurations and dictionaries, ranking algorithms, highlighting search results, and architectural decisions between PostgreSQL FTS versus dedicated search engines like Elasticsearch.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../10. operations_and_observability_dev_sre_friendly/40. performance_under_production_constraints.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='42. full_text_search.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
