# Chapter 10: Data Modeling for OLTP Systems

Online Transaction Processing (OLTP) systems demand data models that prioritize data integrity, concurrency, and query performance for operational workloads. This chapter establishes industry-standard patterns for designing PostgreSQL schemas that scale from startup to enterprise, balancing theoretical normalization with pragmatic denormalization where appropriate.

## 10.1 Normalization Fundamentals

### 10.1.1 First Normal Form (1NF): Atomicity and Repeating Groups

1NF eliminates repeating groups and ensures atomic values—critical for PostgreSQL's strict type system and indexing capabilities.

```sql
-- Anti-pattern: Violating 1NF with comma-separated values or arrays in OLTP
CREATE TABLE orders_violating_1nf (
    order_id SERIAL PRIMARY KEY,
    customer_email VARCHAR(255),
    product_skus VARCHAR(1000),  -- 'SKU-001,SKU-002,SKU-003' - Violation!
    quantities VARCHAR(255)      -- '2,1,5' - Parallel arrays, violation!
);

-- Problems:
-- 1. Cannot index individual SKUs for product-based queries
-- 2. Cannot enforce foreign key constraints to products table
-- 3. Quantities are strings, not integers (type safety lost)
-- 4. Querying "orders containing SKU-001" requires full table scan + regex

-- 1NF Compliant: Separate table for line items
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_email VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE order_items (
    order_item_id BIGSERIAL PRIMARY KEY,
    order_id BIGINT NOT NULL REFERENCES orders(order_id) ON DELETE CASCADE,
    product_sku VARCHAR(50) NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price_cents INTEGER NOT NULL,
    
    UNIQUE(order_id, product_sku)  -- Prevent duplicate line items
);

-- Create composite index for order retrieval
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- Foreign key automatically creates index, but explicit declaration preferred

-- Querying is now efficient and type-safe
SELECT o.order_id, o.customer_email, oi.product_sku, oi.quantity
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE oi.product_sku = 'SKU-001';  -- Uses index on product_sku
```

### 10.1.2 Second Normal Form (2NF): Eliminating Partial Dependencies

2NF requires that non-key attributes depend on the **entire** primary key, not just part of it. This primarily applies to composite keys.

```sql
-- Anti-pattern: Partial dependency (violates 2NF)
CREATE TABLE order_item_suppliers (
    order_id BIGINT,
    product_sku VARCHAR(50),
    supplier_name VARCHAR(255),    -- Depends only on product_sku, not order_id+sku
    supplier_address TEXT,         -- Also depends only on product_sku
    quantity INTEGER,
    PRIMARY KEY (order_id, product_sku)
);
-- supplier_name depends only on product_sku (part of the key), not the full key

-- 2NF Compliant: Separate supplier information
CREATE TABLE products (
    product_sku VARCHAR(50) PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    supplier_id BIGINT REFERENCES suppliers(supplier_id),
    -- other product-specific fields
);

CREATE TABLE suppliers (
    supplier_id BIGSERIAL PRIMARY KEY,
    supplier_name VARCHAR(255) NOT NULL,
    supplier_address TEXT
);

CREATE TABLE order_items_2nf (
    order_id BIGINT REFERENCES orders(order_id),
    product_sku VARCHAR(50) REFERENCES products(product_sku),
    quantity INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_sku)
);
-- Now supplier info is stored once per product, not duplicated per order line
```

### 10.1.3 Third Normal Form (3NF): Eliminating Transitive Dependencies

3NF requires that non-key attributes depend **only** on the primary key, not on other non-key attributes.

```sql
-- Anti-pattern: Transitive dependency (violates 3NF)
CREATE TABLE employees_violating_3nf (
    employee_id SERIAL PRIMARY KEY,
    employee_name VARCHAR(255),
    department_id INTEGER,
    department_name VARCHAR(255),  -- Depends on department_id, not employee_id
    department_location VARCHAR(255)  -- Also depends on department_id
);
-- department_name depends on department_id (non-key), which depends on employee_id

-- 3NF Compliant
CREATE TABLE departments (
    department_id SERIAL PRIMARY KEY,
    department_name VARCHAR(255) NOT NULL,
    department_location VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE employees (
    employee_id BIGSERIAL PRIMARY KEY,
    employee_name VARCHAR(255) NOT NULL,
    department_id INTEGER REFERENCES departments(department_id),
    hire_date DATE NOT NULL,
    -- Only attributes directly describing the employee
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Benefits:
-- 1. Department info stored once (storage efficiency)
-- 2. Update department name in one place (update anomaly prevention)
-- 3. Delete last employee doesn't accidentally delete department info
```

### 10.1.4 Boyce-Codd Normal Form (BCNF): Handling Anomalies

BCNF is a stricter variant of 3NF. A table is in BCNF if every determinant is a candidate key.

```sql
-- Example: Student-Course-Professor relationship violating BCNF
CREATE TABLE student_courses (
    student_id BIGINT,
    course_id BIGINT,
    professor_id BIGINT,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    UNIQUE (course_id, professor_id)  -- Each course has one professor
);
-- Functional dependency: course_id -> professor_id (determinant is not candidate key)

-- Problem: If we update professor for course_id=101, must update all rows
-- If we delete last student from course, we lose professor assignment info

-- BCNF Compliant: Decompose
CREATE TABLE course_professors (
    course_id BIGINT PRIMARY KEY,
    professor_id BIGINT NOT NULL REFERENCES professors(professor_id)
);

CREATE TABLE student_enrollments (
    student_id BIGINT REFERENCES students(student_id),
    course_id BIGINT REFERENCES course_professors(course_id),
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id)
);
-- Now professor assignment is independent of student enrollment
```

## 10.2 Pragmatic Denormalization

### 10.2.1 When to Denormalize (Calculated Redundancy)

Normalization reduces redundancy but increases join complexity. Strategic denormalization improves read performance for hot paths.

```sql
-- Scenario: Order totals queried frequently, but recalculating sum joins 3 tables
-- Normalized approach (correct but slow for high-traffic order history)
SELECT 
    o.order_id,
    o.customer_email,
    SUM(oi.quantity * oi.unit_price_cents) as total_cents,
    COUNT(DISTINCT oi.product_sku) as item_count
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_email = 'user@example.com'
GROUP BY o.order_id;

-- Pragmatic denormalization: Store computed totals (OLTP cache pattern)
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_email VARCHAR(255) NOT NULL,
    total_cents INTEGER NOT NULL DEFAULT 0,  -- Denormalized
    item_count INTEGER NOT NULL DEFAULT 0,    -- Denormalized
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    CONSTRAINT positive_total CHECK (total_cents >= 0),
    CONSTRAINT positive_items CHECK (item_count >= 0)
);

-- Maintain consistency via triggers (transactional integrity)
CREATE OR REPLACE FUNCTION update_order_totals()
RETURNS TRIGGER AS $$
BEGIN
    -- Recalculate totals whenever line items change
    UPDATE orders 
    SET 
        total_cents = (
            SELECT COALESCE(SUM(quantity * unit_price_cents), 0)
            FROM order_items
            WHERE order_id = COALESCE(NEW.order_id, OLD.order_id)
        ),
        item_count = (
            SELECT COUNT(*)
            FROM order_items
            WHERE order_id = COALESCE(NEW.order_id, OLD.order_id)
        )
    WHERE order_id = COALESCE(NEW.order_id, OLD.order_id);
    
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_update_order_totals
AFTER INSERT OR UPDATE OR DELETE ON order_items
FOR EACH ROW
EXECUTE FUNCTION update_order_totals();

-- Rules for denormalization:
-- 1. Only denormalize "down" the hierarchy (parent stores child aggregates)
-- 2. Never store mutable data in multiple places without triggers/constraints
-- 3. Document denormalized fields as "derived" in schema comments
COMMENT ON COLUMN orders.total_cents IS 'DERIVED: Sum of line items. Maintained by trigger.';
```

### 10.2.2 Materialized View Strategy (Read-Heavy Workloads)

For complex aggregations that don't need real-time accuracy, materialized views provide controlled denormalization.

```sql
-- Complex report query joining 5 tables
CREATE MATERIALIZED VIEW mv_monthly_sales_summary AS
SELECT 
    DATE_TRUNC('month', o.created_at) as month,
    p.category_id,
    c.category_name,
    COUNT(DISTINCT o.order_id) as order_count,
    SUM(oi.quantity) as units_sold,
    SUM(oi.quantity * oi.unit_price_cents) as revenue_cents,
    AVG(oi.quantity * oi.unit_price_cents) as avg_order_value_cents
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_sku = p.product_sku
JOIN categories c ON p.category_id = c.category_id
WHERE o.status = 'completed'
GROUP BY 1, 2, 3;

-- Create unique index for fast refresh
CREATE UNIQUE INDEX idx_mv_monthly_summary_unique 
ON mv_monthly_sales_summary(month, category_id);

-- Refresh strategy (scheduled via pg_cron or application)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_sales_summary;
-- CONCURRENTLY allows reads during refresh (requires unique index)

-- Access pattern (treat as table)
SELECT * FROM mv_monthly_sales_summary 
WHERE month >= DATE_TRUNC('month', NOW() - INTERVAL '3 months');
```

## 10.3 Key Selection Strategies

### 10.3.1 Natural vs Surrogate Keys

The choice between meaningful business keys (natural) and artificial identifiers (surrogate) impacts schema flexibility and join performance.

```sql
-- Natural Key: SSN, Email, ISBN (has business meaning)
CREATE TABLE citizens (
    ssn VARCHAR(11) PRIMARY KEY,  -- Natural key (format: XXX-XX-XXXX)
    full_name VARCHAR(255) NOT NULL
);
-- Pros: No additional index needed, meaningful in URLs/debugging
-- Cons: Can change (SSN reissued, email changes), exposes PII, long/composite

-- Surrogate Key: Auto-increment or UUID (no business meaning)
CREATE TABLE citizens_surrogate (
    citizen_id BIGSERIAL PRIMARY KEY,  -- Surrogate
    ssn VARCHAR(11) UNIQUE NOT NULL,   -- Still enforce uniqueness, but not PK
    full_name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE
);
-- Pros: Immutable, stable references, short (4-8 bytes), no PII exposure
-- Cons: Extra index storage, meaningless to humans

-- Industry Standard: Surrogate keys for internal references, unique constraints on natural keys
CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,        -- Internal reference
    sku VARCHAR(50) UNIQUE NOT NULL,         -- Business natural key
    upc VARCHAR(12) UNIQUE,                  -- External natural key
    name VARCHAR(255) NOT NULL
);
-- Foreign keys reference product_id (stable, never changes even if SKU changes)
```

### 10.3.2 Sequential vs UUID Primary Keys

PostgreSQL-specific considerations for key generation strategies.

```sql
-- BigSerial (Sequential): Best for OLTP, clustered data, foreign key performance
CREATE TABLE events_sequential (
    event_id BIGSERIAL PRIMARY KEY,  -- 8 bytes
    event_type VARCHAR(50) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Pros: Compact (8 bytes), ordered inserts (clustering), human-readable
-- Cons: Predictable (security), collision risk in distributed systems, write hotspot

-- UUID v4 (Random): Best for distributed systems, merge safety
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE events_uuid (
    event_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),  -- 16 bytes
    event_type VARCHAR(50) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Pros: Globally unique, merge-safe across databases, non-predictable
-- Cons: 2x storage, random insert pattern (page splits), poor cache locality

-- UUID v7 (Time-ordered): Best of both worlds (PostgreSQL 16+ with uuid-ossp 1.1+, or generate in app)
-- Combines timestamp prefix with random suffix for ordering while maintaining uniqueness
-- Example generation in application layer (recommended for UUID v7)
-- Stores as UUID type but sorts chronologically

-- Sequential with distributed safety (Snowflake-style)
-- Use bigint with application-generated IDs combining timestamp + sequence + node
-- Or use pg_generate_series with step sizes per shard

-- Index considerations
-- Sequential IDs: Automatically clustered, efficient range scans
-- UUIDs: Consider BRIN index for time-series on created_at, B-tree on PK
CREATE INDEX idx_events_uuid_created_at ON events_uuid(created_at) 
WHERE created_at > NOW() - INTERVAL '7 days';  -- Partial index for recent data
```

### 10.3.3 Composite Keys for Junction Tables

Many-to-many relationships require thoughtful primary key design.

```sql
-- Pure junction table (just linking)
CREATE TABLE product_tags (
    product_id BIGINT REFERENCES products(product_id) ON DELETE CASCADE,
    tag_id BIGINT REFERENCES tags(tag_id) ON DELETE CASCADE,
    
    PRIMARY KEY (product_id, tag_id)  -- Composite natural key
);
-- Pros: No extra index needed for FK validation, prevents duplicates
-- Cons: 16 bytes per PK (8+8), referencing this table requires both columns

-- Junction with payload (attributes on the relationship)
CREATE TABLE employee_projects (
    employee_id BIGINT REFERENCES employees(employee_id) ON DELETE CASCADE,
    project_id BIGINT REFERENCES projects(project_id) ON DELETE CASCADE,
    role_on_project VARCHAR(100) NOT NULL,
    joined_date DATE NOT NULL,
    allocation_percent INTEGER DEFAULT 100,
    
    PRIMARY KEY (employee_id, project_id),
    -- Additional index for reverse lookup
    CONSTRAINT valid_allocation CHECK (allocation_percent BETWEEN 0 AND 100)
);

CREATE INDEX idx_employee_projects_project ON employee_projects(project_id);
-- Needed for "find all employees on project X" queries

-- Surrogate key alternative (if relationship has its own lifecycle)
CREATE TABLE enrollments (
    enrollment_id BIGSERIAL PRIMARY KEY,
    student_id BIGINT REFERENCES students(student_id),
    course_id BIGINT REFERENCES courses(course_id),
    enrolled_at TIMESTAMPTZ DEFAULT NOW(),
    grade VARCHAR(2),
    
    UNIQUE(student_id, course_id)  -- Still enforce uniqueness
);
-- Use when: Relationship has many attributes, history tracking (soft delete), 
-- or when other tables need to FK to this relationship
```

## 10.4 Many-to-Many Relationships and Association Metadata

### 10.4.1 Rich Association Tables

Real-world many-to-many relationships almost always carry metadata about the association.

```sql
-- E-commerce: Orders and Products (line items = rich association)
CREATE TABLE order_items (
    order_item_id BIGSERIAL PRIMARY KEY,  -- Surrogate for flexibility
    order_id BIGINT NOT NULL REFERENCES orders(order_id) ON DELETE CASCADE,
    product_sku VARCHAR(50) NOT NULL REFERENCES products(product_sku),
    
    -- Association metadata
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price_cents INTEGER NOT NULL,  -- Snapshot at time of order
    discount_cents INTEGER DEFAULT 0 CHECK (discount_cents >= 0),
    fulfillment_status VARCHAR(20) DEFAULT 'pending',
    shipped_at TIMESTAMPTZ,
    
    UNIQUE(order_id, product_sku)  -- Prevent duplicate products in same order
);

-- Indexes for common access patterns
CREATE INDEX idx_order_items_order ON order_items(order_id);
CREATE INDEX idx_order_items_product ON order_items(product_sku);
CREATE INDEX idx_order_items_fulfillment ON order_items(fulfillment_status) 
WHERE fulfillment_status != 'delivered';  -- Partial index for pending items

-- Social network: Users following Users (self-referential many-to-many)
CREATE TABLE follows (
    follower_id BIGINT REFERENCES users(user_id) ON DELETE CASCADE,
    following_id BIGINT REFERENCES users(user_id) ON DELETE CASCADE,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    notification_enabled BOOLEAN DEFAULT TRUE,
    
    PRIMARY KEY (follower_id, following_id),
    CONSTRAINT no_self_follow CHECK (follower_id != following_id)
);

-- Bidirectional queries need both indexes
CREATE INDEX idx_follows_following ON follows(following_id, created_at);
-- "Who follows me?" uses this index
```

### 10.4.2 Hierarchical Data (Adjacency List vs Closure Table)

Modeling tree structures efficiently in PostgreSQL.

```sql
-- Adjacency List (simple, but recursive queries needed for depth)
CREATE TABLE categories_adjacency (
    category_id SERIAL PRIMARY KEY,
    parent_id INTEGER REFERENCES categories_adjacency(category_id),
    name VARCHAR(255) NOT NULL,
    path VARCHAR(255) GENERATED ALWAYS AS (
        COALESCE(
            (SELECT path FROM categories_adjacency WHERE category_id = parent_id) || '/' || name,
            name
        )
    ) STORED  -- Materialized path for display
);

-- Query children (fast)
SELECT * FROM categories_adjacency WHERE parent_id = 5;

-- Query all descendants (requires recursive CTE - see Chapter 9)
WITH RECURSIVE descendants AS (
    SELECT * FROM categories_adjacency WHERE category_id = 5
    UNION ALL
    SELECT c.* FROM categories_adjacency c
    JOIN descendants d ON c.parent_id = d.category_id
)
SELECT * FROM descendants;

-- Closure Table (better for frequent subtree queries, complex maintenance)
CREATE TABLE categories_closure (
    ancestor_id INTEGER REFERENCES categories(category_id),
    descendant_id INTEGER REFERENCES categories(category_id),
    depth INTEGER NOT NULL CHECK (depth >= 0),
    
    PRIMARY KEY (ancestor_id, descendant_id)
);

-- All ancestors of node 10 (immediate parent is depth 1)
SELECT ancestor_id FROM categories_closure WHERE descendant_id = 10 ORDER BY depth DESC;

-- All descendants of node 1 (root)
SELECT descendant_id FROM categories_closure WHERE ancestor_id = 1 AND depth > 0;

-- Inserting new node requires adding all ancestor relationships
-- More complex maintenance but O(1) subtree queries vs O(n) recursive
```

## 10.5 Soft Deletes and Temporal Patterns

### 10.5.1 Soft Delete Implementation Strategies

Hard deletes destroy data; soft deletes preserve history while hiding "deleted" records from standard queries.

```sql
-- Standard soft delete columns
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;
ALTER TABLE users ADD COLUMN deleted_by BIGINT REFERENCES users(user_id);
ALTER TABLE users ADD COLUMN is_deleted BOOLEAN GENERATED ALWAYS AS (deleted_at IS NOT NULL) STORED;

-- Partial unique index for business keys (allow reuse after delete)
CREATE UNIQUE INDEX idx_users_email_active ON users(email) 
WHERE deleted_at IS NULL;

-- View to hide deleted records (application uses this, not table directly)
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;

-- Soft delete function (encapsulates logic)
CREATE OR REPLACE FUNCTION soft_delete_user(target_user_id BIGINT, deleted_by_user_id BIGINT)
RETURNS VOID AS $$
BEGIN
    UPDATE users 
    SET deleted_at = NOW(), 
        deleted_by = deleted_by_user_id
    WHERE user_id = target_user_id 
      AND deleted_at IS NULL;  -- Idempotent: only if not already deleted
    
    -- Cascade soft delete to related data
    UPDATE user_sessions 
    SET expired_at = NOW() 
    WHERE user_id = target_user_id 
      AND expired_at IS NULL;
END;
$$ LANGUAGE plpgsql;

-- Hard delete (only after retention period)
DELETE FROM users 
WHERE deleted_at IS NOT NULL 
  AND deleted_at < NOW() - INTERVAL '90 days';

-- Querying with deleted (admin purposes)
SELECT * FROM users 
WHERE deleted_at BETWEEN '2024-01-01' AND '2024-01-31';
```

### 10.5.2 Temporal Tables (System-Versioned)

PostgreSQL doesn't have native system-versioned tables, but temporal patterns can be implemented.

```sql
-- Current data table
CREATE TABLE accounts (
    account_id BIGSERIAL PRIMARY KEY,
    balance_cents INTEGER NOT NULL,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- History table (parquet-like structure)
CREATE TABLE accounts_history (
    account_id BIGINT NOT NULL,
    balance_cents INTEGER NOT NULL,
    valid_from TIMESTAMPTZ NOT NULL,
    valid_to TIMESTAMPTZ,  -- NULL means current version
    changed_by VARCHAR(100),
    
    PRIMARY KEY (account_id, valid_from)
);

-- Trigger to maintain history
CREATE OR REPLACE FUNCTION accounts_temporal_trigger()
RETURNS TRIGGER AS $$
BEGIN
    -- Close previous version
    UPDATE accounts_history 
    SET valid_to = NOW()
    WHERE account_id = OLD.account_id 
      AND valid_to IS NULL;
    
    -- Insert new version
    INSERT INTO accounts_history (account_id, balance_cents, valid_from, changed_by)
    VALUES (NEW.account_id, NEW.balance_cents, NOW(), current_user);
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_accounts_history
AFTER UPDATE ON accounts
FOR EACH ROW
WHEN (OLD.balance_cents IS DISTINCT FROM NEW.balance_cents)
EXECUTE FUNCTION accounts_temporal_trigger();

-- Query as of specific time
SELECT * FROM accounts_history
WHERE account_id = 123 
  AND valid_from <= '2024-01-15 10:00:00'::timestamptz
  AND (valid_to IS NULL OR valid_to > '2024-01-15 10:00:00'::timestamptz);

-- Range queries (using tstzrange)
CREATE TABLE accounts_history_range (
    account_id BIGINT,
    balance_cents INTEGER,
    validity_period TSTZRANGE,
    EXCLUDE USING GIST (account_id WITH =, validity_period WITH &&)
);
-- Exclusion constraint prevents overlapping periods for same account
```

## 10.6 Audit Fields and Change Tracking

### 10.6.1 Standard Audit Columns

Every OLTP table should track creation and modification metadata.

```sql
-- Template for audit columns
CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(255) NOT NULL,
    price_cents INTEGER NOT NULL,
    
    -- Audit columns (industry standard naming)
    created_at TIMESTAMPTZ DEFAULT NOW() NOT NULL,
    created_by VARCHAR(100) DEFAULT current_user NOT NULL,
    updated_at TIMESTAMPTZ DEFAULT NOW() NOT NULL,
    updated_by VARCHAR(100) DEFAULT current_user NOT NULL,
    
    -- Version for optimistic locking
    version INTEGER DEFAULT 1 NOT NULL
);

-- Auto-update timestamp and version
CREATE OR REPLACE FUNCTION update_audit_fields()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    NEW.updated_by = current_user;
    NEW.version = OLD.version + 1;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_products_audit
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION update_audit_fields();

-- Application context for user tracking (better than current_user)
-- SET LOCAL app.current_user_id = '12345';
-- Then in trigger: current_setting('app.current_user_id', true)
```

### 10.6.2 Change Data Capture (CDC) with Triggers

For compliance requirements (HIPAA, SOX, GDPR), detailed change logs are necessary.

```sql
-- Generic audit log table
CREATE TABLE audit_log (
    audit_id BIGSERIAL PRIMARY KEY,
    table_name VARCHAR(63) NOT NULL,
    record_id VARCHAR(100) NOT NULL,  -- Composite keys as JSON
    operation VARCHAR(10) NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
    old_values JSONB,
    new_values JSONB,
    changed_fields VARCHAR(255)[],  -- Array of changed column names
    performed_by VARCHAR(100) NOT NULL,
    performed_at TIMESTAMPTZ DEFAULT NOW() NOT NULL
);

-- Generic trigger function (apply to any table)
CREATE OR REPLACE FUNCTION audit_trigger_func()
RETURNS TRIGGER AS $$
DECLARE
    audit_row audit_log;
    excluded_cols TEXT[] = ARRAY['created_at', 'updated_at', 'version'];
BEGIN
    IF TG_OP = 'INSERT' THEN
        audit_row = ROW(
            NULL, TG_TABLE_NAME, NEW.product_id::TEXT, 'INSERT',
            NULL, TO_JSONB(NEW), NULL, current_user, NOW()
        );
    ELSIF TG_OP = 'UPDATE' THEN
        audit_row = ROW(
            NULL, TG_TABLE_NAME, NEW.product_id::TEXT, 'UPDATE',
            TO_JSONB(OLD), TO_JSONB(NEW),
            ARRAY(SELECT jsonb_object_keys(TO_JSONB(NEW) - TO_JSONB(OLD)) 
                  EXCEPT SELECT UNNEST(excluded_cols)),
            current_user, NOW()
        );
        -- Only log if meaningful changes occurred
        IF audit_row.old_values = audit_row.new_values THEN
            RETURN NULL;
        END IF;
    ELSIF TG_OP = 'DELETE' THEN
        audit_row = ROW(
            NULL, TG_TABLE_NAME, OLD.product_id::TEXT, 'DELETE',
            TO_JSONB(OLD), NULL, NULL, current_user, NOW()
        );
    END IF;
    
    INSERT INTO audit_log VALUES (audit_row.*);
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_products_audit_cdc
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW EXECUTE FUNCTION audit_trigger_func();
```

---

## Chapter Summary

In this chapter, you learned:

1. **Normalization (1NF-BCNF)**: Enforce 1NF by eliminating repeating groups (use junction tables); ensure 2NF by removing partial dependencies (separate tables for dependent attributes); achieve 3NF by eliminating transitive dependencies (don't store derived data); apply BCNF by ensuring every determinant is a candidate key (decompose multi-value dependencies).

2. **Pragmatic Denormalization**: Store computed aggregates in parent tables using triggers for transactional consistency; use materialized views for complex read-heavy aggregations with scheduled refreshes; always document denormalized fields and maintain them through database constraints, not application logic.

3. **Key Selection**: Prefer surrogate keys (BIGSERIAL or UUID) for internal references with unique constraints on natural business keys; use BIGSERIAL for single-node OLTP (better clustering, smaller size); use UUID v7 (time-ordered) or v4 for distributed systems; implement composite keys for pure junction tables, but use surrogate keys for "rich" associations with metadata.

4. **Many-to-Many Patterns**: Design association tables to carry relationship metadata (quantity, role, timestamps); create indexes for both directions of the relationship; consider closure tables for hierarchical data requiring fast subtree queries; use adjacency lists with recursive CTEs for simple hierarchies.

5. **Soft Deletes**: Implement `deleted_at` timestamps with partial unique indexes to allow business key reuse; create views (`active_users`) to hide deleted records from applications; cascade soft deletes through triggers; purge hard deletes only after retention periods.

6. **Temporal Modeling**: Maintain history tables with `valid_from`/`valid_to` ranges; use exclusion constraints to prevent overlapping periods; implement triggers to automatically archive changes; consider range types (`TSTZRANGE`) for efficient temporal queries.

7. **Audit Standards**: Include `created_at`, `created_by`, `updated_at`, `updated_by`, and `version` (optimistic locking) on all tables; use generic CDC triggers for compliance-heavy environments storing JSONB snapshots; prefer database-level auditing over application-level to catch all changes including ad-hoc updates.

---

**Next:** In Chapter 11, we will explore PostgreSQL schemas (namespaces) for multi-tenant architectures and bounded contexts—covering search paths, schema-based access control, and patterns for isolating tenant data while maintaining operational efficiency.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../2. SQL_essentials/9. functions_expressions_and_common_patterns.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='11. working_with_schemas.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
