# Chapter 22: SQL Functions and Procedures

PostgreSQL allows encapsulating business logic within the database through functions and procedures. Understanding the distinctions between SQL and procedural languages, volatility categories, and security contexts is essential for maintaining both performance and security boundaries.

## 22.1 Function Fundamentals and Volatility

Every function has a volatility classification that determines how the query optimizer can use it. Incorrect classification causes subtle bugs or performance degradation.

### 22.1.1 Volatility Categories

```sql
-- IMMUTABLE: Result never changes for same inputs, no database lookup
-- Examples: Mathematical operations, string formatting, cryptographic hashing
CREATE OR REPLACE FUNCTION calculate_tax(amount DECIMAL, rate DECIMAL)
RETURNS DECIMAL AS $$
    SELECT (amount * rate)::DECIMAL(10,2);
$$ LANGUAGE SQL IMMUTABLE;

-- Characteristics:
-- 1. Can be used in CREATE INDEX (functional indexes)
-- 2. Can be used in generated columns (PostgreSQL 12+)
-- 3. Optimizer may cache results across query execution
-- 4. Safe to call in parallel queries
-- 5. Must NOT access database tables or non-immutable functions

-- Critical violation example (WRONG):
CREATE OR REPLACE FUNCTION get_current_tax_rate()
RETURNS DECIMAL AS $$
    SELECT rate FROM tax_settings WHERE active = true;  -- Database access!
$$ LANGUAGE SQL IMMUTABLE;  -- LIE: This is STABLE or VOLATILE

-- STABLE: Result consistent within single query scan, may change between queries
-- Examples: current_timestamp, lookup tables that rarely change
CREATE OR REPLACE FUNCTION get_customer_status(customer_id BIGINT)
RETURNS TEXT AS $$
    SELECT status FROM customers WHERE id = customer_id;
$$ LANGUAGE SQL STABLE;

-- Characteristics:
-- 1. Cannot be used in functional indexes (result may change)
-- 2. Safe to use in WHERE clauses
-- 3. Optimizer assumes same result for all calls with same args in one query
-- 4. Can read database but cannot modify it

-- VOLATILE: Result may change on every call, or function has side effects
-- Examples: random(), nextval(), functions that modify data
CREATE OR REPLACE FUNCTION log_audit_event(event_type TEXT, payload JSONB)
RETURNS VOID AS $$
    INSERT INTO audit_log (event_type, payload, created_at)
    VALUES (event_type, payload, NOW());
$$ LANGUAGE SQL VOLATILE;

-- Characteristics:
-- 1. Cannot be optimized away (must execute for every row)
-- 2. Required for INSERT/UPDATE/DELETE operations
-- 3. Default if not specified (always specify explicitly for clarity)

-- Performance impact demonstration:
-- IMMUTABLE function in WHERE clause:
EXPLAIN ANALYZE SELECT * FROM orders WHERE calculate_tax(total, 0.10) > 100;
-- If calculate_tax is IMMUTABLE, may use index on calculated value
-- If mislabeled as STABLE/VOLATILE, forces sequential scan

-- Functional index requires IMMUTABLE:
CREATE INDEX idx_orders_tax_amount ON orders(calculate_tax(total, 0.10));
-- ERROR: cannot use mutable function in index expression
-- Must be IMMUTABLE to index
```

### 22.1.2 Return Types and Parameters

```sql
-- Scalar return types:
CREATE FUNCTION get_user_email(user_id BIGINT) 
RETURNS TEXT AS $$
    SELECT email FROM users WHERE id = user_id;
$$ LANGUAGE SQL STABLE;

-- Multiple OUT parameters (returns record):
CREATE FUNCTION get_user_stats(user_id BIGINT, OUT email TEXT, OUT created_at TIMESTAMPTZ, OUT order_count BIGINT)
AS $$
    SELECT u.email, u.created_at, COUNT(o.id)
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.id = user_id
    GROUP BY u.email, u.created_at;
$$ LANGUAGE SQL STABLE;

-- Usage:
SELECT * FROM get_user_stats(123);
-- Returns: (email, created_at, order_count)

-- INOUT parameters (input and output):
CREATE FUNCTION increment_counter(INOUT counter INT, increment INT)
AS $$
    SELECT counter + increment;
$$ LANGUAGE SQL IMMUTABLE;

-- Default values:
CREATE FUNCTION create_order(
    customer_id BIGINT,
    total DECIMAL,
    status TEXT DEFAULT 'pending'
) RETURNS BIGINT AS $$
    INSERT INTO orders (customer_id, total, status)
    VALUES (customer_id, total, status)
    RETURNING order_id;
$$ LANGUAGE SQL VOLATILE;

-- Named parameter notation (avoids positional errors):
SELECT create_order(customer_id := 123, total := 100.00, status := 'confirmed');
```

## 22.2 SQL Language Functions

SQL functions (LANGUAGE SQL) are parsed and inlined by the query optimizer, making them performance-transparent when simple.

### 22.2.1 Inlining and Optimization

```sql
-- Simple SQL function (inlined):
CREATE OR REPLACE FUNCTION get_active_users()
RETURNS SETOF users AS $$
    SELECT * FROM users WHERE status = 'active';
$$ LANGUAGE SQL STABLE;

-- When called:
SELECT * FROM get_active_users() WHERE created_at > '2024-01-01';
-- PostgreSQL INLINES the function body:
-- SELECT * FROM (SELECT * FROM users WHERE status = 'active') sub 
-- WHERE created_at > '2024-01-01';
-- Optimizer pushes down predicate: WHERE status = 'active' AND created_at > '2024-01-01'
-- Can use indexes efficiently

-- Non-inlinable patterns (avoid in performance-critical code):
-- 1. Multi-statement SQL functions (pre-PostgreSQL 14)
-- 2. Language plpgsql (never inlined into parent query)
-- 3. Complex control structures
-- 4. Volatile functions in SELECT list

-- Multi-statement SQL function (PostgreSQL 14+ may inline, earlier versions do not):
CREATE FUNCTION get_user_summary(user_id BIGINT)
RETURNS TABLE(email TEXT, order_count BIGINT, total_spent DECIMAL) AS $$
    SELECT u.email, COUNT(o.id), COALESCE(SUM(o.total), 0)
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.id = user_id
    GROUP BY u.email;
    
    -- Second query (CTE style)
    SELECT 'summary' as email, 0::BIGINT as order_count, 0::DECIMAL as total_spent;
$$ LANGUAGE SQL;

-- For PostgreSQL < 14, this is a "black box" - optimizer cannot push filters inside
```

### 22.2.2 Variadic Arguments

```sql
-- Variable number of arguments:
CREATE FUNCTION sum_values(VARIADIC values NUM[])
RETURNS NUMERIC AS $$
    SELECT SUM(v) FROM UNNEST(values) AS v;
$$ LANGUAGE SQL IMMUTABLE;

-- Usage:
SELECT sum_values(1, 2, 3, 4, 5);  -- Returns 15
SELECT sum_values(VARIADIC ARRAY[10, 20, 30]);  -- Array syntax also works
```

## 22.3 Procedures vs Functions

PostgreSQL 11 introduced procedures (CREATE PROCEDURE) which differ from functions in critical ways regarding transaction control and usage.

### 22.3.1 Key Differences

```sql
-- FUNCTION: Returns a value, runs in caller's transaction, cannot commit/rollback
CREATE OR REPLACE FUNCTION transfer_funds(
    from_acct BIGINT,
    to_acct BIGINT,
    amount DECIMAL
) RETURNS BOOLEAN AS $$
BEGIN
    -- Runs in caller's transaction context
    UPDATE accounts SET balance = balance - amount WHERE id = from_acct;
    UPDATE accounts SET balance = balance + amount WHERE id = to_acct;
    RETURN true;
END;
$$ LANGUAGE plpgsql;

-- PROCEDURE: No return value (can use INOUT), can commit/rollback independently
CREATE OR REPLACE PROCEDURE transfer_funds_proc(
    from_acct BIGINT,
    to_acct BIGINT,
    amount DECIMAL
)
AS $$
BEGIN
    UPDATE accounts SET balance = balance - amount WHERE id = from_acct;
    UPDATE accounts SET balance = balance + amount WHERE id = to_acct;
    
    -- Can commit within procedure (impossible in function)
    COMMIT;
    
    -- Or conditional rollback:
    IF (SELECT balance FROM accounts WHERE id = from_acct) < 0 THEN
        ROLLBACK;
        RAISE EXCEPTION 'Insufficient funds';
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Calling differences:
-- Function:
SELECT transfer_funds(1, 2, 100.00);  -- Returns boolean

-- Procedure:
CALL transfer_funds_proc(1, 2, 100.00);  -- No return, can commit inside
```

### 22.3.2 When to Use Procedures

```sql
-- Use PROCEDURES when:
-- 1. You need to commit intermediate results (long-running ETL)
-- 2. You need autonomous transactions (log success before main commit)
-- 3. You want to execute DDL (CREATE TABLE, etc.) and commit it separately

-- Example: ETL with intermediate commits
CREATE PROCEDURE process_large_batch()
AS $$
DECLARE
    batch_size INT := 10000;
    processed INT := 0;
BEGIN
    LOOP
        -- Process batch
        INSERT INTO processed_data 
        SELECT * FROM staging_table 
        WHERE processed = false 
        LIMIT batch_size;
        
        UPDATE staging_table SET processed = true 
        WHERE ctid IN (
            SELECT ctid FROM staging_table 
            WHERE processed = false 
            LIMIT batch_size
        );
        
        COMMIT;  -- Commit every 10k rows to prevent long transaction
        processed := processed + batch_size;
        
        IF processed >= 100000 THEN
            EXIT;
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Use FUNCTIONS when:
-- 1. You need to return values for use in SQL queries
-- 2. You want atomic operations (all or nothing with caller)
-- 3. You want query inlining (SQL language functions)
-- 4. You need to use results in SELECT, WHERE, etc.
```

## 22.4 Security Contexts (SECURITY DEFINER)

Functions execute with either the invoker's privileges (default) or the definer's privileges (owner). SECURITY DEFINER is powerful but dangerous if misconfigured.

### 22.4.1 SECURITY INVOKER (Default)

```sql
-- Function runs with permissions of the user who calls it
CREATE FUNCTION get_my_orders()
RETURNS SETOF orders AS $$
    SELECT * FROM orders WHERE user_id = current_user_id();  -- hypothetical
$$ LANGUAGE SQL SECURITY INVOKER;  -- Default, explicit for clarity

-- If user 'alice' calls this, she sees only alice's orders (assuming RLS or filter)
-- If user 'bob' calls this, he sees only bob's orders
-- User must have SELECT permission on orders table
```

### 22.4.2 SECURITY DEFINER and Privilege Escalation

```sql
-- Function runs with permissions of the function OWNER
-- Useful for: Controlled access to sensitive data without granting table permissions

-- Scenario: Users should update their own profile, but not directly access table
CREATE TABLE user_profiles (
    user_id BIGINT PRIMARY KEY,
    email TEXT,
    private_data JSONB
);

-- Revoke direct access:
REVOKE ALL ON user_profiles FROM PUBLIC;

-- Create function that enforces row-level access:
CREATE OR REPLACE FUNCTION update_user_email(new_email TEXT)
RETURNS VOID AS $$
BEGIN
    UPDATE user_profiles 
    SET email = new_email 
    WHERE user_id = current_setting('app.current_user_id')::BIGINT;
    
    IF NOT FOUND THEN
        RAISE EXCEPTION 'Profile not found or access denied';
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

-- Grant execute to application role:
GRANT EXECUTE ON FUNCTION update_user_email(TEXT) TO app_user;

-- CRITICAL SECURITY FIX: Set search_path explicitly
-- Without this, attacker could create function with same name in public schema
CREATE OR REPLACE FUNCTION update_user_email(new_email TEXT)
RETURNS VOID AS $$
DECLARE
    target_user_id BIGINT;
BEGIN
    -- Explicit schema qualification prevents search_path attacks
    target_user_id := current_setting('app.current_user_id')::BIGINT;
    
    UPDATE public.user_profiles   -- Explicit schema!
    SET email = new_email 
    WHERE user_id = target_user_id;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER
SET search_path = public, pg_temp;  -- Force safe search_path

-- Even better: Use explicit schema for all objects inside function
```

### 22.4.3 Search Path Attacks

```sql
-- Attack scenario if search_path not set:
-- 1. Attacker creates: CREATE FUNCTION public.user_profiles() RETURNS VOID...
-- 2. Function owner has SECURITY DEFINER function that references user_profiles
-- 3. If search_path includes public before private schema, attacker function executes
-- 4. Attacker code runs with definer privileges

-- Prevention checklist for SECURITY DEFINER:
-- 1. Always SET search_path in function definition
-- 2. Always schema-qualify all table references (public.table_name)
-- 3. Use LEAKPROOF keyword for functions that handle sensitive data (prevents optimizer exploits)
-- 4. Grant minimum necessary permissions to function owner role

CREATE OR REPLACE FUNCTION get_sensitive_data(user_id BIGINT)
RETURNS TEXT AS $$
    SELECT secret FROM secure_vault WHERE id = user_id;
$$ LANGUAGE SQL STABLE SECURITY DEFINER
SET search_path = public
LEAKPROOF;  -- Prevents certain information leak attacks through EXPLAIN
```

## 22.5 Set-Returning Functions (SRFs)

Functions that return multiple rows require special handling for optimal performance.

### 22.5.1 RETURNS TABLE vs RETURNS SETOF

```sql
-- RETURNS SETOF: Return rows of existing table type
CREATE FUNCTION get_orders_by_status(order_status TEXT)
RETURNS SETOF orders AS $$
    SELECT * FROM orders WHERE status = order_status;
$$ LANGUAGE SQL STABLE;

-- RETURNS TABLE: Define columns inline (anonymous type)
CREATE FUNCTION get_order_summary(min_amount DECIMAL)
RETURNS TABLE(
    order_id BIGINT,
    customer_email TEXT,
    total DECIMAL,
    order_date TIMESTAMPTZ
) AS $$
    SELECT 
        o.order_id,
        c.email,
        o.total,
        o.created_at
    FROM orders o
    JOIN customers c ON c.id = o.customer_id
    WHERE o.total >= min_amount;
$$ LANGUAGE SQL STABLE;

-- Usage in FROM clause (like table):
SELECT * FROM get_order_summary(100.00) 
WHERE order_date > '2024-01-01'
ORDER BY total DESC;

-- Materialization control:
-- Functions in FROM clause are often materialized (executed once, stored)
-- Functions in SELECT list execute per row (correlated)
```

### 22.5.2 LATERAL Function Calls

```sql
-- Functions can be used with LATERAL for row-by-row processing
CREATE FUNCTION get_customer_orders(cust_id BIGINT, limit_count INT)
RETURNS TABLE(order_id BIGINT, total DECIMAL) AS $$
    SELECT order_id, total 
    FROM orders 
    WHERE customer_id = cust_id 
    ORDER BY created_at DESC 
    LIMIT limit_count;
$$ LANGUAGE SQL STABLE;

-- Top-N per customer using LATERAL:
SELECT c.name, o.order_id, o.total
FROM customers c
LEFT JOIN LATERAL get_customer_orders(c.id, 5) o ON true
WHERE c.status = 'active';

-- LATERAL allows function to reference columns from preceding FROM items
-- Each row from customers drives a call to get_customer_orders
-- Efficient for small outer sets, index-friendly if function uses indexes
```

## 22.6 Error Handling and Validation

Robust functions validate inputs and handle errors gracefully without exposing internal details.

### 22.6.1 Input Validation

```sql
-- Check constraints at function entry:
CREATE OR REPLACE FUNCTION withdraw(account_id BIGINT, amount DECIMAL)
RETURNS VOID AS $$
BEGIN
    -- Validate inputs
    IF amount <= 0 THEN
        RAISE EXCEPTION 'Withdrawal amount must be positive: %', amount
            USING ERRCODE = 'invalid_parameter_value';
    END IF;
    
    IF amount > 10000 THEN
        RAISE EXCEPTION 'Withdrawal amount % exceeds daily limit', amount
            USING ERRCODE = 'check_violation',
                  HINT = 'Contact support for large withdrawals';
    END IF;
    
    -- Perform operation
    UPDATE accounts SET balance = balance - amount WHERE id = account_id;
    
    IF NOT FOUND THEN
        RAISE EXCEPTION 'Account % not found', account_id
            USING ERRCODE = 'no_data_found';
    END IF;
END;
$$ LANGUAGE plpgsql;

-- SQLSTATE codes (ERRCODE):
-- 'no_data_found' (P0002)
-- 'too_many_rows' (P0003)  
-- 'unique_violation' (23505)
-- 'check_violation' (23514)
-- 'insufficient_privilege' (42501)
```

### 22.6.2 Exception Handling (PL/pgSQL)

```sql
-- Handling exceptions within functions:
CREATE OR REPLACE FUNCTION safe_transfer(from_id BIGINT, to_id BIGINT, amount DECIMAL)
RETURNS BOOLEAN AS $$
BEGIN
    PERFORM transfer_funds(from_id, to_id, amount);
    RETURN true;
    
EXCEPTION 
    WHEN insufficient_funds THEN
        -- Log failure but don't crash
        INSERT INTO failed_transfers (from_id, to_id, amount, reason, attempted_at)
        VALUES (from_id, to_id, amount, 'insufficient_funds', NOW());
        RETURN false;
        
    WHEN unique_violation THEN
        -- Handle duplicate key gracefully
        RAISE WARNING 'Duplicate transaction detected for account %', from_id;
        RETURN false;
        
    WHEN OTHERS THEN
        -- Catch-all: Log and re-raise
        INSERT INTO error_log (message, sqlstate, detail)
        VALUES (SQLERRM, SQLSTATE, PG_EXCEPTION_DETAIL);
        RAISE;  -- Re-raise the original exception
END;
$$ LANGUAGE plpgsql;

-- Trapping specific SQLSTATE codes:
EXCEPTION WHEN SQLSTATE '23505' THEN  -- unique_violation
    -- Handle duplicate
```

## 22.7 Function Overloading

PostgreSQL supports function overloading (same name, different argument types), but requires careful management to avoid ambiguity.

```sql
-- Overloaded functions:
CREATE FUNCTION get_user(user_id BIGINT) RETURNS users AS $$
    SELECT * FROM users WHERE id = user_id;
$$ LANGUAGE SQL STABLE;

CREATE FUNCTION get_user(email_address TEXT) RETURNS users AS $$
    SELECT * FROM users WHERE email = email_address;
$$ LANGUAGE SQL STABLE;

-- Usage:
SELECT * FROM get_user(123);        -- Calls BIGINT version
SELECT * FROM get_user('a@b.com');  -- Calls TEXT version

-- Ambiguity error:
SELECT * FROM get_user(NULL);  
-- ERROR: function get_user(unknown) is not unique
-- Fix: Explicit cast: get_user(NULL::BIGINT)

-- Best practices:
-- 1. Use overloading sparingly (confusing for developers)
-- 2. Ensure argument types are sufficiently distinct
-- 3. Document which version handles which case
-- 4. Consider naming conventions instead: get_user_by_id(), get_user_by_email()
```

## 22.8 Performance Anti-Patterns

### 22.8.1 Row-by-Row Processing

```sql
-- Anti-pattern: Function called per row in large result set
CREATE FUNCTION calculate_discount(price DECIMAL) RETURNS DECIMAL AS $$
    SELECT CASE 
        WHEN price > 1000 THEN price * 0.9
        WHEN price > 100 THEN price * 0.95
        ELSE price
    END;
$$ LANGUAGE SQL IMMUTABLE;

-- Bad usage:
SELECT product_id, calculate_discount(price) FROM products;  -- Function call per row

-- Better: Inline the logic (allows vectorized execution)
SELECT 
    product_id,
    CASE 
        WHEN price > 1000 THEN price * 0.9
        WHEN price > 100 THEN price * 0.95
        ELSE price
    END as discounted_price
FROM products;

-- Exception: Complex logic that must be shared across queries and maintained centrally
-- Cost: 10-20% overhead per function call
```

### 22.8.2 Hidden Cursor Operations

```sql
-- PL/pgSQL functions returning SETOF use cursors internally
-- Large result sets may materialize or iterate slowly

-- Anti-pattern: Returning huge sets from PL/pgSQL
CREATE FUNCTION get_all_orders()
RETURNS SETOF orders AS $$
BEGIN
    RETURN QUERY SELECT * FROM orders;  -- Returns all 10M rows
END;
$$ LANGUAGE plpgsql;

-- Better: Use SQL function for simple returns (inlined, streams)
CREATE FUNCTION get_all_orders()
RETURNS SETOF orders AS $$
    SELECT * FROM orders;
$$ LANGUAGE SQL;

-- Or use RETURNS TABLE with LIMIT/OFFSET parameters to control size
```

### 22.8.3 Volatility Misclassification

```sql
-- Worst case: Labeling random() as IMMUTABLE
CREATE FUNCTION get_random_id() RETURNS BIGINT AS $$
    SELECT (random() * 1000000)::BIGINT;
$$ LANGUAGE SQL IMMUTABLE;  -- WRONG! Should be VOLATILE

-- Consequences:
-- 1. Query planner caches result, returns same "random" number for all rows
-- SELECT get_random_id() FROM generate_series(1,10);  -- Returns same number 10 times!
-- 2. Could be used in index (nonsensical index that never updates)

-- Always correctly classify:
-- IMMUTABLE: Pure function, same output forever for same input (md5, concat, math)
-- STABLE: Depends on database state but not modified by query (current_timestamp, lookups)
-- VOLATILE: Side effects or unpredictable results (random(), nextval(), now())
```

---

## Chapter Summary

In this chapter, you learned:

1. **Volatility Categories**: IMMUTABLE functions enable functional indexes and generated columns but must not access the database. STABLE functions provide consistent results within a query scan and are safe for WHERE clauses. VOLATILE functions have side effects or non-deterministic results and cannot be optimized by the planner. Misclassification causes subtle bugs (caching) or prevents index usage.

2. **SQL vs PL/pgSQL**: SQL functions (LANGUAGE SQL) are inlined by the query optimizer, allowing predicate pushdown and index usage. PL/pgSQL functions are black boxes that execute per call. Use SQL for simple queries, PL/pgSQL for control flow and exception handling.

3. **Procedures vs Functions**: Procedures (CREATE PROCEDURE) support transaction control (COMMIT/ROLLBACK inside) and are ideal for batch ETL with intermediate commits. Functions run in the caller's transaction context and return values for use in SQL expressions. Functions are composable; procedures are standalone CALL statements.

4. **SECURITY DEFINER**: Functions executing with owner privileges require explicit `SET search_path` and schema-qualified table names to prevent search_path injection attacks. Always use `SECURITY INVOKER` unless privilege escalation is specifically required, and audit DEFINER functions regularly.

5. **Set-Returning Functions**: Use `RETURNS TABLE` for ad-hoc column definitions, `RETURNS SETOF` for existing table types. Combine with `LATERAL` for correlated function calls (top-N per group patterns). Materialization behavior differs between SQL and PL/pgSQL implementations.

6. **Error Handling**: Use specific SQLSTATE codes in RAISE EXCEPTION for application error handling. Trap exceptions in PL/pgSQL with EXCEPTION blocks for logging or fallback logic, but avoid excessive exception handling (expensive in PL/pgSQL).

**Next:** In Chapter 23, we will explore PL/pgSQL Essentials—covering variables, control flow, cursors, record types, and advanced procedural patterns for complex database logic.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../5. Transactions_concurrency_and_correctness_under_load/21. sequences_and_id_generation.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='23. pl_pgsql_essentials.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
