# Chapter 5: Data Types Done Right

Choosing incorrect data types is one of the most expensive mistakes in database design. Type changes require table rewrites, index rebuilds, and application refactoring. This chapter establishes the decision criteria for PostgreSQL's type system, emphasizing correctness, storage efficiency, and operational safety.

## 5.1 Numeric Types: Precision, Money, and Performance

PostgreSQL offers multiple numeric categories: integers, arbitrary precision, floating-point, and the notorious `MONEY` type. Each has specific semantics that affect calculation accuracy and storage.

### 5.1.1 Integer Types: When Size Matters

PostgreSQL provides five integer types. The industry standard is to use the smallest type that accommodates your data range, with a bias toward `BIGINT` for primary keys to prevent overflow in high-volume systems.

```sql
-- Integer type selection guide
-- SMALLINT (-32768 to 32767): 2 bytes
-- INTEGER (-2147483648 to 2147483647): 4 bytes  
-- BIGINT (-9223372036854775808 to 9223372036854775807): 8 bytes

-- Industry standard for primary keys
CREATE TABLE user_account (
    user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    -- BIGINT prevents overflow in high-insert systems (9 quintillion max)
    age_in_years SMALLINT CHECK (age_in_years BETWEEN 0 AND 150),
    -- SMALLINT sufficient for human ages, saves 6 bytes per row vs BIGINT
    reputation_score INTEGER
    -- INTEGER sufficient for most counters (2 billion max)
);

-- Serial types (legacy vs modern)
-- Old way (still common):
user_id SERIAL PRIMARY KEY
-- Creates SEQUENCE, sets DEFAULT nextval(), links ownership

-- Modern standard (PostgreSQL 10+):
user_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY
-- SQL standard compliant, cleaner DDL, easier to change
```

**Critical Distinction: `SERIAL` vs `GENERATED ALWAYS AS IDENTITY`**

```sql
-- SERIAL behavior (legacy)
CREATE TABLE legacy_table (
    id SERIAL PRIMARY KEY,
    name TEXT
);
-- Behind the scenes creates: CREATE SEQUENCE legacy_table_id_seq
-- Default: nextval('legacy_table_id_seq')
-- Not SQL standard; dumping/restoring can create gaps or duplicate sequences

-- IDENTITY columns (modern standard)
CREATE TABLE modern_table (
    id BIGINT GENERATED ALWAYS AS IDENTITY 
        (START WITH 1000 INCREMENT BY 1 CACHE 100) PRIMARY KEY,
    name TEXT
);
-- SQL standard compliant
-- Cannot override id manually without OVERRIDING clause (safety)
-- Sequence automatically dropped with table
```

### 5.1.2 Arbitrary Precision: `NUMERIC` vs `DECIMAL`

`NUMERIC` and `DECIMAL` are synonymous in PostgreSQL. They store exact precision up to 16383 digits before the decimal point and 16383 after, using variable storage (2-4 bytes overhead plus 2 bytes per 4 decimal digits).

**When to Use Exact Precision:**
- Financial calculations where rounding errors are unacceptable
- Scientific measurements requiring exact decimal representation
- Compliance requirements (GAAP, SOX) mandating precise arithmetic

```sql
-- Financial amounts: store as integer cents or exact decimal
CREATE TABLE transactions (
    -- Option 1: Integer cents (industry standard for high performance)
    amount_cents BIGINT NOT NULL CHECK (amount_cents != 0),
    
    -- Option 2: Exact decimal (when fractional cents exist, e.g., tax rates)
    exact_amount NUMERIC(19, 4) NOT NULL CHECK (exact_amount != 0),
    -- Precision 19, Scale 4 allows up to 999,999,999,999,999.9999
    -- 19 digits total, 4 after decimal
);

-- Why not (19,2)? Tax calculations (e.g., 8.875%) require 4 decimal places
-- Why not FLOAT? 0.1 + 0.2 != 0.3 in floating point
```

**Performance Warning:** `NUMERIC` operations are 5-10x slower than integer math. For high-frequency trading or real-time analytics, consider storing as `BIGINT` (cents) and formatting at the application layer.

### 5.1.3 Floating-Point: `REAL` and `DOUBLE PRECISION`

IEEE 754 floating-point types sacrifice exactness for performance and range. Never use them for money or equality comparisons.

```sql
-- Legitimate uses: scientific measurements, lat/long coordinates, ML features
CREATE TABLE sensor_readings (
    temperature_celsius REAL,           -- 6 decimal digits precision
    latitude DOUBLE PRECISION,          -- 15 decimal digits precision  
    longitude DOUBLE PRECISION,
    
    -- NEVER do this:
    -- price DOUBLE PRECISION  -- Rounding errors accumulate
);

-- Equality is dangerous with floats
SELECT * FROM sensor_readings WHERE temperature_celsius = 23.5;  -- Risky
SELECT * FROM sensor_readings WHERE ABS(temperature_celsius - 23.5) < 0.001;  -- Safe
```

### 5.1.4 The `MONEY` Type: A Cautionary Tale

**Never use the `MONEY` type.** It is a legacy type with dangerous semantics:

- Stores data as numeric, but displays with locale-specific formatting
- Fixed fractional precision (2 decimals) unsuitable for high-precision finance
- Portability issues (behavior changes with `lc_monetary` setting)
- Limited operator support (cannot multiply MONEY by MONEY)

```sql
-- Instead of:
price MONEY,

-- Use:
price_cents BIGINT,  -- or
price NUMERIC(19,4)  -- if fractional cents required
```

## 5.2 Text Types: Encoding, Collation, and Constraints

PostgreSQL's text handling is powerful but requires understanding of character sets, collation (sorting rules), and the `VARCHAR(n)` vs `TEXT` distinction.

### 5.2.1 `TEXT` vs `VARCHAR(n)`: The Definitive Guide

In PostgreSQL, `TEXT` and `VARCHAR` (without length) are identical in storage and performance. `VARCHAR(n)` adds a length constraint check.

**Industry Standard:**
- Use `TEXT` for virtually all character data
- Add `CHECK` constraints for length limits ( clearer error messages, easier to change)
- Reserve `VARCHAR(n)` only for strict interoperability requirements (e.g., fixed-width mainframe integration)

```sql
-- Preferred approach: TEXT with constraints
CREATE TABLE articles (
    article_id BIGINT PRIMARY KEY,
    title TEXT NOT NULL CHECK (LENGTH(title) BETWEEN 10 AND 200),
    -- Clear error: "violates check constraint articles_title_check"
    -- vs "value too long for type character varying(200)"
    
    slug TEXT NOT NULL CHECK (slug ~ '^[a-z0-9-]+$'),
    -- Regex validation more useful than length limit
    
    content TEXT NOT NULL CHECK (octet_length(content) < 1000000),
    -- Byte length check for storage limits (1MB max)
    
    summary TEXT CHECK (summary IS NULL OR LENGTH(summary) <= 500)
);

-- Legacy/Interop approach (when required)
CREATE TABLE legacy_integration (
    account_code VARCHAR(10) NOT NULL,  -- Fixed width from mainframe
    description VARCHAR(255)
);
```

**Storage Details:**
- Both use variable-length storage with 1-4 byte length prefix
- No padding (unlike `CHAR(n)` which pads with spaces)
- TOAST kicks in at ~2KB, compressing large text automatically

### 5.2.2 Character Encoding: UTF-8 Enforcement

PostgreSQL databases are created with a specific encoding. Industry standard is **UTF-8** exclusively.

```sql
-- Verify database encoding
SELECT pg_encoding_to_char(encoding) 
FROM pg_database 
WHERE datname = current_database();
-- Must return 'UTF8'

-- Client encoding (should match database)
SHOW client_encoding;
SET client_encoding TO 'UTF8';

-- Detect encoding violations (migration from legacy systems)
SELECT column_name 
FROM table_name 
WHERE column_name <> convert_from(convert_to(column_name, 'UTF8'), 'UTF8');
```

**Critical Warning:** If `client_encoding` differs from server encoding, PostgreSQL automatically transcodes data. This can cause errors (invalid byte sequences) or silent data corruption if the client claims wrong encoding.

### 5.2.3 Collation (LC_COLLATE): Sorting and Comparison

Collation determines string sort order, equality, and range comparisons. It is set at database creation but can be overridden per column or query.

```sql
-- Database default (from template)
SELECT datcollate FROM pg_database WHERE datname = current_database();
-- Typically 'en_US.utf8' or 'C'

-- Column-specific collation for case-insensitive storage
CREATE TABLE user_emails (
    email_address TEXT PRIMARY KEY COLLATE "C",  
    -- Binary collation: fast, strict, case-sensitive, dot-sensitive
    -- Required for email uniqueness (John@example.com ≠ john@example.com per RFC)
    
    display_name TEXT COLLATE "en_US"
    -- Locale-aware for proper sorting (Ü sorts with U in German, after Z in Swedish)
);

-- Query-level override
SELECT * FROM products 
ORDER BY name COLLATE "C";  -- Fast binary sort
-- vs
ORDER BY name COLLATE "en_US";  -- Locale-aware (slower)
```

**Performance Impact:**
- `C` or `POSIX` collation uses byte-wise comparison (fast, index-friendly)
- Locale collations (`en_US`, `de_DE`) use complex linguistic rules (slower, but correct for display)
- Indexes inherit collation; mixing collations in queries prevents index usage

**Case-Insensitive Searching Standard:**

```sql
-- Option 1: citext extension (case-insensitive text type)
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE users (
    email CITEXT PRIMARY KEY  -- 'John@Example.com' matches 'john@example.com'
);

-- Option 2: Functional index with LOWER() (more control)
CREATE UNIQUE INDEX users_email_lower_ukey ON users (LOWER(email_address));
SELECT * FROM users WHERE LOWER(email_address) = LOWER('John@Example.com');
```

## 5.3 Temporal Types: Time Zones, Precision, and Intervals

Time handling is the source of more production bugs than any other data type category. PostgreSQL offers rich temporal support, but requires explicit timezone discipline.

### 5.3.1 `TIMESTAMPTZ` vs `TIMESTAMP`: The Critical Distinction

- **`TIMESTAMP WITH TIME ZONE` (`TIMESTAMPTZ`)**: Stores moment in time as UTC internally, displays in session timezone. **Always use this.**
- **`TIMESTAMP WITHOUT TIME ZONE` (`TIMESTAMP`)**: Stores calendar/wall-clock time without zone context. Use only for future local events (meetings) or abstract times.

```sql
-- Industry standard: Always use TIMESTAMPTZ for events
CREATE TABLE events (
    event_id BIGINT PRIMARY KEY,
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),  -- Moment in time
    scheduled_for TIMESTAMP,  -- Abstract: "9:00 AM" regardless of zone
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Critical demonstration
SET TIME ZONE 'America/New_York';
INSERT INTO events (occurred_at) VALUES ('2024-01-15 12:00:00');
-- Stored as UTC internally, displays as EST/EDT

SET TIME ZONE 'Asia/Tokyo';
SELECT occurred_at FROM events;
-- Displays as '2024-01-16 02:00:00+09' (same moment, different display)

-- TIMESTAMP (without zone) danger:
INSERT INTO events (scheduled_for) VALUES ('2024-01-15 12:00:00');
-- Always displays as '2024-01-15 12:00:00' regardless of session zone
-- Is this noon UTC? Noon local? Impossible to tell without metadata.
```

**Best Practices:**
1. **Database server**: Set `timezone = 'UTC'` in `postgresql.conf`
2. **Application connections**: Always use UTC (`SET TIME ZONE 'UTC'`)
3. **Display conversion**: Convert to local time in application layer (JavaScript, Python, etc.)
4. **Never store**: Local times with offset stripped (e.g., stripping '-05:00' from '12:00-05:00')

### 5.3.2 Date, Time, and Intervals

```sql
-- DATE: For birthdates, anniversaries (no time component)
birth_date DATE NOT NULL CHECK (birth_date < CURRENT_DATE),

-- TIME: Time of day without date (rarely used)
opening_time TIME NOT NULL,

-- TIMETZ: Time with zone (avoid: ambiguous, deprecated in SQL standard)
-- closing_time TIMETZ  -- DON'T USE THIS

-- INTERVAL: Duration between timestamps
SELECT 
    AGE(NOW(), created_at) as human_readable_age,  -- '2 years 3 mons 4 days'
    NOW() - created_at as exact_interval,           -- '832 days 14:22:11.342985'
    EXTRACT(EPOCH FROM (NOW() - created_at)) as seconds_since_epoch
FROM events;
```

**Interval Storage Optimization:**

```sql
-- INTERVAL uses 16 bytes. For simple day counts, use INTEGER:
subscription_length_days INTEGER NOT NULL CHECK (subscription_length_days > 0),
-- vs
subscription_interval INTERVAL NOT NULL CHECK (subscription_interval > '0 days'),

-- Integer is smaller, faster for math, but requires application handling of months
-- Interval handles '1 month' correctly (variable days), but is bulkier
```

### 5.3.3 Ranges: Temporal and Numeric

Range types store intervals with inclusive/exclusive bounds efficiently.

```sql
-- Temporal ranges (most common)
CREATE TABLE room_reservations (
    room_id INTEGER NOT NULL,
    during TSTZRANGE NOT NULL,  -- Timestamp with time zone range
    
    -- Exclusion constraint prevents overlaps
    CONSTRAINT no_double_booking 
        EXCLUDE USING GIST (room_id WITH =, during WITH &&)
);

-- Inserting ranges
INSERT INTO room_reservations (room_id, during)
VALUES (101, '[2024-01-15 09:00:00+00, 2024-01-15 17:00:00+00)');
-- [ = inclusive, ) = exclusive (standard mathematical notation)

-- Range operators
SELECT * FROM room_reservations 
WHERE during @> NOW();  -- Contains current time

SELECT * FROM room_reservations 
WHERE during && '[2024-01-15 12:00:00+00, 2024-01-16 12:00:00+00)';  -- Overlaps

-- Integer ranges (useful for versioning, chunking)
CREATE TABLE file_chunks (
    file_id BIGINT,
    byte_range INT8RANGE,
    data BYTEA
);
```

## 5.4 UUID: Universally Unique Identifiers

UUIDs prevent enumeration attacks and simplify distributed system primary key generation, but have storage and performance implications.

### 5.4.1 Generation Methods

```sql
-- Method 1: uuid-ossp extension (legacy, widely supported)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
-- uuid_generate_v4() - Random UUID (version 4)
-- uuid_generate_v1() - Timestamp-based (leaks MAC address, time)

-- Method 2: pgcrypto (cryptographically secure random)
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- gen_random_uuid() - Random UUID (version 4), preferred for security

-- Method 3: Built-in (PostgreSQL 13+)
-- gen_random_uuid() is built-in since PG 13, no extension needed
```

**Industry Standard:** Use `gen_random_uuid()` (Version 4 UUID) for primary keys in distributed systems or external-facing IDs.

### 5.4.2 UUID as Primary Key: Pros and Cons

```sql
-- UUID primary key (distributed system safe)
CREATE TABLE distributed_documents (
    document_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    -- 16 bytes (vs 8 bytes for BIGINT), random insertion order
    
    content TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Sequential UUID (compromise: time-based prefix for locality)
-- Using uuid-ossp's uuid_generate_v1mc() or pg_uuidv7 extension
-- UUIDv7 provides time-sortable UUIDs (best of both worlds)
```

**Storage and Performance:**
- **Size**: 16 bytes vs 8 bytes for `BIGINT` (100% overhead)
- **Index fragmentation**: Random UUIDs cause frequent page splits in B-trees
- **Sequential UUIDs**: UUIDv7 (time-based prefix) mitigates fragmentation while retaining uniqueness
- **Exposure**: Never expose sequential integer IDs in URLs (enumeration risk); UUIDs are preferred for public APIs

**Hybrid Approach (Industry Standard):**

```sql
-- Internal: BIGINT (fast joins, compact)
-- External: UUID (security, distribution)
CREATE TABLE orders (
    order_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    order_uuid UUID UNIQUE DEFAULT gen_random_uuid() NOT NULL,
    -- Internal joins use order_id (fast)
    -- External APIs expose order_uuid (secure)
    
    customer_id BIGINT REFERENCES customers(customer_id),
    total_cents INTEGER NOT NULL
);
```

## 5.5 JSONB: Structured Data in a Relational World

PostgreSQL's JSONB (binary JSON) bridges relational and document models. It is distinct from the `JSON` type (text storage) and offers indexing, binary efficiency, and rich operators.

### 5.5.1 `JSON` vs `JSONB`: Always Use `JSONB`

```sql
-- JSON: Text storage, validates syntax, preserves whitespace/key order/repeats
-- JSONB: Binary storage, normalized (no whitespace), no key order, no duplicate keys
--         Supports indexing and binary operators

-- Always use JSONB unless:
-- 1. You must preserve exact formatting (whitespace, key order)
-- 2. You need to store duplicate keys (invalid JSON but valid JSON text)

CREATE TABLE event_logs (
    event_id BIGINT PRIMARY KEY,
    event_type TEXT NOT NULL,
    payload JSONB NOT NULL CHECK (jsonb_typeof(payload) = 'object'),
    -- Ensures payload is always an object, not array or scalar
    
    created_at TIMESTAMPTZ DEFAULT NOW()
);
```

### 5.5.2 JSONB Operators and Querying

```sql
-- Sample data
INSERT INTO event_logs (event_id, event_type, payload) VALUES
(1, 'user_signup', '{"user_id": 123, "email": "test@example.com", "plan": "pro", "tags": ["mobile", "ios"]}');

-- Extraction operators
SELECT 
    payload -> 'email' as json_quote,           -- JSON string: "test@example.com"
    payload ->> 'email' as text_value,          -- Text: test@example.com
    payload #> '{tags,0}' as nested_access      -- Path access: "mobile"
FROM event_logs;

-- Containment (uses GIN index)
SELECT * FROM event_logs 
WHERE payload @> '{"plan": "pro"}';  -- Contains this key-value pair

-- Existence
SELECT * FROM event_logs 
WHERE payload ? 'tags';  -- Has key 'tags'
WHERE payload ?| array['email', 'phone'];  -- Has any of these keys
WHERE payload ?& array['email', 'plan'];   -- Has all of these keys

-- Array containment
SELECT * FROM event_logs 
WHERE payload -> 'tags' @> '["mobile"]';

-- Update operators
UPDATE event_logs 
SET payload = payload || '{"verified": true}'  -- Merge/append keys
WHERE event_id = 1;

UPDATE event_logs 
SET payload = jsonb_set(payload, '{plan}', '"enterprise"')  -- Update specific path
WHERE event_id = 1;

-- Remove key
UPDATE event_logs 
SET payload = payload - 'temporary_field';
```

### 5.5.3 Indexing JSONB

```sql
-- GIN index for containment and existence queries (essential)
CREATE INDEX idx_event_logs_payload ON event_logs USING GIN (payload);

-- Specific path index (B-tree, for equality/range on specific field)
CREATE INDEX idx_event_logs_user_id ON event_logs USING BTREE ((payload ->> 'user_id'));

-- Multi-column index with JSONB
CREATE INDEX idx_event_logs_type_user ON event_logs (event_type, ((payload ->> 'user_id')::BIGINT));

-- Expression index for partial JSONB
CREATE INDEX idx_event_logs_pro_users ON event_logs ((payload ->> 'user_id')) 
WHERE payload @> '{"plan": "pro"}';
```

**When to Use JSONB vs Normalized Tables:**

Use **JSONB** when:
- Schema evolves frequently (event logs, CMS content)
- Data is hierarchical and rarely joined (nested configurations)
- You need flexible attributes (EAV replacement)

Use **Normalized Tables** when:
- You query specific fields frequently with complex joins
- Data integrity requires foreign key constraints (JSONB cannot reference other tables)
- You aggregate across the field (SUM, AVG on JSONB fields requires casting, no statistics)

## 5.6 Arrays: Collection Types

PostgreSQL supports arrays of any built-in or user-defined type. They are powerful but violate First Normal Form; use judiciously.

### 5.6.1 Array Basics and Syntax

```sql
-- Array columns
CREATE TABLE articles (
    article_id BIGINT PRIMARY KEY,
    title TEXT NOT NULL,
    tags TEXT[] NOT NULL DEFAULT '{}',  -- Array of text
    -- or: tags TEXT ARRAY,
    
    ratings INTEGER[] NOT NULL CHECK (array_length(ratings, 1) <= 5),
    -- 1-D array, max 5 elements
    
    matrix DOUBLE PRECISION[][]  -- 2-D array (matrix)
);

-- Array literals
INSERT INTO articles (article_id, title, tags, ratings) VALUES
(1, 'PostgreSQL Guide', ARRAY['database', 'sql', 'tutorial'], ARRAY[5, 4, 5]);

-- Alternative literal syntax
INSERT INTO articles VALUES 
(2, 'Advanced SQL', '{performance,indexing,tuning}', '{4,5,4,5}');
```

### 5.6.2 Array Operations

```sql
-- Containment (order-independent)
SELECT * FROM articles WHERE tags @> ARRAY['sql'];  -- Has 'sql' tag
SELECT * FROM articles WHERE tags && ARRAY['sql', 'nosql'];  -- Has any of these

-- Index access (1-based, not 0-based like most languages)
SELECT tags[1] as first_tag FROM articles;

-- Slicing
SELECT tags[1:2] as first_two_tags FROM articles;

-- Update
UPDATE articles SET tags = array_append(tags, 'advanced') WHERE article_id = 1;
UPDATE articles SET tags = array_remove(tags, 'old_tag') WHERE article_id = 1;

-- Unnest (convert array to rows)
SELECT article_id, unnest(tags) as tag FROM articles;
-- Returns one row per tag per article
```

### 5.6.3 Array Indexing

```sql
-- GIN index for containment/contains operations (essential for tags)
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);

-- B-tree index for specific array values (rare)
CREATE INDEX idx_articles_first_tag ON articles USING BTREE ((tags[1]));
```

**Array Anti-Patterns:**
- Never use arrays to store foreign keys (no referential integrity)
- Never use arrays for searchable tags without GIN index (sequential scan)
- Avoid large arrays (>100 elements); consider normalization or JSONB

## 5.7 Enumerated Types (ENUM): Domain Constraints

ENUMs create a static, ordered set of values. They are stored as 4-byte integers internally (efficient) but require ALTER TYPE to modify (expensive).

```sql
-- Define enum
CREATE TYPE order_status AS ENUM ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled');

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    status order_status NOT NULL DEFAULT 'pending',
    -- Stored as 4 bytes (vs ~10-20 bytes for VARCHAR)
    
    updated_at TIMESTAMPTZ
);

-- Ordering respects enum definition order, not alphabetical
SELECT * FROM orders ORDER BY status;
-- Returns: pending, confirmed, shipped... (not alphabetical)

-- Modifying enum (requires exclusive lock, careful in production)
ALTER TYPE order_status ADD VALUE 'returned';  -- Adds at end
ALTER TYPE order_status ADD VALUE 'on_hold' AFTER 'confirmed';  -- PG 15+
```

**ENUM vs Lookup Table Decision:**

Use **ENUM** when:
- Values are static (rarely change)
- No additional metadata needed per value (descriptions, codes)
- Space efficiency critical (millions of rows)

Use **Lookup Table** when:
- Values change frequently
- Need to attach metadata (display names, sort orders, external codes)
- Need foreign key relationships from other tables
- Need to track when values were added/deprecated

**Industry Standard:** Use lookup tables for business domain values (flexibility), use ENUMs for internal state machines with stable values (connection states, job queue states).

## 5.8 Type Conversion and Casting

Explicit casting prevents ambiguity and runtime errors.

```sql
-- Syntax options
SELECT CAST('123' AS INTEGER);
SELECT '123'::INTEGER;  -- PostgreSQL-specific, shorter
SELECT INTEGER '123';   -- Function-like

-- Safe casting (returns NULL on failure vs exception)
SELECT 'abc'::INTEGER;  -- ERROR: invalid input syntax
SELECT NULLIF('abc', '')::INTEGER;  -- Still errors

-- Use try-cast for safe parsing
SELECT CASE 
    WHEN 'abc' ~ '^\d+$' THEN 'abc'::INTEGER 
    ELSE NULL 
END;

-- Date parsing (explicit formats prevent locale issues)
SELECT TO_TIMESTAMP('2024-01-15 14:30', 'YYYY-MM-DD HH24:MI') AT TIME ZONE 'UTC';
```

---

## Chapter Summary

In this chapter, you learned:

1. **Numeric Types**: Use `BIGINT` for IDs (or `GENERATED ALWAYS AS IDENTITY`), `NUMERIC` for exact financial math (or `BIGINT` cents for performance), avoid `FLOAT` for money, never use `MONEY` type.
2. **Text Types**: Use `TEXT` with `CHECK` constraints instead of `VARCHAR(n)`, enforce UTF-8 encoding, choose collation (`C` for identifiers, locale-specific for display names) carefully.
3. **Temporal Types**: Always use `TIMESTAMPTZ` (UTC storage) for timestamps, use `DATE` for birthdays, use `TSTZRANGE` for scheduling with exclusion constraints.
4. **UUID**: Use `gen_random_uuid()` for distributed/external IDs, prefer `BIGINT` for internal joins, consider UUIDv7 for time-sortable uniqueness.
5. **JSONB**: Prefer over `JSON` for binary efficiency and indexing, use GIN indexes for containment queries, use expression indexes for specific path queries, avoid storing foreign references in JSONB.
6. **Arrays**: Useful for tags and simple collections with GIN indexing, but violate normalization—avoid for foreign key references or large collections.
7. **Enums**: Space-efficient for static value sets, but prefer lookup tables for business values requiring metadata or frequent changes.

**Next:** In Chapter 6, we will explore table creation and constraints in depth—covering primary key strategies, foreign key enforcement with proper ON DELETE behaviors, CHECK constraints for data quality, exclusion constraints for complex uniqueness, and generated columns for derived data.