# Chapter 2: Installing PostgreSQL (Local + Team Standards)

## 2.1 Installation Options: Making the Right Choice

PostgreSQL runs on virtually every modern platform. However, the installation method you choose determines your operational flexibility, upgrade path, and team consistency. This section covers the four primary installation patterns used in industry.

### 2.1.1 Package Manager Installation (Recommended for Development)

Package managers provide the tightest integration with your operating system, automatic security updates, and standard file system layouts.

**macOS (Homebrew)**

```bash
# Install PostgreSQL
brew install postgresql@16

# Add to PATH (add to ~/.zshrc or ~/.bash_profile)
export PATH="/opt/homebrew/opt/postgresql@16/bin:$PATH"

# Start service
brew services start postgresql@16

# Create default database (matches your username)
createdb

# Verify installation
psql --version
psql -c "SELECT version();"
```

**Ubuntu/Debian (APT)**

```bash
# Add official PostgreSQL repository (recommended over distribution packages)
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update

# Install specific version (pin version in production)
sudo apt-get install -y postgresql-16 postgresql-client-16 postgresql-contrib-16

# Service management
sudo systemctl enable postgresql
sudo systemctl start postgresql

# Switch to postgres user for initial setup
sudo -u postgres psql -c "SELECT version();"
```

**RHEL/CentOS/Rocky Linux (DNF/YUM)**

```bash
# Add repository
sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm

# Disable default module to prevent conflicts
sudo dnf -qy module disable postgresql

# Install
sudo dnf install -y postgresql16-server postgresql16-contrib

# Initialize database cluster
sudo /usr/pgsql-16/bin/postgresql-16-setup initdb

# Enable and start
sudo systemctl enable postgresql-16
sudo systemctl start postgresql-16
```

### 2.1.2 Docker Installation (Recommended for Team Consistency)

Docker ensures every developer runs identical PostgreSQL versions with identical configurations, eliminating "works on my machine" issues.

**Basic Docker Run (Development Only)**

```bash
# Quick start (data lost when container stops)
docker run --name postgres-dev \
  -e POSTGRES_PASSWORD=devpassword \
  -e POSTGRES_USER=devuser \
  -e POSTGRES_DB=devdb \
  -p 5432:5432 \
  postgres:16-alpine

# With persistent volume
docker run --name postgres-dev \
  -e POSTGRES_PASSWORD=devpassword \
  -e POSTGRES_USER=devuser \
  -e POSTGRES_DB=devdb \
  -p 5432:5432 \
  -v postgres_data:/var/lib/postgresql/data \
  postgres:16-alpine
```

**Production-Ready Docker Compose (Team Standard)**

```yaml
# docker-compose.yml
version: '3.8'

services:
  postgres:
    image: postgres:16-alpine
    container_name: app-postgres
    environment:
      POSTGRES_USER: ${POSTGRES_USER:-appuser}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
      POSTGRES_DB: ${POSTGRES_DB:-appdb}
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      - type: volume
        source: postgres_data
        target: /var/lib/postgresql/data
      - type: bind
        source: ./init-scripts
        target: /docker-entrypoint-initdb.d
        read_only: true
      - type: bind
        source: ./postgresql.conf
        target: /etc/postgresql/postgresql.conf
        read_only: true
    ports:
      - "127.0.0.1:5432:5432"  # Bind to localhost only for security
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-appuser} -d ${POSTGRES_DB:-appdb}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '1.0'
          memory: 1G
    restart: unless-stopped

volumes:
  postgres_data:
    driver: local
```

**Key Docker Standards:**
- Always use specific versions (`postgres:16-alpine`), never `latest`
- Bind to `127.0.0.1` in development to prevent accidental exposure
- Use environment files (`.env`) for secrets, never commit credentials
- Mount custom `postgresql.conf` for team-standard configuration
- Use healthchecks for orchestration (Kubernetes, Docker Swarm)

### 2.1.3 Cloud-Managed PostgreSQL (Production Standard)

For production workloads, managed services handle backups, patching, replication, and monitoring.

**Amazon RDS/Aurora PostgreSQL**

```bash
# Connection string format (standard URI)
postgresql://username:password@hostname:port/database?sslmode=require

# Example with RDS-specific parameters
postgresql://appuser:SecurePass123@mydb.cluster-xyz.us-east-1.rds.amazonaws.com:5432/appdb?sslmode=require&connect_timeout=10&application_name=myapp
```

**Key Cloud Configuration Standards:**

```sql
-- Verify SSL is enforced (cloud standard)
SHOW ssl;
-- Should return 'on'

-- Check connection encryption
SELECT 
    client_addr, 
    client_port, 
    ssl, 
    ssl_version, 
    ssl_cipher
FROM pg_stat_ssl 
JOIN pg_stat_activity USING (pid)
WHERE pid = pg_backend_pid();
```

**Google Cloud SQL / Azure Database for PostgreSQL**

Similar connection patterns with cloud-specific CLI tools for provisioning:

```bash
# Google Cloud SQL example
gcloud sql instances create postgres-instance \
    --database-version=POSTGRES_16 \
    --tier=db-g1-small \
    --region=us-central1 \
    --availability-type=REGIONAL \
    --storage-size=100GB \
    --storage-auto-increase \
    --backup-start-time=03:00 \
    --maintenance-window-day=SUN \
    --maintenance-window-hour=4

# Create database and user
gcloud sql databases create appdb --instance=postgres-instance
gcloud sql users create appuser --instance=postgres-instance --password='SecureRandomPassword123!'
```

---

## 2.2 Choosing Your PostgreSQL Version

### 2.2.1 Version Numbering and Support Lifecycle

PostgreSQL uses **semantic versioning** with a unique support model:

- **Major versions**: Annual release (e.g., 15, 16, 17)
- **Minor versions**: Quarterly bug fix releases (e.g., 16.1, 16.2)
- **Support duration**: 5 years from major version release

**Current Version Strategy (as of 2024-2025):**

| Version | Status | Support Until | Recommendation |
|---------|--------|---------------|----------------|
| 12 | EOL (End of Life) | Nov 2024 | Upgrade immediately |
| 13 | Supported | Nov 2025 | Plan upgrade |
| 14 | Supported | Nov 2026 | Acceptable |
| 15 | Supported | Nov 2027 | Good |
| 16 | Current | Nov 2028 | **Recommended** |
| 17 | Latest | Nov 2029 | Evaluate for new projects |

### 2.2.2 Version Selection Decision Matrix

**For New Projects:**

```sql
-- Check current version
SELECT version();
-- Should return something like: PostgreSQL 16.2 on x86_64-pc-linux-gnu...

-- Check feature availability
SELECT 
    current_setting('server_version_num')::int >= 160000 
    AS has_sql_json_functions;
```

**Industry Standard Rules:**

1. **Never use EOL versions** in production (no security patches)
2. **N-1 or N-2 strategy**: Run either the latest or one version behind for stability
3. **Major upgrades require planning**: They are not in-place; require dump/restore or logical replication
4. **Test extensions compatibility** before version upgrades

---

## 2.3 Directory Layout and Key Files

Understanding PostgreSQL's file system layout is crucial for troubleshooting, backup strategies, and performance tuning.

### 2.3.1 The Data Directory (PGDATA)

The data directory contains all database files, configuration, and WAL. Its location varies by installation method:

```bash
# Find your data directory
psql -c "SHOW data_directory;"

# Typical locations:
# macOS (Homebrew): /opt/homebrew/var/postgresql@16  or /usr/local/var/postgresql@16
# Ubuntu/Debian:  /var/lib/postgresql/16/main
# RHEL/CentOS:    /var/lib/pgsql/16/data
# Docker:         /var/lib/postgresql/data
# Windows:        C:\Program Files\PostgreSQL\16\data
```

**Directory Structure:**

```
$PGDATA/
├── base/                  # Database files (one subdirectory per database)
│   ├── 1/                # Template1 database
│   ├── 4/                # Template0 database
│   └── 16384/            # Your actual databases (OID numbers)
├── global/                # Cluster-wide tables (pg_database, pg_control)
├── pg_wal/                # Write-Ahead Log (WAL) files
│   └── archive/           # Archived WAL (if configured)
├── pg_commit_ts/          # Commit timestamps (if enabled)
├── pg_dynshmem/           # Dynamic shared memory
├── pg_logical/            # Logical decoding data
├── pg_multixact/          # Multi-transaction status
├── pg_notify/             # LISTEN/NOTIFY queue
├── pg_replslot/           # Replication slot data
├── pg_serial/             # Serializable transaction info
├── pg_snapshots/          # Exported snapshots
├── pg_stat/               # Permanent statistics data
├── pg_stat_tmp/           # Temporary statistics
├── pg_subtrans/           # Subtransaction data
├── pg_tblspc/             # Tablespace symbolic links
├── pg_twophase/           # Prepared transaction state
├── pg_xact/               # Transaction commit/abort status
├── current_logfiles       # Current log file locations
├── pg_hba.conf            # Host-based authentication
├── pg_ident.conf          # User name mapping
├── PG_VERSION             # Version marker (e.g., "16")
├── postgresql.auto.conf   # Auto-tuned parameters
├── postgresql.conf        # Main configuration
└── postmaster.opts        # Server start options
```

**Critical Files for Developers:**

| File | Purpose | When You Need It |
|------|---------|------------------|
| `postgresql.conf` | Server configuration | Tuning memory, connections, logging |
| `pg_hba.conf` | Authentication rules | Connecting from new networks, SSL enforcement |
| `PG_VERSION` | Version check | Troubleshooting, upgrade verification |
| `base/` directory | Database storage | Understanding bloat, sizing |
| `pg_wal/` | Transaction logs | PITR, replication troubleshooting |

### 2.3.2 Configuration File Hierarchy

PostgreSQL reads configuration in a specific order. Understanding this prevents "why didn't my change take effect?" confusion.

```sql
-- Check configuration file locations
SHOW config_file;        -- Main config file
SHOW hba_file;           -- Authentication config
SHOW ident_file;         -- User mapping

-- Check current settings with context
SELECT name, setting, unit, context, vartype, source, sourcefile
FROM pg_settings
WHERE name IN ('max_connections', 'shared_buffers', 'work_mem')
ORDER BY name;
```

**Configuration Contexts (When Changes Take Effect):**

| Context | Description | Example Parameters |
|---------|-------------|-------------------|
| `internal` | Set at compile time | `block_size`, `max_identifier_length` |
| `postmaster` | Requires server restart | `max_connections`, `shared_buffers`, `port` |
| `sighup` | Reload configuration file | `log_min_messages`, `work_mem` |
| `backend` | New connections only | `deadlock_timeout` |
| `superuser` | Superuser can change anytime | `log_statement_stats` |
| `user` | Session-local changes | `search_path`, `application_name` |

**Reloading Configuration:**

```bash
# Method 1: SQL command (superuser)
psql -c "SELECT pg_reload_conf();"

# Method 2: Signal (Unix)
kill -HUP $(head -1 $PGDATA/postmaster.pid)

# Method 3: Systemd
sudo systemctl reload postgresql
```

---

## 2.4 Environment Setup Checklist: Dev/Prod Parity

The most expensive bugs occur when development environments differ from production. This section establishes a reproducible, team-standard environment.

### 2.4.1 Version Locking Strategy

**Industry Standard:** Use the same major version locally as in production, ideally the same minor version.

```bash
# .tool-versions (asdf version manager)
postgres 16.2

# Dockerfile
FROM postgres:16.2-alpine

# docker-compose.yml
services:
  postgres:
    image: postgres:16.2-alpine
```

### 2.4.2 Configuration Standards

Create a `postgresql.conf` that balances development convenience with production realism.

```ini
# postgresql.dev.conf
# Development configuration that mirrors production constraints

# Connectivity
listen_addresses = 'localhost'
port = 5432
max_connections = 100                  # Match production connection pool size

# Memory (scale down from production proportionally)
shared_buffers = 256MB                 # Production might use 25% of RAM
work_mem = 4MB                       # Per-operation, per-connection
maintenance_work_mem = 64MB          # For VACUUM, CREATE INDEX
effective_cache_size = 1GB           # Planner assumption about OS cache

# Write Ahead Logging (WAL)
wal_level = replica                  # Match production (logical if using replication)
max_wal_size = 1GB
min_wal_size = 80MB

# Query Planner
random_page_cost = 1.1               # Lower for SSDs (match production storage)
effective_io_concurrency = 200       # For SSDs

# Logging (Development: be verbose)
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_messages = info
log_min_error_statement = error
log_line_prefix = '%m [%p] %q%u@%d '  # Timestamp [pid] user@database
log_statement = 'ddl'                  # Log all DDL (CREATE, ALTER, DROP)
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on

# Locale and Formatting
datestyle = 'iso, mdy'
timezone = 'UTC'                       # Always UTC in database, convert in app
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
```

### 2.4.3 Docker Development Environment

The following `docker-compose.yml` provides a production-like local environment with persistent data, health checks, and resource limits.

```yaml
# docker-compose.yml
version: '3.8'

services:
  postgres:
    image: postgres:16.2-alpine
    container_name: app-postgres-dev
    environment:
      POSTGRES_USER: ${POSTGRES_USER:-appuser}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-devpassword}
      POSTGRES_DB: ${POSTGRES_DB:-appdb}
      PGDATA: /var/lib/postgresql/data/pgdata
    volumes:
      # Named volume for data persistence
      - postgres_data:/var/lib/postgresql/data
      
      # Initialization scripts (run once on first start)
      - ./init:/docker-entrypoint-initdb.d:ro
      
      # Custom configuration
      - ./postgresql.conf:/etc/postgresql/postgresql.conf:ro
      
      # SSL certificates (if using TLS in dev)
      - ./certs:/var/lib/postgresql/certs:ro
    ports:
      - "127.0.0.1:5432:5432"  # Localhost only for security
    command: 
      - "postgres"
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-appuser} -d ${POSTGRES_DB:-appdb}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '1.0'
          memory: 1G
    restart: unless-stopped
    networks:
      - app-network

  # Optional: pgAdmin for GUI management (disable in production-like envs)
  pgadmin:
    image: dpage/pgadmin4:8
    container_name: app-pgadmin
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@localhost.com
      PGADMIN_DEFAULT_PASSWORD: admin
      PGADMIN_CONFIG_SERVER_MODE: 'False'
    volumes:
      - pgadmin_data:/var/lib/pgadmin
    ports:
      - "127.0.0.1:5050:80"
    depends_on:
      postgres:
        condition: service_healthy
    networks:
      - app-network

volumes:
  postgres_data:
    driver: local
  pgadmin_data:
    driver: local

networks:
  app-network:
    driver: bridge
```

**Environment File (`.env`)**

```bash
# .env - Never commit this to version control
POSTGRES_USER=appuser
POSTGRES_PASSWORD=ChangeMeInProduction123!
POSTGRES_DB=appdb
```

### 2.1.4 Windows Installation (Native)

While most production PostgreSQL runs on Linux, Windows development is common.

**Using EnterpriseDB Installer (Official)**

1. Download from postgresql.org/download/windows/
2. Run installer as Administrator
3. Select components: PostgreSQL Server, pgAdmin, Stack Builder (optional)
4. Set data directory (avoid paths with spaces: use `C:\pgsql\data` not `C:\Program Files\...`)
5. Set superuser password
6. Set port (default 5432)
7. Set locale (recommend UTF-8)

**Windows Service Management**

```powershell
# Check service status
Get-Service postgresql-x64-16

# Start/Stop/Restart
Start-Service postgresql-x64-16
Stop-Service postgresql-x64-16
Restart-Service postgresql-x64-16

# Set to automatic start
Set-Service -Name postgresql-x64-16 -StartupType Automatic
```

**Windows Path Considerations**

```powershell
# Add to PATH environment variable (User level)
[Environment]::SetEnvironmentVariable(
    "Path", 
    [Environment]::GetEnvironmentVariable("Path", "User") + ";C:\Program Files\PostgreSQL\16\bin", 
    "User"
)
```

---

## 2.2 Version Selection and Lifecycle Management

### 2.2.1 The PostgreSQL Release Cycle

PostgreSQL releases follow a predictable annual cycle:

- **Q4 (September-October)**: New major version released (e.g., 16.0)
- **Quarterly**: Minor releases (bug fixes, security patches) - 16.1, 16.2, etc.
- **5 years**: Support lifecycle for each major version

**Industry Standard Version Policy:**

```bash
# Development: Latest stable major version
# Staging/Production: Same major version as dev, specific minor version

# Pin version in infrastructure as code
# Terraform example:
# version = "16.2"
```

### 2.2.2 Checking Version and Capabilities

```sql
-- Detailed version information
SELECT version();
-- Returns: PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.0, 64-bit

-- Numeric version for programmatic checks
SHOW server_version_num;
-- Returns: 160002 (16.0.2)

-- Check specific feature availability
SELECT 
    current_setting('server_version_num')::int >= 150000 AS has_improved_merge,
    current_setting('server_version_num')::int >= 160000 AS has_sql_json_constructor;
```

### 2.2.3 Upgrade Strategies

**Minor Version Upgrades (16.1 → 16.2):**
- Simple binary replacement
- No data directory changes
- Can use `pg_ctl restart` with new binaries

**Major Version Upgrades (15 → 16):**
Require data migration. Three industry-standard approaches:

1. **pg_dump/pg_restore** (Small databases, < 100GB)
2. **pg_upgrade** (Large databases, downtime acceptable)
3. **Logical Replication** (Large databases, minimal downtime)

```bash
# pg_upgrade workflow (simplified)
# 1. Install new version binaries
# 2. Stop old server
# 3. Run pg_upgrade
pg_upgrade \
    --old-datadir=/var/lib/postgresql/15/main \
    --new-datadir=/var/lib/postgresql/16/main \
    --old-bindir=/usr/lib/postgresql/15/bin \
    --new-bindir=/usr/lib/postgresql/16/bin \
    --check  # Dry run first

# 4. Start new server
# 5. Analyze new cluster
# 6. Delete old data (after verification)
```

---

## 2.3 Directory Layout and Key Files Deep Dive

### 2.3.1 File System Organization

Understanding the physical layout is essential for capacity planning, backup strategies, and troubleshooting.

```bash
# Navigate to data directory
cd $(psql -c "SHOW data_directory;" -t -A)

# View structure
tree -L 2 -d
```

**Critical Subdirectories:**

| Directory | Purpose | Operational Relevance |
|-----------|---------|---------------------|
| `base/` | Database files (tables, indexes) | Size monitoring, bloat detection |
| `global/` | Cluster-wide catalogs | pg_database, pg_control |
| `pg_wal/` | Write-ahead log | PITR, replication, disk space critical |
| `pg_stat/` | Permanent statistics | Query performance analysis |
| `pg_stat_tmp/` | Temporary statistics | Can be on tmpfs for performance |
| `pg_tblspc/` | Tablespace symbolic links | Custom storage locations |
| `pg_logical/` | Logical decoding | Replication, change data capture |
| `pg_snapshots/` | Exported snapshots | Concurrent backup consistency |

### 2.3.2 Configuration Files

**postgresql.conf**
The primary configuration file. Location found via:

```sql
SHOW config_file;
-- /etc/postgresql/16/main/postgresql.conf
```

Key parameters categorized:

```ini
# CONNECTIONS AND AUTHENTICATION
max_connections = 100                 # Total concurrent connections
superuser_reserved_connections = 3    # Reserved for superuser/emergency
listen_addresses = 'localhost'        # '*' for all interfaces (security risk)
port = 5432

# MEMORY
shared_buffers = 256MB              # 25% of RAM is typical starting point
work_mem = 4MB                      # Per-sort/per-hash operation
maintenance_work_mem = 64MB         # VACUUM, CREATE INDEX, ALTER TABLE
effective_cache_size = 1GB          # Planner assumption about OS cache

# WRITE AHEAD LOG (WAL)
wal_level = replica                 # minimal, replica, logical
max_wal_size = 1GB
min_wal_size = 80MB
checkpoint_completion_target = 0.9  # Spread checkpoint I/O over time

# QUERY PLANNER
random_page_cost = 1.1              # 1.1 for SSD, 4.0 for HDD
effective_io_concurrency = 200      # Concurrent disk I/O operations
default_statistics_target = 100     # ANALYZE sample size

# LOGGING (Development verbose, production selective)
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_messages = info              # debug5 to panic
log_min_error_statement = error      # Log statements causing errors
log_line_prefix = '%m [%p] %q%u@%d/%a '  -- Timestamp [pid] user@db/app
log_statement = 'ddl'                -- Log CREATE, ALTER, DROP
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on                  -- Detect contention
```

**pg_hba.conf (Host-Based Authentication)**
Controls who can connect, from where, and how they authenticate.

```bash
# Location
SHOW hba_file;

# Format: TYPE  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]
# TYPE: local (Unix socket), host (TCP/IP), hostssl (SSL only), hostnossl (no SSL)
```

**Standard pg_hba.conf for Development:**

```conf
# TYPE  DATABASE        USER            ADDRESS                 METHOD

# Local connections (Unix socket) - trust for local dev only
local   all             all                                     trust

# IPv4 local connections - password with MD5 or SCRAM-SHA-256
host    all             all             127.0.0.1/32            scram-sha-256

# IPv6 local connections
host    all             all             ::1/128                 scram-sha-256

# Docker network (example: 172.18.0.0/16)
# host    all             all             172.18.0.0/16           scram-sha-256

# Reject all other connections (explicit deny)
host    all             all             0.0.0.0/0               reject
```

**Production pg_hba.conf Standards:**

```conf
# Never use 'trust' in production
# Use 'scram-sha-256' (PostgreSQL 14+) or 'md5' (legacy)
# Require SSL for remote connections

# Application specific user with least privilege
hostssl appdb         appuser         10.0.0.0/8              scram-sha-256
hostssl appdb         readonly        10.0.0.0/8              scram-sha-256

# Admin access only from bastion/jump host
hostssl all           postgres        10.0.1.10/32            scram-sha-256
```

**pg_ident.conf (User Name Mapping)**
Maps operating system users to database users (useful for local development with peer authentication).

```conf
# MAPNAME       SYSTEM-USERNAME           PG-USERNAME
mymap           john_doe                  app_admin
mymap           deploy_user               postgres
```

Usage in pg_hba.conf:
```conf
local   all             all                     peer map=mymap
```

---

## 2.4 Environment Parity: Development vs Production

### 2.4.1 The Twelve-Factor App Database Principles

While not strictly part of the Twelve-Factor methodology, database configuration should follow similar principles:

1. **Version Parity**: Dev and prod run identical major versions
2. **Configuration Parity**: Critical parameters (memory settings, planner costs) scaled proportionally
3. **Feature Parity**: If prod uses partitioning, connection pooling, or replication, dev should simulate these

### 2.4.2 Development Environment Setup Script

Create a reproducible development environment:

```bash
#!/bin/bash
# setup-dev-env.sh
# Run this after package installation to configure development instance

set -e

PG_VERSION=${PG_VERSION:-16}
PG_USER=${PG_USER:-$USER}
DB_NAME=${DB_NAME:-devdb}

echo "Setting up PostgreSQL $PG_VERSION for development..."

# Initialize database cluster (if not already done)
if [ ! -d "$HOME/.postgres/$PG_VERSION/data" ]; then
    mkdir -p "$HOME/.postgres/$PG_VERSION"
    initdb -D "$HOME/.postgres/$PG_VERSION/data" \
           --auth-host=scram-sha-256 \
           --auth-local=peer \
           --encoding=UTF8 \
           --locale=en_US.UTF-8
fi

# Start server
pg_ctl -D "$HOME/.postgres/$PG_VERSION/data" -l "$HOME/.postgres/$PG_VERSION/logfile" start

# Wait for ready
sleep 2
until pg_isready -q; do
    echo "Waiting for PostgreSQL to start..."
    sleep 1
done

# Create user (if not exists) and database
psql -d postgres -tc "SELECT 1 FROM pg_roles WHERE rolname='$PG_USER'" | grep -q 1 || \
    psql -d postgres -c "CREATE ROLE $PG_USER WITH LOGIN SUPERUSER CREATEDB PASSWORD 'devpassword';"

psql -d postgres -tc "SELECT 1 FROM pg_database WHERE datname='$DB_NAME'" | grep -q 1 || \
    createdb -O $PG_USER $DB_NAME

echo "Development environment ready!"
echo "Connection: psql -d $DB_NAME"
echo "JDBC: jdbc:postgresql://localhost:5432/$DB_NAME"
```

### 2.4.3 Production Readiness Checklist

Before deploying to production, verify:

```sql
-- Security
SHOW ssl;                              -- Should be 'on'
SELECT name, setting FROM pg_settings WHERE name LIKE '%password%';
SELECT name, setting FROM pg_settings WHERE name = 'log_connections';

-- Performance Baseline
SELECT name, setting, unit FROM pg_settings 
WHERE name IN ('max_connections', 'shared_buffers', 'work_mem', 'maintenance_work_mem');

-- WAL and Replication
SHOW wal_level;
SHOW max_wal_size;
SELECT pg_is_in_recovery();            -- false for primary, true for standby

-- Statistics
SELECT * FROM pg_stat_database WHERE datname = current_database();
```

---

## 2.5 Verifying Your Installation

### 2.5.1 System Validation

```bash
# Check version and build info
psql -c "SELECT version();"

# Verify extensions available
psql -c "SELECT * FROM pg_available_extensions WHERE name IN ('pgcrypto', 'uuid-ossp', 'postgis') ORDER BY name;"

# Test connectivity and authentication
psql "postgresql://localhost:5432/postgres?sslmode=require" -c "SELECT current_user, current_database();"

# Check data directory permissions (should be 700)
ls -ld $(psql -c "SHOW data_directory;" -t -A)
```

### 2.5.2 Creating Your First Database (Properly)

```sql
-- Connect as superuser (postgres)
psql -U postgres

-- Create application role (not superuser)
CREATE ROLE app_admin WITH 
    LOGIN 
    PASSWORD 'SecureRandomPassword123!' 
    CREATEDB  -- Can create databases for migrations
    CREATEROLE -- Can create other app roles
    INHERIT;

-- Create application database with explicit encoding
CREATE DATABASE appdb
    WITH 
    OWNER = app_admin
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TEMPLATE = template0  -- Clean template, no custom objects
    CONNECTION LIMIT = -1;

-- Connect to new database and set up schema
\c appdb app_admin

-- Create schema (don't use public schema for application objects)
CREATE SCHEMA app;

-- Set search path (schema resolution order)
ALTER DATABASE appdb SET search_path TO app, public;

-- Grant usage
GRANT USAGE ON SCHEMA app TO app_admin;
```

---

## Chapter Summary

In this chapter, you learned:

1. **Installation Methods**: Package managers for OS integration, Docker for team consistency, cloud for production
2. **Version Strategy**: Run identical major versions across environments; understand the 5-year support lifecycle
3. **File System Layout**: `PGDATA` structure, configuration files (`postgresql.conf`, `pg_hba.conf`), and their purposes
4. **Configuration Hierarchy**: Understanding `postmaster` vs `sighup` contexts prevents operational surprises
5. **Environment Parity**: Development must mirror production constraints (memory settings, authentication, features) to catch issues early
6. **Security Baseline**: SCRAM-SHA-256 authentication, SSL enforcement, and least-privilege role setup from day one

---

# Chapter 3: First Steps with psql (Your Primary Interface)

## 3.1 Understanding the psql Philosophy

`psql` is PostgreSQL's official command-line client. Unlike GUI tools that abstract database mechanics, `psql` exposes the full power of PostgreSQL's protocol, metadata, and server-side features. Industry professionals use `psql` for:
- Debugging production issues (always available on servers)
- Running complex administrative commands
- Scripting database operations
- Verifying query plans and performance

### 3.1.1 Connection Fundamentals

PostgreSQL uses a hierarchical connection model: **Cluster → Database → Schema → Object**.

```bash
# Basic connection syntax
psql [options]... [dbname [username]]

# Connection parameters
psql -h hostname -p port -U username -d database

# Connection URI (RFC 3986)
psql "postgresql://username:password@host:port/database?param1=value1&..."

# Examples
psql -h localhost -U appuser -d appdb
psql "postgresql://appuser:secret@localhost:5432/appdb?sslmode=require"
```

**Connection Parameter Precedence** (highest to lowest):
1. Command-line flags (`-h`, `-U`, etc.)
2. Connection URI parameters
3. Environment variables (`PGHOST`, `PGUSER`, etc.)
4. `~/.pgpass` (password file)
5. `~/.pg_service.conf` (service definitions)
6. Default values (localhost, current OS username, port 5432)

### 3.1.2 Environment Variables

Standard practice uses environment variables for connection defaults, never hardcoded credentials.

```bash
# ~/.bashrc or ~/.zshrc
export PGHOST=localhost
export PGPORT=5432
export PGUSER=appuser
export PGDATABASE=appdb
export PGSSLMODE=require

# Password (use .pgpass file instead of env var for security)
# Format: hostname:port:database:username:password
echo "localhost:5432:appdb:appuser:secretpassword" > ~/.pgpass
chmod 600 ~/.pgpass  # Required: readable only by owner
```

---

## 3.2 psql Meta-Commands

Meta-commands (backslash commands) are `psql`-specific instructions that interact with the client or query database metadata. They are not SQL and are processed by `psql` before sending anything to the server.

### 3.2.1 Connection and Session Management

```sql
-- Connect to different database/user (within psql)
\c appdb appuser localhost
-- You are now connected to database "appdb" as user "appuser" on host "localhost".

-- Connect with specific options
\c service=production sslmode=require

-- Show current connection info
\conninfo
-- You are connected to database "appdb" as user "appuser" on host "localhost" (address "127.0.0.1") at port "5432".

-- Change session configuration
SET search_path TO app, public;
SET TIME ZONE 'America/New_York';
```

### 3.2.2 Schema Introspection (Information Discovery)

These commands query the system catalogs (`pg_catalog`, `information_schema`) and present human-readable output.

```sql
-- List databases
\l                 -- Short format
\l+                -- With size, tablespace, description

-- List schemas
\dn                -- Schemas
\dn+               -- With access privileges

-- List tables, views, sequences
\dt                -- Tables only
\dt *.*            -- Tables in all schemas
\dt app.*          -- Tables in 'app' schema
\dt+               -- With size, description
\dv                -- Views
\dv+               -- Views with definition
\ds                -- Sequences
\ds+               -- Sequences with details

-- List all objects (tables, views, sequences, foreign tables)
\d                 -- All objects in search_path
\d app.*           -- All objects in schema
\d users           -- Describe specific table (columns, types, defaults, constraints, indexes, triggers)
\d+ users          -- Extended description (description, storage, stats)

-- List indexes
\di                -- Indexes
\di+               -- With size
\d users           -- Shows indexes at bottom of table description

-- List constraints
\ddp               -- Default privileges
\drds              -- Role database settings

-- List functions
\df                -- Functions
\df+               -- With source code
\df *text*         -- Filter by pattern
\do                -- Operators
\do+               -- Operator details

-- List tablespaces
\db                -- Tablespaces
\db+               -- With location, size

-- List roles (users/groups)
\du                -- Roles
\du+               -- With attributes, member of
\dg                -- Groups (same as \du in modern Postgres)

-- List extensions
\dx                -- Installed extensions
\dx+               -- With objects belonging to extension
```

### 3.2.3 Query Execution and Output Control

```sql
-- Execute query from file
\i /path/to/script.sql

-- Execute command and show query execution time
\timing on
SELECT * FROM users LIMIT 10;
-- Time: 1.234 ms

-- Control output format
\pset format aligned     -- Default, human-readable columns
\pset format unaligned   -- CSV-like, useful for scripting
\pset format html        -- HTML tables
\pset format csv         -- Proper CSV
\pset format json        -- JSON output (one object per row)

-- Set output file
\o /tmp/output.txt
SELECT * FROM users;
\o                      -- Return to stdout

-- Pager control (disable for scripting)
\pset pager off
-- Or set specific pager
\setenv PAGER 'less -S'

-- Expanded display (vertical format for wide tables)
\x on
SELECT * FROM users WHERE user_id = 1;
-- Returns vertical format instead of horizontal

-- Auto-expanded for wide output
\pset expanded auto

-- Border style
\pset border 2          -- Solid border (default)
\pset border 1          -- Minimal border
\pset border 0          -- No border

-- Null display
\pset null '[NULL]'      -- Show NULLs explicitly
```

### 3.2.4 Scripting and Automation

```sql
-- Variables
\set myvar 'some_value'
\echo :myvar

-- Variable from shell environment
\getenv home HOME
\echo :home

-- Conditional execution (psql features, not SQL)
\if :some_var
  \echo 'Variable is true'
\else
  \echo 'Variable is false'
\endif

-- Prompt customization
\set PROMPT1 '%[%033[1m%]%M %n@%/%R%[%033[0m%]%# '
-- Shows: host user@database= (bold)

-- Transaction status in prompt
\set PROMPT2 '%R%x%# '
-- Shows: ... (in transaction), ' (in string), etc.

-- Quiet mode (scripting)
\pset quiet on

-- Single transaction mode (fail on first error, rollback all)
psql -1 -f script.sql  -- -1 is --single-transaction

-- Variable passing from command line
psql -v var_name=value -c "SELECT :var_name"
```

---

## 2.4 Directory Layout and Key Files in Detail

### 2.4.1 The Data Directory Structure

After installation, understanding the physical layout is crucial for troubleshooting and maintenance.

```bash
# Find your data directory
psql -c "SHOW data_directory;" -t -A

# Typical layout inspection
sudo ls -la /var/lib/postgresql/16/main/
```

**Detailed File Purposes:**

**Configuration Files:**
- `postgresql.conf`: Main server configuration (location found via `SHOW config_file`)
- `postgresql.auto.conf`: ALTER SYSTEM changes (do not edit manually)
- `pg_hba.conf`: Client authentication rules
- `pg_ident.conf`: OS user to database user mapping

**Control Files:**
- `PG_VERSION`: Contains major version number (e.g., "16")
- `postmaster.pid`: Lock file containing PID and data directory path
- `postmaster.opts`: Command-line options from last start

**Database Files (in `base/`):**
Each database has a subdirectory named by its OID (Object ID).

```bash
# Find database OIDs
psql -c "SELECT oid, datname FROM pg_database;"
```

Inside each database directory:
- Files named by table/index filenode numbers
- `_fsm` files: Free Space Maps
- `_vm` files: Visibility Maps
- `_init` files: Unlogged table initialization forks

**WAL Files (in `pg_wal/`):**
Write-ahead log segments (16MB each by default).

```bash
# Check WAL configuration
SHOW wal_level;
SHOW max_wal_size;
SHOW archive_mode;
```

**Temporary Files (in `base/pgsql_tmp/`):**
Created when operations exceed `work_mem` (sorts, hashes, materialization).

### 2.4.2 Tablespaces (Advanced Layout)

Tablespaces allow spreading data across different storage volumes.

```sql
-- Create tablespace on fast SSD for hot tables
CREATE TABLESPACE hot_data
    OWNER postgres
    LOCATION '/mnt/fast_ssd/postgres/hot';

-- Create tablespace on cheap storage for archive
CREATE TABLESPACE cold_data
    OWNER postgres
    LOCATION '/mnt/slow_hdd/postgres/cold';

-- Use tablespaces
CREATE TABLE recent_transactions (...)
TABLESPACE hot_data;

CREATE TABLE archived_transactions (...)
TABLESPACE cold_data;

-- Move existing table
ALTER TABLE big_table SET TABLESPACE hot_data;

-- View tablespaces
\db
\db+
```

---

## 2.5 Environment Setup Checklist

### 2.5.1 Pre-Development Checklist

Before writing application code, verify:

```sql
-- 1. Version check (match production)
SELECT version();

-- 2. Encoding check (should be UTF8)
SHOW server_encoding;  -- UTF8
SHOW client_encoding;  -- UTF8

-- 3. Locale check
SHOW lc_collate;  -- en_US.UTF-8 (or your locale)
SHOW lc_ctype;    -- en_US.UTF-8

-- 4. Timezone (should be UTC for servers)
SHOW timezone;  -- UTC recommended

-- 5. Max connections (know your limits)
SHOW max_connections;

-- 6. Current user and privileges
SELECT current_user, session_user;
SELECT pg_has_role('appuser', 'MEMBER');

-- 7. Database size and location
SELECT 
    current_database(),
    pg_size_pretty(pg_database_size(current_database())),
    current_setting('data_directory');
```

### 2.5.2 Development Tools Setup

**Required CLI Tools:**

```bash
# PostgreSQL client utilities
which psql pg_dump pg_restore createdb dropdb pg_isready pg_ctl

# Connection testing
pg_isready -h localhost -p 5432 -U appuser

# Database creation
createdb -U appuser -E UTF8 -T template0 --locale=en_US.UTF-8 appdb

# Backup testing
pg_dump -U appuser -d appdb --schema-only --no-owner --no-privileges > schema.sql
```

**Recommended GUI Tools (Optional):**
- **pgAdmin 4**: Official tool, comprehensive but heavy
- **DBeaver**: Universal database tool, good for multiple databases
- **DataGrip**: JetBrains IDE, excellent SQL completion and refactoring
- **TablePlus**: Lightweight, modern interface

**IDE Integration:**
- **VS Code**: PostgreSQL extension by Chris Kolkman
- **Vim**: `vim-dadbod` or `psql` integration
- **Emacs**: `sql-postgres` mode

### 2.5.3 Team Standardization

Create a `.psqlrc` file for consistent developer experience:

```sql
-- ~/.psqlrc
-- psql startup configuration

-- Enable timing for all queries
\timing on

-- Pretty output
\pset pager off
\pset format aligned
\pset border 2
\pset null '[NULL]'
\pset footer on

-- History configuration
\set HISTFILE ~/.psql_history-:DBNAME
\set HISTSIZE 10000
\set HISTCONTROL ignoredups

-- Prompt customization
-- [user@host:port/database]#
\set PROMPT1 '%[%033[1;32m%]%n@%[%033[1;36m%]%M:%>%[%033[0m%]/%/%R%# '

-- Auto-explain for slow queries (development only)
-- \set AUTOCOMMIT off

-- Load custom functions
-- \i ~/.psql_functions.sql
```

---

## Chapter Summary

In this chapter, you learned:

1. **Installation Methods**: Package managers for OS integration, Docker for team consistency, cloud for production workloads
2. **Version Management**: Annual major releases, quarterly minor releases, 5-year support lifecycle, and the N-1 strategy
3. **Directory Structure**: `PGDATA` layout, configuration files, database files in `base/`, WAL in `pg_wal/`, and the purpose of each subdirectory
4. **Configuration Files**: `postgresql.conf` for server settings, `pg_hba.conf` for access control, `pg_ident.conf` for user mapping
5. **Environment Standards**: UTF-8 encoding, UTC timezone, SCRAM-SHA-256 authentication, and team-standardized `.psqlrc` configurations

**Key Takeaway**: Your development environment should mirror production constraints (version, encoding, authentication method, key configuration parameters) to prevent "works on my machine" issues and deployment surprises.

**Next**: In Chapter 3, we will master psql—the essential command-line interface for PostgreSQL—covering secure connection management, meta-commands for schema introspection, advanced scripting with variable substitution, and output formatting techniques that enable both interactive debugging and automation workflows.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='1. what_postgresql_is.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='2. installing_postgresql.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
