# Chapter 35: Connection Management and Pooling

PostgreSQL's process-per-connection architecture creates a fundamental tension: applications want many connections for concurrency, but database resources degrade with excessive process counts. This chapter explores the physics of PostgreSQL connections, the necessity of pooling, and production-hardened configurations that prevent the "too many clients" outages that plague growing systems.

---

## 35.1 The Connection Architecture Problem

### 35.1.1 Process-Per-Connection Model

Unlike threaded databases (MySQL, SQL Server), PostgreSQL spawns a separate OS process for every client connection.

```text
Client Connection → Postmaster forks → New Backend Process (postgres: user db ...)
```

**Resource Consumption Per Connection**:

```sql
-- Check per-connection memory usage
SELECT 
    pid,
    usename,
    application_name,
    client_addr,
    pg_size_pretty(pg_backend_memory_contexts.context_name IS NOT NULL) as memory,
    state
FROM pg_stat_activity 
LEFT JOIN pg_backend_memory_contexts ON pg_stat_activity.pid = pg_backend_memory_contexts.pid
WHERE backend_type = 'client backend';

-- Typical memory per connection (work_mem + maintenance_work_mem allocations)
-- Base overhead: ~5-10 MB per process (RSS)
-- With work_mem = 4MB: Can grow to 20-50 MB per complex query
```

**The Math**:
- 100 connections: ~1 GB RAM baseline
- 1000 connections: ~10 GB RAM, heavy context switching
- 10,000 connections: System thrashing, OOM kills

### 35.1.2 The `max_connections` Limit

```sql
-- Check current and max connections
SELECT 
    current_setting('max_connections') as max_allowed,
    COUNT(*) FILTER (WHERE state = 'active') as active,
    COUNT(*) FILTER (WHERE state = 'idle') as idle,
    COUNT(*) FILTER (WHERE state = 'idle in transaction') as idle_in_tx,
    COUNT(*) as total_current
FROM pg_stat_activity;

-- Typical error when limit reached:
-- FATAL: sorry, too many clients already
-- connection failed: connection to server at "..." failed: FATAL: remaining connection slots are reserved for non-replicated superuser connections
```

**Hard Limits**:
- Default: 100 connections
- Practical production maximum: 500-1000 (depends on RAM and workload)
- Beyond 1000: Requires connection pooling (PgBouncer) or connection throttling

**Reserved Connections**:
```ini
# postgresql.conf
max_connections = 200
superuser_reserved_connections = 3  # For emergency admin access when at limit
```

### 35.1.3 Connection Latency and Throughput

Each connection creation requires:
1. TCP handshake (1 RTT)
2. TLS negotiation (2 RTT)
3. Fork new process (OS context switch)
4. Authentication check (password hash or cert validation)
5. Memory allocation for process

**Total**: 5-20ms per connection on modern hardware; unacceptable for high-frequency microservices.

---

## 35.2 Connection Pooling Fundamentals

### 35.2.1 Pooling Concepts

Connection pooling maintains a persistent set of database connections, recycling them between application requests instead of creating/destroying per request.

**Without Pooling**:
```
Request 1: Connect → Query → Disconnect (20ms overhead + query time)
Request 2: Connect → Query → Disconnect
```

**With Pooling**:
```
Startup: Create 10 connections (200ms once)
Request 1: Borrow → Query → Return (0ms overhead)
Request 2: Borrow → Query → Return
```

### 35.2.2 Application vs. External Pooling

**Level 1: Application Pool** (HikariCP, SQLAlchemy pool, node-postgres pool)
- Lives in application memory
- Fastest (no network hop)
- Limited to single application instance
- Configuration per instance

**Level 2: Middleware Pool** (PgBouncer, PgPool-II)
- Separate process/server between apps and database
- Shared across multiple application instances
- Survives application restarts
- Centralized control and monitoring

**Level 3: Database Built-in** (PostgreSQL 14+ built-in pooling via `libpq` pipeline, limited)
- Not true pooling, just query pipelining

**Industry Standard**: Application pool + PgBouncer for defense in depth.

---

## 35.3 PgBouncer Architecture and Modes

PgBouncer is the de facto standard PostgreSQL connection pooler—lightweight, efficient, and proven at massive scale (handling 10,000+ application connections with 100 database connections).

### 35.3.1 Installation and Basic Configuration

```ini
; /etc/pgbouncer/pgbouncer.ini

[databases]
; Database name mapping
production = host=localhost port=5432 dbname=production
analytics = host=analytics.internal port=5432 dbname=warehouse
legacy = host=old-db.internal port=5432 dbname=legacy_db

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt

; Pool size configuration (critical)
pool_mode = transaction          ; session | transaction | statement
max_client_conn = 10000          ; Application connections (high)
default_pool_size = 20           ; Actual database connections per user/db pair
min_pool_size = 10               ; Keep warm connections ready
reserve_pool_size = 5            ; Emergency overflow
reserve_pool_timeout = 5         ; Seconds to wait before using reserve

; Connection limits (safety)
server_lifetime = 3600           ; Recycle server connections hourly (prevent memory leak)
server_idle_timeout = 600        ; Close idle server connections after 10min
server_connect_timeout = 15      ; Fail fast if DB unreachable
server_login_retry = 15          ; Retry interval

; Client timeouts
client_login_timeout = 60        ; How long client has to authenticate
idle_transaction_timeout = 0     ; Kill idle transactions (0 = disabled, use with caution)
query_timeout = 0                ; Max query duration (0 = disabled)
query_wait_timeout = 120         ; Max time client waits for pool slot

; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
```

**User List File** (`/etc/pgbouncer/userlist.txt`):
```text
"app_user" "SCRAM-SHA-256$4096:..."
"readonly" "SCRAM-SHA-256$4096:..."
```

Or use `auth_query` to avoid duplicating passwords:
```ini
auth_type = scram-sha-256
auth_user = pgbouncer_auth
auth_query = SELECT usename, passwd FROM pg_shadow WHERE usename=$1
```

### 35.3.2 Pool Mode: Session (Default)

**Behavior**: Connection assigned to client until client disconnects.

```
Client A Connects → Server Connection 1 assigned → Client A Disconnects → Server Connection 1 released
Client B Connects → Server Connection 1 (or new) assigned...
```

**Characteristics**:
- Simplest mental model
- Supports prepared statements, temporary tables, SET commands, LISTEN/NOTIFY
- Poor resource utilization if clients hold connections while idle

**Use Case**: Applications using prepared statements extensively, or when `SET` session variables required.

**Drawback**:
```python
# Flask/Django keeping connection open during HTTP request
@app.route('/api/data')
def get_data():
    conn = pool.getconn()  # Holds DB connection entire request duration
    result = conn.query("SELECT ...")
    return jsonify(result)  # Connection released when request ends
# If HTTP request takes 200ms, DB connection held 200ms
# With 1000 concurrent HTTP requests, need 1000 DB connections
```

### 35.3.3 Pool Mode: Transaction (Production Standard)

**Behavior**: Connection assigned only for duration of transaction; returned to pool on COMMIT/ROLLBACK.

```
BEGIN → Server Connection assigned → Query → COMMIT → Server Connection released
Next query → Different server connection may be used
```

**Characteristics**:
- Maximum efficiency: 100 DB connections can serve 10,000 application connections
- Prepared statements not supported across transactions (must be re-prepared)
- Temporary tables cleared after each transaction
- `SET` commands don't persist across queries

**Configuration**:
```ini
pool_mode = transaction
server_reset_query = DISCARD ALL  ; Clean session state between uses
```

**Application Adaptation Required**:
```python
# Bad: Assumes prepared statement persists
cur.execute("PREPARE get_user AS SELECT * FROM users WHERE id = $1")
cur.execute("EXECUTE get_user(123)")  # Works in session mode
cur.execute("EXECUTE get_user(456)")  # Fails in transaction mode (prepared statement gone)

# Good: Use parameterized queries (driver handles prepare/execute each time)
cur.execute("SELECT * FROM users WHERE id = %s", (123,))  # Safe for transaction mode
```

**Why This Is The Standard**:
- Most web applications use connection-per-request
- Transactions are short (milliseconds)
- Allows massive frontend connection scaling with modest DB resources

### 35.3.4 Pool Mode: Statement (Rarely Used)

**Behavior**: Connection returned to pool after every single statement.

```
Query 1 → Server Connection 1 → Return
Query 2 → Server Connection 2 (possibly different) → Return
```

**Characteristics**:
- Maximum churn
- No multi-statement transactions possible
- Breaks any session state

**Use Case**: Sharding proxies where each statement may route to different backend.

---

## 35.4 Advanced PgBouncer Patterns

### 35.4.1 Multiple Pool Sizes for Different Workloads

```ini
[databases]
; Production with different pool sizes for different user types
production = host=localhost port=5432 dbname=production pool_size=50
production_reporting = host=localhost port=5432 dbname=production pool_size=5
production_admin = host=localhost port=5432 dbname=production pool_size=2

[pgbouncer]
; Override per-database/user in auth_file or use connect string params
```

**Connection String Routing**:
```python
# Application routes to different pools based on operation type
READ_WRITE_URL = "postgresql://app_user@pgbouncer:6432/production"
REPORTING_URL = "postgresql://report_user@pgbouncer:6432/production_reporting"

def get_user(user_id):
    conn = psycopg2.connect(READ_WRITE_URL)  # Gets from 50-connection pool
    ...

def generate_report():
    conn = psycopg2.connect(REPORTING_URL)    # Gets from 5-connection pool
    # Long-running query can't starve main app
```

### 35.4.2 Handling Prepared Statements in Transaction Mode

Since prepared statements are session-bound and transaction mode rotates connections:

**Option 1**: Disable prepared statements in driver
```python
# SQLAlchemy
engine = create_engine(
    DATABASE_URL,
    connect_args={'prepare_threshold': None}  # Never use prepared statements
)

# Psycopg2
conn = psycopg2.connect(DATABASE_URL, options='-c plan_cache_mode=force_generic_plan')
```

**Option 2**: Use `server_reset_query` carefully
```ini
; Keep prepared statements in session mode, clear in transaction mode
pool_mode = transaction
server_reset_query = DISCARD ALL  ; Required to clear temp tables, prepared statements
server_reset_query_always = 0     ; Only run if pool mode is transaction
```

**Option 3**: Application-level statement cache
```python
# Cache query plans in application memory, not DB session
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_user_cached_query():
    return "SELECT * FROM users WHERE id = %s"
```

### 35.4.3 Failover and High Availability

PgBouncer itself becomes a single point of failure. Solutions:

**Option 1**: Multiple PgBouncer instances behind HAProxy/NLB
```text
Apps → HAProxy (round-robin) → [PgBouncer-1, PgBouncer-2] → PostgreSQL
```

**Option 2**: Sidecar pattern (PgBouncer on same host as application)
```text
App Server 1: [App + PgBouncer] → PostgreSQL
App Server 2: [App + PgBouncer] → PostgreSQL
```
- No network hop to pooler
- No central bottleneck
- Harder to manage configuration consistency

**Option 3**: DNS-based failover with `server_round_robin` or `server_lifetime` distribution

---

## 35.5 Timeout Configuration and Safety

### 35.5.1 Client-Side Timeouts

Prevent applications from hanging indefinitely:

```ini
[pgbouncer]
; How long client can be idle before disconnection
client_idle_timeout = 0          ; Never (let app control)
; OR
client_idle_timeout = 600        ; 10 minutes (kill idle apps)

; How long client can hold connection without querying
client_login_timeout = 60        ; Must complete auth in 60s

; Transaction duration limit (nuclear option)
query_timeout = 0                ; Disabled (prefer statement_timeout in PostgreSQL)
idle_transaction_timeout = 0     ; Disabled (prefer PostgreSQL's idle_in_transaction_session_timeout)
```

### 35.5.2 Server-Side Timeouts

Prevent pool exhaustion from slow queries:

```ini
; If query takes longer than this, connection is closed (use with caution)
query_timeout = 300              ; 5 minutes max query time

; How long to wait for connection to PostgreSQL to establish
server_connect_timeout = 15      ; Fail fast if DB is down

; How long to keep server connection in pool
server_lifetime = 3600           ; Recycle every hour (prevent memory fragmentation)

; Idle server connection cleanup
server_idle_timeout = 600        ; Close if unused for 10 minutes
```

### 35.5.3 PostgreSQL Coordination

Set corresponding timeouts in PostgreSQL to match PgBouncer:

```ini
# postgresql.conf
statement_timeout = 5min         # Kill queries running > 5 minutes
idle_in_transaction_session_timeout = 10min  # Kill idle transactions
tcp_keepalives_idle = 60         # Detect dead connections
tcp_keepalives_interval = 10
tcp_keepalives_count = 6
```

---

## 35.6 Monitoring and Troubleshooting

### 35.6.1 PgBouncer Statistics

```bash
# Connect to PgBouncer admin console
psql -p 6432 pgbouncer -c "SHOW STATS"

# Key metrics:
# - total_xact_count: Transactions processed
# - total_query_count: Queries processed  
# - total_received: Bytes from clients
# - total_sent: Bytes to clients
# - total_wait_time: Time clients waited for pool slots (should be near 0)

# Pool status
psql -p 6432 pgbouncer -c "SHOW POOLS"

# Columns:
# - cl_active: Client connections linked to server connection
# - cl_waiting: Clients waiting for free connection (BAD if > 0)
# - sv_active: Server connections in use
# - sv_idle: Server connections idle in pool
# - sv_used: Server connections just released (dirty)
# - sv_tested: Server connections running server_reset_query
# - sv_login: Server connections in process of logging in
```

**Alerting Thresholds**:
- `cl_waiting` > 10: Pool exhaustion imminent
- `sv_idle` = 0 and `cl_active` = `max_client_conn`: Increase `default_pool_size`
- `total_wait_time` increasing: Queries slower than arrival rate

### 35.6.2 Common Failure Modes

**"Too many connections" despite PgBouncer**:
```ini
; Check max_client_conn limit
max_client_conn = 10000          ; Must accommodate all app instances
; If 100 app servers × 100 connections each = 10,000 needed
```

**Prepared statement errors in transaction mode**:
```
ERROR: prepared statement "pgsql_123" does not exist
```
- Solution: Set `prepare_threshold=0` in driver or switch to session mode

**"Server conn crashed?"**:
- PostgreSQL restart killed backend
- PgBouncer will reconnect automatically, but in-flight transactions lost

**Memory leak in long-running connections**:
```ini
; Mitigation: Aggressive connection recycling
server_lifetime = 300            ; Recycle every 5 minutes
```

---

## 35.7 Alternative Poolers and Built-in Options

### 35.7.1 PgPool-II (Feature-rich but Complex)

PgPool-II provides pooling plus:
- Query caching (problematic with invalidation)
- Read/write splitting (sends SELECTs to standbys)
- Load balancing
- Connection limits

**Tradeoff**: More features but higher latency, more complex configuration, single point of failure without complex clustering.

### 35.7.2 Built-in Connection Pooling (PostgreSQL 14+)

PostgreSQL 14+ allows multiple queries to pipeline over single connection via `libpq` pipeline mode, but this is driver-level optimization, not true pooling.

### 35.7.3 Cloud Provider Poolers

- **Amazon RDS Proxy**: Managed pooling for RDS/Aurora
- **Google Cloud SQL Proxy**: Connection handling, not pooling (use PgBouncer alongside)
- **Azure PostgreSQL Flexible Server**: Built-in gateway handles some pooling

---

## Chapter Summary

In this chapter, you learned:

1. **Connection Architecture**: PostgreSQL's process-per-connection model consumes ~5-10 MB RAM per backend plus `work_mem` allocations. The practical limit is 500-1000 connections; beyond this requires pooling to prevent memory exhaustion and context-switching overhead.

2. **Pooling Modes**: 
   - **Session mode** maintains connection state (prepared statements, temp tables) but offers poor concurrency.
   - **Transaction mode** (industry standard) assigns connections only for transaction duration, allowing 10,000+ application connections to share 100 database connections. Requires `DISCARD ALL` for cleanup and breaks prepared statements across transactions.
   - **Statement mode** rotates per query—rarely used except for sharding proxies.

3. **PgBouncer Configuration**: Deploy between applications and database with `pool_mode = transaction`. Set `max_client_conn` high (10,000+) to accommodate all application instances, `default_pool_size` modest (20-50) based on database capacity, and `reserve_pool_size` for burst handling. Use `server_lifetime` to recycle connections and prevent memory fragmentation.

4. **Safety Timeouts**: Coordinate `statement_timeout` in PostgreSQL with `query_timeout` in PgBouncer. Set `server_connect_timeout` low for fast failure detection. Use `client_idle_timeout` to reap dead application connections but ensure it exceeds application-side pool idle timeout.

5. **Operational Patterns**: Use separate database aliases in PgBouncer (`production`, `production_reporting`) to isolate pool resources between OLTP and analytical workloads. Monitor `cl_waiting` in `SHOW POOLS`—any value above zero indicates pool exhaustion requiring size increases or query optimization.

**Next**: In Chapter 36, we will cover Scaling Strategies—exploring vertical scaling limits, read replica scaling with load balancing, connection pooling at scale, sharding fundamentals and tradeoffs, and caching layer integration with consistency considerations.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='34. logical_replication.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='36. scaling_strategies_what_actually_works.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
