SQLake Security Model

This document describes the security architecture, threat model, and mitigation strategies implemented in SQLake. It covers SQL injection prevention, R2 path traversal prevention, input validation at all boundaries, and the trust model for migrations.

Threat Model

SQLake operates in a Cloudflare Workers environment with the following trust boundaries:

Untrusted: User-supplied query parameters (values in sql tagged template literals)
Semi-trusted: Schema configuration (table names, column names) -- defined by developers but validated at runtime
Trusted (with verification): Migration manifests from R2 -- signed with SHA-256 checksums
Trusted: Internal CDC events (generated by SQLite triggers, never user-controlled)

SQL Injection Prevention

1. Parameterized Queries (Primary Defense)

The sql tagged template literal automatically converts interpolated values to parameterized placeholders. This is the primary defense against SQL injection:

// Safe -- values are parameterized via ? placeholders
const user = await sql`SELECT * FROM users WHERE id = ${userId}`.first()
// Generated: SELECT * FROM users WHERE id = ?
// Parameters: [userId]

Implementation in src/utils/build-sql.ts:

Template literal strings are concatenated as-is (they are developer-authored code)
Interpolated values become ? placeholders with values pushed to a params array
The only exception is TableRef objects, which are validated identifiers (see below)

2. Identifier Validation (Defense-in-Depth)

SQL identifiers (table names, column names) cannot be parameterized. SQLake validates all identifiers against a strict allowlist pattern before interpolation.

Pattern: ^[a-zA-Z_][a-zA-Z0-9_]*$

This rejects:

SQL keywords used as injection vectors (users; DROP TABLE)
Quote escaping attempts (users' OR '1'='1)
Empty strings, strings starting with numbers
Hyphens, dots, spaces, or any special characters

Validation call sites (all paths that interpolate identifiers into SQL):

Location	What is validated
`src/utils/build-sql.ts` `buildSQL()`	TableRef names in template literals
`src/query/builders/sql-builders.ts` `quoteIdentifier()`	All table/column names in ORM builders
`src/query/builders/sql-builders.ts` `buildWhereClause()`	WHERE column names
`src/query/builders/sql-builders.ts` `buildSetClause()`	SET column names
`src/query/builders/sql-builders.ts` `buildOnConflictClause()`	ON CONFLICT target columns
`src/query/builders/sql-builders.ts` `buildReturningClause()`	RETURNING column names
`src/query/builders/sql-builders.ts` `buildInsertSQL()`	Table name, column names
`src/query/builders/sql-builders.ts` `buildUpdateSQL()`	Table name
`src/query/builders/sql-builders.ts` `buildDeleteSQL()`	Table name
`src/do/cdc-manager.ts` `_createTableTriggers()`	Table name, shard key column
`src/do/cdc-manager.ts` `_jsonObjectArgs()`	All column names in json_object()
`src/utils/cdc.ts` `createCDCBufferTable()`	Buffer table name
`src/utils/cdc.ts` `createCDCTriggers()`	Table name, column names, row ID column, trigger prefix, buffer table name
`src/utils/cdc.ts` `dropCDCTriggers()`	Table name, trigger prefix

3. ORM Query Builders (Additional Safety)

The ORM-style builders (sql.insert, sql.update, sql.delete) use quoteIdentifier() which both validates AND double-quote-escapes identifiers:

export function quoteIdentifier(name: string, context?: string): string {
  validateSqlIdentifier(name, context)    // Rejects unsafe chars
  const escaped = name.replace(/"/g, '""') // Escape existing quotes
  return `"${escaped}"`                     // Wrap in double quotes
}

This provides defense-in-depth: even if the regex were to have a bypass, the quoting would prevent injection in most cases.

4. LIMIT Clause Parameterization

LIMIT values in UPDATE and DELETE builders are parameterized (not interpolated) to prevent injection through numeric values:

// In buildUpdateSQL and buildDeleteSQL:
sql += ` LIMIT ?`
params.push(limit)

Additionally, LIMIT values are validated as non-negative integers before use.

5. Delete Safety

The sql.delete() builder requires a WHERE clause by design. Attempting to execute a delete without a WHERE clause throws an error:

await sql.delete(db.users)               // Throws: "Delete requires a WHERE clause"
await sql.delete(db.users).where({ id }) // OK

R2 Path Security

Path Traversal Prevention

All R2 object key paths that include user-controlled or semi-trusted values use encodeURIComponent() to prevent path traversal attacks:

Function	Encoding
`src/r2/upload.ts` `generateCDCPath()`	`encodeURIComponent(shardId)`
`src/parquet/cdc-writer.ts` `cdcParquetPath()`	`encodeURIComponent(tableName)`, `encodeURIComponent(shardId)`

This prevents attacks such as:

shardId = "../../secrets" which would become ..%2F..%2Fsecrets
tableName = "../admin" which would become ..%2Fadmin

Fixed Path Structure

R2 paths follow deterministic structures that limit what can be written:

CDC data: _cdc/{encodedTable}/year={YYYY}/month={MM}/day={DD}/hour={HH}/{encodedShard}-{timestamp}.parquet
Upload data: data/{table}/_shard={encodedShardId}/cdc_{timestamp}_{seq}.parquet
Iceberg metadata: {tableLocation}/metadata/v{sequenceNumber}.metadata.json
Migration manifest: migrations/manifest.json (fixed path)

CDC Trigger Security

CDC triggers are SQL statements generated at runtime that execute automatically on INSERT, UPDATE, and DELETE. Because they contain dynamic identifiers, they require special validation.

Validation Chain

Table names from schema.tables keys are validated via validateSqlIdentifier()
Shard key column names from schema.getShardKey() are validated
All column names from table definitions are validated
Trigger prefix and buffer table name are validated

Trust Model

CDC trigger SQL is constructed server-side in the Durable Object using schema definitions that are developer-authored (not user-supplied). The validation provides defense-in-depth against:

Compromised or malformed schema objects
Future refactoring that might introduce untrusted input

Migration Security

Trust Model

Migration SQL is intentionally executed without parameterization because migrations contain DDL statements (CREATE TABLE, ALTER TABLE, etc.) that cannot be parameterized. The security model relies on:

Checksum verification: Each migration has a SHA-256 checksum verified before execution
Manifest integrity: The overall manifest has its own SHA-256 checksum
R2 source trust: Manifests are fetched from a configured R2 bucket (not user-supplied URLs)
Transaction safety: Each migration runs inside transactionSync() for atomicity

Risk: Semicolon Splitting

Migration SQL is split by semicolons to execute individual statements. This is a known limitation -- SQL strings containing semicolons in string literals could be incorrectly split. This is acceptable because:

Migrations are developer-authored, not user-supplied
The checksum ensures the migration content has not been tampered with

Input Validation at Boundaries

JSON Validation

All JSON parsed from external sources uses the runtime validation library (src/validation/index.ts):

Data Source	Validator
Migration manifest from R2	`parseMigrationManifest()`
Migration state from DB	`parseMigrationState()`
Iceberg manifest from R2	`parseIcebergManifest()`
Catalog state from R2	`parseCatalogState()`
Query request from HTTP	`parseQueryRequest()`
Local dev manifest	`parseLocalManifest()`
CDC buffer events	`parseCDCBufferEvent()`

Each validator returns either a typed result or a ValidationError -- they never throw, allowing callers to handle errors gracefully.

Configuration Validation

Runtime configuration is validated by src/validation/config.ts:

validateSQLakeConfig() validates storage mode, CDC flag, binding names
validateSQLakeDOOptions() validates schema structure, flush thresholds
validateCompactionOptions() validates file sizes, counts, ages

Variant Encoding Limits

The Parquet Variant encoder (src/parquet/cdc-writer.ts) enforces limits to prevent unbounded memory growth from malicious or deeply nested data:

Maximum dictionary size (unique field names)
Maximum recursion depth for nested objects/arrays

Exceeding these limits throws a VariantEncodingLimitError.

Security Audit Summary

Audited Query Paths

All paths that construct SQL have been verified to use either parameterized queries or validated identifiers:

sql template literal -- parameterized values, validated TableRef names
ORM builders (insert/update/delete) -- quoteIdentifier() for all identifiers, ? for all values
CDC triggers -- validateSqlIdentifier() on all interpolated identifiers
CDC utility functions -- validateSqlIdentifier() on table, column, trigger, and buffer names
Migration system -- trusted SQL with checksum verification
Internal queries (CDC buffer reads, dead-letter operations) -- parameterized with ?

Audited R2 Paths

All R2 key construction sites have been verified:

generateCDCPath() -- encodeURIComponent(shardId)
cdcParquetPath() -- encodeURIComponent(tableName), encodeURIComponent(shardId)
Iceberg paths -- constructed from tableLocation (developer config) + sequence numbers
Migration manifest -- fixed path migrations/manifest.json

Known Accepted Risks

Migration SQL execution: Raw SQL execution is by design, mitigated by checksum verification
Semicolon splitting in migrations: Could misparse SQL with semicolons in strings, but migrations are trusted code
Schema-defined identifiers: Table/column names from developer schema are validated but ultimately developer-controlled

Best Practices

Use the sql tagged template for all queries -- never construct SQL strings manually
Use ${db.tableName} references -- never interpolate string table names
Prefer ORM builders for INSERT/UPDATE/DELETE operations
Validate all external JSON using the validation library before use
Never disable checksum verification for migrations in production
Review R2 key construction when adding new upload paths

Reporting Security Issues

If you discover a security vulnerability in SQLake, please report it responsibly through the appropriate channel. Do not open a public issue for security vulnerabilities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security

docs/security.md

SQLake Security Model

Threat Model

SQL Injection Prevention

1. Parameterized Queries (Primary Defense)

2. Identifier Validation (Defense-in-Depth)

3. ORM Query Builders (Additional Safety)

4. LIMIT Clause Parameterization

5. Delete Safety

R2 Path Security

Path Traversal Prevention

Fixed Path Structure

CDC Trigger Security

Validation Chain

Trust Model

Migration Security

Trust Model

Risk: Semicolon Splitting

Input Validation at Boundaries

JSON Validation

Configuration Validation

Variant Encoding Limits

Security Audit Summary

Audited Query Paths

Audited R2 Paths

Known Accepted Risks

Best Practices

Reporting Security Issues

There aren’t any published security advisories

Security: dot-do/sqlake

Security

docs/security.md

SQLake Security Model

Threat Model

SQL Injection Prevention

1. Parameterized Queries (Primary Defense)

2. Identifier Validation (Defense-in-Depth)

3. ORM Query Builders (Additional Safety)

4. LIMIT Clause Parameterization

5. Delete Safety

R2 Path Security

Path Traversal Prevention

Fixed Path Structure

CDC Trigger Security

Validation Chain

Trust Model

Migration Security

Trust Model

Risk: Semicolon Splitting

Input Validation at Boundaries

JSON Validation

Configuration Validation

Variant Encoding Limits

Security Audit Summary

Audited Query Paths

Audited R2 Paths

Known Accepted Risks

Best Practices

Reporting Security Issues

There aren’t any published security advisories