Skip to content

Security: dot-do/sqlake

Security

docs/security.md

SQLake Security Model

This document describes the security architecture, threat model, and mitigation strategies implemented in SQLake. It covers SQL injection prevention, R2 path traversal prevention, input validation at all boundaries, and the trust model for migrations.

Threat Model

SQLake operates in a Cloudflare Workers environment with the following trust boundaries:

  1. Untrusted: User-supplied query parameters (values in sql tagged template literals)
  2. Semi-trusted: Schema configuration (table names, column names) -- defined by developers but validated at runtime
  3. Trusted (with verification): Migration manifests from R2 -- signed with SHA-256 checksums
  4. Trusted: Internal CDC events (generated by SQLite triggers, never user-controlled)

SQL Injection Prevention

1. Parameterized Queries (Primary Defense)

The sql tagged template literal automatically converts interpolated values to parameterized placeholders. This is the primary defense against SQL injection:

// Safe -- values are parameterized via ? placeholders
const user = await sql`SELECT * FROM users WHERE id = ${userId}`.first()
// Generated: SELECT * FROM users WHERE id = ?
// Parameters: [userId]

Implementation in src/utils/build-sql.ts:

  • Template literal strings are concatenated as-is (they are developer-authored code)
  • Interpolated values become ? placeholders with values pushed to a params array
  • The only exception is TableRef objects, which are validated identifiers (see below)

2. Identifier Validation (Defense-in-Depth)

SQL identifiers (table names, column names) cannot be parameterized. SQLake validates all identifiers against a strict allowlist pattern before interpolation.

Pattern: ^[a-zA-Z_][a-zA-Z0-9_]*$

This rejects:

  • SQL keywords used as injection vectors (users; DROP TABLE)
  • Quote escaping attempts (users' OR '1'='1)
  • Empty strings, strings starting with numbers
  • Hyphens, dots, spaces, or any special characters

Validation call sites (all paths that interpolate identifiers into SQL):

Location What is validated
src/utils/build-sql.ts buildSQL() TableRef names in template literals
src/query/builders/sql-builders.ts quoteIdentifier() All table/column names in ORM builders
src/query/builders/sql-builders.ts buildWhereClause() WHERE column names
src/query/builders/sql-builders.ts buildSetClause() SET column names
src/query/builders/sql-builders.ts buildOnConflictClause() ON CONFLICT target columns
src/query/builders/sql-builders.ts buildReturningClause() RETURNING column names
src/query/builders/sql-builders.ts buildInsertSQL() Table name, column names
src/query/builders/sql-builders.ts buildUpdateSQL() Table name
src/query/builders/sql-builders.ts buildDeleteSQL() Table name
src/do/cdc-manager.ts _createTableTriggers() Table name, shard key column
src/do/cdc-manager.ts _jsonObjectArgs() All column names in json_object()
src/utils/cdc.ts createCDCBufferTable() Buffer table name
src/utils/cdc.ts createCDCTriggers() Table name, column names, row ID column, trigger prefix, buffer table name
src/utils/cdc.ts dropCDCTriggers() Table name, trigger prefix

3. ORM Query Builders (Additional Safety)

The ORM-style builders (sql.insert, sql.update, sql.delete) use quoteIdentifier() which both validates AND double-quote-escapes identifiers:

export function quoteIdentifier(name: string, context?: string): string {
  validateSqlIdentifier(name, context)    // Rejects unsafe chars
  const escaped = name.replace(/"/g, '""') // Escape existing quotes
  return `"${escaped}"`                     // Wrap in double quotes
}

This provides defense-in-depth: even if the regex were to have a bypass, the quoting would prevent injection in most cases.

4. LIMIT Clause Parameterization

LIMIT values in UPDATE and DELETE builders are parameterized (not interpolated) to prevent injection through numeric values:

// In buildUpdateSQL and buildDeleteSQL:
sql += ` LIMIT ?`
params.push(limit)

Additionally, LIMIT values are validated as non-negative integers before use.

5. Delete Safety

The sql.delete() builder requires a WHERE clause by design. Attempting to execute a delete without a WHERE clause throws an error:

await sql.delete(db.users)               // Throws: "Delete requires a WHERE clause"
await sql.delete(db.users).where({ id }) // OK

R2 Path Security

Path Traversal Prevention

All R2 object key paths that include user-controlled or semi-trusted values use encodeURIComponent() to prevent path traversal attacks:

Function Encoding
src/r2/upload.ts generateCDCPath() encodeURIComponent(shardId)
src/parquet/cdc-writer.ts cdcParquetPath() encodeURIComponent(tableName), encodeURIComponent(shardId)

This prevents attacks such as:

  • shardId = "../../secrets" which would become ..%2F..%2Fsecrets
  • tableName = "../admin" which would become ..%2Fadmin

Fixed Path Structure

R2 paths follow deterministic structures that limit what can be written:

  • CDC data: _cdc/{encodedTable}/year={YYYY}/month={MM}/day={DD}/hour={HH}/{encodedShard}-{timestamp}.parquet
  • Upload data: data/{table}/_shard={encodedShardId}/cdc_{timestamp}_{seq}.parquet
  • Iceberg metadata: {tableLocation}/metadata/v{sequenceNumber}.metadata.json
  • Migration manifest: migrations/manifest.json (fixed path)

CDC Trigger Security

CDC triggers are SQL statements generated at runtime that execute automatically on INSERT, UPDATE, and DELETE. Because they contain dynamic identifiers, they require special validation.

Validation Chain

  1. Table names from schema.tables keys are validated via validateSqlIdentifier()
  2. Shard key column names from schema.getShardKey() are validated
  3. All column names from table definitions are validated
  4. Trigger prefix and buffer table name are validated

Trust Model

CDC trigger SQL is constructed server-side in the Durable Object using schema definitions that are developer-authored (not user-supplied). The validation provides defense-in-depth against:

  • Compromised or malformed schema objects
  • Future refactoring that might introduce untrusted input

Migration Security

Trust Model

Migration SQL is intentionally executed without parameterization because migrations contain DDL statements (CREATE TABLE, ALTER TABLE, etc.) that cannot be parameterized. The security model relies on:

  1. Checksum verification: Each migration has a SHA-256 checksum verified before execution
  2. Manifest integrity: The overall manifest has its own SHA-256 checksum
  3. R2 source trust: Manifests are fetched from a configured R2 bucket (not user-supplied URLs)
  4. Transaction safety: Each migration runs inside transactionSync() for atomicity

Risk: Semicolon Splitting

Migration SQL is split by semicolons to execute individual statements. This is a known limitation -- SQL strings containing semicolons in string literals could be incorrectly split. This is acceptable because:

  • Migrations are developer-authored, not user-supplied
  • The checksum ensures the migration content has not been tampered with

Input Validation at Boundaries

JSON Validation

All JSON parsed from external sources uses the runtime validation library (src/validation/index.ts):

Data Source Validator
Migration manifest from R2 parseMigrationManifest()
Migration state from DB parseMigrationState()
Iceberg manifest from R2 parseIcebergManifest()
Catalog state from R2 parseCatalogState()
Query request from HTTP parseQueryRequest()
Local dev manifest parseLocalManifest()
CDC buffer events parseCDCBufferEvent()

Each validator returns either a typed result or a ValidationError -- they never throw, allowing callers to handle errors gracefully.

Configuration Validation

Runtime configuration is validated by src/validation/config.ts:

  • validateSQLakeConfig() validates storage mode, CDC flag, binding names
  • validateSQLakeDOOptions() validates schema structure, flush thresholds
  • validateCompactionOptions() validates file sizes, counts, ages

Variant Encoding Limits

The Parquet Variant encoder (src/parquet/cdc-writer.ts) enforces limits to prevent unbounded memory growth from malicious or deeply nested data:

  • Maximum dictionary size (unique field names)
  • Maximum recursion depth for nested objects/arrays

Exceeding these limits throws a VariantEncodingLimitError.

Security Audit Summary

Audited Query Paths

All paths that construct SQL have been verified to use either parameterized queries or validated identifiers:

  1. sql template literal -- parameterized values, validated TableRef names
  2. ORM builders (insert/update/delete) -- quoteIdentifier() for all identifiers, ? for all values
  3. CDC triggers -- validateSqlIdentifier() on all interpolated identifiers
  4. CDC utility functions -- validateSqlIdentifier() on table, column, trigger, and buffer names
  5. Migration system -- trusted SQL with checksum verification
  6. Internal queries (CDC buffer reads, dead-letter operations) -- parameterized with ?

Audited R2 Paths

All R2 key construction sites have been verified:

  1. generateCDCPath() -- encodeURIComponent(shardId)
  2. cdcParquetPath() -- encodeURIComponent(tableName), encodeURIComponent(shardId)
  3. Iceberg paths -- constructed from tableLocation (developer config) + sequence numbers
  4. Migration manifest -- fixed path migrations/manifest.json

Known Accepted Risks

  1. Migration SQL execution: Raw SQL execution is by design, mitigated by checksum verification
  2. Semicolon splitting in migrations: Could misparse SQL with semicolons in strings, but migrations are trusted code
  3. Schema-defined identifiers: Table/column names from developer schema are validated but ultimately developer-controlled

Best Practices

  1. Use the sql tagged template for all queries -- never construct SQL strings manually
  2. Use ${db.tableName} references -- never interpolate string table names
  3. Prefer ORM builders for INSERT/UPDATE/DELETE operations
  4. Validate all external JSON using the validation library before use
  5. Never disable checksum verification for migrations in production
  6. Review R2 key construction when adding new upload paths

Reporting Security Issues

If you discover a security vulnerability in SQLake, please report it responsibly through the appropriate channel. Do not open a public issue for security vulnerabilities.

There aren’t any published security advisories