This document describes the security architecture, threat model, and mitigation strategies implemented in SQLake. It covers SQL injection prevention, R2 path traversal prevention, input validation at all boundaries, and the trust model for migrations.
SQLake operates in a Cloudflare Workers environment with the following trust boundaries:
- Untrusted: User-supplied query parameters (values in
sqltagged template literals) - Semi-trusted: Schema configuration (table names, column names) -- defined by developers but validated at runtime
- Trusted (with verification): Migration manifests from R2 -- signed with SHA-256 checksums
- Trusted: Internal CDC events (generated by SQLite triggers, never user-controlled)
The sql tagged template literal automatically converts interpolated values to parameterized placeholders. This is the primary defense against SQL injection:
// Safe -- values are parameterized via ? placeholders
const user = await sql`SELECT * FROM users WHERE id = ${userId}`.first()
// Generated: SELECT * FROM users WHERE id = ?
// Parameters: [userId]Implementation in src/utils/build-sql.ts:
- Template literal strings are concatenated as-is (they are developer-authored code)
- Interpolated values become
?placeholders with values pushed to a params array - The only exception is
TableRefobjects, which are validated identifiers (see below)
SQL identifiers (table names, column names) cannot be parameterized. SQLake validates all identifiers against a strict allowlist pattern before interpolation.
Pattern: ^[a-zA-Z_][a-zA-Z0-9_]*$
This rejects:
- SQL keywords used as injection vectors (
users; DROP TABLE) - Quote escaping attempts (
users' OR '1'='1) - Empty strings, strings starting with numbers
- Hyphens, dots, spaces, or any special characters
Validation call sites (all paths that interpolate identifiers into SQL):
| Location | What is validated |
|---|---|
src/utils/build-sql.ts buildSQL() |
TableRef names in template literals |
src/query/builders/sql-builders.ts quoteIdentifier() |
All table/column names in ORM builders |
src/query/builders/sql-builders.ts buildWhereClause() |
WHERE column names |
src/query/builders/sql-builders.ts buildSetClause() |
SET column names |
src/query/builders/sql-builders.ts buildOnConflictClause() |
ON CONFLICT target columns |
src/query/builders/sql-builders.ts buildReturningClause() |
RETURNING column names |
src/query/builders/sql-builders.ts buildInsertSQL() |
Table name, column names |
src/query/builders/sql-builders.ts buildUpdateSQL() |
Table name |
src/query/builders/sql-builders.ts buildDeleteSQL() |
Table name |
src/do/cdc-manager.ts _createTableTriggers() |
Table name, shard key column |
src/do/cdc-manager.ts _jsonObjectArgs() |
All column names in json_object() |
src/utils/cdc.ts createCDCBufferTable() |
Buffer table name |
src/utils/cdc.ts createCDCTriggers() |
Table name, column names, row ID column, trigger prefix, buffer table name |
src/utils/cdc.ts dropCDCTriggers() |
Table name, trigger prefix |
The ORM-style builders (sql.insert, sql.update, sql.delete) use quoteIdentifier() which both validates AND double-quote-escapes identifiers:
export function quoteIdentifier(name: string, context?: string): string {
validateSqlIdentifier(name, context) // Rejects unsafe chars
const escaped = name.replace(/"/g, '""') // Escape existing quotes
return `"${escaped}"` // Wrap in double quotes
}This provides defense-in-depth: even if the regex were to have a bypass, the quoting would prevent injection in most cases.
LIMIT values in UPDATE and DELETE builders are parameterized (not interpolated) to prevent injection through numeric values:
// In buildUpdateSQL and buildDeleteSQL:
sql += ` LIMIT ?`
params.push(limit)Additionally, LIMIT values are validated as non-negative integers before use.
The sql.delete() builder requires a WHERE clause by design. Attempting to execute a delete without a WHERE clause throws an error:
await sql.delete(db.users) // Throws: "Delete requires a WHERE clause"
await sql.delete(db.users).where({ id }) // OKAll R2 object key paths that include user-controlled or semi-trusted values use encodeURIComponent() to prevent path traversal attacks:
| Function | Encoding |
|---|---|
src/r2/upload.ts generateCDCPath() |
encodeURIComponent(shardId) |
src/parquet/cdc-writer.ts cdcParquetPath() |
encodeURIComponent(tableName), encodeURIComponent(shardId) |
This prevents attacks such as:
shardId = "../../secrets"which would become..%2F..%2FsecretstableName = "../admin"which would become..%2Fadmin
R2 paths follow deterministic structures that limit what can be written:
- CDC data:
_cdc/{encodedTable}/year={YYYY}/month={MM}/day={DD}/hour={HH}/{encodedShard}-{timestamp}.parquet - Upload data:
data/{table}/_shard={encodedShardId}/cdc_{timestamp}_{seq}.parquet - Iceberg metadata:
{tableLocation}/metadata/v{sequenceNumber}.metadata.json - Migration manifest:
migrations/manifest.json(fixed path)
CDC triggers are SQL statements generated at runtime that execute automatically on INSERT, UPDATE, and DELETE. Because they contain dynamic identifiers, they require special validation.
- Table names from
schema.tableskeys are validated viavalidateSqlIdentifier() - Shard key column names from
schema.getShardKey()are validated - All column names from table definitions are validated
- Trigger prefix and buffer table name are validated
CDC trigger SQL is constructed server-side in the Durable Object using schema definitions that are developer-authored (not user-supplied). The validation provides defense-in-depth against:
- Compromised or malformed schema objects
- Future refactoring that might introduce untrusted input
Migration SQL is intentionally executed without parameterization because migrations contain DDL statements (CREATE TABLE, ALTER TABLE, etc.) that cannot be parameterized. The security model relies on:
- Checksum verification: Each migration has a SHA-256 checksum verified before execution
- Manifest integrity: The overall manifest has its own SHA-256 checksum
- R2 source trust: Manifests are fetched from a configured R2 bucket (not user-supplied URLs)
- Transaction safety: Each migration runs inside
transactionSync()for atomicity
Migration SQL is split by semicolons to execute individual statements. This is a known limitation -- SQL strings containing semicolons in string literals could be incorrectly split. This is acceptable because:
- Migrations are developer-authored, not user-supplied
- The checksum ensures the migration content has not been tampered with
All JSON parsed from external sources uses the runtime validation library (src/validation/index.ts):
| Data Source | Validator |
|---|---|
| Migration manifest from R2 | parseMigrationManifest() |
| Migration state from DB | parseMigrationState() |
| Iceberg manifest from R2 | parseIcebergManifest() |
| Catalog state from R2 | parseCatalogState() |
| Query request from HTTP | parseQueryRequest() |
| Local dev manifest | parseLocalManifest() |
| CDC buffer events | parseCDCBufferEvent() |
Each validator returns either a typed result or a ValidationError -- they never throw, allowing callers to handle errors gracefully.
Runtime configuration is validated by src/validation/config.ts:
validateSQLakeConfig()validates storage mode, CDC flag, binding namesvalidateSQLakeDOOptions()validates schema structure, flush thresholdsvalidateCompactionOptions()validates file sizes, counts, ages
The Parquet Variant encoder (src/parquet/cdc-writer.ts) enforces limits to prevent unbounded memory growth from malicious or deeply nested data:
- Maximum dictionary size (unique field names)
- Maximum recursion depth for nested objects/arrays
Exceeding these limits throws a VariantEncodingLimitError.
All paths that construct SQL have been verified to use either parameterized queries or validated identifiers:
sqltemplate literal -- parameterized values, validated TableRef names- ORM builders (insert/update/delete) --
quoteIdentifier()for all identifiers,?for all values - CDC triggers --
validateSqlIdentifier()on all interpolated identifiers - CDC utility functions --
validateSqlIdentifier()on table, column, trigger, and buffer names - Migration system -- trusted SQL with checksum verification
- Internal queries (CDC buffer reads, dead-letter operations) -- parameterized with
?
All R2 key construction sites have been verified:
generateCDCPath()--encodeURIComponent(shardId)cdcParquetPath()--encodeURIComponent(tableName),encodeURIComponent(shardId)- Iceberg paths -- constructed from
tableLocation(developer config) + sequence numbers - Migration manifest -- fixed path
migrations/manifest.json
- Migration SQL execution: Raw SQL execution is by design, mitigated by checksum verification
- Semicolon splitting in migrations: Could misparse SQL with semicolons in strings, but migrations are trusted code
- Schema-defined identifiers: Table/column names from developer schema are validated but ultimately developer-controlled
- Use the
sqltagged template for all queries -- never construct SQL strings manually - Use
${db.tableName}references -- never interpolate string table names - Prefer ORM builders for INSERT/UPDATE/DELETE operations
- Validate all external JSON using the validation library before use
- Never disable checksum verification for migrations in production
- Review R2 key construction when adding new upload paths
If you discover a security vulnerability in SQLake, please report it responsibly through the appropriate channel. Do not open a public issue for security vulnerabilities.