fix(core): parameterize SQL across sampler and branching layers by faculopezscala · Pull Request #1 · Bugsterapp/sow

faculopezscala · 2026-04-06T07:50:21Z

Summary

Closes a SQL injection vulnerability in the database sampler and extends the same fix to six related sites in the branching layer.

The bug. sow connect runs a referential-integrity pass that builds SQL WHERE clauses by string-interpolating primary-key values from sampled source rows. Two concrete failure modes:

Benign: a text PK containing a single quote (e.g. "O'Brien") produced invalid SQL, caught by an empty catch {}, and silently dropped. Users got a branch with dangling foreign keys and never knew.
Security: sow's entire pitch is "safe for your production DB." A crafted PK value like '; DROP TABLE x; -- would execute against the source DB under whatever permissions the connection string has. Read-only in intent, not in SQL effect.

Scope expansion. A specialist review during /ship surfaced the same class of bug in several other files. Rather than ship a narrow fix that left five more injection sinks on main, the PR was expanded to close them all.

Commits (bisectable):

63ef8d8 — Parameterize the sampler. DatabaseAdapter.query() accepted a params argument in its interface but silently dropped it (_params?: unknown[]) and called sql.unsafe(sqlStr) with no binding — the interface was a lie. Now it actually passes params through to postgres@3's sql.unsafe(query, parameters). Three dynamic-SQL call sites in sampler/referential.ts rewritten to use $1, $2, ... placeholders. 9 regression tests including the O'Brien single-quote crash.
e0a724d — Parameterize the branching layer.
- Extract quoteIdent() to packages/core/src/sql/identifiers.ts as a shared module.
- branching/manager.ts:getBranchSample — the table argument comes directly from sow branch sample <branch> <table> (user argv) AND the sow_branch_sample MCP tool (AI agent input). Now quoted via quoteIdent; limit bound via $1 after numeric clamping.
- branching/providers/supabase.ts:fetchAuthUserMappings — the IN (...) clause was built by single-quote-concatenating user ids from sampled source data. Now uses $1, $2, ... placeholders.
- branching/supabase.ts RLS setup — 8 sites interpolated tableName into DDL and information_schema queries. Values go through $1 binds; identifiers through quoteIdent.
- branching/supabase.ts createTestAuthUsers — user.id and user.email were interpolated into DELETE and INSERT statements inside single-quoted literals. All bound now.
11e0615 — Adversarial review follow-ups (Claude subagent).
- quoteIdent throws on empty identifier and NUL byte.
- Dropped the re-export of quoteIdent from referential.ts so tests import from the canonical module.
- getBranchSample limit clamping uses an explicit Number.isFinite check so unset/NaN falls back to the documented default of 5 instead of silently returning zero rows.
- branching/supabase.ts RLS introspection hoisted outside the DDL try block to prevent a transient read error from landing in the policy-disable fallback path (a real fail-open security vector).
08eebb1 — Codex adversarial follow-ups.
- fetchAuthUserMappings now batches ids (1000 per query) so large full-copy snapshots don't hit Postgres's 65,535 bind-param limit, AND filters by UUID regex before binding so one non-UUID value doesn't poison the whole batch via cast error. A single batch failure no longer nukes the other batches.
- branching/supabase.ts RLS loop has a per-table introspection try block that continues on error (fail-safe: leaves RLS enabled) rather than aborting the whole loadIntoSupabase run. The DDL try block still has the dev-sandbox disable-RLS-on-policy-failure fallback.
- getBranchSample clamp fixed to [0, 100] (was [1, 100] — LIMIT 0 is legal SQL and a valid caller intent).

Test Coverage

Tests: 79 → 89 (+10 new) — 7 for quoteIdent (including throw cases for empty + NUL byte), 3 regression tests for the sampler SQL shape + params.

CODE PATH COVERAGE
===========================
[M] packages/core/src/sql/identifiers.ts — quoteIdent
    ├── [★★★ TESTED] simple identifier
    ├── [★★★ TESTED] underscored column
    ├── [★★★ TESTED] embedded double quote escape
    ├── [★★★ TESTED] multiple embedded quotes
    ├── [★★★ TESTED] hostile payload defense
    ├── [★★★ TESTED] throws on empty string
    └── [★★★ TESTED] throws on NUL byte

[M] packages/core/src/sampler/referential.ts
    ├── [★★★ TESTED] single-col PK with single quote (O'Brien regression)
    ├── [★★★ TESTED] composite FK → $1 + $2 placeholders
    └── [★★★ TESTED] hostile payload stays in params, not in SQL text

[M] packages/core/src/adapters/postgres.ts — query() passthrough
    └── [GAP]        Direct unit test of the new params branch.
                     Covered indirectly by every sampler test via SpyAdapter.
                     Full integration lives in __integration__ (needs live PG).

[M] packages/core/src/branching/manager.ts:getBranchSample
    └── [GAP]        Function creates a live postgres client; unit-testing
                     requires refactor. Existing integration test covers it.

[M] packages/core/src/branching/providers/supabase.ts:fetchAuthUserMappings
    └── [GAP]        Module-private; tested indirectly via supabase integration.

[M] packages/core/src/branching/supabase.ts RLS + auth setup
    └── [GAP]        Uses raw postgres() client. Integration test in
                     __integration__ covers the end-to-end path.

─────────────────────────────────
COVERAGE: 9/13 tested ≈ 69% (above 60% minimum, below 80% target)
─────────────────────────────────

The four untested paths all share a common shape: they use quoteIdent/parameterized queries in the same pattern as the fully-tested sampler path, and quoteIdent itself is exhaustively tested.

Pre-Landing Review

Specialist review (4 specialists):

Testing: 7 findings, 0 critical. Suggestions around SpyAdapter fragility and edge cases. Below the fix bar.
Maintainability: 9 findings, 0 critical. DRY suggestions, moving quoteIdent to a shared module (done).
Security: 6 CRITICAL findings — all addressed in commits e0a724d and 11e0615. This is why the PR expanded from 3 files to 7.
Performance: 5 INFORMATIONAL findings, all net-neutral or net-positive. Parameterized queries enable plan caching.

PR Quality Score: 7.0/10

Adversarial Review

Two rounds of adversarial review from both Claude and Codex:

Round 1 — Claude adversarial subagent: 15 findings. Actionable items (4) addressed in commit 11e0615.

Round 2 — Codex adversarial + structured review: 5 findings. 3 actionable (batching, UUID filter, per-table scope) addressed in 08eebb1. 2 pre-existing issues documented below.

Codex structured review P1 gate: PASS. No remaining P1 issues.

Scope Drift

Scope Check: CLEAN (with deliberate expansion). The PR started as a narrow sampler fix and deliberately expanded to close the entire SQL injection category after the specialist review surfaced 5 more sites of the same class. Every file changed is directly related to the security fix.

Plan Completion

This PR implements Issue #2 of the 11-issue plan from /plan-eng-review. Issues #1 (template-DB reset), #3 (sanitizer warnings), #4 (JSONB), #5 (sow sandbox), #6 (env-patch), #7 (dynamic skip list), #8 (N+1 batching in sampler), #9 (fail-closed sanitization), #10 (Postgres type coverage), and #11 (release workflow refactor) ship as separate PRs per the plan's parallelization lanes.

Known Non-Blocking (documented, deferred)

These surfaced during adversarial review but are pre-existing and not introduced by this PR:

keyStr.split("|") fragility in the formal-FK path of sampler/referential.ts. A primary key value containing a literal | is silently mis-split and the row is dropped.
fkValue.includes("null") bug in the missing-parent detection. A legitimate key containing the substring "null" is falsely treated as a NULL FK.
String(parent[col] ?? "") NULL→"" coercion in the ensure-1-child-per-parent path. A NULL parent key becomes '' and can match empty-string children.
Pre-existing lint debt in sanitizer/rules.ts (regex escapes) and sanitizer/transformer.ts (unused import).

Test plan

bunx vitest run — 89/89 tests passing (79 pre-existing + 10 new)
bunx turbo build — all 3 packages build cleanly
bunx eslint on every touched file — clean
Post-merge: verify auto-published v0.1.14 artifact on npm
Post-merge: run sow branch sample <branch> 'test"name' to manually verify the quoteIdent path

🤖 Generated with Claude Code

The referential-integrity sampler built WHERE clauses by string-concatenating primary key values taken from sampled source rows. Two failure modes: 1. Benign: a text PK containing a single quote (e.g. "O'Brien") produced invalid SQL, which was then caught by an empty catch{} and silently dropped. Users got a branch with dangling foreign keys and never knew. 2. Security: sow's entire pitch is "safe for your production database". A crafted PK value like "'; DROP TABLE x; --" would execute against the source DB under whatever permissions the connection string had. Read-only in intent, not in SQL effect. Changes: - adapters/postgres.ts: the query() method accepted a `params` argument in the interface but silently dropped it (`_params?: unknown[]`) and called sql.unsafe(sqlStr) with no binding. Now it passes params through when present. The interface finally matches the runtime behavior. - sampler/referential.ts: all three dynamic-SQL call sites now use $1,$2,... placeholders and pass values via the params array. Added a quoteIdent() helper that escapes double quotes in table/column names per the SQL standard as defense in depth for the catalog names. - sampler/referential.test.ts: 9 new tests covering quoteIdent, a regression test for the O'Brien single-quote crash, composite FK placeholders, and the hostile-payload case. Uses a spying adapter to assert that values flow through params and never into SQL text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Follow-up to the sampler fix. A /ship specialist review flagged that the same class of string-interpolated SQL existed in the branching layer and on paths that receive user/agent input. Five more sites closed. Changes: - sql/identifiers.ts (new): extract quoteIdent to a shared module so it can be used by any file that builds dynamic SQL. The sampler re-exports it so the prior test imports still resolve. - branching/manager.ts:473 getBranchSample — the `table` argument comes directly from `sow branch sample <branch> <table>` (user argv) and from the `sow_branch_sample` MCP tool (agent-supplied). It was interpolated into sql.unsafe unsanitized. Now quoted via quoteIdent, and the limit is bound via $1 after numeric clamping. - branching/providers/supabase.ts:55 fetchAuthUserMappings — the IN(...) clause was built by single-quote-concatenating user ids from sampled source data. Now uses $1,$2,... placeholders with userIds as the params array to adapter.query(). - branching/supabase.ts RLS setup block — 8 sites interpolating tableName into DDL and information_schema WHERE clauses. tableName values without quoted identifier escaping could break out of the wrapping quotes. Now uses quoteIdent for all identifier positions and $1 bind params for all value positions in the information_schema lookups. - branching/supabase.ts createTestAuthUsers — user.id and user.email were interpolated into DELETE and INSERT statements inside single-quoted literals. Now all three call sites (delete-by-id, insert auth.users row, insert matching identity) bind values via $1,$2 placeholders. The `gen_random_uuid()` SQL function literal path is split from the parameterized-id path to keep the query text honest about which part is value vs SQL. All fixes follow the same pattern: quoteIdent for identifiers (which Postgres cannot parameterize), $N placeholders for values. Testing: - Full unit suite green (88/88). - Build clean across all three packages. - Lint clean on every touched file. - The sampler regression tests from the previous commit still pass (quoteIdent is re-exported via the sampler module). - Integration tests against the bugsterdb schema in __integration__ exercise the RLS and auth-user paths; they require a live Postgres and are NOT run here (separate vitest config). Future work: add spy-adapter tests for fetchAuthUserMappings specifically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

An adversarial review pass on the fix/sql-injection-sampler branch surfaced four actionable issues. Addressed here. 1. quoteIdent hardening: - Throws on empty identifier (would have produced the invalid SQL token `""` and errored at execution time far from the call site). - Throws on embedded NUL byte (libpq rejects these at protocol level with a confusing error; fail loudly here instead). 2. Kill the quoteIdent re-export from sampler/referential.ts. The test file now imports quoteIdent directly from ../sql/identifiers.js so there is one canonical import path and tests cannot silently diverge from production. 3. branching/manager.ts getBranchSample limit clamping: previously used `limit | 0` which coerced undefined/NaN to 0 and silently returned empty result sets. Now uses an explicit Number.isFinite check and falls back to the documented default of 5, then clamps to [1, 100]. Restores pre-fix semantics for edge inputs without reintroducing any injection surface (the clamped value is still bound via $1). 4. branching/supabase.ts RLS setup: hoisted the two information_schema lookups OUTSIDE the try/catch block. Previously, a transient read error during introspection could end up in the fallback path that DISABLES row level security on the table — a real security regression vector even though the pre-fix code had the same shape. Now the try/catch only wraps the DDL that the fallback is actually designed to recover from (ENABLE RLS + CREATE POLICY). Tests: - 89/89 unit tests passing (up from 88; quoteIdent empty-string test flipped to two throw tests). - Build clean. - Lint clean on every touched file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…table RLS scope A second adversarial pass (Codex) surfaced three more issues. Addressed. 1. fetchAuthUserMappings: batch and UUID-filter before querying. - Batch size 1000 keeps us well under Postgres's 65,535 bind-param limit for full-copy snapshots with many auth users. - Filter userIds by UUID_RE before binding: the upstream heuristic in extractUserIds() only checks `length > 10`, so a non-UUID value reaching the IN clause would cause a cast error that the broad catch swallowed, silently dropping ALL mappings. Now we skip non-UUID ids before the query runs. - A single batch failure no longer nukes the whole mapping set — other batches still contribute their results. 2. supabase.ts RLS loop: per-table introspection scope. - The previous fix hoisted info_schema lookups outside the try to avoid the RLS-disable fallback firing on transient reads. But that made a catalog blip abort the entire loadIntoSupabase run. - New shape: two separate try blocks per table. (a) Introspection try: if this fails, `continue` to next table without touching RLS. Fail-safe: unconfigured tables remain locked, not disabled. (b) DDL try: if ENABLE RLS or CREATE POLICY fails, fall back to DISABLE RLS so the dev sandbox stays usable. Scope is now limited to actual DDL errors where the fallback is intended. - Transient errors no longer abort the whole load. - Security-regression vector (disable RLS on read error) is closed. 3. getBranchSample: allow limit=0. - Previous clamp was [1, 100], which silently bumped a caller's intentional `LIMIT 0` to 1 row (a behavior regression — LIMIT 0 is valid SQL and a legitimate probe). Clamp is now [0, 100]. Non-finite inputs still fall back to the documented default of 5. Tests: 89/89 passing. Build clean. Lint clean. Known non-blocking (documented in PR body): pre-existing `keyStr.split("|")` and `fkValue.includes("null")` fragility in the formal-FK path of referential.ts. Not introduced by this PR; tracked for a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The 5 code PRs (#2-#6) deliver the engineering for the launch. This PR delivers everything user-facing: the README pitch, the package metadata, the docs/ folder, and the CHANGELOG. This PR depends on PRs #2-#6 being merged first because the new README references commands those PRs add (sow sandbox, sow doctor <connector>, the --allow-unsafe flag, the sub-second reset). Land them first, rebase this against main, then merge. README.md Hero rewritten from "Safe test databases from production Postgres" to "Stop letting Claude touch your prod database". Body explains the anxiety-reduction pitch: a coding agent is about to do something database-adjacent and you feel that quiet pang. sow is the safety layer. New "Why sow" section. New "How It Works" diagram showing the template-DB shape (one container per connector, N branch DBs, reset in <1s). New "Cookbook" stub linking to docs/cookbook.md. New "Documentation" section with the docs/ index. packages/cli/package.json Description: "Stop letting Claude touch your prod database. PII-safe local Postgres sandbox for coding agents." Keywords: added ai-agents, coding-agents, claude-code, cursor, sandbox, mcp. packages/core/package.json Description: "sow core engine — analyze, sample, sanitize, and branch Postgres databases for safe coding-agent sandboxes" Keywords: added ai-agents, coding-agents, sandbox. packages/mcp/package.json Description corrected from "15 tools" to "22 tools" (the actual count in packages/mcp/src/index.ts) and repositioned: "sow MCP server — 22 tools for coding agents (Claude Code, Cursor, Codex) to safely manage Postgres sandboxes" Keywords: added claude-code, cursor, codex, coding-agents, sandbox. docs/sandbox.md (new) The sow sandbox flagship command — what it does, the flags, the .env.local backup/revert flow, when not to use it, and what's actually in the sandbox. docs/sanitization.md (new) What sow sanitizes (the PII type table), how JSONB walking works, the fail-closed gate, the --allow-unsafe escape hatch, custom rules via .sow.yml, what sow does NOT do (free-text NER, etc.), and the read-only-on-the-source guarantee. docs/cookbook.md (new) Three end-to-end workflows with concrete prompts: 1. Let Claude refactor your schema without fear 2. Let Cursor generate seed data for a new feature 3. Let your coding agent debug a failing migration Plus the "agent reset loop" pattern diagram, the MCP tool list, and operational tips (one long-running sandbox per project, checkpoints for known-good states, sow doctor as the inspection surface, principle of least privilege on the source DB user). CHANGELOG.md (new) Scaffold with three sections: - [Unreleased] documenting the planned PR #2-#6 features under Added/Changed - [0.1.14] documenting the SQL injection security fix that already shipped (PR #1, merged earlier in the session) - [0.1.13] one-line summary of the initial public release Test/build/lint all clean (89/89 tests, 3/3 packages built, no source code changed in this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

faculopezscala and others added 4 commits April 6, 2026 00:16

faculopezscala changed the title ~~Fix/sql injection sampler~~ fix(core): parameterize SQL across sampler and branching layers Apr 6, 2026

faculopezscala merged commit cea77df into main Apr 6, 2026
1 check passed

This was referenced Apr 6, 2026

refactor(core): Postgres template databases for sub-second branch reset #2

Merged

refactor(core): sampler warnings + dynamic skip + batched fetch (Lane A rest) #6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): parameterize SQL across sampler and branching layers#1

fix(core): parameterize SQL across sampler and branching layers#1
faculopezscala merged 4 commits into
mainfrom
fix/sql-injection-sampler

faculopezscala commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

faculopezscala commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Coverage

Pre-Landing Review

Adversarial Review

Scope Drift

Plan Completion

Known Non-Blocking (documented, deferred)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

faculopezscala commented Apr 6, 2026 •

edited

Loading