fix(core): parameterize SQL across sampler and branching layers#1
Merged
Conversation
The referential-integrity sampler built WHERE clauses by string-concatenating
primary key values taken from sampled source rows. Two failure modes:
1. Benign: a text PK containing a single quote (e.g. "O'Brien") produced
invalid SQL, which was then caught by an empty catch{} and silently
dropped. Users got a branch with dangling foreign keys and never knew.
2. Security: sow's entire pitch is "safe for your production database".
A crafted PK value like "'; DROP TABLE x; --" would execute against
the source DB under whatever permissions the connection string had.
Read-only in intent, not in SQL effect.
Changes:
- adapters/postgres.ts: the query() method accepted a `params` argument
in the interface but silently dropped it (`_params?: unknown[]`) and
called sql.unsafe(sqlStr) with no binding. Now it passes params through
when present. The interface finally matches the runtime behavior.
- sampler/referential.ts: all three dynamic-SQL call sites now use
$1,$2,... placeholders and pass values via the params array. Added
a quoteIdent() helper that escapes double quotes in table/column
names per the SQL standard as defense in depth for the catalog names.
- sampler/referential.test.ts: 9 new tests covering quoteIdent, a
regression test for the O'Brien single-quote crash, composite FK
placeholders, and the hostile-payload case. Uses a spying adapter
to assert that values flow through params and never into SQL text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up to the sampler fix. A /ship specialist review flagged that the same class of string-interpolated SQL existed in the branching layer and on paths that receive user/agent input. Five more sites closed. Changes: - sql/identifiers.ts (new): extract quoteIdent to a shared module so it can be used by any file that builds dynamic SQL. The sampler re-exports it so the prior test imports still resolve. - branching/manager.ts:473 getBranchSample — the `table` argument comes directly from `sow branch sample <branch> <table>` (user argv) and from the `sow_branch_sample` MCP tool (agent-supplied). It was interpolated into sql.unsafe unsanitized. Now quoted via quoteIdent, and the limit is bound via $1 after numeric clamping. - branching/providers/supabase.ts:55 fetchAuthUserMappings — the IN(...) clause was built by single-quote-concatenating user ids from sampled source data. Now uses $1,$2,... placeholders with userIds as the params array to adapter.query(). - branching/supabase.ts RLS setup block — 8 sites interpolating tableName into DDL and information_schema WHERE clauses. tableName values without quoted identifier escaping could break out of the wrapping quotes. Now uses quoteIdent for all identifier positions and $1 bind params for all value positions in the information_schema lookups. - branching/supabase.ts createTestAuthUsers — user.id and user.email were interpolated into DELETE and INSERT statements inside single-quoted literals. Now all three call sites (delete-by-id, insert auth.users row, insert matching identity) bind values via $1,$2 placeholders. The `gen_random_uuid()` SQL function literal path is split from the parameterized-id path to keep the query text honest about which part is value vs SQL. All fixes follow the same pattern: quoteIdent for identifiers (which Postgres cannot parameterize), $N placeholders for values. Testing: - Full unit suite green (88/88). - Build clean across all three packages. - Lint clean on every touched file. - The sampler regression tests from the previous commit still pass (quoteIdent is re-exported via the sampler module). - Integration tests against the bugsterdb schema in __integration__ exercise the RLS and auth-user paths; they require a live Postgres and are NOT run here (separate vitest config). Future work: add spy-adapter tests for fetchAuthUserMappings specifically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
An adversarial review pass on the fix/sql-injection-sampler branch
surfaced four actionable issues. Addressed here.
1. quoteIdent hardening:
- Throws on empty identifier (would have produced the invalid SQL
token `""` and errored at execution time far from the call site).
- Throws on embedded NUL byte (libpq rejects these at protocol level
with a confusing error; fail loudly here instead).
2. Kill the quoteIdent re-export from sampler/referential.ts. The test
file now imports quoteIdent directly from ../sql/identifiers.js so
there is one canonical import path and tests cannot silently diverge
from production.
3. branching/manager.ts getBranchSample limit clamping:
previously used `limit | 0` which coerced undefined/NaN to 0 and
silently returned empty result sets. Now uses an explicit
Number.isFinite check and falls back to the documented default of
5, then clamps to [1, 100]. Restores pre-fix semantics for edge
inputs without reintroducing any injection surface (the clamped
value is still bound via $1).
4. branching/supabase.ts RLS setup: hoisted the two information_schema
lookups OUTSIDE the try/catch block. Previously, a transient read
error during introspection could end up in the fallback path that
DISABLES row level security on the table — a real security
regression vector even though the pre-fix code had the same shape.
Now the try/catch only wraps the DDL that the fallback is actually
designed to recover from (ENABLE RLS + CREATE POLICY).
Tests:
- 89/89 unit tests passing (up from 88; quoteIdent empty-string test
flipped to two throw tests).
- Build clean.
- Lint clean on every touched file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…table RLS scope
A second adversarial pass (Codex) surfaced three more issues. Addressed.
1. fetchAuthUserMappings: batch and UUID-filter before querying.
- Batch size 1000 keeps us well under Postgres's 65,535 bind-param
limit for full-copy snapshots with many auth users.
- Filter userIds by UUID_RE before binding: the upstream heuristic
in extractUserIds() only checks `length > 10`, so a non-UUID value
reaching the IN clause would cause a cast error that the broad
catch swallowed, silently dropping ALL mappings. Now we skip
non-UUID ids before the query runs.
- A single batch failure no longer nukes the whole mapping set —
other batches still contribute their results.
2. supabase.ts RLS loop: per-table introspection scope.
- The previous fix hoisted info_schema lookups outside the try to
avoid the RLS-disable fallback firing on transient reads. But
that made a catalog blip abort the entire loadIntoSupabase run.
- New shape: two separate try blocks per table.
(a) Introspection try: if this fails, `continue` to next table
without touching RLS. Fail-safe: unconfigured tables remain
locked, not disabled.
(b) DDL try: if ENABLE RLS or CREATE POLICY fails, fall back to
DISABLE RLS so the dev sandbox stays usable. Scope is now
limited to actual DDL errors where the fallback is intended.
- Transient errors no longer abort the whole load.
- Security-regression vector (disable RLS on read error) is closed.
3. getBranchSample: allow limit=0.
- Previous clamp was [1, 100], which silently bumped a caller's
intentional `LIMIT 0` to 1 row (a behavior regression — LIMIT 0
is valid SQL and a legitimate probe). Clamp is now [0, 100].
Non-finite inputs still fall back to the documented default of 5.
Tests: 89/89 passing. Build clean. Lint clean.
Known non-blocking (documented in PR body): pre-existing `keyStr.split("|")`
and `fkValue.includes("null")` fragility in the formal-FK path of
referential.ts. Not introduced by this PR; tracked for a follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 6, 2026
faculopezscala
added a commit
that referenced
this pull request
Apr 6, 2026
The 5 code PRs (#2-#6) deliver the engineering for the launch. This PR delivers everything user-facing: the README pitch, the package metadata, the docs/ folder, and the CHANGELOG. This PR depends on PRs #2-#6 being merged first because the new README references commands those PRs add (sow sandbox, sow doctor <connector>, the --allow-unsafe flag, the sub-second reset). Land them first, rebase this against main, then merge. README.md Hero rewritten from "Safe test databases from production Postgres" to "Stop letting Claude touch your prod database". Body explains the anxiety-reduction pitch: a coding agent is about to do something database-adjacent and you feel that quiet pang. sow is the safety layer. New "Why sow" section. New "How It Works" diagram showing the template-DB shape (one container per connector, N branch DBs, reset in <1s). New "Cookbook" stub linking to docs/cookbook.md. New "Documentation" section with the docs/ index. packages/cli/package.json Description: "Stop letting Claude touch your prod database. PII-safe local Postgres sandbox for coding agents." Keywords: added ai-agents, coding-agents, claude-code, cursor, sandbox, mcp. packages/core/package.json Description: "sow core engine — analyze, sample, sanitize, and branch Postgres databases for safe coding-agent sandboxes" Keywords: added ai-agents, coding-agents, sandbox. packages/mcp/package.json Description corrected from "15 tools" to "22 tools" (the actual count in packages/mcp/src/index.ts) and repositioned: "sow MCP server — 22 tools for coding agents (Claude Code, Cursor, Codex) to safely manage Postgres sandboxes" Keywords: added claude-code, cursor, codex, coding-agents, sandbox. docs/sandbox.md (new) The sow sandbox flagship command — what it does, the flags, the .env.local backup/revert flow, when not to use it, and what's actually in the sandbox. docs/sanitization.md (new) What sow sanitizes (the PII type table), how JSONB walking works, the fail-closed gate, the --allow-unsafe escape hatch, custom rules via .sow.yml, what sow does NOT do (free-text NER, etc.), and the read-only-on-the-source guarantee. docs/cookbook.md (new) Three end-to-end workflows with concrete prompts: 1. Let Claude refactor your schema without fear 2. Let Cursor generate seed data for a new feature 3. Let your coding agent debug a failing migration Plus the "agent reset loop" pattern diagram, the MCP tool list, and operational tips (one long-running sandbox per project, checkpoints for known-good states, sow doctor as the inspection surface, principle of least privilege on the source DB user). CHANGELOG.md (new) Scaffold with three sections: - [Unreleased] documenting the planned PR #2-#6 features under Added/Changed - [0.1.14] documenting the SQL injection security fix that already shipped (PR #1, merged earlier in the session) - [0.1.13] one-line summary of the initial public release Test/build/lint all clean (89/89 tests, 3/3 packages built, no source code changed in this PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a SQL injection vulnerability in the database sampler and extends the same fix to six related sites in the branching layer.
The bug.
sow connectruns a referential-integrity pass that builds SQLWHEREclauses by string-interpolating primary-key values from sampled source rows. Two concrete failure modes:"O'Brien") produced invalid SQL, caught by an emptycatch {}, and silently dropped. Users got a branch with dangling foreign keys and never knew.'; DROP TABLE x; --would execute against the source DB under whatever permissions the connection string has. Read-only in intent, not in SQL effect.Scope expansion. A specialist review during /ship surfaced the same class of bug in several other files. Rather than ship a narrow fix that left five more injection sinks on
main, the PR was expanded to close them all.Commits (bisectable):
63ef8d8— Parameterize the sampler.DatabaseAdapter.query()accepted aparamsargument in its interface but silently dropped it (_params?: unknown[]) and calledsql.unsafe(sqlStr)with no binding — the interface was a lie. Now it actually passesparamsthrough topostgres@3'ssql.unsafe(query, parameters). Three dynamic-SQL call sites insampler/referential.tsrewritten to use$1, $2, ...placeholders. 9 regression tests including theO'Briensingle-quote crash.e0a724d— Parameterize the branching layer.quoteIdent()topackages/core/src/sql/identifiers.tsas a shared module.branching/manager.ts:getBranchSample— thetableargument comes directly fromsow branch sample <branch> <table>(user argv) AND thesow_branch_sampleMCP tool (AI agent input). Now quoted viaquoteIdent; limit bound via$1after numeric clamping.branching/providers/supabase.ts:fetchAuthUserMappings— theIN (...)clause was built by single-quote-concatenating user ids from sampled source data. Now uses$1, $2, ...placeholders.branching/supabase.tsRLS setup — 8 sites interpolatedtableNameinto DDL and information_schema queries. Values go through$1binds; identifiers throughquoteIdent.branching/supabase.ts createTestAuthUsers—user.idanduser.emailwere interpolated into DELETE and INSERT statements inside single-quoted literals. All bound now.11e0615— Adversarial review follow-ups (Claude subagent).quoteIdentthrows on empty identifier and NUL byte.quoteIdentfromreferential.tsso tests import from the canonical module.getBranchSamplelimit clamping uses an explicitNumber.isFinitecheck so unset/NaN falls back to the documented default of 5 instead of silently returning zero rows.branching/supabase.tsRLS introspection hoisted outside the DDL try block to prevent a transient read error from landing in the policy-disable fallback path (a real fail-open security vector).08eebb1— Codex adversarial follow-ups.fetchAuthUserMappingsnow batches ids (1000 per query) so large full-copy snapshots don't hit Postgres's 65,535 bind-param limit, AND filters by UUID regex before binding so one non-UUID value doesn't poison the whole batch via cast error. A single batch failure no longer nukes the other batches.branching/supabase.tsRLS loop has a per-table introspection try block thatcontinues on error (fail-safe: leaves RLS enabled) rather than aborting the wholeloadIntoSupabaserun. The DDL try block still has the dev-sandbox disable-RLS-on-policy-failure fallback.getBranchSampleclamp fixed to[0, 100](was[1, 100]—LIMIT 0is legal SQL and a valid caller intent).Test Coverage
Tests: 79 → 89 (+10 new) — 7 for
quoteIdent(including throw cases for empty + NUL byte), 3 regression tests for the sampler SQL shape + params.The four untested paths all share a common shape: they use
quoteIdent/parameterized queries in the same pattern as the fully-tested sampler path, andquoteIdentitself is exhaustively tested.Pre-Landing Review
Specialist review (4 specialists):
quoteIdentto a shared module (done).e0a724dand11e0615. This is why the PR expanded from 3 files to 7.PR Quality Score: 7.0/10
Adversarial Review
Two rounds of adversarial review from both Claude and Codex:
Round 1 — Claude adversarial subagent: 15 findings. Actionable items (4) addressed in commit
11e0615.Round 2 — Codex adversarial + structured review: 5 findings. 3 actionable (batching, UUID filter, per-table scope) addressed in
08eebb1. 2 pre-existing issues documented below.Codex structured review P1 gate: PASS. No remaining P1 issues.
Scope Drift
Scope Check: CLEAN (with deliberate expansion). The PR started as a narrow sampler fix and deliberately expanded to close the entire SQL injection category after the specialist review surfaced 5 more sites of the same class. Every file changed is directly related to the security fix.
Plan Completion
This PR implements Issue #2 of the 11-issue plan from
/plan-eng-review. Issues #1 (template-DB reset), #3 (sanitizer warnings), #4 (JSONB), #5 (sow sandbox), #6 (env-patch), #7 (dynamic skip list), #8 (N+1 batching in sampler), #9 (fail-closed sanitization), #10 (Postgres type coverage), and #11 (release workflow refactor) ship as separate PRs per the plan's parallelization lanes.Known Non-Blocking (documented, deferred)
These surfaced during adversarial review but are pre-existing and not introduced by this PR:
keyStr.split("|")fragility in the formal-FK path ofsampler/referential.ts. A primary key value containing a literal|is silently mis-split and the row is dropped.fkValue.includes("null")bug in the missing-parent detection. A legitimate key containing the substring "null" is falsely treated as a NULL FK.String(parent[col] ?? "")NULL→"" coercion in the ensure-1-child-per-parent path. A NULL parent key becomes''and can match empty-string children.sanitizer/rules.ts(regex escapes) andsanitizer/transformer.ts(unused import).Test plan
bunx vitest run— 89/89 tests passing (79 pre-existing + 10 new)bunx turbo build— all 3 packages build cleanlybunx eslinton every touched file — cleanv0.1.14artifact on npmsow branch sample <branch> 'test"name'to manually verify thequoteIdentpath🤖 Generated with Claude Code