Skip to content

docs(internal): agent-worktree path resolution bleed audit#70

Merged
TSavo merged 1 commit into
mainfrom
docs/agent-worktree-path-bleed
May 3, 2026
Merged

docs(internal): agent-worktree path resolution bleed audit#70
TSavo merged 1 commit into
mainfrom
docs/agent-worktree-path-bleed

Conversation

@TSavo
Copy link
Copy Markdown
Owner

@TSavo TSavo commented May 2, 2026

Summary

  • Root-cause analysis of the agent-worktree absolute-path resolution bleed flagged during PR feat(go): port binaryCid + targetProofCid + sourceContractCid + EvidenceTerm to Go kit #18's Go cross-impl conformance dispatch
  • Documents the failure mode: Edit/Write tool calls resolve absolute paths to the maintainer's main worktree instead of the agent's isolated git worktree
  • Reviews existing partial mitigation in captureChange.ts (commit 9d9f9d5) and notes it covers only the fix-loop C3/C6 path
  • Evaluates four structural fix options and recommends option (a): CWD invariant enforcement at the agent dispatch boundary
  • No code changes -- audit and recommendation only

Copilot AI review requested due to automatic review settings May 2, 2026 21:50
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 2, 2026

Warning

Rate limit exceeded

@TSavo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 5 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 70d9fb4d-e3ba-4080-9f47-ec72ffdbcd6f

📥 Commits

Reviewing files that changed from the base of the PR and between 2bf8f8f and 5b46c31.

📒 Files selected for processing (1)
  • docs/internal/agent-worktree-path-bleed-audit.md
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/agent-worktree-path-bleed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 59 minutes and 5 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an internal audit document analyzing an “agent worktree absolute-path resolution bleed” issue discovered during parallel agent dispatch, including the current partial mitigation and recommended structural fix.

Changes:

  • Introduces a root-cause analysis of how absolute paths can cause tools to operate on the maintainer’s main worktree instead of an agent worktree.
  • Documents observed failure modes and operational impact in parallel agent scenarios.
  • Evaluates four remediation options and recommends enforcing a worktree-root invariant at the tool-dispatch boundary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +57 to +59
Commit `9d9f9d5` (`fix(c6): tolerate Write-bypass when agent self-corrected to overlay path`) added
post-agent path enforcement in `src/fix/captureChange.ts`:

Comment on lines +36 to +50
(e.g. `/Users/tsavo/provekit/src/workflow/producers/foo.ts`), the agent's Edit/Write tools resolve
that path literally on the filesystem. The agent is executing in an isolated worktree at
`/Users/tsavo/provekit/.claude/worktrees/agent-XXXXXXXXXX/`, but the absolute path points to the
maintainer's primary clone at `/Users/tsavo/provekit/`.

**Why this happens:**

1. Task dispatch prompts are authored from the maintainer's main worktree and include absolute paths
(e.g. "Modify `src/workflow/producers/foo.ts`" becomes `file_path: "/Users/tsavo/provekit/src/workflow/producers/foo.ts"`).
2. The agent's worktree is a full clone -- the same directory hierarchy exists under a different root.
3. The LLM sees a string path, not a filesystem context. It passes it verbatim to the tool.
4. The tool implementation resolves the path against the real filesystem. `/Users/tsavo/provekit/`
is the maintainer's main worktree, not the agent's.

**Consequence:** a Write tool call against `/Users/tsavo/provekit/src/foo.ts` mutates the maintainer's
inconsistencies, option (a) surfaces the rejection to the agent. The agent sees the error, can
reason about it, and can self-correct by issuing a relative path. This matches the existing
`captureChange.ts` design principle: "This is detection, not prevention. By the time we get here
the agent has already run." The fix is to move that detection UPstream, from post-hoc to pre-tool,
Comment on lines +218 to +221
1. Resolve the tool's file_path against the filesystem (realpath).
2. Resolve the agent's worktree root (realpath of the cwd assigned during worktree creation).
3. If file_path does NOT start with worktree_root:
a. Log the full tool input for audit.
@TSavo TSavo force-pushed the docs/agent-worktree-path-bleed branch 4 times, most recently from 29cde25 to 32c55b0 Compare May 3, 2026 00:41
Root-cause analysis of the absolute-path resolution bleed where
Edit/Write tool calls from agents in isolated git worktrees resolve
absolute paths to the maintainer's main worktree instead of the
agent's worktree. Recommends CWD invariant enforcement at the agent
dispatch boundary (option a) as the structural fix.

Co-Authored-By: Claude DeepSeek v4 Pro <noreply@anthropic.com>
@TSavo TSavo force-pushed the docs/agent-worktree-path-bleed branch from 32c55b0 to 5b46c31 Compare May 3, 2026 06:35
@TSavo TSavo merged commit 6629e38 into main May 3, 2026
3 of 4 checks passed
@TSavo TSavo deleted the docs/agent-worktree-path-bleed branch May 3, 2026 06:38
TSavo added a commit that referenced this pull request May 14, 2026
3 mint_kit_integration tests have been quietly red on main because the
strict CI=1 path requires built self-contract binaries that the local
'cargo test' invocation does not produce. The tests were skipped silently
on dev boxes (missing binaries -> empty-set CID -> false skip) and not
gated in CI.

Fix: add a 'test-mint-kit-integration-pins' make target and wire it into
'make conformance' AFTER 'all-mint' (which builds and mints the rust/cpp/zig
self-contract binaries), running with CI=1 so strict mode is active.

3 tests fixed:
- kits_with_real_contracts_produce_nonempty_contract_set
- rust_kit_contract_set_cid_is_pinned_to_self_contracts_canonical
  Pinned CID verified: 3b41145b...
- cpp_kit_contract_set_cid_is_pinned_to_self_contracts_canonical
  Pinned CID verified: 0e17f718...

Verification:
- Reproduced strict failures with CI=1 before binaries were built.
- Built release Rust workspace + cpp self-contract binary + zig self-contract.
- CI=1 strict tests for the 3 fixed tests all passed.
- make -n test-mint-kit-integration-pins confirms the new gate runs after all-mint.
- Em-dash + en-dash sweep: clean.

Note: full 'mint_kit_integration' release run could not complete locally
(network blocked for tools/recompute-spec-cids crates.io fetch); CI will
have full network. The 3 named tests pass.

Audit doc updated: docs/audits/2026-05-12-mint-kit-integration-failures.md.

Closes #70.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TSavo
Copy link
Copy Markdown
Owner Author

TSavo commented May 14, 2026

Closed via #949.

TSavo added a commit that referenced this pull request May 16, 2026
…closes A21) (#1090)

Three cleanups per the 2026-05-16 post-merge audit finding A21:

- site_cid: remove file from hash input. Call-site identity is function-relative
  (function, name, termShapeCid); file path is provenance, not bind-tier.
- NamedTerm::file: add skip_serializing_if = "String::is_empty" so empty defaults
  no longer leak into named-term JSON serialization.
- BindLiftEntry::fn_line: kept for legacy nonzero producers, but added
  default-zero serialization elision so empty defaults no longer leak.

All three closures complete the cascade started by 610abaf (which made
the struct fields optional) and a70a13d (which removed the file key emission
from walk_rpc.rs). Regression tests added for all three behaviors.

Pre-existing provekit-bridgeworks smoke red on origin/main reproduces with
this patch reverted; not introduced by this commit (tracked at task #70).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSavo added a commit that referenced this pull request May 16, 2026
…ses A19 + A22 on Rust side) (#1091)

Two related findings from the 2026-05-16 post-merge audit:

A19: lift envelope still emitted attr_pre/attr_post/concept_annotation/fn_name
which got hashed into bind CID via address(payload). Federation broke for
any function with #[provekit_contract(pre=)] attributes because Rust extracted
literal attribute text while Python extracted differently-formatted comment
text. Strips these from the json! emission sites in walk_rpc.rs.

A22: syn::UnOp::Not collapsed bool-logical-not and integer-bitwise-not to
concept:not, hiding distinct algebras. Now routes by operand type: bool to
concept:not, integer types to concept:bitnot. Matches Python's
not/~ distinction.

11/11 walk_rpc bind_lift tests pass. Pre-existing provekit-bridgeworks smoke
red on origin/main is not introduced by this commit (tracked at task #70).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSavo added a commit that referenced this pull request May 16, 2026
…loses #1093) (#1097)

A21 stripped source-location residuals from the bind payload; this completes
the envelope-tier strip by removing the function name field from the hashed
bytes. Federation now holds across languages with diverging function naming
conventions (e.g., camelCase vs snake_case).

Per the canonical-form ruling: function names are surface syntax, not
algebraic. Bind CID for the same algebra must be byte-identical regardless
of source function name.

Implementation:
- BindLiftEntry::fn_name and NamedTerm::function get skip-if-empty handling
  so the hashed bytes contain no function-name residual
- realize_function_name helper in lower_plugin.rs provides an in-place
  fallback to term.name when term.function is empty. NOT a sidecar channel
  (that is A25 follow-up); just a bridge so realize emission doesn't break
  during the transition window before A25 introduces source_function_name

Red-before / green-after regression test added (add vs adder produces
byte-identical bind CIDs). Pre-existing bridgeworks smoke red on origin/main
is task #70, not introduced by this commit.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSavo added a commit that referenced this pull request May 16, 2026
…1095, #1096) (#1098)

Implements the locked β-with-integer-array-paths schema per
docs/plans/2026-05-16-operand-binding-sidecar-schema.md, closing both A24
(operand-binding-from-context derivation) and A25 (function-name non-hashed
sidecar channel) in one combined PR.

Both lift kits (Rust walk_rpc.rs + Python bind_lifter.py) now emit:
- operand_bindings: flat list of {position: [int...], symbol: str} tuples
  walked from term_shape root, recording the source symbol at each operand
  slot during AST traversal
- source_function_name: optional string carrying the source function name

Both fields live on the realize request, NOT in the bind payload. They
flow through libprovekit's lower_plugin.rs as the realize sidecar channel.
The bind CID hash strips these fields via strip_realize_sidecar_from_lift_term
before computing the address, so federation byte-identity at the algebra
layer is preserved regardless of operand naming or function-name conventions.

Realize-side: provekit-realize-python-core consumes operand_bindings,
pre-processes into a path-to-symbol map, and synthesizes function bodies
from term_shape using the map. Completeness gate (per Supra omnia rectum)
asserts every term_shape leaf has a binding entry and every binding entry's
path resolves to a leaf; refuses synthesis on misalignment with a clear
diagnostic.

Verification:
- python realize tests: 39 passed
- python lift tests: 39 passed
- Trinity composition census slow lane: 9 passed (all seams, including
  seam 3 positive with CORRECT operand bindings)
- Bridgeworks targeted smoke: passes (incidental fix to witness lowering
  via .provekit/lower/<surface> RPC manifest dispatch, addressing the
  pre-existing red from task #70)

Rebased against A23 (#1097). bind.rs and lower_plugin.rs combine both
A23's function-name strip + A24+A25's sidecar threading; both helper
function families coexist (A23's strip_bind_payload_source_function;
A24+A25's strip_realize_sidecar_from_lift_term + merge_realize_sidecar).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSavo added a commit that referenced this pull request May 17, 2026
…-correctness landing

Updates the empirical artifact to reflect:
- A23 #1097 merged: libprovekit function-field strip from bind payload hash;
  federation byte-identity across diverging function naming conventions
- A24+A25 #1098 combined merged: operand-binding sidecar channel + function-name
  sidecar channel both implemented per the locked β-with-integer-array-paths
  schema. Both lift kits emit non-hashed sidecar fields flowing through lower
  request to realize plugin. Realize completeness gate (Supra omnia rectum)
  refuses synthesis on operand-binding misalignment.
- Trinity census slow lane 9/9 GREEN with CORRECT operand bindings (no longer
  positional fallback); behavioral-correctness layer fully operational on main
- Bridgeworks task #70 incidentally closed via .provekit/lower/<surface> RPC
  manifest dispatch fix bundled with A24+A25
- Adds operand-binding sidecar schema ruling to the codified architect rulings list

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSavo added a commit that referenced this pull request May 17, 2026
Adds 5 Python fixtures (hello_world, recursive factorial, arithmetic,
control flow, transported concept citation comment) under
implementations/python/conformance/fixtures, plus a CI slow lane that
runs real LiftKit → BindKit → LowerKit, invokes py_compile then python3,
and asserts behavior equivalence to the original source.

Refusal paths verified: target-compile-failure and target-behavior-divergence
both emit CompositionRefusalMemento with the expected failure_kind.

Extended Python lift/realize support for leaf-only returns, `pass` as
concept:skip, ternary conditionals, and call callee slots — surfaced
naturally by the fixtures.

43 python realize tests pass, 39 python lift tests pass, 2 cargo slow-lane
conformance tests pass. Pre-existing bug-zoo TypeScript red on local
default lane is sandbox-environmental (tsx unavailable), task #70 not
introduced by this PR.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants