docs(internal): agent-worktree path resolution bleed audit#70
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 59 minutes and 5 seconds.Comment |
There was a problem hiding this comment.
Pull request overview
Adds an internal audit document analyzing an “agent worktree absolute-path resolution bleed” issue discovered during parallel agent dispatch, including the current partial mitigation and recommended structural fix.
Changes:
- Introduces a root-cause analysis of how absolute paths can cause tools to operate on the maintainer’s main worktree instead of an agent worktree.
- Documents observed failure modes and operational impact in parallel agent scenarios.
- Evaluates four remediation options and recommends enforcing a worktree-root invariant at the tool-dispatch boundary.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Commit `9d9f9d5` (`fix(c6): tolerate Write-bypass when agent self-corrected to overlay path`) added | ||
| post-agent path enforcement in `src/fix/captureChange.ts`: | ||
|
|
| (e.g. `/Users/tsavo/provekit/src/workflow/producers/foo.ts`), the agent's Edit/Write tools resolve | ||
| that path literally on the filesystem. The agent is executing in an isolated worktree at | ||
| `/Users/tsavo/provekit/.claude/worktrees/agent-XXXXXXXXXX/`, but the absolute path points to the | ||
| maintainer's primary clone at `/Users/tsavo/provekit/`. | ||
|
|
||
| **Why this happens:** | ||
|
|
||
| 1. Task dispatch prompts are authored from the maintainer's main worktree and include absolute paths | ||
| (e.g. "Modify `src/workflow/producers/foo.ts`" becomes `file_path: "/Users/tsavo/provekit/src/workflow/producers/foo.ts"`). | ||
| 2. The agent's worktree is a full clone -- the same directory hierarchy exists under a different root. | ||
| 3. The LLM sees a string path, not a filesystem context. It passes it verbatim to the tool. | ||
| 4. The tool implementation resolves the path against the real filesystem. `/Users/tsavo/provekit/` | ||
| is the maintainer's main worktree, not the agent's. | ||
|
|
||
| **Consequence:** a Write tool call against `/Users/tsavo/provekit/src/foo.ts` mutates the maintainer's |
| inconsistencies, option (a) surfaces the rejection to the agent. The agent sees the error, can | ||
| reason about it, and can self-correct by issuing a relative path. This matches the existing | ||
| `captureChange.ts` design principle: "This is detection, not prevention. By the time we get here | ||
| the agent has already run." The fix is to move that detection UPstream, from post-hoc to pre-tool, |
| 1. Resolve the tool's file_path against the filesystem (realpath). | ||
| 2. Resolve the agent's worktree root (realpath of the cwd assigned during worktree creation). | ||
| 3. If file_path does NOT start with worktree_root: | ||
| a. Log the full tool input for audit. |
29cde25 to
32c55b0
Compare
Root-cause analysis of the absolute-path resolution bleed where Edit/Write tool calls from agents in isolated git worktrees resolve absolute paths to the maintainer's main worktree instead of the agent's worktree. Recommends CWD invariant enforcement at the agent dispatch boundary (option a) as the structural fix. Co-Authored-By: Claude DeepSeek v4 Pro <noreply@anthropic.com>
32c55b0 to
5b46c31
Compare
3 mint_kit_integration tests have been quietly red on main because the strict CI=1 path requires built self-contract binaries that the local 'cargo test' invocation does not produce. The tests were skipped silently on dev boxes (missing binaries -> empty-set CID -> false skip) and not gated in CI. Fix: add a 'test-mint-kit-integration-pins' make target and wire it into 'make conformance' AFTER 'all-mint' (which builds and mints the rust/cpp/zig self-contract binaries), running with CI=1 so strict mode is active. 3 tests fixed: - kits_with_real_contracts_produce_nonempty_contract_set - rust_kit_contract_set_cid_is_pinned_to_self_contracts_canonical Pinned CID verified: 3b41145b... - cpp_kit_contract_set_cid_is_pinned_to_self_contracts_canonical Pinned CID verified: 0e17f718... Verification: - Reproduced strict failures with CI=1 before binaries were built. - Built release Rust workspace + cpp self-contract binary + zig self-contract. - CI=1 strict tests for the 3 fixed tests all passed. - make -n test-mint-kit-integration-pins confirms the new gate runs after all-mint. - Em-dash + en-dash sweep: clean. Note: full 'mint_kit_integration' release run could not complete locally (network blocked for tools/recompute-spec-cids crates.io fetch); CI will have full network. The 3 named tests pass. Audit doc updated: docs/audits/2026-05-12-mint-kit-integration-failures.md. Closes #70. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Closed via #949. |
…closes A21) (#1090) Three cleanups per the 2026-05-16 post-merge audit finding A21: - site_cid: remove file from hash input. Call-site identity is function-relative (function, name, termShapeCid); file path is provenance, not bind-tier. - NamedTerm::file: add skip_serializing_if = "String::is_empty" so empty defaults no longer leak into named-term JSON serialization. - BindLiftEntry::fn_line: kept for legacy nonzero producers, but added default-zero serialization elision so empty defaults no longer leak. All three closures complete the cascade started by 610abaf (which made the struct fields optional) and a70a13d (which removed the file key emission from walk_rpc.rs). Regression tests added for all three behaviors. Pre-existing provekit-bridgeworks smoke red on origin/main reproduces with this patch reverted; not introduced by this commit (tracked at task #70). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ses A19 + A22 on Rust side) (#1091) Two related findings from the 2026-05-16 post-merge audit: A19: lift envelope still emitted attr_pre/attr_post/concept_annotation/fn_name which got hashed into bind CID via address(payload). Federation broke for any function with #[provekit_contract(pre=)] attributes because Rust extracted literal attribute text while Python extracted differently-formatted comment text. Strips these from the json! emission sites in walk_rpc.rs. A22: syn::UnOp::Not collapsed bool-logical-not and integer-bitwise-not to concept:not, hiding distinct algebras. Now routes by operand type: bool to concept:not, integer types to concept:bitnot. Matches Python's not/~ distinction. 11/11 walk_rpc bind_lift tests pass. Pre-existing provekit-bridgeworks smoke red on origin/main is not introduced by this commit (tracked at task #70). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…loses #1093) (#1097) A21 stripped source-location residuals from the bind payload; this completes the envelope-tier strip by removing the function name field from the hashed bytes. Federation now holds across languages with diverging function naming conventions (e.g., camelCase vs snake_case). Per the canonical-form ruling: function names are surface syntax, not algebraic. Bind CID for the same algebra must be byte-identical regardless of source function name. Implementation: - BindLiftEntry::fn_name and NamedTerm::function get skip-if-empty handling so the hashed bytes contain no function-name residual - realize_function_name helper in lower_plugin.rs provides an in-place fallback to term.name when term.function is empty. NOT a sidecar channel (that is A25 follow-up); just a bridge so realize emission doesn't break during the transition window before A25 introduces source_function_name Red-before / green-after regression test added (add vs adder produces byte-identical bind CIDs). Pre-existing bridgeworks smoke red on origin/main is task #70, not introduced by this commit. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1095, #1096) (#1098) Implements the locked β-with-integer-array-paths schema per docs/plans/2026-05-16-operand-binding-sidecar-schema.md, closing both A24 (operand-binding-from-context derivation) and A25 (function-name non-hashed sidecar channel) in one combined PR. Both lift kits (Rust walk_rpc.rs + Python bind_lifter.py) now emit: - operand_bindings: flat list of {position: [int...], symbol: str} tuples walked from term_shape root, recording the source symbol at each operand slot during AST traversal - source_function_name: optional string carrying the source function name Both fields live on the realize request, NOT in the bind payload. They flow through libprovekit's lower_plugin.rs as the realize sidecar channel. The bind CID hash strips these fields via strip_realize_sidecar_from_lift_term before computing the address, so federation byte-identity at the algebra layer is preserved regardless of operand naming or function-name conventions. Realize-side: provekit-realize-python-core consumes operand_bindings, pre-processes into a path-to-symbol map, and synthesizes function bodies from term_shape using the map. Completeness gate (per Supra omnia rectum) asserts every term_shape leaf has a binding entry and every binding entry's path resolves to a leaf; refuses synthesis on misalignment with a clear diagnostic. Verification: - python realize tests: 39 passed - python lift tests: 39 passed - Trinity composition census slow lane: 9 passed (all seams, including seam 3 positive with CORRECT operand bindings) - Bridgeworks targeted smoke: passes (incidental fix to witness lowering via .provekit/lower/<surface> RPC manifest dispatch, addressing the pre-existing red from task #70) Rebased against A23 (#1097). bind.rs and lower_plugin.rs combine both A23's function-name strip + A24+A25's sidecar threading; both helper function families coexist (A23's strip_bind_payload_source_function; A24+A25's strip_realize_sidecar_from_lift_term + merge_realize_sidecar). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-correctness landing Updates the empirical artifact to reflect: - A23 #1097 merged: libprovekit function-field strip from bind payload hash; federation byte-identity across diverging function naming conventions - A24+A25 #1098 combined merged: operand-binding sidecar channel + function-name sidecar channel both implemented per the locked β-with-integer-array-paths schema. Both lift kits emit non-hashed sidecar fields flowing through lower request to realize plugin. Realize completeness gate (Supra omnia rectum) refuses synthesis on operand-binding misalignment. - Trinity census slow lane 9/9 GREEN with CORRECT operand bindings (no longer positional fallback); behavioral-correctness layer fully operational on main - Bridgeworks task #70 incidentally closed via .provekit/lower/<surface> RPC manifest dispatch fix bundled with A24+A25 - Adds operand-binding sidecar schema ruling to the codified architect rulings list Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 5 Python fixtures (hello_world, recursive factorial, arithmetic, control flow, transported concept citation comment) under implementations/python/conformance/fixtures, plus a CI slow lane that runs real LiftKit → BindKit → LowerKit, invokes py_compile then python3, and asserts behavior equivalence to the original source. Refusal paths verified: target-compile-failure and target-behavior-divergence both emit CompositionRefusalMemento with the expected failure_kind. Extended Python lift/realize support for leaf-only returns, `pass` as concept:skip, ternary conditionals, and call callee slots — surfaced naturally by the fixtures. 43 python realize tests pass, 39 python lift tests pass, 2 cargo slow-lane conformance tests pass. Pre-existing bug-zoo TypeScript red on local default lane is sandbox-environmental (tsx unavailable), task #70 not introduced by this PR. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
captureChange.ts(commit9d9f9d5) and notes it covers only the fix-loop C3/C6 path