[feat] Approval boundary: pause on authored intent, not session ids#5041
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Feedback needed (the four decisions in
Also worth a skim even if you skip the reviews: @coderabbitai review |
📝 WalkthroughWalkthroughMigrates tool/agent permission configuration from ChangesPermission model migration
Estimated code review effort: 5 (Critical) | ~150 minutes Sequence Diagram(s)sequenceDiagram
participant SandboxAgent as sandbox_agent.ts
participant ACPInteractions as acp-interactions.ts
participant ApprovalResponder as ApprovalResponder
participant PermissionPlan as permission-plan.ts
participant Relay as tools/relay.ts
SandboxAgent->>ACPInteractions: attachPermissionResponder(session)
ACPInteractions->>ApprovalResponder: onPermission(gate)
ApprovalResponder->>PermissionPlan: decide(gate, plan, stored)
PermissionPlan-->>ApprovalResponder: Verdict(allow/deny/pendingApproval)
alt pendingApproval
ApprovalResponder-->>ACPInteractions: pendingApproval
ACPInteractions->>SandboxAgent: emit interaction_request, pause()
else allow or deny
ApprovalResponder-->>ACPInteractions: allow/deny
ACPInteractions->>SandboxAgent: session.respondPermission(reply)
end
Relay->>PermissionPlan: decide(gate, plan, stored)
PermissionPlan-->>Relay: Verdict
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Oops, something went wrong! Please try again later. 🐰 💔 |
|
Here, |
361eb67 to
9d5161e
Compare
| (`sdks/python/agenta/sdk/agents/adapters/claude_settings.py:201-258`) by merging four rule | ||
| sources: the author's raw Claude rules, denies derived from the sandbox permission, per-MCP- | ||
| server permissions, and per-tool permissions. The rendering is deliberately conservative: a | ||
| per-tool `allow` becomes an allow rule (Claude runs the tool with no prompt), a `deny` |
There was a problem hiding this comment.
I'm not sure I understand this part. Do we override user settings on the permissions to these ones? Do we automatically set the permissions in the cloud setting depending on the other permissions?
There was a problem hiding this comment.
Neither override nor magic: we generate the whole settings.json from the agent config. The author's raw Claude rules (knob 3) are one of four inputs; they are kept verbatim and our derived rules (sandbox denies, per-MCP, per-tool) are appended, never rewritten. Conflicts are settled by Claude's own precedence at match time (deny beats allow), so an author deny always holds. Doc updated with the merge semantics.
| 3. The runner starts a Claude session in the sandbox, writing the rendered | ||
| `.claude/settings.json` into the workspace first. | ||
| 4. Claude runs the three reads without asking (allow rules from their `read_only` hints). | ||
| Their `tool_call` and `tool_result` events stream to the frontend as they happen. |
There was a problem hiding this comment.
If I understand correctly, in this case we don't even hit the gate to the ACP. Everything is auto-improved within Claude Code, and just to get the results, no request events or whatever.
There was a problem hiding this comment.
Exactly right. The three reads are settled entirely inside Claude by the rendered allow rules; nothing reaches the ACP layer for them, no permission or interaction events exist for them anywhere; you only see their tool_call/tool_result events. Doc step 4 now states this in your words.
Generalize the explainer across harnesses (per-harness gate table, Pi has no HITL today), explain Claude default_mode/bypassPermissions and the settings merge semantics, distinguish ACP request vs stream event and the two kinds of session, explain the cold-replay resume model before the responder code, and record the live approve-loop finding. Plan updates: relay becomes a pure executor (decision before execution), client tools resolve through the same ladder defaulting to allow, M2+M3 elevated with the approve-loop as an acceptance case. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
The live approve-loop is diagnosed: a constant stream messageId plus a level-triggered resume predicate on the frontend (new finding M7), compounded by tool-name drift across ACP frames breaking the decision key (M2's observed form, not argument drift). Updated the explainer's live-warning section, reframed M2 and added M7 in the code review, settled the fix direction in the plan (direct replay of the approved call; absorb #5054's message-id fix and edge-trigger guard; supersede its resolvedName patch and loop-breaker), and recorded the #5054 recommendation in status. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
…e target path Global policy becomes four explicit modes (allow|ask|deny|allow_reads): read-only-allow is a policy choice, not a hidden per-tool default, and needs_approval is deleted from the model. 'Disposition' renamed to 'effective permission' everywhere. New 'target path' section shows the clean end-state flow; resume is redesigned to replay the approved call directly. Corrected the session-id story (the playground sends a stable per-conversation id). Added the Pi-builtins explanation (selection is Pi's only native control). Plan gains the stacked-on-#5054 baseline (keep the message-id fix and resume guard; delete resolvedName and the loop-breaker) and updated phases/deltas. Status consolidated for final review. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
… Pi settings block (round 3)
…sume redesign, wire-first phases, test plan)
…y-ask pauses (phase 2b)
…y fallbacks (phase 4b)
…ause controller (phase 5)
…perseded workspaces
…he approval prompt
- handler.py: permission_default (the SDK deleted permission_policy in phase 3) - fold(): the terminal result's stop_reason wins over the done event (the live runner emits done without stopReason, so a real pause would otherwise drop its envelope metadata) and pending_interaction carries a derived top-level tool name - sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution (shared apiBase module, claude-model + resolved-model log lines) - service pause test pinned to the realistic stream shape (done event with no stopReason)
- playground: a rules-only runner.permissions override no longer drops the allow_reads default from the /invoke body (nested merge + test) - runner: a client-tool pause now seeds /sessions/interactions like every other pause (pauseClientTool was the one gate that left no row) - sdk: wire.py imports ToolPermission/PermissionRule from permission_rules.py instead of redeclaring them; greppable legacy-key literals in mcp/models.py; annotate_trace comment and claude_settings docstring tell the truth - docs: MCP-server permission step added to three precedence ladders; permissions wire block documents the optional default + fallbacks; client-tool carve-out paragraph matches onClientTool's actual semantics
…bit round, and merge
…l pauses with their own kind Codex pre-merge review of the rebase: - both ACP pause sites emit a stamped COPY of the toolCall (resolvedName = the gate's stable anchor) so the Vercel egress's preferred field is populated again (upstream stamped by mutating the shared ACP object; we keep it pure) - onCreateInteraction threads kind, so client-tool rows stop masquerading as user_approval, and the Pi relay's client-tool pause now seeds a row at all - the gateway integration test drops the deleted needs_approval vocabulary
62d2f49 to
f3a666f
Compare
|
Agenta Team seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
…5059 + decisions and implementation status Claude-Session: https://claude.ai/code/session_014iPB7HL5PjgT9npyPHaFMT
Context
An agent configured to auto-approve tools still stopped at the first gated tool and waited for a human who was not there. The runner decided "pause for a human" from whether the request carried a session id, and the SDK mints one for every request, so the
autopolicy was dead code: every gate paused, headless runs died silently (HTTP 200, mid-sentence text), and a fragile resume key produced the infinite approve-loop QA hit live. This PR carries the full design workspace (docs/design/agent-workflows/projects/approval-boundary/) plus the implementation.Changes
One permission model, authored to enforced with no inference:
permission: allow | ask | denyor inherit; per agentrunner.permissions.defaultwith four modes:allow,ask,deny,allow_reads(reads run, writes ask; the default).needs_approval, thepermission_modealiases, andrunner.interactions.headlessare deleted (legacy keys in old drafts are ignored, with a warning where they were a dead knob).permissions: {default, rules?};permissionPolicyandneedsApprovalare gone. Rules and Claude settings render from one shared parse so they cannot disagree.services/runner/src/permission-plan.ts) answers both gates. The ACP responder replies allow/deny or pauses; the relay enforces only where the harness does not gate, which gives Pi real human-in-the-loop for the first time.hasHumanSurface,PolicyResponder, and the feat(agent): agent playground — turn inspector, HITL dock, and tool I/O fidelity #5054 loop-breaker are deleted.tool_callname for Claude gates, canonical args), are consumed once, and a config changed to deny beats a stale approval. Drift produces a visible fresh prompt, never a silent loop and never an auto-deny.stop_reasonand, when paused,pending_interaction: {id, tool}. Both gates seed the durable interactions plane on every pause.{type: "builtin"}entries inagent.tools); the tool editor's permission select drops the legacy fallbacks.Before:
permissionPolicy: "auto"on the wire, park-by-default at the gate. After:permissions: {default: "allow_reads", rules: [...]}, verdicts from one function, pauses only on authoredask.Scope / risk
big-agents-work;big-agentshas since absorbed it via feat(agent): big-agents-work — turn inspector, HITL hardening, tool catalog, playground UX #5058 plus one HITL-hardening commit (5555e6ed81). Audit done: those runner deltas are formatting plus patches this redesign supersedes; the egressrawInput is not Nonefix merges untouched. The base flip + rebase happens before merge (five known overlapping files, resolution documented in build-notes.md).permissionis dropped with a logged warning. Selection-time enforcement is a designed follow-up (status.md).Tests
How to QA
Prerequisites: the EE dev stack; restart the runner sidecar after checkout (it compiles on start). A project with a Composio connection (pi-agents on the dev box).
Steps:
Expected: step 2 renders exactly one Approve/Deny prompt that stays; step 3 resumes and completes with no second prompt; step 4 continues with a plain refusal, no error card; step 5 runs with no prompt. Headless: POST a batch invoke with a write tool under
allow_readsand checkstop_reason: "paused"+pending_interactionin the envelope.Automated tests:
cd services/runner && pnpm test;cd sdks/python && uv run --no-sync python -m pytest oss/tests/pytest/unit/agents -n0;cd services && uv run --no-sync python -m pytest oss/tests/pytest/unit -n0.Edge cases: approve then let the model re-issue different args (fresh visible prompt, no silent loop); a config flipped to deny after an approval (deny wins); two gates in one turn (one prompt only).
Reading order for review
README.mdthenhow-approvals-work.md(the "target path" section describes what this PR ships) thenplan.md;build-notes.mdrecords every judgment call and incident;status.mdhas the decision log and follow-ups. Inline comments mark the load-bearing code.https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2