Skip to content

[feat] Approval boundary: pause on authored intent, not session ids#5041

Merged
mmabrouk merged 24 commits into
big-agentsfrom
docs/approval-boundary
Jul 4, 2026
Merged

[feat] Approval boundary: pause on authored intent, not session ids#5041
mmabrouk merged 24 commits into
big-agentsfrom
docs/approval-boundary

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jul 2, 2026

Copy link
Copy Markdown
Member

Context

An agent configured to auto-approve tools still stopped at the first gated tool and waited for a human who was not there. The runner decided "pause for a human" from whether the request carried a session id, and the SDK mints one for every request, so the auto policy was dead code: every gate paused, headless runs died silently (HTTP 200, mid-sentence text), and a fragile resume key produced the infinite approve-loop QA hit live. This PR carries the full design workspace (docs/design/agent-workflows/projects/approval-boundary/) plus the implementation.

Changes

One permission model, authored to enforced with no inference:

  • Authored: per tool permission: allow | ask | deny or inherit; per agent runner.permissions.default with four modes: allow, ask, deny, allow_reads (reads run, writes ask; the default). needs_approval, the permission_mode aliases, and runner.interactions.headless are deleted (legacy keys in old drafts are ignored, with a warning where they were a dead knob).
  • Wire: the run request carries permissions: {default, rules?}; permissionPolicy and needsApproval are gone. Rules and Claude settings render from one shared parse so they cannot disagree.
  • Enforced: one decision module (services/runner/src/permission-plan.ts) answers both gates. The ACP responder replies allow/deny or pauses; the relay enforces only where the harness does not gate, which gives Pi real human-in-the-loop for the first time. hasHumanSurface, PolicyResponder, and the feat(agent): agent playground — turn inspector, HITL dock, and tool I/O fidelity #5054 loop-breaker are deleted.
  • Resume: stored decisions match on stable anchors (spec name for relay tools, recorded tool_call name for Claude gates, canonical args), are consumed once, and a config changed to deny beats a stale approval. Drift produces a visible fresh prompt, never a silent loop and never an auto-deny.
  • Events mean actions: the approval event fires only when a run actually pauses (single per-turn latch), and a paused call's teardown frames are suppressed so nothing clobbers the prompt (QA-found bug, fixed with regression tests).
  • Visibility: batch responses carry stop_reason and, when paused, pending_interaction: {id, tool}. Both gates seed the durable interactions plane on every pause.
  • Frontend: the policy select shows the four modes for every harness including Pi; a Pi settings block selects builtins (written as {type: "builtin"} entries in agent.tools); the tool editor's permission select drops the legacy fallbacks.

Before: permissionPolicy: "auto" on the wire, park-by-default at the gate. After: permissions: {default: "allow_reads", rules: [...]}, verdicts from one function, pauses only on authored ask.

Scope / risk

  • Stacked on big-agents-work; big-agents has since absorbed it via feat(agent): big-agents-work — turn inspector, HITL hardening, tool catalog, playground UX #5058 plus one HITL-hardening commit (5555e6ed81). Audit done: those runner deltas are formatting plus patches this redesign supersedes; the egress rawInput is not None fix merges untouched. The base flip + rebase happens before merge (five known overlapping files, resolution documented in build-notes.md).
  • Pi native builtins are still governed by selection only; a per-builtin permission is dropped with a logged warning. Selection-time enforcement is a designed follow-up (status.md).
  • Claude-harness live runs were blocked by account credit; that path is covered by unit and settings-rendering tests and shares the Gate-2 machinery verified live on Pi.
  • Deny wording polish: a user deny currently reads "denied by the permission policy".

Tests

  • Runner 444 (vitest), SDK agents 480+ (pytest), services 49 (pytest); typecheck green.
  • A 40-case cross-language parity fixture pins the Python and TS resolvers to identical verdicts (zero disagreements found).
  • Regression pins for F-024/F-036/F-040/F-046, the M2/M7 loop, consume-once, the latch, and the pause-clobber fix.
  • Live QA: headless matrix 7/7 (all four modes, explicit-beats-policy both ways, paused envelope, legacy tolerance); playground: one prompt, Approve resumes with real tool output and zero re-prompts, Deny refuses cleanly.

How to QA

Prerequisites: the EE dev stack; restart the runner sidecar after checkout (it compiles on start). A project with a Composio connection (pi-agents on the dev box).

Steps:

  1. Open an agent app, Pi harness, cheap model. In Advanced > Permissions set policy "Allow reads". Add a Composio read tool (GitHub "Get the authenticated user") and set its Permission to "Ask".
  2. Chat: "You must call the GitHub get authenticated user tool now."
  3. Approve the prompt.
  4. Repeat and Deny.
  5. Set the tool to Inherit and policy to "Allow all"; repeat.

Expected: step 2 renders exactly one Approve/Deny prompt that stays; step 3 resumes and completes with no second prompt; step 4 continues with a plain refusal, no error card; step 5 runs with no prompt. Headless: POST a batch invoke with a write tool under allow_reads and check stop_reason: "paused" + pending_interaction in the envelope.

Automated tests: cd services/runner && pnpm test; cd sdks/python && uv run --no-sync python -m pytest oss/tests/pytest/unit/agents -n0; cd services && uv run --no-sync python -m pytest oss/tests/pytest/unit -n0.

Edge cases: approve then let the model re-issue different args (fresh visible prompt, no silent loop); a config flipped to deny after an approval (deny wins); two gates in one turn (one prompt only).

Reading order for review

README.md then how-approvals-work.md (the "target path" section describes what this PR ships) then plan.md; build-notes.md records every judgment call and incident; status.md has the decision log and follow-ups. Inline comments mark the load-bearing code.

https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2

@vercel

vercel Bot commented Jul 2, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jul 4, 2026 11:56am

Request Review

@mmabrouk mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 2, 2026
@mmabrouk

mmabrouk commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Feedback needed (the four decisions in status.md, each with stakes):

  1. Fix shape: one-shot Option D (one resolved permission plan, SDK computes, both runner gates enforce) vs staged B+C first. The plan and the Codex second opinion both recommend D; the staged path ships sooner but temporarily auto-approves explicit ask rules on Claude builtins.
  2. Naming: approve permissions.default with vocabulary allow | ask | deny replacing runner.interactions.headless / permission_policy: auto|deny.
  3. Pi relay-ask scope: if relay parking proves heavy, accept a documented Pi-only collapse for the first slice?
  4. Batch paused-response shape: only "visible + carries the interaction reference" is required from this side; exact fields coordinate with the streaming-invoke workspace.

Also worth a skim even if you skip the reviews: how-approvals-work.md (the flow explainer this whole workspace hangs on) and the "One expectation to reset" bullet in the README (the reproducing agent still pauses at SEND_MESSAGE under default config, by authored design; the fix makes that pause authored, visible, and answerable).

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Migrates tool/agent permission configuration from permission_policy/needs_approval to a unified runner.permissions.default and per-tool permission model. Introduces a shared runner decision engine (permission-plan.ts, effective_permission) replacing park-based HITL with pause/ApprovalResponder. Updates Python SDK, runner, agent service, frontend, and extensive design docs.

Changes

Permission model migration

Layer / File(s) Summary
Design documentation
docs/design/agent-workflows/..., docs/designs/sessions/...
Rewrites protocol, harness-adapter, and tool-model docs for runner.permissions.default/permissions, adds approval-boundary and build-kit-cleanup project docs, marks capability-config/hitl-fix docs superseded.
Python SDK permission model & harness adapters
sdks/python/agenta/sdk/agents/dtos.py, tools/models.py, wire_models.py, permission_rules.py, mcp/models.py, adapters/claude_settings.py, adapters/harnesses.py, platform/*, utils/types.py, __init__.py
Introduces PermissionMode/effective_permission, replaces permission_policy/needs_approval fields, rewires Claude/Pi templates and _parse_run_selection to permission_default.
Python SDK tests
sdks/python/oss/tests/pytest/unit/agents/...
Updates/adds tests for Claude settings rules, tool spec permission derivation, op catalog, wire-contract golden fixtures, and cross-language permission parity.
Agent service batch/pause handling
services/oss/src/agent/app.py, tests
Wires permission_default into SessionConfig, tracks interaction_request, surfaces stop_reason/pending_interaction.
Runner permission engine & pause wiring
services/runner/src/permission-plan.ts, protocol.ts, responder.ts, engines/sandbox_agent*, tools/relay.ts, tools/dispatch.ts
Adds shared decision engine, ApprovalResponder/ConversationDecisions, replaces park mechanism with PendingApprovalPauseController/acp-interactions.ts.
Runner tests
services/runner/tests/unit/...
Adds/updates tests for permission-plan, responder, ACP interactions, orchestration, relay, and wire-contract.
Frontend permission controls
web/oss/..., web/packages/agenta-entity-ui/..., web/packages/agenta-playground/...
Updates defaults to runner.permissions.default, adds PiSettingsControl, refactors ToolFormView and useModelHarness permission UI.

Estimated code review effort: 5 (Critical) | ~150 minutes

Sequence Diagram(s)

sequenceDiagram
  participant SandboxAgent as sandbox_agent.ts
  participant ACPInteractions as acp-interactions.ts
  participant ApprovalResponder as ApprovalResponder
  participant PermissionPlan as permission-plan.ts
  participant Relay as tools/relay.ts

  SandboxAgent->>ACPInteractions: attachPermissionResponder(session)
  ACPInteractions->>ApprovalResponder: onPermission(gate)
  ApprovalResponder->>PermissionPlan: decide(gate, plan, stored)
  PermissionPlan-->>ApprovalResponder: Verdict(allow/deny/pendingApproval)
  alt pendingApproval
    ApprovalResponder-->>ACPInteractions: pendingApproval
    ACPInteractions->>SandboxAgent: emit interaction_request, pause()
  else allow or deny
    ApprovalResponder-->>ACPInteractions: allow/deny
    ACPInteractions->>SandboxAgent: session.respondPermission(reply)
  end
  Relay->>PermissionPlan: decide(gate, plan, stored)
  PermissionPlan-->>Relay: Verdict
Loading

Possibly related PRs

  • Agenta-AI/agenta#4814: Overlaps in harness adapter wiring that removes legacy permission_policy forwarding and propagates the new permissions shape.
  • Agenta-AI/agenta#4848: Both touch the HITL permission/approval gating flow and related sandbox-agent/ACP wiring at the same responder boundary.
  • Agenta-AI/agenta#4888: Both modify the Claude settings renderer/permissions handling for the internal agenta-tools MCP server.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly summarizes the main change: replacing session-id-based pauses with authored-intent approval handling.
Description check ✅ Passed The description is detailed and directly matches the documented permission-model, runner, and UI changes in the PR.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/approval-boundary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Oops, something went wrong! Please try again later. 🐰 💔

@jp-agenta

jp-agenta commented Jul 2, 2026

Copy link
Copy Markdown
Member

Here, permissions pertain to the runner, so instead of permissions.default it should at least be runner.permissions.default. Also, permissions.default sounds rather general for "just" interactions. Unless we foresee no other use of permissions in the runner, let's keep them under interactions: either runner.interactions.permissions.x or runner.permissions.interactions.x depending on what is being modelled and how.

Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
(`sdks/python/agenta/sdk/agents/adapters/claude_settings.py:201-258`) by merging four rule
sources: the author's raw Claude rules, denies derived from the sandbox permission, per-MCP-
server permissions, and per-tool permissions. The rendering is deliberately conservative: a
per-tool `allow` becomes an allow rule (Claude runs the tool with no prompt), a `deny`

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this part. Do we override user settings on the permissions to these ones? Do we automatically set the permissions in the cloud setting depending on the other permissions?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither override nor magic: we generate the whole settings.json from the agent config. The author's raw Claude rules (knob 3) are one of four inputs; they are kept verbatim and our derived rules (sandbox denies, per-MCP, per-tool) are appended, never rewritten. Conflicts are settled by Claude's own precedence at match time (deny beats allow), so an author deny always holds. Doc updated with the merge semantics.

Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
Comment thread docs/design/agent-workflows/projects/approval-boundary/how-approvals-work.md Outdated
3. The runner starts a Claude session in the sandbox, writing the rendered
`.claude/settings.json` into the workspace first.
4. Claude runs the three reads without asking (allow rules from their `read_only` hints).
Their `tool_call` and `tool_result` events stream to the frontend as they happen.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, in this case we don't even hit the gate to the ACP. Everything is auto-improved within Claude Code, and just to get the results, no request events or whatever.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly right. The three reads are settled entirely inside Claude by the rendered allow rules; nothing reaches the ACP layer for them, no permission or interaction events exist for them anywhere; you only see their tool_call/tool_result events. Doc step 4 now states this in your words.

mmabrouk added 22 commits July 4, 2026 13:21
Generalize the explainer across harnesses (per-harness gate table, Pi has no
HITL today), explain Claude default_mode/bypassPermissions and the settings
merge semantics, distinguish ACP request vs stream event and the two kinds of
session, explain the cold-replay resume model before the responder code, and
record the live approve-loop finding. Plan updates: relay becomes a pure
executor (decision before execution), client tools resolve through the same
ladder defaulting to allow, M2+M3 elevated with the approve-loop as an
acceptance case.

Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
The live approve-loop is diagnosed: a constant stream messageId plus a
level-triggered resume predicate on the frontend (new finding M7), compounded
by tool-name drift across ACP frames breaking the decision key (M2's observed
form, not argument drift). Updated the explainer's live-warning section,
reframed M2 and added M7 in the code review, settled the fix direction in the
plan (direct replay of the approved call; absorb #5054's message-id fix and
edge-trigger guard; supersede its resolvedName patch and loop-breaker), and
recorded the #5054 recommendation in status.

Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
…e target path

Global policy becomes four explicit modes (allow|ask|deny|allow_reads):
read-only-allow is a policy choice, not a hidden per-tool default, and
needs_approval is deleted from the model. 'Disposition' renamed to 'effective
permission' everywhere. New 'target path' section shows the clean end-state
flow; resume is redesigned to replay the approved call directly. Corrected the
session-id story (the playground sends a stable per-conversation id). Added
the Pi-builtins explanation (selection is Pi's only native control). Plan
gains the stacked-on-#5054 baseline (keep the message-id fix and resume guard;
delete resolvedName and the loop-breaker) and updated phases/deltas. Status
consolidated for final review.

Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2
…sume redesign, wire-first phases, test plan)
- handler.py: permission_default (the SDK deleted permission_policy in phase 3)
- fold(): the terminal result's stop_reason wins over the done event (the live runner
  emits done without stopReason, so a real pause would otherwise drop its envelope
  metadata) and pending_interaction carries a derived top-level tool name
- sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution
  (shared apiBase module, claude-model + resolved-model log lines)
- service pause test pinned to the realistic stream shape (done event with no stopReason)
- playground: a rules-only runner.permissions override no longer drops the
  allow_reads default from the /invoke body (nested merge + test)
- runner: a client-tool pause now seeds /sessions/interactions like every other
  pause (pauseClientTool was the one gate that left no row)
- sdk: wire.py imports ToolPermission/PermissionRule from permission_rules.py
  instead of redeclaring them; greppable legacy-key literals in mcp/models.py;
  annotate_trace comment and claude_settings docstring tell the truth
- docs: MCP-server permission step added to three precedence ladders; permissions
  wire block documents the optional default + fallbacks; client-tool carve-out
  paragraph matches onClientTool's actual semantics
…l pauses with their own kind

Codex pre-merge review of the rebase:
- both ACP pause sites emit a stamped COPY of the toolCall (resolvedName = the
  gate's stable anchor) so the Vercel egress's preferred field is populated again
  (upstream stamped by mutating the shared ACP object; we keep it pure)
- onCreateInteraction threads kind, so client-tool rows stop masquerading as
  user_approval, and the Pi relay's client-tool pause now seeds a row at all
- the gateway integration test drops the deleted needs_approval vocabulary
@mmabrouk mmabrouk force-pushed the docs/approval-boundary branch from 62d2f49 to f3a666f Compare July 4, 2026 11:55
@mmabrouk mmabrouk changed the base branch from big-agents-work to big-agents July 4, 2026 11:55
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ jp-agenta
✅ ardaerzin
✅ junaway
✅ mmabrouk
❌ Agenta Team


Agenta Team seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@mmabrouk mmabrouk merged commit b839267 into big-agents Jul 4, 2026
22 of 24 checks passed
mmabrouk added a commit that referenced this pull request Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend documentation Improvements or additions to documentation Feature Request New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants