[feat] Approval boundary: pause on authored intent, not session ids by mmabrouk · Pull Request #5041 · Agenta-AI/agenta

mmabrouk · 2026-07-02T17:56:38Z

Context

An agent configured to auto-approve tools still stopped at the first gated tool and waited for a human who was not there. The runner decided "pause for a human" from whether the request carried a session id, and the SDK mints one for every request, so the auto policy was dead code: every gate paused, headless runs died silently (HTTP 200, mid-sentence text), and a fragile resume key produced the infinite approve-loop QA hit live. This PR carries the full design workspace (docs/design/agent-workflows/projects/approval-boundary/) plus the implementation.

Changes

One permission model, authored to enforced with no inference:

Authored: per tool permission: allow | ask | deny or inherit; per agent runner.permissions.default with four modes: allow, ask, deny, allow_reads (reads run, writes ask; the default). needs_approval, the permission_mode aliases, and runner.interactions.headless are deleted (legacy keys in old drafts are ignored, with a warning where they were a dead knob).
Wire: the run request carries permissions: {default, rules?}; permissionPolicy and needsApproval are gone. Rules and Claude settings render from one shared parse so they cannot disagree.
Enforced: one decision module (services/runner/src/permission-plan.ts) answers both gates. The ACP responder replies allow/deny or pauses; the relay enforces only where the harness does not gate, which gives Pi real human-in-the-loop for the first time. hasHumanSurface, PolicyResponder, and the feat(agent): agent playground — turn inspector, HITL dock, and tool I/O fidelity #5054 loop-breaker are deleted.
Resume: stored decisions match on stable anchors (spec name for relay tools, recorded tool_call name for Claude gates, canonical args), are consumed once, and a config changed to deny beats a stale approval. Drift produces a visible fresh prompt, never a silent loop and never an auto-deny.
Events mean actions: the approval event fires only when a run actually pauses (single per-turn latch), and a paused call's teardown frames are suppressed so nothing clobbers the prompt (QA-found bug, fixed with regression tests).
Visibility: batch responses carry stop_reason and, when paused, pending_interaction: {id, tool}. Both gates seed the durable interactions plane on every pause.
Frontend: the policy select shows the four modes for every harness including Pi; a Pi settings block selects builtins (written as {type: "builtin"} entries in agent.tools); the tool editor's permission select drops the legacy fallbacks.

Before: permissionPolicy: "auto" on the wire, park-by-default at the gate. After: permissions: {default: "allow_reads", rules: [...]}, verdicts from one function, pauses only on authored ask.

Scope / risk

Stacked on big-agents-work; big-agents has since absorbed it via feat(agent): big-agents-work — turn inspector, HITL hardening, tool catalog, playground UX #5058 plus one HITL-hardening commit (5555e6ed81). Audit done: those runner deltas are formatting plus patches this redesign supersedes; the egress rawInput is not None fix merges untouched. The base flip + rebase happens before merge (five known overlapping files, resolution documented in build-notes.md).
Pi native builtins are still governed by selection only; a per-builtin permission is dropped with a logged warning. Selection-time enforcement is a designed follow-up (status.md).
Claude-harness live runs were blocked by account credit; that path is covered by unit and settings-rendering tests and shares the Gate-2 machinery verified live on Pi.
Deny wording polish: a user deny currently reads "denied by the permission policy".

Tests

Runner 444 (vitest), SDK agents 480+ (pytest), services 49 (pytest); typecheck green.
A 40-case cross-language parity fixture pins the Python and TS resolvers to identical verdicts (zero disagreements found).
Regression pins for F-024/F-036/F-040/F-046, the M2/M7 loop, consume-once, the latch, and the pause-clobber fix.
Live QA: headless matrix 7/7 (all four modes, explicit-beats-policy both ways, paused envelope, legacy tolerance); playground: one prompt, Approve resumes with real tool output and zero re-prompts, Deny refuses cleanly.

How to QA

Prerequisites: the EE dev stack; restart the runner sidecar after checkout (it compiles on start). A project with a Composio connection (pi-agents on the dev box).

Steps:

Open an agent app, Pi harness, cheap model. In Advanced > Permissions set policy "Allow reads". Add a Composio read tool (GitHub "Get the authenticated user") and set its Permission to "Ask".
Chat: "You must call the GitHub get authenticated user tool now."
Approve the prompt.
Repeat and Deny.
Set the tool to Inherit and policy to "Allow all"; repeat.

Expected: step 2 renders exactly one Approve/Deny prompt that stays; step 3 resumes and completes with no second prompt; step 4 continues with a plain refusal, no error card; step 5 runs with no prompt. Headless: POST a batch invoke with a write tool under allow_reads and check stop_reason: "paused" + pending_interaction in the envelope.

Automated tests: cd services/runner && pnpm test; cd sdks/python && uv run --no-sync python -m pytest oss/tests/pytest/unit/agents -n0; cd services && uv run --no-sync python -m pytest oss/tests/pytest/unit -n0.

Edge cases: approve then let the model re-issue different args (fresh visible prompt, no silent loop); a config flipped to deny after an approval (deny wins); two gates in one turn (one prompt only).

Reading order for review

README.md then how-approvals-work.md (the "target path" section describes what this PR ships) then plan.md; build-notes.md records every judgment call and incident; status.md has the decision log and follow-ups. Inline comments mark the load-bearing code.

https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2

vercel · 2026-07-02T17:56:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jul 4, 2026 11:56am

mmabrouk · 2026-07-02T17:57:02Z

Feedback needed (the four decisions in status.md, each with stakes):

Fix shape: one-shot Option D (one resolved permission plan, SDK computes, both runner gates enforce) vs staged B+C first. The plan and the Codex second opinion both recommend D; the staged path ships sooner but temporarily auto-approves explicit ask rules on Claude builtins.
Naming: approve permissions.default with vocabulary allow | ask | deny replacing runner.interactions.headless / permission_policy: auto|deny.
Pi relay-ask scope: if relay parking proves heavy, accept a documented Pi-only collapse for the first slice?
Batch paused-response shape: only "visible + carries the interaction reference" is required from this side; exact fields coordinate with the streaming-invoke workspace.

Also worth a skim even if you skip the reviews: how-approvals-work.md (the flow explainer this whole workspace hangs on) and the "One expectation to reset" bullet in the README (the reproducing agent still pauses at SEND_MESSAGE under default config, by authored design; the fix makes that pause authored, visible, and answerable).

@coderabbitai review

coderabbitai · 2026-07-02T17:59:05Z

📝 Walkthrough

Walkthrough

Migrates tool/agent permission configuration from permission_policy/needs_approval to a unified runner.permissions.default and per-tool permission model. Introduces a shared runner decision engine (permission-plan.ts, effective_permission) replacing park-based HITL with pause/ApprovalResponder. Updates Python SDK, runner, agent service, frontend, and extensive design docs.

Changes

Permission model migration

Layer / File(s)	Summary
Design documentation `docs/design/agent-workflows/...`, `docs/designs/sessions/...`	Rewrites protocol, harness-adapter, and tool-model docs for `runner.permissions.default`/`permissions`, adds approval-boundary and build-kit-cleanup project docs, marks capability-config/hitl-fix docs superseded.
Python SDK permission model & harness adapters `sdks/python/agenta/sdk/agents/dtos.py`, `tools/models.py`, `wire_models.py`, `permission_rules.py`, `mcp/models.py`, `adapters/claude_settings.py`, `adapters/harnesses.py`, `platform/*`, `utils/types.py`, `__init__.py`	Introduces `PermissionMode`/`effective_permission`, replaces `permission_policy`/`needs_approval` fields, rewires Claude/Pi templates and `_parse_run_selection` to `permission_default`.
Python SDK tests `sdks/python/oss/tests/pytest/unit/agents/...`	Updates/adds tests for Claude settings rules, tool spec permission derivation, op catalog, wire-contract golden fixtures, and cross-language permission parity.
Agent service batch/pause handling `services/oss/src/agent/app.py`, tests	Wires `permission_default` into `SessionConfig`, tracks `interaction_request`, surfaces `stop_reason`/`pending_interaction`.
Runner permission engine & pause wiring `services/runner/src/permission-plan.ts`, `protocol.ts`, `responder.ts`, `engines/sandbox_agent*`, `tools/relay.ts`, `tools/dispatch.ts`	Adds shared decision engine, `ApprovalResponder`/`ConversationDecisions`, replaces park mechanism with `PendingApprovalPauseController`/`acp-interactions.ts`.
Runner tests `services/runner/tests/unit/...`	Adds/updates tests for permission-plan, responder, ACP interactions, orchestration, relay, and wire-contract.
Frontend permission controls `web/oss/...`, `web/packages/agenta-entity-ui/...`, `web/packages/agenta-playground/...`	Updates defaults to `runner.permissions.default`, adds `PiSettingsControl`, refactors `ToolFormView` and `useModelHarness` permission UI.

Estimated code review effort: 5 (Critical) | ~150 minutes

Sequence Diagram(s)

sequenceDiagram
  participant SandboxAgent as sandbox_agent.ts
  participant ACPInteractions as acp-interactions.ts
  participant ApprovalResponder as ApprovalResponder
  participant PermissionPlan as permission-plan.ts
  participant Relay as tools/relay.ts

  SandboxAgent->>ACPInteractions: attachPermissionResponder(session)
  ACPInteractions->>ApprovalResponder: onPermission(gate)
  ApprovalResponder->>PermissionPlan: decide(gate, plan, stored)
  PermissionPlan-->>ApprovalResponder: Verdict(allow/deny/pendingApproval)
  alt pendingApproval
    ApprovalResponder-->>ACPInteractions: pendingApproval
    ACPInteractions->>SandboxAgent: emit interaction_request, pause()
  else allow or deny
    ApprovalResponder-->>ACPInteractions: allow/deny
    ACPInteractions->>SandboxAgent: session.respondPermission(reply)
  end
  Relay->>PermissionPlan: decide(gate, plan, stored)
  PermissionPlan-->>Relay: Verdict

Possibly related PRs

Agenta-AI/agenta#4814: Overlaps in harness adapter wiring that removes legacy permission_policy forwarding and propagates the new permissions shape.
Agenta-AI/agenta#4848: Both touch the HITL permission/approval gating flow and related sandbox-agent/ACP wiring at the same responder boundary.
Agenta-AI/agenta#4888: Both modify the Claude settings renderer/permissions handling for the internal agenta-tools MCP server.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title clearly summarizes the main change: replacing session-id-based pauses with authored-intent approval handling.
Description check	✅ Passed	The description is detailed and directly matches the documented permission-model, runner, and UI changes in the PR.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/approval-boundary

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai · 2026-07-02T17:59:16Z

Oops, something went wrong! Please try again later. 🐰 💔

jp-agenta · 2026-07-02T21:06:43Z

Here, permissions pertain to the runner, so instead of permissions.default it should at least be runner.permissions.default. Also, permissions.default sounds rather general for "just" interactions. Unless we foresee no other use of permissions in the runner, let's keep them under interactions: either runner.interactions.permissions.x or runner.permissions.interactions.x depending on what is being modelled and how.

mmabrouk · 2026-07-03T09:55:58Z

+(`sdks/python/agenta/sdk/agents/adapters/claude_settings.py:201-258`) by merging four rule
+sources: the author's raw Claude rules, denies derived from the sandbox permission, per-MCP-
+server permissions, and per-tool permissions. The rendering is deliberately conservative: a
+per-tool `allow` becomes an allow rule (Claude runs the tool with no prompt), a `deny`


I'm not sure I understand this part. Do we override user settings on the permissions to these ones? Do we automatically set the permissions in the cloud setting depending on the other permissions?

Neither override nor magic: we generate the whole settings.json from the agent config. The author's raw Claude rules (knob 3) are one of four inputs; they are kept verbatim and our derived rules (sandbox denies, per-MCP, per-tool) are appended, never rewritten. Conflicts are settled by Claude's own precedence at match time (deny beats allow), so an author deny always holds. Doc updated with the merge semantics.

mmabrouk · 2026-07-03T10:12:19Z

+3. The runner starts a Claude session in the sandbox, writing the rendered
+   `.claude/settings.json` into the workspace first.
+4. Claude runs the three reads without asking (allow rules from their `read_only` hints).
+   Their `tool_call` and `tool_result` events stream to the frontend as they happen.


If I understand correctly, in this case we don't even hit the gate to the ACP. Everything is auto-improved within Claude Code, and just to get the results, no request events or whatever.

Exactly right. The three reads are settled entirely inside Claude by the rendered allow rules; nothing reaches the ACP layer for them, no permission or interaction events exist for them anywhere; you only see their tool_call/tool_result events. Doc step 4 now states this in your words.

Generalize the explainer across harnesses (per-harness gate table, Pi has no HITL today), explain Claude default_mode/bypassPermissions and the settings merge semantics, distinguish ACP request vs stream event and the two kinds of session, explain the cold-replay resume model before the responder code, and record the live approve-loop finding. Plan updates: relay becomes a pure executor (decision before execution), client tools resolve through the same ladder defaulting to allow, M2+M3 elevated with the approve-loop as an acceptance case. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2

The live approve-loop is diagnosed: a constant stream messageId plus a level-triggered resume predicate on the frontend (new finding M7), compounded by tool-name drift across ACP frames breaking the decision key (M2's observed form, not argument drift). Updated the explainer's live-warning section, reframed M2 and added M7 in the code review, settled the fix direction in the plan (direct replay of the approved call; absorb #5054's message-id fix and edge-trigger guard; supersede its resolvedName patch and loop-breaker), and recorded the #5054 recommendation in status. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2

…e target path Global policy becomes four explicit modes (allow|ask|deny|allow_reads): read-only-allow is a policy choice, not a hidden per-tool default, and needs_approval is deleted from the model. 'Disposition' renamed to 'effective permission' everywhere. New 'target path' section shows the clean end-state flow; resume is redesigned to replay the approved call directly. Corrected the session-id story (the playground sends a stable per-conversation id). Added the Pi-builtins explanation (selection is Pi's only native control). Plan gains the stacked-on-#5054 baseline (keep the message-id fix and resume guard; delete resolvedName and the loop-breaker) and updated phases/deltas. Status consolidated for final review. Claude-Session: https://claude.ai/code/session_01DGj7GKafjkZeQXMsryWhb2

… Pi settings block (round 3)

…sume redesign, wire-first phases, test plan)

…y phase 1)

…lt (phase 2a)

…y-ask pauses (phase 2b)

…al (phase 3)

… tail)

…y fallbacks (phase 4b)

…ause controller (phase 5)

…perseded workspaces

…he approval prompt

- handler.py: permission_default (the SDK deleted permission_policy in phase 3) - fold(): the terminal result's stop_reason wins over the done event (the live runner emits done without stopReason, so a real pause would otherwise drop its envelope metadata) and pending_interaction carries a derived top-level tool name - sandbox_agent.ts: fold upstream keepers lost to ours-wins conflict resolution (shared apiBase module, claude-model + resolved-model log lines) - service pause test pinned to the realistic stream shape (done event with no stopReason)

- playground: a rules-only runner.permissions override no longer drops the allow_reads default from the /invoke body (nested merge + test) - runner: a client-tool pause now seeds /sessions/interactions like every other pause (pauseClientTool was the one gate that left no row) - sdk: wire.py imports ToolPermission/PermissionRule from permission_rules.py instead of redeclaring them; greppable legacy-key literals in mcp/models.py; annotate_trace comment and claude_settings docstring tell the truth - docs: MCP-server permission step added to three precedence ladders; permissions wire block documents the optional default + fallbacks; client-tool carve-out paragraph matches onClientTool's actual semantics

…bit round, and merge

…l pauses with their own kind Codex pre-merge review of the rebase: - both ACP pause sites emit a stamped COPY of the toolCall (resolvedName = the gate's stable anchor) so the Vercel egress's preferred field is populated again (upstream stamped by mutating the shared ACP object; we keep it pure) - onCreateInteraction threads kind, so client-tool rows stop masquerading as user_approval, and the Pi relay's client-tool pause now seeds a row at all - the gateway integration test drops the deleted needs_approval vocabulary

CLAassistant · 2026-07-04T11:55:24Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ jp-agenta
✅ ardaerzin
✅ junaway
✅ mmabrouk
❌ Agenta Team

Agenta Team seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…5059 + decisions and implementation status Claude-Session: https://claude.ai/code/session_014iPB7HL5PjgT9npyPHaFMT

mmabrouk added the needs-review Agent updated; awaiting Mahmoud's review label Jul 2, 2026

vercel Bot deployed to Preview July 2, 2026 17:57 View deployment

mmabrouk force-pushed the docs/approval-boundary branch from 361eb67 to 9d5161e Compare July 3, 2026 09:42

vercel Bot deployed to Preview July 3, 2026 09:44 View deployment