Skip to content

Run panel: auto-approve operator-invoked tools#1183

Merged
RhysSullivan merged 2 commits into
mainfrom
run-panel-auto-approve
Jun 28, 2026
Merged

Run panel: auto-approve operator-invoked tools#1183
RhysSullivan merged 2 commits into
mainfrom
run-panel-auto-approve

Conversation

@RhysSullivan

Copy link
Copy Markdown
Owner

Invoking a tool through the Run/Test panel paused on approval-gated tools and dead-ended on a "This tool requires approval (a policy gates it)" message, with no way to actually run it from the panel. But the operator clicking Run is itself the approval, so the panel should just run the tool.

What changed

autoApprove is threaded through the execute path. When set, the execution engine runs the inline accept-all handler instead of intercepting the first elicitation as a pause, so an approval-gated tool runs to completion:

  • executeWithPause(code, { autoApprove }) in the engine
  • POST /executions accepts an optional autoApprove
  • the Run panel sends autoApprove: true

Safety is preserved: block policies fail before any elicitation, so this never bypasses a hard block. The MCP host path is unchanged and still pauses for the model to approve.

Before / after

Same tool, same Require approval · npmdl.* policy badge. Before, the panel dead-ends; after, it returns the real result (HTTP 200, real download count).

Before

before

After

after

Tests

  • New cross-target e2e scenario e2e/scenarios/run-panel-auto-approve.test.ts: drives the same POST /executions endpoint the panel uses against a tool gated by its own requiresApproval annotation. Without autoApprove the call pauses and the side effect does not happen; with autoApprove it runs to completion and the side effect lands. Green against a live selfhost instance.
  • New engine unit test in tool-invoker.test.ts: the same eliciting tool that pauses without autoApprove runs straight to completion with it.
  • Existing pause/resume suites still green (the default, non-autoApprove path is untouched).

Invoking a tool through the Run/Test panel paused on approval-gated tools
and dead-ended on a "This tool requires approval" message, even though the
operator clicking Run is itself the approval.

Thread an optional autoApprove through the execute path: the execution
engine runs the inline accept-all handler instead of intercepting the first
elicitation as a pause, so an approval-gated tool runs to completion. The
HTTP /executions endpoint takes autoApprove and the Run panel sends it.
block policies still fail before any elicitation, so this never bypasses a
hard block; the MCP host path is unchanged and still pauses for the model.
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
executor-marketing b0bb4a8 Commit Preview URL

Branch Preview URL
Jun 28 2026, 06:28 PM

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 28, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
executor-cloud b0bb4a8 Jun 28 2026, 06:30 PM

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Cloudflare preview

Torn down — the PR is closed.

These two files were already unformatted on main (identical to origin/main);
oxfmt --check flags them repo-wide. Formatting-only, no behavior change.
@pkg-pr-new

pkg-pr-new Bot commented Jun 28, 2026

Copy link
Copy Markdown

Open in StackBlitz

@executor-js/cli

npm i https://pkg.pr.new/@executor-js/cli@1183

@executor-js/config

npm i https://pkg.pr.new/@executor-js/config@1183

@executor-js/execution

npm i https://pkg.pr.new/@executor-js/execution@1183

@executor-js/sdk

npm i https://pkg.pr.new/@executor-js/sdk@1183

@executor-js/plugin-file-secrets

npm i https://pkg.pr.new/@executor-js/plugin-file-secrets@1183

@executor-js/plugin-graphql

npm i https://pkg.pr.new/@executor-js/plugin-graphql@1183

@executor-js/plugin-keychain

npm i https://pkg.pr.new/@executor-js/plugin-keychain@1183

@executor-js/plugin-mcp

npm i https://pkg.pr.new/@executor-js/plugin-mcp@1183

@executor-js/plugin-onepassword

npm i https://pkg.pr.new/@executor-js/plugin-onepassword@1183

@executor-js/plugin-openapi

npm i https://pkg.pr.new/@executor-js/plugin-openapi@1183

@executor-js/codemode-core

npm i https://pkg.pr.new/@executor-js/codemode-core@1183

@executor-js/runtime-quickjs

npm i https://pkg.pr.new/@executor-js/runtime-quickjs@1183

executor

npm i https://pkg.pr.new/executor@1183

commit: b0bb4a8

@greptile-apps

greptile-apps Bot commented Jun 28, 2026

Copy link
Copy Markdown

Greptile Summary

This PR threads an autoApprove flag from the Run/Test panel through the HTTP API into the execution engine, so clicking Run is treated as the human approval rather than pausing on a requiresApproval-annotated tool.

  • Adds autoApprove?: boolean to the ExecuteRequest schema and handler, routes it to executeWithPause, and inside the engine the autoApprove path short-circuits the pause-queue entirely by calling runInlineExecution with an acceptAllHandler (() => Effect.succeed({ action: "accept" })).
  • block policies are unaffected because they reject before any elicitation fires; the new path is strictly narrower than bypassing a block.
  • New unit test verifies the same eliciting tool that pauses without the flag completes straight through with it; new e2e scenario drives the HTTP endpoint end-to-end and asserts the side effect (policy write) lands only on the auto-approved call.

Confidence Score: 4/5

Safe to merge. The core execution change is a clean short-circuit: autoApprove routes through the existing inline path with an accept-all handler, leaving the pause-queue path completely untouched. Block policies fire before any elicitation and are unaffected.

The engine and API changes are correct and well-tested, with both a unit test and an e2e scenario proving the before/after contract. The only loose end is in tool-run-panel.tsx: the paused branch and its UI message were not updated to match the new reality where autoApprove: true is always sent, leaving stale dead code with misleading advice. This does not affect runtime behavior under normal conditions but would give operators wrong guidance in an unexpected edge case.

packages/react/src/components/tool-run-panel.tsx — the paused result branch and its UI message should be updated or removed.

Important Files Changed

Filename Overview
packages/core/execution/src/engine.ts Adds acceptAllHandler and an early-return branch in startPausableExecution that routes through runInlineExecution when autoApprove is set, bypassing the pause queue entirely. Logic is clean, the typed error channel is preserved, and block policies still fire before elicitation.
packages/core/api/src/executions/api.ts Adds autoApprove: Schema.optional(Schema.Boolean) to ExecuteRequest. Minimal, correct schema change with a clear comment.
packages/react/src/components/tool-run-panel.tsx Adds autoApprove: true to the execute payload. The existing paused result branch is now dead code (the server never returns paused when autoApprove: true is sent), but the stale UI message in that branch remains.
e2e/scenarios/run-panel-auto-approve.test.ts New cross-target e2e scenario that drives the HTTP endpoint directly, proves the tool pauses without autoApprove and completes with it, and cleans up via Effect.ensuring. Well-structured and reads as a spec.
packages/core/execution/src/tool-invoker.test.ts Adds a unit test that confirms the same eliciting tool that pauses without autoApprove runs to completion with it. Good complementary coverage to the e2e test.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Panel as ToolRunPanel (React)
    participant API as POST /executions
    participant Engine as ExecutionEngine
    participant Invoker as ToolInvoker
    participant Tool as Approval-gated Tool

    Panel->>API: "{ code, autoApprove: true }"
    API->>Engine: "executeWithPause(code, { autoApprove: true })"
    Note over Engine: autoApprove branch: skip pause queue
    Engine->>Engine: runInlineExecution(code, acceptAllHandler)
    Engine->>Invoker: "execute(code, { onElicitation: acceptAllHandler })"
    Invoker->>Tool: invoke tool
    Tool-->>Invoker: ElicitationRequest (requiresApproval)
    Invoker->>Engine: acceptAllHandler(ctx)
    Engine-->>Invoker: "{ action: "accept" }"
    Tool-->>Invoker: tool result
    Invoker-->>Engine: ExecuteResult
    Engine-->>API: "{ status: "completed", result }"
    API-->>Panel: "{ status: "completed", text, structured, isError }"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Panel as ToolRunPanel (React)
    participant API as POST /executions
    participant Engine as ExecutionEngine
    participant Invoker as ToolInvoker
    participant Tool as Approval-gated Tool

    Panel->>API: "{ code, autoApprove: true }"
    API->>Engine: "executeWithPause(code, { autoApprove: true })"
    Note over Engine: autoApprove branch: skip pause queue
    Engine->>Engine: runInlineExecution(code, acceptAllHandler)
    Engine->>Invoker: "execute(code, { onElicitation: acceptAllHandler })"
    Invoker->>Tool: invoke tool
    Tool-->>Invoker: ElicitationRequest (requiresApproval)
    Invoker->>Engine: acceptAllHandler(ctx)
    Engine-->>Invoker: "{ action: "accept" }"
    Tool-->>Invoker: tool result
    Invoker-->>Engine: ExecuteResult
    Engine-->>API: "{ status: "completed", result }"
    API-->>Panel: "{ status: "completed", text, structured, isError }"
Loading

Comments Outside Diff (2)

  1. packages/react/src/components/tool-run-panel.tsx, line 236-245 (link)

    P2 Stale dead-code branch: with autoApprove: true always in the payload, the server's startPausableExecution routes through runInlineExecution and can only ever return status: "completed". The paused branch here is therefore unreachable in normal operation, and if it were somehow hit (e.g. a schema version mismatch strips autoApprove), the message "adjust the policy to run it directly" would be wrong advice — the panel is already sending autoApprove: true, so adjusting a policy would not unblock the call. Either remove the branch or update the message to reflect the new reality.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/react/src/components/tool-run-panel.tsx, line 374-376 (link)

    P2 The "paused" UI message is now stale. Since the panel always sends autoApprove: true, a requiresApproval-annotated tool runs straight through, and a block-policy tool fails (returning completed with isError: true) rather than pausing. If this message is ever shown it means something unexpected happened, and "adjust the policy" is no longer the right next step.

Reviews (1): Last reviewed commit: "Format files flagged by oxfmt" | Re-trigger Greptile

@RhysSullivan RhysSullivan merged commit a150db9 into main Jun 28, 2026
22 of 23 checks passed
@RhysSullivan RhysSullivan deleted the run-panel-auto-approve branch June 28, 2026 18:41
RhysSullivan added a commit that referenced this pull request Jul 2, 2026
The cloud e2e project never gated CI either, so ten scenarios rotted.
Refresh the four whose product behavior moved intentionally:
- connect-card-ssr-origin: install URLs are org-slug-scoped since the
  org-slug console URLs change (#974); accept the slug form.
- connection-owner-isolation: /api/auth/switch-organization was deleted
  with cookie-based org switching (#1000); switch orgs the way the web
  client does, via the x-executor-organization selector header.
- oauth-connections: the popup-state fix (#1235) envelopes the callback
  state as base64url JSON; decode it and assert the inner state + orgSlug.
- unauthenticated-skeleton: the 404 page shipped as a standalone page in
  the same commit as the shell-framed assertion (#986); assert the page
  it actually renders.

Quarantine the six that need product/harness work, each with a reason:
mcp-browser-approval-org-scope + the two browser-approval scenarios
(cloud-only: the mcporter browser-approval completion never lands),
cli-device-login (device-flow terminal never reaches the emulator), and
run-panel-auto-approve (autoApprove leaves the run paused; never green
since the feature landed in #1183).
RhysSullivan added a commit that referenced this pull request Jul 2, 2026
* e2e: fix stale docs, harden dev-CLI status, add cloud+selfhost CI jobs

- e2e/AGENTS.md: the anatomy example predated the service-yielding scenario()
  signature (no more needs/ctx); capability notes said browser was cloud-only
  and mcp-oauth selfhost-only, both wrong per targets/*.ts; file placement now
  lists cloudflare/, local/, cli/; document summary, motel, test:* scripts,
  the viewer/ SPA, pr-media, and the Windows desktop/cli VM targets.
- e2e dev CLI status: probe the app URL before reporting ready (a zombie
  runner with a dead server used to read as healthy), and only parse real
  state files in .dev/ (cloud.journey.json rendered as a garbage DEAD line).
- CI: run the cloud and selfhost e2e projects on every PR/push with failure
  artifacts (trace.zip, session.mp4, step screenshots) uploaded per target.

* Fix the MCP regressions and policy gaps the e2e suite caught

Cloud (hibernatable MCP DO rework fallout):
- server.ts no longer gates MCP dispatch behind the Axiom tracer install: with
  AXIOM_TOKEN unset (any dev boot without motel) every /mcp request fell
  through to the SPA router and 404ed.
- agent-handler mounts a second serve() on /mcp/toolkits/:slug — the agents
  SDK builds an exact-match URLPattern, so the single /mcp handler never saw
  toolkit paths.
- Restore the old envelope's transport contract: JSON-RPC 405 for verbs
  outside GET/POST/DELETE/OPTIONS (was a bare 404), 200 for session DELETE
  (agents SDK answers 204), and a reconnect-worded 404 for requests that
  race a condemned DO's abort.

Selfhost (org-scoped MCP OAuth discovery):
- The org-segment strip middleware now carries the original pathname in an
  internal header, and the protected-resource metadata echoes it, so a client
  that dialed /<org>/mcp/... passes the MCP SDK's RFC 9728 resource check.
  Bare paths are untouched; the header is stripped from unrewritten requests.

Microsoft Graph URL policy:
- microsoftHttpPlugin gains the hosts' local-network dev posture: selfhost,
  cloud, and the cloudflare host thread allowLocalNetwork into
  allowUnsafeUrlOverrides, and the override now also admits plain-http
  loopback URLs (local emulators). Production behavior is unchanged: the
  flag is unset there, and non-loopback http stays rejected even with it.

Stale e2e assertion refreshed for an intentional product change:
- tool-descriptions: the execute inventory is names-only since the skills
  tool slimming; drop the per-connection description assertions.

* test(e2e): repair self-host scenarios and gate the suite in CI

The self-host e2e project never ran in CI, so it drifted red while the app
moved on. Repair the failing scenarios (stale connect-modal selectors, a racy
action-bar position read, a shared-admin connection-count assertion, a
multi-tenant-only org-slug 404 step, and a cloud-shaped toolkit MCP URL), add a
documented skip affordance to the scenario helper, and quarantine the two
Microsoft emulator scenarios that need a canonical block-YAML Graph spec
(tracked separately).

Cherry-picked from origin/fix-selfhost-e2e-and-ci (PR #1239); its CI job is
superseded by the cloud+selfhost matrix job already on this branch.

* test(e2e): quarantine the two agents-SDK transport gaps

Both are real gaps in the hibernatable Agent bridge (standalone SSE
supersede never resolves; response routing scopes JSON-RPC ids per
session instead of per stream), not regressions on this branch. Skip
with reasons so the suite gates CI while the gaps stay visible;
fixing the bridge is tracked separately.

* test(e2e): repair or quarantine the cloud scenarios that drifted on main

The cloud e2e project never gated CI either, so ten scenarios rotted.
Refresh the four whose product behavior moved intentionally:
- connect-card-ssr-origin: install URLs are org-slug-scoped since the
  org-slug console URLs change (#974); accept the slug form.
- connection-owner-isolation: /api/auth/switch-organization was deleted
  with cookie-based org switching (#1000); switch orgs the way the web
  client does, via the x-executor-organization selector header.
- oauth-connections: the popup-state fix (#1235) envelopes the callback
  state as base64url JSON; decode it and assert the inner state + orgSlug.
- unauthenticated-skeleton: the 404 page shipped as a standalone page in
  the same commit as the shell-framed assertion (#986); assert the page
  it actually renders.

Quarantine the six that need product/harness work, each with a reason:
mcp-browser-approval-org-scope + the two browser-approval scenarios
(cloud-only: the mcporter browser-approval completion never lands),
cli-device-login (device-flow terminal never reaches the emulator), and
run-panel-auto-approve (autoApprove leaves the run paused; never green
since the feature landed in #1183).

* lint: suppress the adapter-boundary error checks in the MCP agent handler

The condemned-DO abort surfaces as a plain runtime Error thrown out of the
agents SDK's serve.fetch; its message string is the only signal. Narrow
suppressions with boundary reasons, per the typed-errors skill.

* test(e2e): quarantine the seat-limit scenario on the emulate 0.9.0 Autumn gap

emulate 0.9.0's Autumn customer balances omit the expanded feature object
autumn-js asserts, so useCustomer crashes the org page into the error
boundary. Fixed upstream in UsefulSoftwareCo/emulate#8 (0.9.1); unskip
once the publish lands and the e2e dependency is bumped.

* ci: retrigger

* ci: shard the cloud e2e job so each shard gets a fresh dev stack

A full-suite run against one long-lived cloud dev server degrades partway
through: sign-in starts refusing connections and everything after fails
with fetch errors (the same SSE/OTel memory growth being instrumented on
main). Four shards, each booting its own stack, stay under the threshold.
Re-merge into one job once the leak is fixed.

* ci: split the cloud e2e job into eight shards

Four shards still hit the dev-server degradation a few minutes in on
2-core runners; eight keeps each stack's lifetime under the threshold.

* ci: retry flaky browser scenarios twice on the same stack

The remaining shard failures are scattered single-test Playwright
waitFor timeouts on 2-core runners, not systemic stack death; vitest
--retry clears them without hiding real regressions (a consistent
failure still fails after 3 attempts).

* test(e2e): quarantine the Graph default-add scenario on CI runners

Compiling the Graph spec inside dev workerd 500s on 2-core GitHub
runners and takes the dev stack down for every scenario after it in the
shard (the auth-hint/org-slug/docs-link failures in the same shard were
all downstream of this). Local runs are unaffected; skip only under CI.

* selfhost: read the local-network posture from env in the plugins seam

plugins() runs per request; loadConfig() does filesystem work (data
dir, secret key resolution) that should not ride the request path. The
env read is the same computation loadConfig makes for the flag.

* e2e: bump @executor-js/emulate to 0.10.0, unskip the seat-limit scenario

0.10.0 ships the Autumn balances.feature expansion autumn-js asserts
(UsefulSoftwareCo/emulate#8), so the org page renders again and the
scenario passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant