feat(control-plane): plan-mode HITL gate (storage + API + sandbox preamble)#671
feat(control-plane): plan-mode HITL gate (storage + API + sandbox preamble)#671bleleve wants to merge 8 commits into
Conversation
…amble)
Adds the backend for plan-mode: a human-in-the-loop planning gate where
the agent persists a markdown plan that must be approved before any
code-changing turn runs. This PR is headless — no UI or bot triggers
yet (those follow as separate PRs in the stack).
Changes:
- New `plans` table in the SessionDO SQLite (monotonic versions per
session); `PlanService` + `plans.handler` covering save / get / list
/ approve / reject.
- New endpoints under `/sessions/:id/plan{,/approve,/reject}` and
`/sessions/:id/plans` (list).
- `SessionMessageQueue` reads `getCurrentPlan()` and attaches
`resumeContext.currentPlan` to dispatched `PromptCommand`s. The
planning-turn gate is terminal-status aware (approved/rejected exits
plan mode without flipping `plan_mode` so the history bubble stays
visible).
- Sandbox `bridge.py` builds a restate-and-confirm preamble from
`resumeContext.currentPlan` and prepends it to the user content
before forwarding to OpenCode.
- New `wrapUntrustedContent` helper in `@open-inspect/shared` wraps
plan + user content in XML tags from the sandbox runtime (security:
isolates plan-as-context from prompt-injection inside user messages).
- Shared `model-defaults.ts` ships the `fetchModelDefaults` helper —
used by bots in subsequent PRs to source the default model + default
plan model from a single place (the control plane). Falls back to
env vars + shared constants until the `/model-preferences` endpoint
is extended (next PR in the stack).
- `MODEL_ALIAS_MAP` and `DEFAULT_PLAN_MODEL` added to shared models.
Tests:
- Unit tests for `PlanService`, `plans.handler`, plan-mode behavior in
`MessageQueue`, plan persistence in the repository.
- Sandbox-runtime tests for `bridge.py` preamble + XML wrapping.
- All test fixtures updated to include the new required `plan_*`
fields on `SessionRow`.
Verification: `npm run typecheck` (workspace), `npm run lint`, `npm test`
(shared 183/183, control-plane 1148/1148), `pytest` (sandbox-runtime
plan-related tests 41/41) — all green on this branch based on upstream/main.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a plan-first human-in-the-loop workflow: persisted versioned plans, approval/rejection endpoints and handlers, PlanService, repository CRUD and migrations, router and SessionDO wiring, message-queue dispatch gating with resume context, sandbox plan buffering/preamble/escape logic, shared utilities, and tests. ChangesPlan-Mode HITL Workflow
Estimated code review effort: 🎯 4 (Complex) | ⏱️ ~60 minutes
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (6)
packages/control-plane/src/session/http/handlers/session-lifecycle.handler.ts (1)
116-120: ⚡ Quick winConsider logging when planModel falls back to default and simplify redundant validation call.
The planModel derivation logic differs from the main model validation pattern:
Redundant validation: Line 118 calls
getValidModelOrDefault(body.planModel)afterisValidModel(body.planModel)has already confirmed it's valid. Since the model is already validated, this call just returnsbody.planModelunchanged—you could simplify tobody.planModeldirectly.Missing warning log: Lines 105-111 log a warning when the main model is invalid and falls back to a default. For consistency and easier debugging, consider logging when
planModeis enabled butbody.planModelis invalid/missing andDEFAULT_PLAN_MODELis used.♻️ Proposed refactor for consistency and clarity
-const planModel = planMode - ? body.planModel && isValidModel(body.planModel) - ? getValidModelOrDefault(body.planModel) - : DEFAULT_PLAN_MODEL - : null; +let planModel: string | null = null; +if (planMode) { + if (body.planModel && isValidModel(body.planModel)) { + planModel = body.planModel; + } else { + planModel = DEFAULT_PLAN_MODEL; + if (body.planModel) { + deps.getLog().warn("Invalid plan model name, using default", { + requested_plan_model: body.planModel, + default_plan_model: planModel, + }); + } + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/http/handlers/session-lifecycle.handler.ts` around lines 116 - 120, The planModel assignment should avoid the redundant getValidModelOrDefault call and log when falling back: if planMode is true and isValidModel(body.planModel) is true, set planModel to body.planModel (not getValidModelOrDefault); if planMode is true but body.planModel is missing/invalid, set planModel to DEFAULT_PLAN_MODEL and emit the same kind of warning used earlier (use the existing logger.warn/logger.warn-like call pattern in this file) indicating planModel fell back to DEFAULT_PLAN_MODEL so debugging is consistent with the main model logic.packages/control-plane/src/types.ts (1)
94-94: ⚡ Quick winUse the default constant name in timeout comments, not a literal.
Replace the literal default value in this comment with the named default constant used by the resolver path.
As per coding guidelines, "Don't restate literal timeout values in comments; instead write 'Defaults to DEFAULT_CONSTANT_NAME'".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/types.ts` at line 94, The comment on the SANDBOX_INACTIVITY_TIMEOUT_MS property currently repeats a literal timeout; update it to reference the resolver's named default constant instead (e.g. "Defaults to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS") so it matches the resolver's constant name; locate SANDBOX_INACTIVITY_TIMEOUT_MS in types.ts and replace the literal "900000 = 15 min" with the appropriate DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS constant name used by the resolver path.packages/control-plane/src/session/types.ts (1)
99-110: ⚡ Quick winUse a single source of truth for
PlanSource.
PlanSourceis redeclared here even though plan source is part of shared contracts. Reuse the shared type to prevent drift between control-plane DB row typing and API contracts.♻️ Proposed refactor
import type { Attachment, SessionStatus, SandboxStatus, GitSyncStatus, MessageStatus, MessageSource, ParticipantRole, SpawnSource, ArtifactType, EventType, PlanApprovalStatus, } from "../types"; +import type { PlanSource } from "`@open-inspect/shared`"; @@ -export type PlanSource = "api" | "agent" | "web";🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/types.ts` around lines 99 - 110, The file defines a local PlanSource type that duplicates the shared API contract; remove the local PlanSource declaration and import the canonical PlanSource type from the shared contracts package, then update the PlanRow interface to use that imported PlanSource. Specifically, delete the local "export type PlanSource = \"api\" | \"agent\" | \"web\";" and add an import for the shared PlanSource type (the same type used by your API contracts), ensuring PlanRow.source references the imported symbol instead of the local one.packages/control-plane/src/session/repository.test.ts (1)
77-115: ⚡ Quick winAdd a
planMode: trueassertion path forupsertSession.This only verifies the default path. Add one case that passes
planMode: true(andplanModel) so Line 109–Line 111 behavior is guarded for both branches.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/repository.test.ts` around lines 77 - 115, Add a new test case for upsertSession that covers the planMode=true branch: call repo.upsertSession with planMode: true and planModel set (e.g., "some-plan-model"), then assert mock.calls length increased and that mock.calls[...].params contains plan_mode set to 1 and plan_model equal to the provided value (and plan_approval_status remains the expected default/null). Target the upsertSession invocation and parameter assertion logic around the existing test (the repo.upsertSession call and the params assertion that currently checks plan_mode/plan_approval_status/plan_model).packages/control-plane/src/session/services/plan.service.test.ts (1)
30-100: ⚡ Quick winAdd regression coverage for dedup + size boundary paths in
savePlan.Given the gating behavior, add tests for: (1) same content with
messageId: nullshould still create a new version, and (2)MAX_PLAN_CONTENT_BYTESboundary acceptance/rejection.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/services/plan.service.test.ts` around lines 30 - 100, Add two regression tests for PlanService.savePlan: (1) verify that when saving identical content with messageId: null the service does not dedupe and still calls repository.savePlan to create a new version (use createService(), call service.savePlan with messageId: null, mock repository.savePlan and assert it was called and response indicates a new plan/version), and (2) verify MAX_PLAN_CONTENT_BYTES boundary behavior by importing/using the MAX_PLAN_CONTENT_BYTES constant and asserting that content whose byte length equals the limit is accepted (calls repository.savePlan) while content exceeding the limit throws/rejects and does not call repository.savePlan; use vi.mocked(repository.savePlan) to observe calls and expect(...) toCheck throws for the overflow case. Ensure tests reference service.savePlan, repository.savePlan, and MAX_PLAN_CONTENT_BYTES so they remain locatable.packages/control-plane/src/router.ts (1)
2209-2230: ⚡ Quick winRemove
as neverand typeinternalPathdirectly.Line 2229 is currently bypassing route-path type checks. Typing
internalPathasSessionInternalPathkeeps this helper compile-safe.Suggested refactor
-import { buildSessionInternalUrl, SessionInternalPaths } from "./session/contracts"; +import { + buildSessionInternalUrl, + SessionInternalPaths, + type SessionInternalPath, +} from "./session/contracts"; ... async function forwardPlanApproval( request: Request, env: Env, match: RegExpMatchArray, ctx: RequestContext, - internalPath: string + internalPath: SessionInternalPath ): Promise<Response> { ... - buildSessionInternalUrl(internalPath as never), + buildSessionInternalUrl(internalPath),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/router.ts` around lines 2209 - 2230, The call to buildSessionInternalUrl currently bypasses typing by using "as never"; update forwardPlanApproval to accept internalPath as the proper SessionInternalPath type (replace the untyped/internalPath parameter with internalPath: SessionInternalPath) and remove the "as never" cast when calling buildSessionInternalUrl so the function and buildSessionInternalUrl(type SessionInternalPath) are compile-time type-checked; ensure any callers pass a SessionInternalPath-compatible value or are adjusted to construct one.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/PLAN_MODE.md`:
- Line 133: The fenced code block in the architecture example that begins with a
bare triple-backtick should include a language identifier to satisfy
markdownlint MD040; update the opening fence for that block (the block
containing the lines starting with "`@mention` / label / web prompt" and ending
with "approve / reject / amend ──▶ next prompt runs as build") to use a
language tag such as "text" (i.e., change the opening ``` to ```text) so the
markdownlint rule is satisfied.
In `@packages/control-plane/README.md`:
- Around line 75-77: The README wording incorrectly implies message queueing is
paused in plan mode; update the sentence that begins "When a session opts into
plan mode..." to clarify that prompts continue to queue and are processed, but
are dispatched as planning turns (i.e., dispatch behavior changes) until the
user approves, rejects, or amends; reference the PLAN_MODE.md workflow and
replace "the message queue is gated" with language like "prompts continue to
queue and are processed but are dispatched as planning turns until
approval/rejection/amendment" to make the behavior explicit.
In `@packages/control-plane/src/session/durable-object.ts`:
- Line 779: Define a named constant for the default inactivity timeout (e.g.,
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS = 900000) and use it as the fallback
instead of the hardcoded string; update the timeout assignment in
durable-object.ts so timeoutMs: parseInt(this.env.SANDBOX_INACTIVITY_TIMEOUT_MS
|| String(DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS), 10). Place the constant near
the top of the module (and export it if other modules need it) so the value is
defined exactly once and referenced everywhere.
In `@packages/control-plane/src/session/http/handlers/plans.handler.ts`:
- Around line 156-165: The current readApprovalBody function swallows JSON parse
errors and returns {}, allowing malformed bodies to proceed; instead, on parse
failure it should fail closed by throwing a specific error (e.g., an Error with
a clear message or a custom InvalidApprovalBodyError) rather than returning an
empty object. Keep the log.warn call, then throw the error so the caller of
readApprovalBody can catch it and map it to a 400 via the existing
errorResponseForApproval(...) handling; update any callers if necessary to
handle the thrown error path.
In `@packages/control-plane/src/session/services/plan.service.ts`:
- Around line 117-123: The deduplication check currently allows requestMessageId
to be null which prevents re-saving identical plans; update the guard in the
function that computes requestMessageId so the dedupe branch only runs when
requestMessageId is a concrete ID (e.g., !== null / !== undefined); specifically
add a condition to the existing if that checks requestMessageId is present
before returning { plan: toResponse(previous), approvalGated: false, deduped:
true } so identical content with a null message id will not be deduped.
In `@packages/sandbox-runtime/src/sandbox_runtime/bridge.py`:
- Around line 769-773: The code is appending every cumulative token snapshot
into text_buffer causing duplicated prefixes; inside the plan_mode handling for
token events in the method that calls self._send_event, instead of appending
token_text to text_buffer use the latest snapshot (e.g., replace the last buffer
entry or set text_buffer to contain only token_text) or reconstruct the full
text from deltas before storing; specifically modify the branch that checks
plan_mode and event.get("type") == "token" so that token_text (from
event.get("content")) overwrites the stored cumulative text rather than being
appended, ensuring text_buffer reflects only the most recent cumulative token
content.
- Around line 639-643: The code embeds untrusted plan text (currentPlan.content
/ plan_content) directly into the XML wrappers (<saved_plan> and
<previous_plan>), allowing injected closing tags to break XML boundaries; escape
XML special chars before interpolation. Locate the places building the XML
fragments (the return that composes "<resume_context>...<saved_plan>" and the
similar "<previous_plan>" construction) and replace direct interpolation of
plan_content/currentPlan.content with an escaped version (escape &, <, >, " and
') using a standard utility (e.g., xml.sax.saxutils.escape or html.escape) so
the content is safe to embed in the saved_plan/previous_plan elements. Ensure
both occurrences (the saved_plan and previous_plan construction sites) are
fixed.
- Line 844: Replace the hardcoded 30.0 timeout in the HTTP post call with a
named constant (e.g., PLAN_SAVE_TIMEOUT_SECONDS) defined once at module scope
(or a shared timeouts module) and use that constant in the call: change resp =
await self.http_client.post(url, json=payload, headers=headers,
timeout=PLAN_SAVE_TIMEOUT_SECONDS). Ensure the constant name encodes the unit
(seconds) and is exported/imported where needed so all calls (including the post
in the method containing resp = await self.http_client.post(...)) reference the
single canonical timeout value.
In `@packages/shared/src/prompt-safety.ts`:
- Around line 51-60: The current escapedContent replacement sequence on content
only matches exact literal tags and misses whitespace and case variants; update
the neutralization to use regular-expression replacements on content (instead of
replaceAll with literals) to match opening and closing tags with optional
internal whitespace and case-insensitively (e.g., patterns targeting
"<\s*user_content" and "<\s*/\s*user_content\s*>", with the i flag) and perform
the two-pass escaping strategy on those regex matches; apply these regex-based
replacements where escapedContent is created so all variants like
"</user_content >", "<USER_CONTENT>", or mixed-case/whitespace forms are
properly escaped.
---
Nitpick comments:
In `@packages/control-plane/src/router.ts`:
- Around line 2209-2230: The call to buildSessionInternalUrl currently bypasses
typing by using "as never"; update forwardPlanApproval to accept internalPath as
the proper SessionInternalPath type (replace the untyped/internalPath parameter
with internalPath: SessionInternalPath) and remove the "as never" cast when
calling buildSessionInternalUrl so the function and buildSessionInternalUrl(type
SessionInternalPath) are compile-time type-checked; ensure any callers pass a
SessionInternalPath-compatible value or are adjusted to construct one.
In
`@packages/control-plane/src/session/http/handlers/session-lifecycle.handler.ts`:
- Around line 116-120: The planModel assignment should avoid the redundant
getValidModelOrDefault call and log when falling back: if planMode is true and
isValidModel(body.planModel) is true, set planModel to body.planModel (not
getValidModelOrDefault); if planMode is true but body.planModel is
missing/invalid, set planModel to DEFAULT_PLAN_MODEL and emit the same kind of
warning used earlier (use the existing logger.warn/logger.warn-like call pattern
in this file) indicating planModel fell back to DEFAULT_PLAN_MODEL so debugging
is consistent with the main model logic.
In `@packages/control-plane/src/session/repository.test.ts`:
- Around line 77-115: Add a new test case for upsertSession that covers the
planMode=true branch: call repo.upsertSession with planMode: true and planModel
set (e.g., "some-plan-model"), then assert mock.calls length increased and that
mock.calls[...].params contains plan_mode set to 1 and plan_model equal to the
provided value (and plan_approval_status remains the expected default/null).
Target the upsertSession invocation and parameter assertion logic around the
existing test (the repo.upsertSession call and the params assertion that
currently checks plan_mode/plan_approval_status/plan_model).
In `@packages/control-plane/src/session/services/plan.service.test.ts`:
- Around line 30-100: Add two regression tests for PlanService.savePlan: (1)
verify that when saving identical content with messageId: null the service does
not dedupe and still calls repository.savePlan to create a new version (use
createService(), call service.savePlan with messageId: null, mock
repository.savePlan and assert it was called and response indicates a new
plan/version), and (2) verify MAX_PLAN_CONTENT_BYTES boundary behavior by
importing/using the MAX_PLAN_CONTENT_BYTES constant and asserting that content
whose byte length equals the limit is accepted (calls repository.savePlan) while
content exceeding the limit throws/rejects and does not call
repository.savePlan; use vi.mocked(repository.savePlan) to observe calls and
expect(...) toCheck throws for the overflow case. Ensure tests reference
service.savePlan, repository.savePlan, and MAX_PLAN_CONTENT_BYTES so they remain
locatable.
In `@packages/control-plane/src/session/types.ts`:
- Around line 99-110: The file defines a local PlanSource type that duplicates
the shared API contract; remove the local PlanSource declaration and import the
canonical PlanSource type from the shared contracts package, then update the
PlanRow interface to use that imported PlanSource. Specifically, delete the
local "export type PlanSource = \"api\" | \"agent\" | \"web\";" and add an
import for the shared PlanSource type (the same type used by your API
contracts), ensuring PlanRow.source references the imported symbol instead of
the local one.
In `@packages/control-plane/src/types.ts`:
- Line 94: The comment on the SANDBOX_INACTIVITY_TIMEOUT_MS property currently
repeats a literal timeout; update it to reference the resolver's named default
constant instead (e.g. "Defaults to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS") so
it matches the resolver's constant name; locate SANDBOX_INACTIVITY_TIMEOUT_MS in
types.ts and replace the literal "900000 = 15 min" with the appropriate
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS constant name used by the resolver path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 63e19c3a-0446-44e4-bf48-fccca934842e
📒 Files selected for processing (39)
docs/HOW_IT_WORKS.mddocs/PLAN_MODE.mdpackages/control-plane/README.mdpackages/control-plane/src/router.tspackages/control-plane/src/sandbox/lifecycle/manager.test.tspackages/control-plane/src/session/contracts.tspackages/control-plane/src/session/durable-object.tspackages/control-plane/src/session/http/handlers/child-sessions.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/src/session/http/handlers/pull-request.handler.test.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.test.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.tspackages/control-plane/src/session/http/routes.test.tspackages/control-plane/src/session/http/routes.tspackages/control-plane/src/session/initialize.tspackages/control-plane/src/session/message-queue.test.tspackages/control-plane/src/session/message-queue.tspackages/control-plane/src/session/openai-token-refresh-service.test.tspackages/control-plane/src/session/pull-request-service.test.tspackages/control-plane/src/session/repository.test.tspackages/control-plane/src/session/repository.tspackages/control-plane/src/session/schema.tspackages/control-plane/src/session/services/plan.service.test.tspackages/control-plane/src/session/services/plan.service.tspackages/control-plane/src/session/types.tspackages/control-plane/src/types.tspackages/control-plane/src/utils/models.tspackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/sandbox-runtime/tests/test_bridge_message_tracking.pypackages/sandbox-runtime/tests/test_bridge_resume_context.pypackages/shared/src/index.tspackages/shared/src/model-defaults.test.tspackages/shared/src/model-defaults.tspackages/shared/src/models.tspackages/shared/src/prompt-safety.test.tspackages/shared/src/prompt-safety.tspackages/shared/src/types/index.tspackages/web/src/lib/session-list.test.ts
…eMurray#671 Must-fix: - bridge.py: escape XML special chars in plan content before interpolating into <saved_plan> and <previous_plan> wrappers (1.1) — prevents a hostile `</saved_plan>` in the plan body from breaking out of the wrapper and injecting instructions outside it. - bridge.py: replace the cumulative token buffer's append with overwrite semantics (1.2) — OpenCode token events carry the FULL accumulated text since the response start (not incremental deltas), so appending produced corrupted plan bodies with compounded prefixes at end-of-turn. - plans.handler.ts: readApprovalBody now throws InvalidApprovalBodyError on JSON parse failure instead of silently coercing the body to {} (1.3) — errorResponseForApproval maps it to HTTP 400 so malformed client requests surface instead of leaking through with partial parameters. Should-fix: - plan.service.ts: dedup guard now requires non-null messageId (1.4) — null is the "no message context" marker, so two identical-body events with null messageId are legitimately distinct saves and must each bump the version. - session/types.ts: import PlanSource from @open-inspect/shared (1.5) instead of redeclaring it locally — prevents contract drift between the control plane row type and the shared API surface. - durable-object.ts: extract DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS module constant (1.6); also cleans up a stray /** doc-comment opener left over from an earlier opencode-config strip. - bridge.py: extract PLAN_SAVE_TIMEOUT_SECONDS module constant (1.7). - prompt-safety.ts: <user_content> tag neutralization is now case- insensitive and whitespace-tolerant (1.8) — catches `<USER_CONTENT>`, `< user_content >`, `</ user_content >`, attribute-bearing tags, etc. Nitpicks: - session-lifecycle.handler.ts: remove redundant getValidModelOrDefault after validation; add log.warn when planMode=true but planModel is invalid and we fall back to DEFAULT_PLAN_MODEL (1.9). - router.ts: type forwardPlanApproval's internalPath parameter with SessionInternalPath instead of bypassing via `as never` (1.10). - control-plane/src/types.ts: reference DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS in the SANDBOX_INACTIVITY_TIMEOUT_MS env-var comment (1.11). - docs/PLAN_MODE.md: add `text` language tag to architecture diagram fence to satisfy MD040 (1.12). - packages/control-plane/README.md: clarify Plan Mode behavior — prompts continue to queue and dispatch as *planning turns* until approval/rejection/amendment (not "the message queue is gated") (1.13). Test coverage gaps: - repository.test.ts: new case for upsertSession({ planMode: true, planModel }) asserting plan_mode=1 and plan_model are persisted (1.14). - plan.service.test.ts: new cases for null-messageId dedup (skipped guard) and MAX_PLAN_CONTENT_BYTES boundary (accept-at-limit, throw- over-limit) (1.15). Targeted regression tests: - test_bridge_resume_context.py: assertions that resume + planning preambles escape XML specials in the plan body and that token-buffer overwrite semantics produce a single non-duplicated snapshot. - plans.handler.test.ts: assertions that malformed-JSON bodies on approve/reject return HTTP 400 with code "invalid_body". Verification: npm run build -w @open-inspect/shared && npm run lint && npm run typecheck && npm test (shared 16 files, control-plane 65 files all green) && pytest (41 + 4 new regression cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/control-plane/src/session/http/handlers/plans.handler.ts (1)
107-125:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't clear the session reasoning effort when no model override is provided.
Line 114 falls back to
""as the validation target, and Line 124 passes the resultingnullintoapprovePlan. In the service,nullis treated as an explicit update, so an approval request that only sendsimplementationReasoningEffortwipes the existingreasoning_effortinstead of leaving it unchanged or rejecting the input.Proposed fix
let implementationReasoningEffort: string | null | undefined = undefined; if (body.implementationReasoningEffort !== undefined) { - const target = implementationModel ?? ""; - implementationReasoningEffort = deps.validateReasoningEffort( - target, - body.implementationReasoningEffort ?? undefined - ); + if (body.implementationReasoningEffort === null) { + implementationReasoningEffort = null; + } else if (implementationModel) { + implementationReasoningEffort = deps.validateReasoningEffort( + implementationModel, + body.implementationReasoningEffort + ); + } else { + return Response.json( + { + error: "implementationReasoningEffort requires implementationModel", + code: "invalid_reasoning_effort", + }, + { status: 400 } + ); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/http/handlers/plans.handler.ts` around lines 107 - 125, The current logic validates reasoning effort even when no implementationModel override is provided, producing a null that gets passed into approvePlanAndFlush and unintentionally clears the persisted reasoning_effort; change the branch so that you only call deps.validateReasoningEffort when implementationModel is non-null/defined: if body.implementationReasoningEffort is set and implementationModel is present, set implementationReasoningEffort = deps.validateReasoningEffort(implementationModel, body.implementationReasoningEffort), otherwise leave implementationReasoningEffort as undefined so approvePlanAndFlush does not treat it as an explicit update.
🧹 Nitpick comments (1)
packages/control-plane/src/session/durable-object.ts (1)
99-105: ⚡ Quick winAlign timeout comment wording with repository guideline.
Line 104 restates the literal duration (
// 15 minutes). Prefer comment text that references the default constant rather than restating the numeric value.As per coding guidelines, "Don't restate literal timeout values in comments; instead write 'Defaults to DEFAULT_CONSTANT_NAME'."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/control-plane/src/session/durable-object.ts` around lines 99 - 105, The inline comment currently restates the numeric duration for DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS; update the trailing comment to follow repo guideline by removing the literal "15 minutes" and instead indicate the default constant, e.g. change "// 15 minutes" to "// Defaults to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS" so the comment references the constant name (DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS) rather than repeating the numeric timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/control-plane/src/session/durable-object.ts`:
- Around line 786-789: The timeoutMs assignment may produce NaN for malformed
SANDBOX_INACTIVITY_TIMEOUT_MS; update the logic where timeoutMs is set
(reference: timeoutMs, this.env.SANDBOX_INACTIVITY_TIMEOUT_MS,
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS) to parse the env var, check
Number.isFinite(value) && value > 0, and if that fails use
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS so lifecycle timing always receives a
positive finite number.
---
Outside diff comments:
In `@packages/control-plane/src/session/http/handlers/plans.handler.ts`:
- Around line 107-125: The current logic validates reasoning effort even when no
implementationModel override is provided, producing a null that gets passed into
approvePlanAndFlush and unintentionally clears the persisted reasoning_effort;
change the branch so that you only call deps.validateReasoningEffort when
implementationModel is non-null/defined: if body.implementationReasoningEffort
is set and implementationModel is present, set implementationReasoningEffort =
deps.validateReasoningEffort(implementationModel,
body.implementationReasoningEffort), otherwise leave
implementationReasoningEffort as undefined so approvePlanAndFlush does not treat
it as an explicit update.
---
Nitpick comments:
In `@packages/control-plane/src/session/durable-object.ts`:
- Around line 99-105: The inline comment currently restates the numeric duration
for DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS; update the trailing comment to follow
repo guideline by removing the literal "15 minutes" and instead indicate the
default constant, e.g. change "// 15 minutes" to "// Defaults to
DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS" so the comment references the constant
name (DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS) rather than repeating the numeric
timeout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90e6ac7f-9b0a-4264-bc44-de33c465bfe2
📒 Files selected for processing (15)
docs/PLAN_MODE.mdpackages/control-plane/README.mdpackages/control-plane/src/router.tspackages/control-plane/src/session/durable-object.tspackages/control-plane/src/session/http/handlers/plans.handler.test.tspackages/control-plane/src/session/http/handlers/plans.handler.tspackages/control-plane/src/session/http/handlers/session-lifecycle.handler.tspackages/control-plane/src/session/repository.test.tspackages/control-plane/src/session/services/plan.service.test.tspackages/control-plane/src/session/services/plan.service.tspackages/control-plane/src/session/types.tspackages/control-plane/src/types.tspackages/sandbox-runtime/src/sandbox_runtime/bridge.pypackages/sandbox-runtime/tests/test_bridge_resume_context.pypackages/shared/src/prompt-safety.ts
✅ Files skipped from review due to trivial changes (2)
- packages/control-plane/README.md
- docs/PLAN_MODE.md
…#671 Three items flagged on the first fix commit (a2cad10): 🟠 Major (plans.handler.ts) — Don't clear reasoning effort when no model override is provided. Previously the approve handler validated the user's implementationReasoningEffort against `""` when no model override was set; validateReasoningEffort returned null, which then propagated to approvePlanAndFlush. The service treats null as an explicit "clear", so an approval request that sent reasoning effort without a model silently wiped the session's persisted reasoning_effort. New semantics: - undefined (omitted) → no change - explicit null → forwarded as null (intentional clear) - string + model → validated against the model - string, no model → return 400 with code "invalid_reasoning_effort" Two new regression cases in plans.handler.test.ts. ⚡ Quick win (durable-object.ts) — NaN-guard the inactivity timeout parse. parseInt(env_value, 10) returns NaN when the env var is set to a non-numeric string (e.g. "abc" or "5 minutes"); the lifecycle manager would then schedule alarms with NaN ms, which is undefined behavior. New parseSandboxInactivityTimeoutMs() helper checks Number.isFinite and value > 0 before accepting the env value, falling back to DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS otherwise. ⚡ Style (durable-object.ts) — Drop the "// 15 minutes" trailing comment on DEFAULT_SANDBOX_INACTIVITY_TIMEOUT_MS. Per repo guideline, don't restate literal timeout values in comments — the constant name is the source of truth. Verification: npm run typecheck, npm run lint, npm test -w @open-inspect/control-plane (65/65 files, all green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit follow-up review on ColeMurray#672 (the only thread that didn't auto- resolve after the previous fix pushes). The issue technically lives in ColeMurray#671's plans.handler.ts but CodeRabbit only flagged it now after seeing the post-fix state. `JSON.parse("null")` returns null, `JSON.parse("[1,2]")` returns an array, `JSON.parse("42")` returns a number. All three are syntactically valid JSON but none of them is the object payload approvePlan/rejectPlan expect. The previous code would parse successfully then crash later when dereferencing `body.implementationModel` on `null` (TypeError) or silently accept arrays and primitives as if they were valid bodies. Reject these early with HTTP 400 (`code: "invalid_body"`) via the existing InvalidApprovalBodyError path. Two new regression cases cover JSON-null and JSON-array bodies. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (65/65 files, 2 new test cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…leMurray#671 Two new items flagged after the previous fix pushes: 🟠 plans.handler.ts — Reject invalid implementationReasoningEffort. validateReasoningEffort() returns null for unsupported values, but null is also our explicit "clear the persisted effort" sentinel. A request with a typo (e.g. "hgih" instead of "high") was previously accepted and silently cleared the session's reasoning_effort. Now we distinguish the two cases: explicit-null body stays null (intentional clear), but a string that fails validation returns 400 with code "invalid_reasoning_effort" instead of being coerced. 🟡 bridge.py — Use quoteattr for the version XML attribute value. xml_escape() does NOT escape `"`, so a version string containing a quote could break the attribute boundary of <saved_plan version=...> and <previous_plan version=...>. Switching to xml.sax.saxutils. quoteattr (which handles the surrounding quotes itself) closes that edge case. Body content keeps xml_escape since it's not in an attribute context. Verification: npm run typecheck && npm test -w @open-inspect/control-plane (66/66 files green) && pytest tests/test_bridge_resume_context.py (19 cases green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a plan is approved, the control-plane now enqueues a synthetic "Implement the approved plan vN..." prompt server-side and flushes the queue, so every client (web + Slack/Linear/GitHub bots) starts the implementation turn through the same code path. Previously only the web client did this (via a follow-up WebSocket prompt after the approve call). Approvals coming from bots stalled at "Execution complete" right after the plan was generated — Build cost stayed at $0 — because nothing was kicking off the build turn. The synthetic prompt is authored by a stable per-session "system" participant (created lazily on first dispatch) and uses the new "system" MessageSource so the UI and event log can distinguish it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
enqueuePromptFromApi already flushes the message queue at the end of its own path, so the implementation-prompt enqueue done in onDispatchImplementationPrompt picks up both the synthetic prompt and any user messages that piled up during awaiting_approval. The separate onPlanApproved callback that fired a second processMessageQueue was a no-op (queue already busy) and the comment claiming it flushed the synthetic prompt was misleading. Remove the dep and the wiring. Addresses reef review on ColeMurray#65. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D1Tables Addresses CodeRabbit review on ColeMurray#672. Control-plane integration tests run under isolatedStorage=false (workers-sdk SQLite WAL cleanup bug), so D1 state can leak across files when each suite forgets to clean. Match the pattern used by other integration suites and reset D1 in beforeEach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds a backend HITL planning gate: when a session is started in plan-mode, the agent proposes a markdown plan that must be approved before any code-changing turn runs. The plan persists across turns so a fresh prompt re-anchors on it instead of relying on compactable conversational memory.
This PR is headless — no UI or bot trigger yet. Those follow as separate PRs in the stack (see below). The full feature was developed and validated end-to-end on a fork; this stack splits it into reviewable pieces.
What's in this PR
Control plane
planstable in the SessionDO SQLite (monotonic versions per session)PlanService+plans.handlercovering save / get / list / approve / rejectPOST/GET /sessions/:id/plan,GET /sessions/:id/plans,POST /sessions/:id/plan/{approve,reject}SessionMessageQueueplanning-turn gate: readsgetCurrentPlan()and attachesresumeContext.currentPlanto dispatchedPromptCommands; terminal-status aware (approved/rejected exits plan mode without flippingplan_modeso the history bubble stays visible)plan_mode,plan_model,plan_approval_status,plan_cost_snapshotcolumns onsessionsSandbox runtime
bridge.pybuilds a restate-and-confirm preamble fromresumeContext.currentPlanand prepends it to user content before forwarding to OpenCode, so the agent re-anchors on the previous turn's planShared package
prompt-safety.ts—wrapUntrustedContent/buildUntrustedUserContentBlockhelpers wrap plan + user content in XML tags (security: isolates plan-as-context from prompt-injection inside user messages)model-defaults.ts—fetchModelDefaultshelper for the bots to source defaults from a single place (the control plane). Falls back to env vars + shared constants until the/model-preferencesendpoint is extended (next PR in stack)MODEL_ALIAS_MAP+DEFAULT_PLAN_MODELconstantsescapeHtmlmoved fromlinear-bot/webhook-handler.tsto shared so it can be reusedDocumentation
docs/PLAN_MODE.md— single source of truth for the workflowdocs/HOW_IT_WORKS.mdextended with "Plan-Mode Gate" + "Prompt-Safety Wrapping" subsectionspackages/control-plane/README.mddocuments the new API endpointsStack
C/D/E/F are mutually orthogonal — each only depends on A+B, none imports from any other. They can be reviewed in any order after A+B.
Test plan
npm run typecheck(workspace) — greennpm run lint— greennpm test -w @open-inspect/shared— 183/183npm test -w @open-inspect/control-plane— 1148/1148pytest packages/sandbox-runtime/tests/test_bridge_resume_context.py— 41/41🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation