feat(agent): apply --resume — resumable Claude SDK sessions across crashes#259
Conversation
…ashes
When `apply` runs the inner Claude SDK agent, capture the session id from
the first system/init message and persist it on the plan. A subsequent
`apply --plan-id <id> --yes --resume` passes that id to the SDK as
`resume:`, rehydrating the conversation instead of starting a fresh
agent that has to redo SDK install + package detection + file reads.
Surface area:
WizardPlan now optionally carries:
agentSessionId — UUID of the prior SDK session
agentSessionUpdatedAt — when it was captured
apply --resume — opt-in flag; pulls agentSessionId from the
persisted plan and forwards to the spawned
child via AMPLITUDE_WIZARD_RESUME_SESSION_ID.
When the plan has no captured session yet,
logs a structured warning and falls through
to a fresh run (right default for the very
first apply against a plan).
applyPlanPatch(planId, p) — partial-update helper for plans on disk;
best-effort, returns null on miss.
getApplyContextFromEnv() — agent-runner reads { planId, resumeSessionId }
from env vars set by `apply` so the spawn
boundary stays decoupled from the SDK call
site. Both vars optional — fresh runs work.
Wiring in agent-runner: pass `resumeSessionId` and `onSessionStart` into
`runAgent`. The latter fires once on system/init and patches the plan
with the SDK-assigned session id (handles both fresh runs AND forks from
a resumed session, which the SDK gives a new id).
Wiring in agent-interface: two new optional `runAgent` config fields
(`onSessionStart`, `resumeSessionId`). Surgical change — adds 6 lines to
the SDK message loop and 1 line to the query options.
Tests: +8 in agent-plans.test.ts (1319 total). Suite green.
Smoke tests:
$ wizard plan --json → planId X (no agentSessionId)
$ wizard apply --plan-id X --yes → first run; captures session
$ wizard apply --plan-id X --resume --yes → second run; resumes
$ wizard apply --plan-id Y --resume --yes → warns "no captured session", runs fresh
Stacked on #258. Closes the design-doc gap on mid-`apply` work loss
(SIGINT, network drop, terminal crash).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Vendor name "Claude SDK" in CLI help text
- Replaced "Claude SDK session" with "agent session" in the user-facing --resume yargs describe string.
Or push these changes by commenting:
@cursor push a5cfb236c3
Preview (a5cfb236c3)
diff --git a/bin.ts b/bin.ts
--- a/bin.ts
+++ b/bin.ts
@@ -2204,7 +2204,7 @@
},
resume: {
describe:
- 'resume the previous Claude SDK session captured against this plan (skip cold-start work after a SIGINT or crash)',
+ 'resume the previous agent session captured against this plan (skip cold-start work after a SIGINT or crash)',
type: 'boolean',
default: false,
},You can send follow-ups to the cloud agent here.
Applied via @cursor push command
a90692a
into
kelsonpw/wizard-mcp-plan-verify-list
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON, but it could not run because the branch was deleted or merged before autofix could start.
Reviewed by Cursor Bugbot for commit 8c54f5e. Configure here.
| // invoked against a plan with a captured agentSessionId. The SDK | ||
| // either rehydrates the conversation or, on a stale id, falls | ||
| // back to a fresh run — agent-runner clears the id in that case. | ||
| ...(config?.resumeSessionId && { resume: config.resumeSessionId }), |
There was a problem hiding this comment.
resume option missing from SDKQueryOptions type
Medium Severity
The new resume property is spread into the SDK query() options via ...(config?.resumeSessionId && { resume: config.resumeSessionId }), but the local SDKQueryOptions type (the single source of truth for what gets passed to the SDK) doesn't declare a resume field. The spread operator bypasses TypeScript's excess-property checking, so the property reaches the SDK at runtime with zero compile-time validation — a misspelling (e.g. Resume) or wrong value type would be silently accepted. Every other SDK option has a declared field in SDKQueryOptions; resume?: string belongs there too.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8c54f5e. Configure here.



Summary
Closes the design-doc gap on mid-
applywork loss. When the inner Claude SDK agent dies partway through (SIGINT, network drop, crashed terminal), today the user re-runsapplyand the agent has to redo all the cold-start work — SDK install, package detection, file reads, framework analysis. This PR captures the SDK's session id and persists it on the plan, soapply --resumerehydrates the conversation instead of starting fresh.Stacked on #258. Gap #1 from the work-loss analysis — the only remaining case (event-plan approval mid-agent) is a future plan/apply boundary refactor, not a session-resumption thing.
Surface
How it works
The fork-and-update on resume means the chain works across multiple interruptions — each
--resumepicks up from the most recent attempt.Smoke tests
What changed
src/lib/agent-plans.ts— extendsWizardPlanSchemawith optionalagentSessionId+agentSessionUpdatedAt. AddsapplyPlanPatch(planId, partial)for atomic on-disk updates andgetApplyContextFromEnv()for the env-var bridge.src/lib/agent-interface.ts— surgical addition: 2 optionalrunAgentconfig fields (onSessionStart,resumeSessionId). 6 lines in the SDK message loop to capture session_id from system/init; 1 line in the query options forresume:.src/lib/agent-runner.ts— reads env viagetApplyContextFromEnv(), passesresumeSessionId+onSessionStartintorunAgent. The handler patches the plan best-effort (failures logged, never break the run).bin.ts—--resumeflag on theapplycommand. When set, looks upagentSessionIdfrom the persisted plan and forwards via env. Surfaces a structuredresume_unavailablewarning when the plan has no captured session.Test plan
pnpm test— 1319 passed, 17 skipped (8 new tests in agent-plans.test.ts)pnpm tsc --noEmitcleanpnpm lintcleanapply --resumewithout a captured session warns + runs freshapply --resumepayload includesresumeSessionIdwhen plan has one--resume, watch SDK skip the cold-startOut of scope
onSessionStarthandler updates the plan with the new id. Handling explicit "session expired" surfacing is a future improvement.apply—wizard --agent --yes(without a plan) doesn't persist a session. Adding that needs a separate "ad-hoc resume" path; not requested.cc @amplitude/growth
🤖 Generated with Claude Code
Note
Medium Risk
Touches the
applyexecution path and Claude SDK invocation by persisting and replaying session IDs via env vars; failures should fall back to fresh runs but regressions could break non-interactiveapplyflows.Overview
Adds an
apply --resumeflag that, when a plan has a captured Claude SDKagentSessionId, re-runsapplyby resuming the prior SDK conversation to skip cold-start work; if no session is available it warns and runs fresh.Persists Claude SDK session IDs onto
WizardPlan(withagentSessionUpdatedAt) via a new best-effortapplyPlanPatch, and bridges resume context across theapplyspawn boundary usingAMPLITUDE_WIZARD_PLAN_ID/AMPLITUDE_WIZARD_RESUME_SESSION_ID.Extends
runAgentto acceptresumeSessionIdand anonSessionStarthook, forwardingresumeinto the SDK query options and capturingsession_idfrom the SDKsystem/initmessage; adds tests covering patching, env parsing, and schema back-compat.Reviewed by Cursor Bugbot for commit 8c54f5e. Bugbot is set up for automated code reviews on this repo. Configure here.