feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173
feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173kelsonpw wants to merge 3 commits intokelsonpw/agent-loop-status-recoveredfrom
Conversation
Enable prompt caching via excludeDynamicSections on the systemPrompt block passed to query(). Strips per-run / per-machine sections (date, cwd, git status) from the Claude Code preset so the static prefix is byte-identical across turns — the Claude Agent SDK then attaches cache_control internally. Verified upstream that per-run values (projectApiKey, projectId, framework version) already live in the user-message prompt built by buildIntegrationPrompt, not in the system prefix, so the prefix is cacheable as-is with no refactor. Measurement already lives in the Bet 1 observability spine: the `wizard cli: agent completed` event now includes `cache read input tokens`, `cache creation 5m/1h tokens`, and `cache hit rate`. Success threshold per the Bet 2 brief: ≥50% cache hit rate on run 2+. Kill criterion: <40% after two weeks → revert. Defers to follow-up slices: three-phase Planner → Integrator → Instrumenter pipeline, structured status via report_status MCP tool (#125), real PreCompact / PostToolUse / UserPromptSubmit hooks, eval harness. Recreated from closed #123 against flattened open-source main. The original was auto-closed during the 2026-04-20 history reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds report_status(kind, code, detail) in-process MCP tool replacing the [STATUS] / [ERROR-MCP-MISSING] / [ERROR-RESOURCE-MISSING] text-marker regex scanner. Gives the --agent NDJSON surface and outro screen a typed source of truth instead of scraping plaintext. - New report_status MCP tool in src/lib/wizard-tools.ts with Zod validation and 5-calls/second-per-(kind,code) rate limit - StatusReporter interface + _activeStatusReporter slot wired per-run - Deleted legacy text-marker scanner from src/lib/agent-interface.ts - New commandment instructing agent to use the tool - +6 tests in report-status.test.ts; removed 2 deprecated regression tests Bet 2 slice 2. Recreated from closed #125 onto kelsonpw/agent-loop-caching-recovered after the 2026-04-20 history reset.
Adds per-attempt AgentState bag that tracks modified files, last structured status, and compaction count. Serializes to a deterministic tmpdir path so a future post-compaction UserPromptSubmit hook can hydrate the agent back with the context that compaction dropped. - New src/lib/agent-state.ts — AgentState class with setAttemptId, recordModifiedFile, recordStatus, recordCompaction, snapshot, persist, snapshotPath, reset. Schema versioned as amplitude-wizard-agent-state/1. - Instantiated per runAgent call; reset between retry attempts. - Wired into StatusReporter.onStatus so the persisted snapshot always carries the most recent status message. - +11 tests (agent-state.test.ts) covering dedup/sort, status tracking, compaction count, JSON schema, tmpdir path, reset semantics. Bet 2 slice 3. Recreated from #126 onto kelsonpw/agent-loop-status-recovered after the 2026-04-20 history reset. Hook-factory wiring for PreCompact and PostToolUse depends on Bet 1 ToolCallCounters landing first (PR #149); this slice ships the persistence primitive so later slices can wire it up with zero merge friction.
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Snapshot path collides across concurrent wizard instances
- Included the per-session
runId(a UUID) in the snapshot filename so concurrent wizard instances write to distinct files, matching the disambiguation pattern already used insession-checkpoint.ts.
- Included the per-session
Or push these changes by commenting:
@cursor push ed55184860
Preview (ed55184860)
diff --git a/src/lib/__tests__/agent-state.test.ts b/src/lib/__tests__/agent-state.test.ts
--- a/src/lib/__tests__/agent-state.test.ts
+++ b/src/lib/__tests__/agent-state.test.ts
@@ -25,7 +25,7 @@
const attemptId = 'att-xyz';
const snapshotPath = join(
tmpdir(),
- `amplitude-wizard-state-${attemptId}.json`,
+ `amplitude-wizard-state-run-abc-${attemptId}.json`,
);
beforeEach(() => {
diff --git a/src/lib/agent-state.ts b/src/lib/agent-state.ts
--- a/src/lib/agent-state.ts
+++ b/src/lib/agent-state.ts
@@ -81,7 +81,8 @@
snapshotPath(): string {
const id = this.attemptId ?? 'unknown';
- return join(tmpdir(), `amplitude-wizard-state-${id}.json`);
+ const run = getRunId() ?? 'unknown';
+ return join(tmpdir(), `amplitude-wizard-state-${run}-${id}.json`);
}
reset(): void {You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 2bee709. Configure here.
| snapshotPath(): string { | ||
| const id = this.attemptId ?? 'unknown'; | ||
| return join(tmpdir(), `amplitude-wizard-state-${id}.json`); | ||
| } |
There was a problem hiding this comment.
Snapshot path collides across concurrent wizard instances
Medium Severity
snapshotPath() derives the filename solely from attemptId (e.g. attempt-0), which is identical across every wizard run. Concurrent or overlapping wizard instances will read/write the same file, silently clobbering each other's state. The existing session-checkpoint.ts avoids this by hashing installDir into the filename. Although the serialized snapshot contains runId, the file path itself has no run-specific component, so the follow-up hydration hook (#127) could load stale data from a different run.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2bee709. Configure here.
20700bd to
d62adbc
Compare



What changes for users
The wizard saves a snapshot of what files the agent has touched before Claude compacts its context, so it doesn't lose that state. Nothing user-visible yet — this is the persistence half of a round-trip. The follow-up slice (#127) reads that snapshot back; together they stop the wizard from re-writing files or skipping steps after a compaction.
Scope of this slice
src/lib/agent-state.ts— per-attemptAgentStatebag tracking modified files (deduped + sorted), last status code/detail, and compaction count. Serializes to/tmp/amplitude-wizard-state-<attemptId>.jsonvia sync write with0o600perms. Schema versioned asamplitude-wizard-agent-state/1.runAgentinstantiates anAgentStateper run, assigns anattempt-Nid before each attempt, and resets it between retries.StatusReporter.onStatuscallsagentState.recordStatusso the persisted snapshot always carries the most recent status message.agent-state.test.ts(dedup, sort, status tracking, compaction counter, JSON schema validity, tmpdir path, reset semantics).How this advances Bet 2
Compaction drops earlier turns from the LLM's context window, which can erase the workflow step + list of files the agent has edited. Writing a small snapshot to disk before compaction runs gives a post-compaction recovery point that a future
UserPromptSubmithook can read. This slice lands the persistence primitive; restoration lands in slice 4 (hydrate, #127).Scope note — recovered chain
The original PR (#126) extended
createPreCompactHookandcreatePostToolUseHookfrom Bet 1's observability spine. Since this recovered Bet 2 chain is rooted on main (without obs-spine), those hook factory additions depend on Bet 1'sToolCallCounterslanding first (PR #149). This slice ships the standaloneAgentStateprimitive +StatusReporterwiring so later slices can attach hooks with zero merge friction once obs-spine merges.Tests
+11 new (
agent-state.test.ts): dedup, sort, last-status tracking, compaction counter, schema version, snapshot path, persist round-trip, reset. 1100 total passing (was 1090 on slice 2 recovered).Test plan
pnpm testgreenpnpm buildsmoke passespnpm trylocally → no runtime regression; status updates still flow through thereport_statustool from slice 2Deferred to later Bet 2 slices
UserPromptSubmithydration hook (slice 4 —#127) reads the snapshot back after compactionToolCallCounters)Recreated from #126 onto the recovered Bet 2 chain after the 2026-04-20 history reset.
cc @amplitude/growth
Note
Medium Risk
Introduces new disk persistence to the OS tmpdir and wires new state tracking into the agent retry/status flow; failures are caught/logged but incorrect attempt IDs or snapshot writes could affect future recovery behavior.
Overview
Adds a new
AgentStatemodule that tracks per-attempt recovery context (deduped/sorted modified files, last structured status, compaction count) and can persist a schema-versioned JSON snapshot to a deterministic tmpdir path with restrictive permissions.runAgentnow instantiates this state, assigns anattempt-Nid per retry attempt, records status updates viaStatusReporter.onStatus, and resets the bag between retries. New Vitest coverage validates snapshot contents, ordering/deduping, tmpdir persistence round-trip, and reset behavior.Reviewed by Cursor Bugbot for commit 2bee709. Bugbot is set up for automated code reviews on this repo. Configure here.