feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3) by kelsonpw · Pull Request #173 · amplitude/wizard

kelsonpw · 2026-04-21T06:38:44Z

Tracker: #143
Bet 2: Agent loop overhaul — prompt caching, three-phase Planner → Integrator → Instrumenter pipeline, structured status, real hooks, eval harness
Kill criterion: cache hit rate <40% after 2 weeks → revert

What changes for users

The wizard saves a snapshot of what files the agent has touched before Claude compacts its context, so it doesn't lose that state. Nothing user-visible yet — this is the persistence half of a round-trip. The follow-up slice (#127) reads that snapshot back; together they stop the wizard from re-writing files or skipping steps after a compaction.

Scope of this slice

New src/lib/agent-state.ts — per-attempt AgentState bag tracking modified files (deduped + sorted), last status code/detail, and compaction count. Serializes to /tmp/amplitude-wizard-state-<attemptId>.json via sync write with 0o600 perms. Schema versioned as amplitude-wizard-agent-state/1.
runAgent instantiates an AgentState per run, assigns an attempt-N id before each attempt, and resets it between retries.
StatusReporter.onStatus calls agentState.recordStatus so the persisted snapshot always carries the most recent status message.
+11 tests in agent-state.test.ts (dedup, sort, status tracking, compaction counter, JSON schema validity, tmpdir path, reset semantics).

How this advances Bet 2

Compaction drops earlier turns from the LLM's context window, which can erase the workflow step + list of files the agent has edited. Writing a small snapshot to disk before compaction runs gives a post-compaction recovery point that a future UserPromptSubmit hook can read. This slice lands the persistence primitive; restoration lands in slice 4 (hydrate, #127).

Scope note — recovered chain

The original PR (#126) extended createPreCompactHook and createPostToolUseHook from Bet 1's observability spine. Since this recovered Bet 2 chain is rooted on main (without obs-spine), those hook factory additions depend on Bet 1's ToolCallCounters landing first (PR #149). This slice ships the standalone AgentState primitive + StatusReporter wiring so later slices can attach hooks with zero merge friction once obs-spine merges.

Tests

+11 new (agent-state.test.ts): dedup, sort, last-status tracking, compaction counter, schema version, snapshot path, persist round-trip, reset. 1100 total passing (was 1090 on slice 2 recovered).

Test plan

pnpm test green
pnpm build smoke passes
pnpm try locally → no runtime regression; status updates still flow through the report_status tool from slice 2

Deferred to later Bet 2 slices

UserPromptSubmit hydration hook (slice 4 — #127) reads the snapshot back after compaction
Hook-factory wiring for PreCompact / PostToolUse (needs Bet 1 ToolCallCounters)

Recreated from #126 onto the recovered Bet 2 chain after the 2026-04-20 history reset.

cc @amplitude/growth

Note

Medium Risk
Introduces new disk persistence to the OS tmpdir and wires new state tracking into the agent retry/status flow; failures are caught/logged but incorrect attempt IDs or snapshot writes could affect future recovery behavior.

Overview
Adds a new AgentState module that tracks per-attempt recovery context (deduped/sorted modified files, last structured status, compaction count) and can persist a schema-versioned JSON snapshot to a deterministic tmpdir path with restrictive permissions.

runAgent now instantiates this state, assigns an attempt-N id per retry attempt, records status updates via StatusReporter.onStatus, and resets the bag between retries. New Vitest coverage validates snapshot contents, ordering/deduping, tmpdir persistence round-trip, and reset behavior.

^{Reviewed by Cursor Bugbot for commit 2bee709. Bugbot is set up for automated code reviews on this repo. Configure here.}

Enable prompt caching via excludeDynamicSections on the systemPrompt block passed to query(). Strips per-run / per-machine sections (date, cwd, git status) from the Claude Code preset so the static prefix is byte-identical across turns — the Claude Agent SDK then attaches cache_control internally. Verified upstream that per-run values (projectApiKey, projectId, framework version) already live in the user-message prompt built by buildIntegrationPrompt, not in the system prefix, so the prefix is cacheable as-is with no refactor. Measurement already lives in the Bet 1 observability spine: the `wizard cli: agent completed` event now includes `cache read input tokens`, `cache creation 5m/1h tokens`, and `cache hit rate`. Success threshold per the Bet 2 brief: ≥50% cache hit rate on run 2+. Kill criterion: <40% after two weeks → revert. Defers to follow-up slices: three-phase Planner → Integrator → Instrumenter pipeline, structured status via report_status MCP tool (#125), real PreCompact / PostToolUse / UserPromptSubmit hooks, eval harness. Recreated from closed #123 against flattened open-source main. The original was auto-closed during the 2026-04-20 history reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds report_status(kind, code, detail) in-process MCP tool replacing the [STATUS] / [ERROR-MCP-MISSING] / [ERROR-RESOURCE-MISSING] text-marker regex scanner. Gives the --agent NDJSON surface and outro screen a typed source of truth instead of scraping plaintext. - New report_status MCP tool in src/lib/wizard-tools.ts with Zod validation and 5-calls/second-per-(kind,code) rate limit - StatusReporter interface + _activeStatusReporter slot wired per-run - Deleted legacy text-marker scanner from src/lib/agent-interface.ts - New commandment instructing agent to use the tool - +6 tests in report-status.test.ts; removed 2 deprecated regression tests Bet 2 slice 2. Recreated from closed #125 onto kelsonpw/agent-loop-caching-recovered after the 2026-04-20 history reset.

Adds per-attempt AgentState bag that tracks modified files, last structured status, and compaction count. Serializes to a deterministic tmpdir path so a future post-compaction UserPromptSubmit hook can hydrate the agent back with the context that compaction dropped. - New src/lib/agent-state.ts — AgentState class with setAttemptId, recordModifiedFile, recordStatus, recordCompaction, snapshot, persist, snapshotPath, reset. Schema versioned as amplitude-wizard-agent-state/1. - Instantiated per runAgent call; reset between retry attempts. - Wired into StatusReporter.onStatus so the persisted snapshot always carries the most recent status message. - +11 tests (agent-state.test.ts) covering dedup/sort, status tracking, compaction count, JSON schema, tmpdir path, reset semantics. Bet 2 slice 3. Recreated from #126 onto kelsonpw/agent-loop-status-recovered after the 2026-04-20 history reset. Hook-factory wiring for PreCompact and PostToolUse depends on Bet 1 ToolCallCounters landing first (PR #149); this slice ships the persistence primitive so later slices can wire it up with zero merge friction.

github-actions · 2026-04-21T06:38:56Z

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

/wizard-ci all

Test all apps in a directory:

/wizard-ci django
/wizard-ci fastapi
/wizard-ci flask
/wizard-ci javascript-node
/wizard-ci javascript-web
/wizard-ci next-js
/wizard-ci python
/wizard-ci react-router
/wizard-ci vue

Test an individual app:

/wizard-ci django/django3-saas
/wizard-ci fastapi/fastapi3-ai-saas
/wizard-ci flask/flask3-social-media

Show more apps

/wizard-ci javascript-node/express-todo
/wizard-ci javascript-node/fastify-blog
/wizard-ci javascript-node/hono-links
/wizard-ci javascript-node/koa-notes
/wizard-ci javascript-node/native-http-contacts
/wizard-ci javascript-web/saas-dashboard
/wizard-ci next-js/15-app-router-saas
/wizard-ci next-js/15-app-router-todo
/wizard-ci next-js/15-pages-router-saas
/wizard-ci next-js/15-pages-router-todo
/wizard-ci python/meeting-summarizer
/wizard-ci react-router/react-router-v7-project
/wizard-ci react-router/rrv7-starter
/wizard-ci react-router/saas-template
/wizard-ci react-router/shopper
/wizard-ci vue/movies

Results will be posted here when complete.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Snapshot path collides across concurrent wizard instances
- Included the per-session runId (a UUID) in the snapshot filename so concurrent wizard instances write to distinct files, matching the disambiguation pattern already used in session-checkpoint.ts.

Or push these changes by commenting:

@cursor push ed55184860

Preview (ed55184860)

diff --git a/src/lib/__tests__/agent-state.test.ts b/src/lib/__tests__/agent-state.test.ts
--- a/src/lib/__tests__/agent-state.test.ts
+++ b/src/lib/__tests__/agent-state.test.ts
@@ -25,7 +25,7 @@
   const attemptId = 'att-xyz';
   const snapshotPath = join(
     tmpdir(),
-    `amplitude-wizard-state-${attemptId}.json`,
+    `amplitude-wizard-state-run-abc-${attemptId}.json`,
   );
 
   beforeEach(() => {

diff --git a/src/lib/agent-state.ts b/src/lib/agent-state.ts
--- a/src/lib/agent-state.ts
+++ b/src/lib/agent-state.ts
@@ -81,7 +81,8 @@
 
   snapshotPath(): string {
     const id = this.attemptId ?? 'unknown';
-    return join(tmpdir(), `amplitude-wizard-state-${id}.json`);
+    const run = getRunId() ?? 'unknown';
+    return join(tmpdir(), `amplitude-wizard-state-${run}-${id}.json`);
   }
 
   reset(): void {

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit 2bee709. Configure here.}

cursor · 2026-04-21T06:43:28Z

+  snapshotPath(): string {
+    const id = this.attemptId ?? 'unknown';
+    return join(tmpdir(), `amplitude-wizard-state-${id}.json`);
+  }


Snapshot path collides across concurrent wizard instances

Medium Severity

snapshotPath() derives the filename solely from attemptId (e.g. attempt-0), which is identical across every wizard run. Concurrent or overlapping wizard instances will read/write the same file, silently clobbering each other's state. The existing session-checkpoint.ts avoids this by hashing installDir into the filename. Although the serialized snapshot contains runId, the file path itself has no run-specific component, so the follow-up hydration hook (#127) could load stale data from a different run.

Additional Locations (1)

src/lib/agent-interface.ts#L1204-L1205

^{Reviewed by Cursor Bugbot for commit 2bee709. Configure here.}

kelsonpw and others added 3 commits April 20, 2026 23:19

kelsonpw requested a review from a team as a code owner April 21, 2026 06:38

kelsonpw added the futureproof bet Part of the wizard vision bets rolled up in issue #143 label Apr 21, 2026

cursor Bot reviewed Apr 21, 2026

View reviewed changes

This was referenced Apr 21, 2026

feat: PreCompact persists agent state (Bet 2 slice 3) #126

Closed

Wizard vision — rollup tracker #143

Closed

kelsonpw force-pushed the kelsonpw/agent-loop-status-recovered branch from 20700bd to d62adbc Compare April 26, 2026 01:20

kelsonpw deleted the branch kelsonpw/agent-loop-status-recovered April 26, 2026 01:25

kelsonpw closed this Apr 26, 2026

kelsonpw mentioned this pull request Apr 26, 2026

feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3) #267

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173

feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173
kelsonpw wants to merge 3 commits intokelsonpw/agent-loop-status-recoveredfrom
kelsonpw/agent-loop-precompact-recovered

kelsonpw commented Apr 21, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

cursor Bot left a comment •

edited

Loading

Uh oh!

cursor Bot Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kelsonpw commented Apr 21, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes for users

Scope of this slice

How this advances Bet 2

Scope note — recovered chain

Tests

Test plan

Deferred to later Bet 2 slices

Uh oh!

github-actions Bot commented Apr 21, 2026

🧙 Wizard CI

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 21, 2026

Choose a reason for hiding this comment

Snapshot path collides across concurrent wizard instances

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kelsonpw commented Apr 21, 2026 •

edited by cursor Bot

Loading

cursor Bot left a comment •

edited

Loading