Skip to content

feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173

Closed
kelsonpw wants to merge 3 commits intokelsonpw/agent-loop-status-recoveredfrom
kelsonpw/agent-loop-precompact-recovered
Closed

feat: AgentState persistence for PreCompact recovery (Bet 2 slice 3)#173
kelsonpw wants to merge 3 commits intokelsonpw/agent-loop-status-recoveredfrom
kelsonpw/agent-loop-precompact-recovered

Conversation

@kelsonpw
Copy link
Copy Markdown
Collaborator

@kelsonpw kelsonpw commented Apr 21, 2026

Tracker: #143
Bet 2: Agent loop overhaul — prompt caching, three-phase Planner → Integrator → Instrumenter pipeline, structured status, real hooks, eval harness
Kill criterion: cache hit rate <40% after 2 weeks → revert

What changes for users

The wizard saves a snapshot of what files the agent has touched before Claude compacts its context, so it doesn't lose that state. Nothing user-visible yet — this is the persistence half of a round-trip. The follow-up slice (#127) reads that snapshot back; together they stop the wizard from re-writing files or skipping steps after a compaction.

Scope of this slice

  • New src/lib/agent-state.ts — per-attempt AgentState bag tracking modified files (deduped + sorted), last status code/detail, and compaction count. Serializes to /tmp/amplitude-wizard-state-<attemptId>.json via sync write with 0o600 perms. Schema versioned as amplitude-wizard-agent-state/1.
  • runAgent instantiates an AgentState per run, assigns an attempt-N id before each attempt, and resets it between retries.
  • StatusReporter.onStatus calls agentState.recordStatus so the persisted snapshot always carries the most recent status message.
  • +11 tests in agent-state.test.ts (dedup, sort, status tracking, compaction counter, JSON schema validity, tmpdir path, reset semantics).

How this advances Bet 2

Compaction drops earlier turns from the LLM's context window, which can erase the workflow step + list of files the agent has edited. Writing a small snapshot to disk before compaction runs gives a post-compaction recovery point that a future UserPromptSubmit hook can read. This slice lands the persistence primitive; restoration lands in slice 4 (hydrate, #127).


Scope note — recovered chain

The original PR (#126) extended createPreCompactHook and createPostToolUseHook from Bet 1's observability spine. Since this recovered Bet 2 chain is rooted on main (without obs-spine), those hook factory additions depend on Bet 1's ToolCallCounters landing first (PR #149). This slice ships the standalone AgentState primitive + StatusReporter wiring so later slices can attach hooks with zero merge friction once obs-spine merges.

Tests

+11 new (agent-state.test.ts): dedup, sort, last-status tracking, compaction counter, schema version, snapshot path, persist round-trip, reset. 1100 total passing (was 1090 on slice 2 recovered).

Test plan

  • pnpm test green
  • pnpm build smoke passes
  • pnpm try locally → no runtime regression; status updates still flow through the report_status tool from slice 2

Deferred to later Bet 2 slices

  • UserPromptSubmit hydration hook (slice 4 — #127) reads the snapshot back after compaction
  • Hook-factory wiring for PreCompact / PostToolUse (needs Bet 1 ToolCallCounters)

Recreated from #126 onto the recovered Bet 2 chain after the 2026-04-20 history reset.

cc @amplitude/growth


Note

Medium Risk
Introduces new disk persistence to the OS tmpdir and wires new state tracking into the agent retry/status flow; failures are caught/logged but incorrect attempt IDs or snapshot writes could affect future recovery behavior.

Overview
Adds a new AgentState module that tracks per-attempt recovery context (deduped/sorted modified files, last structured status, compaction count) and can persist a schema-versioned JSON snapshot to a deterministic tmpdir path with restrictive permissions.

runAgent now instantiates this state, assigns an attempt-N id per retry attempt, records status updates via StatusReporter.onStatus, and resets the bag between retries. New Vitest coverage validates snapshot contents, ordering/deduping, tmpdir persistence round-trip, and reset behavior.

Reviewed by Cursor Bugbot for commit 2bee709. Bugbot is set up for automated code reviews on this repo. Configure here.

kelsonpw and others added 3 commits April 20, 2026 23:19
Enable prompt caching via excludeDynamicSections on the systemPrompt
block passed to query(). Strips per-run / per-machine sections (date,
cwd, git status) from the Claude Code preset so the static prefix is
byte-identical across turns — the Claude Agent SDK then attaches
cache_control internally.

Verified upstream that per-run values (projectApiKey, projectId,
framework version) already live in the user-message prompt built by
buildIntegrationPrompt, not in the system prefix, so the prefix is
cacheable as-is with no refactor.

Measurement already lives in the Bet 1 observability spine: the
`wizard cli: agent completed` event now includes `cache read input
tokens`, `cache creation 5m/1h tokens`, and `cache hit rate`.
Success threshold per the Bet 2 brief: ≥50% cache hit rate on run 2+.
Kill criterion: <40% after two weeks → revert.

Defers to follow-up slices: three-phase Planner → Integrator →
Instrumenter pipeline, structured status via report_status MCP tool
(#125), real PreCompact / PostToolUse / UserPromptSubmit hooks,
eval harness.

Recreated from closed #123 against flattened open-source main. The
original was auto-closed during the 2026-04-20 history reset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds report_status(kind, code, detail) in-process MCP tool replacing the
[STATUS] / [ERROR-MCP-MISSING] / [ERROR-RESOURCE-MISSING] text-marker regex
scanner. Gives the --agent NDJSON surface and outro screen a typed source
of truth instead of scraping plaintext.

- New report_status MCP tool in src/lib/wizard-tools.ts with Zod validation
  and 5-calls/second-per-(kind,code) rate limit
- StatusReporter interface + _activeStatusReporter slot wired per-run
- Deleted legacy text-marker scanner from src/lib/agent-interface.ts
- New commandment instructing agent to use the tool
- +6 tests in report-status.test.ts; removed 2 deprecated regression tests

Bet 2 slice 2. Recreated from closed #125 onto kelsonpw/agent-loop-caching-recovered
after the 2026-04-20 history reset.
Adds per-attempt AgentState bag that tracks modified files, last structured
status, and compaction count. Serializes to a deterministic tmpdir path so a
future post-compaction UserPromptSubmit hook can hydrate the agent back with
the context that compaction dropped.

- New src/lib/agent-state.ts — AgentState class with setAttemptId,
  recordModifiedFile, recordStatus, recordCompaction, snapshot, persist,
  snapshotPath, reset. Schema versioned as amplitude-wizard-agent-state/1.
- Instantiated per runAgent call; reset between retry attempts.
- Wired into StatusReporter.onStatus so the persisted snapshot always carries
  the most recent status message.
- +11 tests (agent-state.test.ts) covering dedup/sort, status tracking,
  compaction count, JSON schema, tmpdir path, reset semantics.

Bet 2 slice 3. Recreated from #126 onto kelsonpw/agent-loop-status-recovered
after the 2026-04-20 history reset. Hook-factory wiring for PreCompact and
PostToolUse depends on Bet 1 ToolCallCounters landing first (PR #149); this
slice ships the persistence primitive so later slices can wire it up with
zero merge friction.
@kelsonpw kelsonpw requested a review from a team as a code owner April 21, 2026 06:38
@kelsonpw kelsonpw added the futureproof bet Part of the wizard vision bets rolled up in issue #143 label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci django
  • /wizard-ci fastapi
  • /wizard-ci flask
  • /wizard-ci javascript-node
  • /wizard-ci javascript-web
  • /wizard-ci next-js
  • /wizard-ci python
  • /wizard-ci react-router
  • /wizard-ci vue

Test an individual app:

  • /wizard-ci django/django3-saas
  • /wizard-ci fastapi/fastapi3-ai-saas
  • /wizard-ci flask/flask3-social-media
Show more apps
  • /wizard-ci javascript-node/express-todo
  • /wizard-ci javascript-node/fastify-blog
  • /wizard-ci javascript-node/hono-links
  • /wizard-ci javascript-node/koa-notes
  • /wizard-ci javascript-node/native-http-contacts
  • /wizard-ci javascript-web/saas-dashboard
  • /wizard-ci next-js/15-app-router-saas
  • /wizard-ci next-js/15-app-router-todo
  • /wizard-ci next-js/15-pages-router-saas
  • /wizard-ci next-js/15-pages-router-todo
  • /wizard-ci python/meeting-summarizer
  • /wizard-ci react-router/react-router-v7-project
  • /wizard-ci react-router/rrv7-starter
  • /wizard-ci react-router/saas-template
  • /wizard-ci react-router/shopper
  • /wizard-ci vue/movies

Results will be posted here when complete.

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Snapshot path collides across concurrent wizard instances
    • Included the per-session runId (a UUID) in the snapshot filename so concurrent wizard instances write to distinct files, matching the disambiguation pattern already used in session-checkpoint.ts.

Create PR

Or push these changes by commenting:

@cursor push ed55184860
Preview (ed55184860)
diff --git a/src/lib/__tests__/agent-state.test.ts b/src/lib/__tests__/agent-state.test.ts
--- a/src/lib/__tests__/agent-state.test.ts
+++ b/src/lib/__tests__/agent-state.test.ts
@@ -25,7 +25,7 @@
   const attemptId = 'att-xyz';
   const snapshotPath = join(
     tmpdir(),
-    `amplitude-wizard-state-${attemptId}.json`,
+    `amplitude-wizard-state-run-abc-${attemptId}.json`,
   );
 
   beforeEach(() => {

diff --git a/src/lib/agent-state.ts b/src/lib/agent-state.ts
--- a/src/lib/agent-state.ts
+++ b/src/lib/agent-state.ts
@@ -81,7 +81,8 @@
 
   snapshotPath(): string {
     const id = this.attemptId ?? 'unknown';
-    return join(tmpdir(), `amplitude-wizard-state-${id}.json`);
+    const run = getRunId() ?? 'unknown';
+    return join(tmpdir(), `amplitude-wizard-state-${run}-${id}.json`);
   }
 
   reset(): void {

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit 2bee709. Configure here.

Comment thread src/lib/agent-state.ts
snapshotPath(): string {
const id = this.attemptId ?? 'unknown';
return join(tmpdir(), `amplitude-wizard-state-${id}.json`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot path collides across concurrent wizard instances

Medium Severity

snapshotPath() derives the filename solely from attemptId (e.g. attempt-0), which is identical across every wizard run. Concurrent or overlapping wizard instances will read/write the same file, silently clobbering each other's state. The existing session-checkpoint.ts avoids this by hashing installDir into the filename. Although the serialized snapshot contains runId, the file path itself has no run-specific component, so the follow-up hydration hook (#127) could load stale data from a different run.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2bee709. Configure here.

@kelsonpw kelsonpw force-pushed the kelsonpw/agent-loop-status-recovered branch from 20700bd to d62adbc Compare April 26, 2026 01:20
@kelsonpw kelsonpw deleted the branch kelsonpw/agent-loop-status-recovered April 26, 2026 01:25
@kelsonpw kelsonpw closed this Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

futureproof bet Part of the wizard vision bets rolled up in issue #143

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants