Skip to content

fix(durable): episode-scoped stable checkpoint keying#497

Merged
cdayAI merged 1 commit into
mainfrom
claude/security-code-audit-Nq8Tw
May 31, 2026
Merged

fix(durable): episode-scoped stable checkpoint keying#497
cdayAI merged 1 commit into
mainfrom
claude/security-code-audit-Nq8Tw

Conversation

@cdayAI
Copy link
Copy Markdown
Owner

@cdayAI cdayAI commented May 31, 2026

A correctness fix for durable execution, found while scoping Phase 2.

The bug

Phase 1 (#472) keyed checkpoints on Agent.name — but name carries a per-process random uuid suffix (f"{role}-{depth}-{uuid4().hex[:6]}", for blackboard/agent-bus uniqueness). On a real fresh-process resume, run_goal builds a new orchestrator with a new random name, so latest(goal_id, self.name) never matched → resume silently fell back to warm-restart. Phase 1's resume only worked in its test because the test pinned agent.name. In production the feature didn't actually resume.

Compounding it: run_goal_best_of_n runs N sequential run_goal calls under the same goal_id (distinct episodes), so a naive "stable id per goal" fix would make attempt 2 resume from attempt 1's checkpoint.

The fix (episode-scoped, stable keying)

Key checkpoints on (goal_id, episode_id, checkpoint_id):

  • checkpoint_id = "{role}-{depth}" — a new Agent property, stable across processes (one orchestrator per episode), distinct from the random name.
  • episode_id threaded through SwarmContext (default 0), set from world.start_episode() in run_goal. Discriminates best-of-N attempts so they never cross-resume.
  • checkpoints table gains an episode_id column (still its own table — no world-model schema migration); save/latest/_prune updated.

Tests

  • test_resume_works_without_pinning_name — a fresh agent (new random name) resumes from the prior checkpoint via the stable id. This is the exact case Phase 1 silently failed; it now passes without test-only pinning.
  • test_episode_scoping_no_cross_resume — a checkpoint under episode 1 is not picked up resuming episode 2 (the best-of-N safety property).
  • Existing store/budget/gating tests updated for the new signature.

Full suite 2459 passed, 0 regressions; durable still off by default; ruff clean.

This is foundational for Phase 2 (swarm child records key off the now-stable parent identity), which I'll do next. Phase 2 design is on #472 / tracked in #396.

https://claude.ai/code/session_01V4m74QKcM4ERqAu3rbkr6B


Generated by Claude Code

… bug)

Phase 1 (#472) keyed checkpoints on Agent.name, which carries a per-process
random uuid suffix (for blackboard/agent-bus uniqueness). On a real
fresh-process resume, run_goal builds a new orchestrator with a NEW random
name, so latest(goal_id, self.name) never matched -> resume silently fell back
to warm-restart. The Phase-1 test only passed because it PINNED agent.name.

Fix: key on (goal_id, episode_id, checkpoint_id):
- checkpoint_id = '{role}-{depth}' (stable; one orchestrator per episode), a
  new Agent property distinct from the random .
- episode_id threaded through SwarmContext (default 0) and set from
  world.start_episode() in run_goal. This discriminates best-of-N attempts,
  which run sequential run_goal calls under the SAME goal_id but DISTINCT
  episodes -- without it, attempt 2 would resume from attempt 1's checkpoint.
- checkpoints table gains an episode_id column (own table, still no
  world-model schema migration); save/latest/_prune updated.

Tests: production-shape resume now works WITHOUT pinning the name; episode
scoping proves no cross-resume between attempts. Full suite 2459 passed, 0
regressions; durable still off by default. Foundational for Phase 2 (swarm
child records key off the now-stable parent identity).
@cdayAI cdayAI marked this pull request as ready for review May 31, 2026 15:37
@cdayAI cdayAI merged commit 75042b1 into main May 31, 2026
12 checks passed
@cdayAI cdayAI deleted the claude/security-code-audit-Nq8Tw branch May 31, 2026 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants