fix(durable): episode-scoped stable checkpoint keying by cdayAI · Pull Request #497 · cdayAI/Maverick

cdayAI · 2026-05-31T15:21:06Z

A correctness fix for durable execution, found while scoping Phase 2.

The bug

Phase 1 (#472) keyed checkpoints on Agent.name — but name carries a per-process random uuid suffix (f"{role}-{depth}-{uuid4().hex[:6]}", for blackboard/agent-bus uniqueness). On a real fresh-process resume, run_goal builds a new orchestrator with a new random name, so latest(goal_id, self.name) never matched → resume silently fell back to warm-restart. Phase 1's resume only worked in its test because the test pinned agent.name. In production the feature didn't actually resume.

Compounding it: run_goal_best_of_n runs N sequential run_goal calls under the same goal_id (distinct episodes), so a naive "stable id per goal" fix would make attempt 2 resume from attempt 1's checkpoint.

The fix (episode-scoped, stable keying)

Key checkpoints on (goal_id, episode_id, checkpoint_id):

checkpoint_id = "{role}-{depth}" — a new Agent property, stable across processes (one orchestrator per episode), distinct from the random name.
episode_id threaded through SwarmContext (default 0), set from world.start_episode() in run_goal. Discriminates best-of-N attempts so they never cross-resume.
checkpoints table gains an episode_id column (still its own table — no world-model schema migration); save/latest/_prune updated.

Tests

test_resume_works_without_pinning_name — a fresh agent (new random name) resumes from the prior checkpoint via the stable id. This is the exact case Phase 1 silently failed; it now passes without test-only pinning.
test_episode_scoping_no_cross_resume — a checkpoint under episode 1 is not picked up resuming episode 2 (the best-of-N safety property).
Existing store/budget/gating tests updated for the new signature.

Full suite 2459 passed, 0 regressions; durable still off by default; ruff clean.

This is foundational for Phase 2 (swarm child records key off the now-stable parent identity), which I'll do next. Phase 2 design is on #472 / tracked in #396.

https://claude.ai/code/session_01V4m74QKcM4ERqAu3rbkr6B

Generated by Claude Code

… bug) Phase 1 (#472) keyed checkpoints on Agent.name, which carries a per-process random uuid suffix (for blackboard/agent-bus uniqueness). On a real fresh-process resume, run_goal builds a new orchestrator with a NEW random name, so latest(goal_id, self.name) never matched -> resume silently fell back to warm-restart. The Phase-1 test only passed because it PINNED agent.name. Fix: key on (goal_id, episode_id, checkpoint_id): - checkpoint_id = '{role}-{depth}' (stable; one orchestrator per episode), a new Agent property distinct from the random . - episode_id threaded through SwarmContext (default 0) and set from world.start_episode() in run_goal. This discriminates best-of-N attempts, which run sequential run_goal calls under the SAME goal_id but DISTINCT episodes -- without it, attempt 2 would resume from attempt 1's checkpoint. - checkpoints table gains an episode_id column (own table, still no world-model schema migration); save/latest/_prune updated. Tests: production-shape resume now works WITHOUT pinning the name; episode scoping proves no cross-resume between attempts. Full suite 2459 passed, 0 regressions; durable still off by default. Foundational for Phase 2 (swarm child records key off the now-stable parent identity).

cdayAI marked this pull request as ready for review May 31, 2026 15:37

cdayAI merged commit 75042b1 into main May 31, 2026
12 checks passed

cdayAI deleted the claude/security-code-audit-Nq8Tw branch May 31, 2026 18:23

cdayAI mentioned this pull request May 31, 2026

feat(durable): Phase 2 swarm-tree checkpointing #506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(durable): episode-scoped stable checkpoint keying#497

fix(durable): episode-scoped stable checkpoint keying#497
cdayAI merged 1 commit into
mainfrom
claude/security-code-audit-Nq8Tw

cdayAI commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cdayAI commented May 31, 2026

The bug

The fix (episode-scoped, stable keying)

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants