fix(replay): seed step_counters on fork so replay steps continue after inherited prefix by risjai · Pull Request #164 · agentoptics/rewind

risjai · 2026-05-03T12:48:32Z

Summary

Bug: engine.fork() created the timeline row but didn't seed step_counters, so a runner's first replayed step landed at step_number=1. With PR fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps #162's owned-over-inherited dedup that shadowed the original turn-1..N steps and pushed the inherited turn-N user message to the END of the timeline. Operators saw "nothing after the edited question" because the agent's reply was above the question chronologically.
Fix: seed step_counters[fork_id] = fork_at_step right after create_timeline. First runner-recorded step on the fork now gets fork_at_step + 1. Chronological order preserved.
Live repro on dev1 session ray-agent-a18ac577: clicking Run replay on edited-fork at step 6 used to produce owned steps [1, 2, 3, 4, 5] + inherited 6. Now produces inherited 6 + owned [7, 8, 9, 10, 11].
50 steps across 21 broken replay-* forks repaired in place via scripts/repair-fork-step-numbering.py (idempotent; only touches replay-prefixed labels — leaves edited-fork / fork-at-N alone since their owned step at step_number == fork_at_step + 1 is intentional promote-and-mutate semantics).

Test plan

cargo test -p rewind-replay --lib — 16 tests pass including 3 new regression tests:
- fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix
- fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2
- fork_does_not_disturb_existing_step_counters_on_other_timelines
cargo test -p rewind-store -p rewind-web --lib — 108 tests pass, no regressions.
vitest run StepDetailPanel.test — 15 UI dispatch tests pass.
End-to-end through the dashboard on dev1 with a patched v0.14.9 binary on the PVC:
- Hooked window.fetch and clicked Run replay → captured POST body shows source_timeline_id=<edited-fork-uuid> and at_step=6 (not main, no UI bug).
- Resulting fork (replay-f0c54d86) renders steps 6 → 7, 8, 9, 10, 11 in order; final llm response is the agent's answer to the user's edited turn-2 question.
- Repair script applied to dev1 DB; replay-64e61b6c (the user's previously-broken fork) now displays in correct order.
Idempotence check: rerunning the repair script after apply reports "No broken forks found".

Version bumps (per CLAUDE.md)

Track 1 (Rust binary, since crates/rewind-replay changed and v0.14.8 has been released):

Cargo.toml: 0.14.8 → 0.14.9
python/rewind_cli.py CLI_VERSION: 0.14.8 → 0.14.9
python-mcp/pyproject.toml: 0.13.7 → 0.13.8
python-mcp/rewind_mcp_cli.py CLI_VERSION: 0.14.8 → 0.14.9

Track 2 (Python SDK, since python/rewind_cli.py changed and 0.15.2 is on PyPI):

python/pyproject.toml: 0.15.2 → 0.15.3
python/rewind_agent/__init__.py __version__: 0.15.2 → 0.15.3

Post-merge checklist

Tag v0.14.9 on GitHub Release (triggers binary build).
./scripts/publish-pypi.sh from python/.
./scripts/publish-mcp-pypi.sh from python-mcp/.

Made with Cursor

…r inherited prefix Bug A — engine.fork() created the new timeline row but didn't seed its step_counters entry. The runner's first replayed step landed at step_number=1, colliding with steps inherited from the parent. Combined with the owned-over-inherited dedup added in PR #162, this shadowed the original turn-1..N steps with the agent's NEW turn-(N+1) work and put the inherited turn-N user message at the END of the timeline — sorting the user's edited question AFTER the agent's response. Live repro on dev1 session ray-agent-a18ac577 (2026-05-03): clicking Run replay on edited-fork at step 6 produced owned steps numbered 1..5 + an inherited step 6, displayed in that order. Operators saw "nothing after the LLM step" because the agent's response was already above the question they'd edited. Fix: seed step_counters[fork_id] = fork_at_step right after create_timeline. Next runner-recorded step gets fork_at_step+1, chronological order is preserved. Verified end-to-end through the dashboard: - Pre-fix: replay-64e61b6c shape was [1,2,3,4,5,6_inherited] - Post-fix: replay-d4c0d5b9 / replay-f0c54d86 shape is [6_inherited, 7,8,9,10,11] — matches the agent's actual conversation flow. Existing broken forks repaired via scripts/repair-fork-step- numbering.py (21 forks, 50 steps renumbered on dev1). Script is idempotent and only touches `replay-*` forks (auto-generated runner replay forks); user-created forks like edited-fork or fork-at-N are left alone since their owned steps at step_number ≤ fork_at_step are intentional promote-and-mutate edits. Tests: - fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix - fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2 - fork_does_not_disturb_existing_step_counters_on_other_timelines Version bump (Track 1 + Track 2 per CLAUDE.md): - Cargo workspace: 0.14.8 -> 0.14.9 - python-mcp/pyproject: 0.13.7 -> 0.13.8 - CLI_VERSION (rewind_cli, rewind_mcp_cli): 0.14.8 -> 0.14.9 - python/rewind-agent SDK: 0.15.2 -> 0.15.3 Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-05-03T12:48:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
rewind	Ready	Preview, Comment	May 3, 2026 1:19pm

risjai

Code Review — PR #164

Overall: The core Rust fix is correct and well-tested. The one-liner sync_step_counter call is the right approach and the three regression tests give excellent coverage (main path, edge-case fork@1, isolation from parent counter). Version bumps follow CLAUDE.md rules. PR body and test plan are thorough.

Findings (by severity)

P1 — Repair script creates step-number collisions when runner recorded more steps than fork_at_step

The script only renumbers owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But in the broken scenario, the runner's counter started at 1 and kept incrementing — so a fork at step 5 where the runner recorded 8 steps has owned steps [1, 2, 3, 4, 5, 6, 7, 8]. The script renumbers 1→6, 2→7, 3→8, 4→9, 5→10 — but owned steps 6, 7, 8 already exist and don't get moved. Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate rows at step_numbers 6, 7, 8. get_full_timeline_steps uses HashMap::insert so one of each pair is arbitrarily dropped.

Fix: shift ALL owned steps (not just those ≤ fork_at_step) by fork_at_step, and process in descending step_number order to avoid transient collisions if a UNIQUE index is ever added. See inline comment.

P2 — Non-atomic create_timeline + sync_step_counter in fork()

If sync_step_counter fails after create_timeline succeeds, the fork exists without a seeded counter — exactly the broken state this PR fixes. Low practical risk (same SQLite conn, same disk), but worth wrapping in a store-level transaction for correctness.

Nit — The 18-line doc comment on fork() duplicates the commit message and test comments. Consider trimming to 3-4 lines ("Seeds step_counters so the first runner-recorded step continues at at_step + 1. See regression test for rationale.").

risjai · 2026-05-03T12:52:00Z

+        cur.execute("""
+            SELECT id, step_number FROM steps
+            WHERE timeline_id = ? AND step_number <= ?
+            ORDER BY step_number


P1 Bug — step-number collision when runner recorded more than fork_at_step steps.

This query only selects owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But the broken counter started at 1, so a runner that recorded M steps produces owned steps [1..M]. When M > fork_at_step, the renumbered early steps (e.g. 1 → fork_at+1) collide with existing un-shifted later steps (e.g. step at fork_at+1).

Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate step_numbers. get_full_timeline_steps's HashMap::insert then drops one arbitrarily.

Suggested fix — shift ALL owned steps, not just those ≤ fork_at:

cur.execute(""" SELECT id, step_number FROM steps WHERE timeline_id = ? ORDER BY step_number DESC -- descending avoids transient collisions """, (fork["id"],))

Then new_num = s["step_number"] + fork_at for every owned step. Processing in DESC order ensures the highest step_number is moved first, so no in-flight collision can occur if a UNIQUE index is ever added.

risjai · 2026-05-03T12:52:00Z

+    /// its parent. Without seeding, the first new step a runner
+    /// records on the fork would land at step_number=1, shadowing
+    /// the inherited prefix in `get_full_timeline_steps` (which
+    /// dedups owned-over-inherited at the same number) and sorting


P2 — Non-atomic seeding. If sync_step_counter fails after create_timeline succeeds (e.g. disk-full between the two writes), the fork exists without a seeded counter — the same broken state this PR fixes.

Low practical risk on a single SQLite connection, but consider wrapping both calls in a store-level transaction (or adding a create_timeline_and_seed_counter helper) so they're atomic.

P1 (repair script): the SQL renumber was selecting only owned steps with step_number <= fork_at_step, then shifting each by +fork_at_step. That was wrong when the runner overran the fork point — a fork@5 with M=8 recorded iterations has owned steps [1..8]; shifting only [1..5] leaves [6..8] in place, and the renumbered early steps land on top of them ([1->6, 2->7, 3->8] all collide). Since `steps` has no UNIQUE on (timeline_id, step_number), these UPDATEs would silently create duplicates and HashMap::insert in get_full_timeline_steps would arbitrarily drop one of each pair. Fix: shift EVERY owned step on a flagged fork (not just ≤ fork_at_step) and process in DESCENDING step_number order. Each UPDATE then targets a step_number that doesn't yet exist among un-shifted rows, even if a UNIQUE constraint is added later. Idempotent: a second run finds nothing to fix because all owned steps now sit at step_number > fork_at_step. P2 (atomicity): the previous fork() called create_timeline followed by sync_step_counter as two separate writes on the same connection. If the second one failed (e.g. disk-full between the two calls), the fork existed without a seeded counter — the exact broken state this PR fixes. Wrap both into a single transaction via a new `Store::create_timeline_with_seeded_counter` helper. Nit: trimmed the 18-line doc comment on `fork()` to 3 lines and moved the rationale into the regression test (which is where future maintainers will read it anyway). CI fix: `test_patch_promote_main_protection_follows_target` was implicitly relying on `.last()` returning an inherited step (the buggy fork numbering put inherited steps after owned ones). After this PR's fix the fork's `.last()` is the fork-OWNED step at step_number=fork_at_step+1, and that step isn't visible on main, so the promote-and-mutate visibility check 400s. Pin the test to explicitly select step_number=2 (the inherited one) so it tests main-protection regardless of the step-counter behavior. Also added a `main_edits_env_lock()` mutex shared by the two tests that mutate REWIND_ALLOW_MAIN_EDITS — previously they raced under parallel `cargo test`, surfaced once both tests do real PATCHes against main. 296 tests across rewind-web pass, 16 in rewind-replay pass, full workspace green. Co-authored-by: Cursor <cursoragent@cursor.com>

risjai · 2026-05-03T13:19:56Z

Pushed c66064e addressing all three review findings + the CI failure:

P1 (repair script collision) — scripts/repair-fork-step-numbering.py now selects every owned step on a flagged replay fork (not just step_number ≤ fork_at_step) and processes them in descending step_number order, so each UPDATE targets a slot that's still unoccupied among the un-shifted rows. Catches the fork@5 + 8 iterations case you flagged. Idempotent: a second run sees no broken forks.

P2 (atomic fork seed) — added Store::create_timeline_with_seeded_counter that wraps both writes in unchecked_transaction(). engine.fork() calls that single helper now, so the fork can never exist without a seeded counter even on a mid-write failure.

Nit — trimmed fork()'s doc to 3 lines pointing at the regression test.

CI fix (separate from the review) — test_patch_promote_main_protection_follows_target was implicitly relying on the buggy .last() returning an inherited step. After this PR's fix the fork's .last() is the fork-OWNED step (correctly), which isn't visible on main → 400 from the visibility check. Pinned the test to explicitly pick step_number=2 (inherited from main, visible on both timelines) so it tests main-protection regardless of step-counter behavior. Also added a main_edits_env_lock() mutex shared by the two tests that mutate REWIND_ALLOW_MAIN_EDITS — they were racing under parallel cargo test once the second test started doing a real successful PATCH against main.

Full workspace green: 296 tests across rewind-web, 16 in rewind-replay, no failures elsewhere.

risjai

Re-review: all findings addressed ✓

P1 (collision) → Repair script now shifts ALL owned steps (no step_number <= fork_at filter), DESC order avoids transient collisions. Docstring updated with rationale.

P2 (non-atomic) → New create_timeline_with_seeded_counter wraps both writes in a single unchecked_transaction. fork() calls the one atomic method. Clean separation — the store owns the transaction boundary.

Nit (doc comment) → Trimmed to 4 lines, points to regression test.

Bonus: Good catch on test_patch_promote_main_protection_follows_target — the .last() lookup was implicitly relying on broken counter behavior (returning the inherited step at #2 by accident). Switching to explicit step_number == 2 lookup makes the test correct under both old and new counter semantics. The OnceLock<Mutex<()>> env-var serialization is the right pattern for the parallel test race.

No new issues found. LGTM — ready to merge.

vercel Bot deployed to Preview May 3, 2026 12:48 View deployment

risjai commented May 3, 2026

View reviewed changes

vercel Bot deployed to Preview May 3, 2026 13:19 View deployment

risjai commented May 3, 2026

View reviewed changes

risjai merged commit 8446bfb into master May 3, 2026
7 checks passed

risjai deleted the fix/fork-step-numbering branch May 3, 2026 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(replay): seed step_counters on fork so replay steps continue after inherited prefix#164

fix(replay): seed step_counters on fork so replay steps continue after inherited prefix#164
risjai merged 2 commits into
masterfrom
fix/fork-step-numbering

risjai commented May 3, 2026

Uh oh!

vercel Bot commented May 3, 2026 •

edited

Loading

Uh oh!

risjai left a comment

Uh oh!

risjai May 3, 2026

Uh oh!

risjai May 3, 2026

Uh oh!

risjai commented May 3, 2026

Uh oh!

risjai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

risjai commented May 3, 2026

Summary

Test plan

Version bumps (per CLAUDE.md)

Post-merge checklist

Uh oh!

vercel Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

risjai left a comment

Choose a reason for hiding this comment

Code Review — PR #164

Findings (by severity)

Uh oh!

risjai May 3, 2026

Choose a reason for hiding this comment

Uh oh!

risjai May 3, 2026

Choose a reason for hiding this comment

Uh oh!

risjai commented May 3, 2026

Uh oh!

risjai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 3, 2026 •

edited

Loading