Skip to content

fix(replay): seed step_counters on fork so replay steps continue after inherited prefix#164

Merged
risjai merged 2 commits into
masterfrom
fix/fork-step-numbering
May 3, 2026
Merged

fix(replay): seed step_counters on fork so replay steps continue after inherited prefix#164
risjai merged 2 commits into
masterfrom
fix/fork-step-numbering

Conversation

@risjai
Copy link
Copy Markdown
Collaborator

@risjai risjai commented May 3, 2026

Summary

  • Bug: engine.fork() created the timeline row but didn't seed step_counters, so a runner's first replayed step landed at step_number=1. With PR fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps #162's owned-over-inherited dedup that shadowed the original turn-1..N steps and pushed the inherited turn-N user message to the END of the timeline. Operators saw "nothing after the edited question" because the agent's reply was above the question chronologically.
  • Fix: seed step_counters[fork_id] = fork_at_step right after create_timeline. First runner-recorded step on the fork now gets fork_at_step + 1. Chronological order preserved.
  • Live repro on dev1 session ray-agent-a18ac577: clicking Run replay on edited-fork at step 6 used to produce owned steps [1, 2, 3, 4, 5] + inherited 6. Now produces inherited 6 + owned [7, 8, 9, 10, 11].
  • 50 steps across 21 broken replay-* forks repaired in place via scripts/repair-fork-step-numbering.py (idempotent; only touches replay-prefixed labels — leaves edited-fork / fork-at-N alone since their owned step at step_number == fork_at_step + 1 is intentional promote-and-mutate semantics).

Test plan

  • cargo test -p rewind-replay --lib — 16 tests pass including 3 new regression tests:
    • fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix
    • fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2
    • fork_does_not_disturb_existing_step_counters_on_other_timelines
  • cargo test -p rewind-store -p rewind-web --lib — 108 tests pass, no regressions.
  • vitest run StepDetailPanel.test — 15 UI dispatch tests pass.
  • End-to-end through the dashboard on dev1 with a patched v0.14.9 binary on the PVC:
    • Hooked window.fetch and clicked Run replay → captured POST body shows source_timeline_id=<edited-fork-uuid> and at_step=6 (not main, no UI bug).
    • Resulting fork (replay-f0c54d86) renders steps 6 → 7, 8, 9, 10, 11 in order; final llm response is the agent's answer to the user's edited turn-2 question.
    • Repair script applied to dev1 DB; replay-64e61b6c (the user's previously-broken fork) now displays in correct order.
  • Idempotence check: rerunning the repair script after apply reports "No broken forks found".

Version bumps (per CLAUDE.md)

Track 1 (Rust binary, since crates/rewind-replay changed and v0.14.8 has been released):

  • Cargo.toml: 0.14.8 → 0.14.9
  • python/rewind_cli.py CLI_VERSION: 0.14.8 → 0.14.9
  • python-mcp/pyproject.toml: 0.13.7 → 0.13.8
  • python-mcp/rewind_mcp_cli.py CLI_VERSION: 0.14.8 → 0.14.9

Track 2 (Python SDK, since python/rewind_cli.py changed and 0.15.2 is on PyPI):

  • python/pyproject.toml: 0.15.2 → 0.15.3
  • python/rewind_agent/__init__.py __version__: 0.15.2 → 0.15.3

Post-merge checklist

  • Tag v0.14.9 on GitHub Release (triggers binary build).
  • ./scripts/publish-pypi.sh from python/.
  • ./scripts/publish-mcp-pypi.sh from python-mcp/.

Made with Cursor

…r inherited prefix

Bug A — engine.fork() created the new timeline row but didn't
seed its step_counters entry. The runner's first replayed step
landed at step_number=1, colliding with steps inherited from the
parent. Combined with the owned-over-inherited dedup added in
PR #162, this shadowed the original turn-1..N steps with the
agent's NEW turn-(N+1) work and put the inherited turn-N user
message at the END of the timeline — sorting the user's edited
question AFTER the agent's response.

Live repro on dev1 session ray-agent-a18ac577 (2026-05-03):
clicking Run replay on edited-fork at step 6 produced owned
steps numbered 1..5 + an inherited step 6, displayed in that
order. Operators saw "nothing after the LLM step" because the
agent's response was already above the question they'd edited.

Fix: seed step_counters[fork_id] = fork_at_step right after
create_timeline. Next runner-recorded step gets fork_at_step+1,
chronological order is preserved.

Verified end-to-end through the dashboard:
- Pre-fix: replay-64e61b6c shape was [1,2,3,4,5,6_inherited]
- Post-fix: replay-d4c0d5b9 / replay-f0c54d86 shape is
  [6_inherited, 7,8,9,10,11] — matches the agent's actual
  conversation flow.

Existing broken forks repaired via scripts/repair-fork-step-
numbering.py (21 forks, 50 steps renumbered on dev1). Script
is idempotent and only touches `replay-*` forks (auto-generated
runner replay forks); user-created forks like edited-fork or
fork-at-N are left alone since their owned steps at step_number
≤ fork_at_step are intentional promote-and-mutate edits.

Tests:
- fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix
- fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2
- fork_does_not_disturb_existing_step_counters_on_other_timelines

Version bump (Track 1 + Track 2 per CLAUDE.md):
- Cargo workspace: 0.14.8 -> 0.14.9
- python-mcp/pyproject: 0.13.7 -> 0.13.8
- CLI_VERSION (rewind_cli, rewind_mcp_cli): 0.14.8 -> 0.14.9
- python/rewind-agent SDK: 0.15.2 -> 0.15.3

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rewind Ready Ready Preview, Comment May 3, 2026 1:19pm

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #164

Overall: The core Rust fix is correct and well-tested. The one-liner sync_step_counter call is the right approach and the three regression tests give excellent coverage (main path, edge-case fork@1, isolation from parent counter). Version bumps follow CLAUDE.md rules. PR body and test plan are thorough.

Findings (by severity)

P1 — Repair script creates step-number collisions when runner recorded more steps than fork_at_step

The script only renumbers owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But in the broken scenario, the runner's counter started at 1 and kept incrementing — so a fork at step 5 where the runner recorded 8 steps has owned steps [1, 2, 3, 4, 5, 6, 7, 8]. The script renumbers 1→6, 2→7, 3→8, 4→9, 5→10 — but owned steps 6, 7, 8 already exist and don't get moved. Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate rows at step_numbers 6, 7, 8. get_full_timeline_steps uses HashMap::insert so one of each pair is arbitrarily dropped.

Fix: shift ALL owned steps (not just those ≤ fork_at_step) by fork_at_step, and process in descending step_number order to avoid transient collisions if a UNIQUE index is ever added. See inline comment.

P2 — Non-atomic create_timeline + sync_step_counter in fork()

If sync_step_counter fails after create_timeline succeeds, the fork exists without a seeded counter — exactly the broken state this PR fixes. Low practical risk (same SQLite conn, same disk), but worth wrapping in a store-level transaction for correctness.

Nit — The 18-line doc comment on fork() duplicates the commit message and test comments. Consider trimming to 3-4 lines ("Seeds step_counters so the first runner-recorded step continues at at_step + 1. See regression test for rationale.").

Comment thread scripts/repair-fork-step-numbering.py Outdated
cur.execute("""
SELECT id, step_number FROM steps
WHERE timeline_id = ? AND step_number <= ?
ORDER BY step_number
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Bug — step-number collision when runner recorded more than fork_at_step steps.

This query only selects owned steps with step_number <= fork_at_step, then shifts them by +fork_at_step. But the broken counter started at 1, so a runner that recorded M steps produces owned steps [1..M]. When M > fork_at_step, the renumbered early steps (e.g. 1 → fork_at+1) collide with existing un-shifted later steps (e.g. step at fork_at+1).

Since steps has no UNIQUE constraint on (timeline_id, step_number), the UPDATEs succeed silently, creating duplicate step_numbers. get_full_timeline_steps's HashMap::insert then drops one arbitrarily.

Suggested fix — shift ALL owned steps, not just those ≤ fork_at:

cur.execute("""
    SELECT id, step_number FROM steps
    WHERE timeline_id = ?
    ORDER BY step_number DESC  -- descending avoids transient collisions
""", (fork["id"],))

Then new_num = s["step_number"] + fork_at for every owned step. Processing in DESC order ensures the highest step_number is moved first, so no in-flight collision can occur if a UNIQUE index is ever added.

Comment thread crates/rewind-replay/src/lib.rs Outdated
/// its parent. Without seeding, the first new step a runner
/// records on the fork would land at step_number=1, shadowing
/// the inherited prefix in `get_full_timeline_steps` (which
/// dedups owned-over-inherited at the same number) and sorting
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 — Non-atomic seeding. If sync_step_counter fails after create_timeline succeeds (e.g. disk-full between the two writes), the fork exists without a seeded counter — the same broken state this PR fixes.

Low practical risk on a single SQLite connection, but consider wrapping both calls in a store-level transaction (or adding a create_timeline_and_seed_counter helper) so they're atomic.

P1 (repair script): the SQL renumber was selecting only owned steps
with step_number <= fork_at_step, then shifting each by +fork_at_step.
That was wrong when the runner overran the fork point — a fork@5 with
M=8 recorded iterations has owned steps [1..8]; shifting only [1..5]
leaves [6..8] in place, and the renumbered early steps land on top of
them ([1->6, 2->7, 3->8] all collide). Since `steps` has no UNIQUE on
(timeline_id, step_number), these UPDATEs would silently create
duplicates and HashMap::insert in get_full_timeline_steps would
arbitrarily drop one of each pair.

Fix: shift EVERY owned step on a flagged fork (not just ≤ fork_at_step)
and process in DESCENDING step_number order. Each UPDATE then targets
a step_number that doesn't yet exist among un-shifted rows, even if a
UNIQUE constraint is added later. Idempotent: a second run finds
nothing to fix because all owned steps now sit at step_number > fork_at_step.

P2 (atomicity): the previous fork() called create_timeline followed by
sync_step_counter as two separate writes on the same connection. If
the second one failed (e.g. disk-full between the two calls), the
fork existed without a seeded counter — the exact broken state this
PR fixes. Wrap both into a single transaction via a new
`Store::create_timeline_with_seeded_counter` helper.

Nit: trimmed the 18-line doc comment on `fork()` to 3 lines and
moved the rationale into the regression test (which is where future
maintainers will read it anyway).

CI fix: `test_patch_promote_main_protection_follows_target` was
implicitly relying on `.last()` returning an inherited step (the
buggy fork numbering put inherited steps after owned ones). After
this PR's fix the fork's `.last()` is the fork-OWNED step at
step_number=fork_at_step+1, and that step isn't visible on main, so
the promote-and-mutate visibility check 400s. Pin the test to
explicitly select step_number=2 (the inherited one) so it tests
main-protection regardless of the step-counter behavior.

Also added a `main_edits_env_lock()` mutex shared by the two tests
that mutate REWIND_ALLOW_MAIN_EDITS — previously they raced under
parallel `cargo test`, surfaced once both tests do real PATCHes
against main.

296 tests across rewind-web pass, 16 in rewind-replay pass, full
workspace green.

Co-authored-by: Cursor <cursoragent@cursor.com>
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented May 3, 2026

Pushed c66064e addressing all three review findings + the CI failure:

P1 (repair script collision)scripts/repair-fork-step-numbering.py now selects every owned step on a flagged replay fork (not just step_number ≤ fork_at_step) and processes them in descending step_number order, so each UPDATE targets a slot that's still unoccupied among the un-shifted rows. Catches the fork@5 + 8 iterations case you flagged. Idempotent: a second run sees no broken forks.

P2 (atomic fork seed) — added Store::create_timeline_with_seeded_counter that wraps both writes in unchecked_transaction(). engine.fork() calls that single helper now, so the fork can never exist without a seeded counter even on a mid-write failure.

Nit — trimmed fork()'s doc to 3 lines pointing at the regression test.

CI fix (separate from the review)test_patch_promote_main_protection_follows_target was implicitly relying on the buggy .last() returning an inherited step. After this PR's fix the fork's .last() is the fork-OWNED step (correctly), which isn't visible on main → 400 from the visibility check. Pinned the test to explicitly pick step_number=2 (inherited from main, visible on both timelines) so it tests main-protection regardless of step-counter behavior. Also added a main_edits_env_lock() mutex shared by the two tests that mutate REWIND_ALLOW_MAIN_EDITS — they were racing under parallel cargo test once the second test started doing a real successful PATCH against main.

Full workspace green: 296 tests across rewind-web, 16 in rewind-replay, no failures elsewhere.

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: all findings addressed

P1 (collision) → Repair script now shifts ALL owned steps (no step_number <= fork_at filter), DESC order avoids transient collisions. Docstring updated with rationale.

P2 (non-atomic) → New create_timeline_with_seeded_counter wraps both writes in a single unchecked_transaction. fork() calls the one atomic method. Clean separation — the store owns the transaction boundary.

Nit (doc comment) → Trimmed to 4 lines, points to regression test.

Bonus: Good catch on test_patch_promote_main_protection_follows_target — the .last() lookup was implicitly relying on broken counter behavior (returning the inherited step at #2 by accident). Switching to explicit step_number == 2 lookup makes the test correct under both old and new counter semantics. The OnceLock<Mutex<()>> env-var serialization is the right pattern for the parallel test race.

No new issues found. LGTM — ready to merge.

@risjai risjai merged commit 8446bfb into master May 3, 2026
7 checks passed
@risjai risjai deleted the fix/fork-step-numbering branch May 3, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant