fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps#162
Conversation
When a fork OWNS a step at the same step_number as an inherited
(parent) step — e.g. after a promote-and-mutate PATCH /steps/{id}/edit
with target_timeline_id=fork — the union view returned BOTH rows.
The dashboard's step picker then showed two #N entries on the fork
and the inherited row visually masked the user's edit.
Caught live in dev1 with session ray-agent-30053072 on 2026-04-29:
the user edited an inherited step's request_body (temperature 0 -> 0.7)
and the API stored it correctly, but the picker kept showing the
inherited row (temperature 0). The owned row with the edit existed
at a different step_id but never surfaced in the UI.
Fix the union view to dedupe by step_number, owned overrides inherited.
HashMap-based: insert parents first, then own steps — `HashMap::insert`
returning the previous value gives us "owned wins" for free, then sort
by step_number and return.
Two pre-existing tests had to be adjusted because they exercised a
scenario the dashboard would never produce after this fix:
- test_patch_promote_idempotent_re_edit: now PATCHes the resolved
owned id directly for the second edit (matches dashboard flow:
refetch list, click pencil on the now-visible owned row).
- test_cascade_count_target_aware: no longer seeds owned steps that
collide with inherited step_numbers; comment spells out why the
collision case isn't reachable from the UI after dedup.
New tests:
- rewind-replay::tests::get_full_timeline_steps_dedupes_owned_over_inherited
pins the unit invariant.
- rewind-web::recording_api_tests::test_steps_endpoint_dedupes_owned_over_inherited_after_promote
pins the API surface (the actual screenshot-bug repro from dev1).
Counts: 68/68 web integration tests, 14/14 rewind-replay unit tests.
Track 1 version bump 0.14.6 -> 0.14.7 per CLAUDE.md.
Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
risjai
left a comment
There was a problem hiding this comment.
Review: fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps
Clean fix for the exact bug reported in ray-agent-30053072. The root cause is correctly identified: after promote-and-mutate, the union view returned both the inherited and owned rows at the same step_number, so the dashboard showed stale data.
Findings
| # | Severity | Finding |
|---|---|---|
| 1 | Info | Test adjustment rationale is sound |
| 2 | Info | Dedup correctly cascades to visibility checks |
| 3 | Suggestion | HashMap could use with_capacity |
I1 — Test adjustments are correct and well-documented
Both adjusted tests have inline comments explaining why the old scenario is unreachable after dedup:
test_patch_promote_idempotent_re_edit— second edit now targets the resolved owned id directly (matches dashboard flow: refetch → click pencil on owned row)test_cascade_count_target_aware— removes owned-step seeding at colliding step_numbers
The before/after logic is sound: once owned wins over inherited at the same step_number, the old "PATCH inherited id twice" test was testing a flow the dashboard can't produce.
I2 — Dedup cascades correctly to both visibility checks
The patch_step handler's visibility check (s.id == step.id) and the cascade_count handler's visibility check both call get_full_timeline_steps. After dedup:
- First promote of an inherited step: inherited id IS visible (no owned twin yet) → 200
- Second edit after promote: inherited id is hidden (owned twin exists) → would 400 if you pass the inherited id → correct, because the dashboard now shows the owned step's id
- The owned step's id is always visible → in-place PATCH without target_timeline_id works
This is the right behavior: the dedup makes the visibility check stricter in a way that matches the UI's display.
S3 — HashMap capacity hint
Minor optimization: since we know the upper bound is fork_at + own_steps.len(), HashMap::with_capacity would avoid rehash for large sessions:
let mut by_step_number: HashMap<u32, Step> = HashMap::with_capacity(fork_at as usize + own_steps.len());Not blocking.
What I verified
- The HashMap approach is correct: insert parents first → insert own →
HashMap::insertoverwrites parent at same key → owned wins - Sort after dedup preserves step_number ordering
- The
filter(|s| s.step_number <= fork_at)still correctly scopes inherited steps to the fork boundary - Own steps are inserted WITHOUT the fork_at filter, so owned steps at any step_number (including below fork_at) correctly override inherited ones — this is the promote case
- New unit test and integration test both pin the invariant from the right angle (unit tests dedup directly, integration test goes through the API endpoint)
Verdict
Ship it. This is the missing piece from the ray-agent-30053072 repro — the edit was landing correctly on the server but the view was masking it with the stale inherited row. Version bump is correct (0.14.6 released → 0.14.7).
Apply the only actionable item from PR #162's review: pre-size the HashMap to fork_at + own_steps.len() so long sessions don't rehash during the union-view dedup. Optional micro-optimization, no behavior change. Made-with: Cursor
|
Folded the only actionable bit from your review (S3, HashMap with_capacity) in [ CI was already green except for build (macos-latest) which was still running when I left it; the new commit will re-trigger. Will merge once green and then drive the dev1 swap + Run-replay end-to-end against ray-agent-30053072 to close the original bug repro. |
…r inherited prefix (#164) * fix(replay): seed step_counters on fork so replay steps continue after inherited prefix Bug A — engine.fork() created the new timeline row but didn't seed its step_counters entry. The runner's first replayed step landed at step_number=1, colliding with steps inherited from the parent. Combined with the owned-over-inherited dedup added in PR #162, this shadowed the original turn-1..N steps with the agent's NEW turn-(N+1) work and put the inherited turn-N user message at the END of the timeline — sorting the user's edited question AFTER the agent's response. Live repro on dev1 session ray-agent-a18ac577 (2026-05-03): clicking Run replay on edited-fork at step 6 produced owned steps numbered 1..5 + an inherited step 6, displayed in that order. Operators saw "nothing after the LLM step" because the agent's response was already above the question they'd edited. Fix: seed step_counters[fork_id] = fork_at_step right after create_timeline. Next runner-recorded step gets fork_at_step+1, chronological order is preserved. Verified end-to-end through the dashboard: - Pre-fix: replay-64e61b6c shape was [1,2,3,4,5,6_inherited] - Post-fix: replay-d4c0d5b9 / replay-f0c54d86 shape is [6_inherited, 7,8,9,10,11] — matches the agent's actual conversation flow. Existing broken forks repaired via scripts/repair-fork-step- numbering.py (21 forks, 50 steps renumbered on dev1). Script is idempotent and only touches `replay-*` forks (auto-generated runner replay forks); user-created forks like edited-fork or fork-at-N are left alone since their owned steps at step_number ≤ fork_at_step are intentional promote-and-mutate edits. Tests: - fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix - fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2 - fork_does_not_disturb_existing_step_counters_on_other_timelines Version bump (Track 1 + Track 2 per CLAUDE.md): - Cargo workspace: 0.14.8 -> 0.14.9 - python-mcp/pyproject: 0.13.7 -> 0.13.8 - CLI_VERSION (rewind_cli, rewind_mcp_cli): 0.14.8 -> 0.14.9 - python/rewind-agent SDK: 0.15.2 -> 0.15.3 Co-authored-by: Cursor <cursoragent@cursor.com> * review #164: fix repair-script collision, atomic fork seeding, fix CI P1 (repair script): the SQL renumber was selecting only owned steps with step_number <= fork_at_step, then shifting each by +fork_at_step. That was wrong when the runner overran the fork point — a fork@5 with M=8 recorded iterations has owned steps [1..8]; shifting only [1..5] leaves [6..8] in place, and the renumbered early steps land on top of them ([1->6, 2->7, 3->8] all collide). Since `steps` has no UNIQUE on (timeline_id, step_number), these UPDATEs would silently create duplicates and HashMap::insert in get_full_timeline_steps would arbitrarily drop one of each pair. Fix: shift EVERY owned step on a flagged fork (not just ≤ fork_at_step) and process in DESCENDING step_number order. Each UPDATE then targets a step_number that doesn't yet exist among un-shifted rows, even if a UNIQUE constraint is added later. Idempotent: a second run finds nothing to fix because all owned steps now sit at step_number > fork_at_step. P2 (atomicity): the previous fork() called create_timeline followed by sync_step_counter as two separate writes on the same connection. If the second one failed (e.g. disk-full between the two calls), the fork existed without a seeded counter — the exact broken state this PR fixes. Wrap both into a single transaction via a new `Store::create_timeline_with_seeded_counter` helper. Nit: trimmed the 18-line doc comment on `fork()` to 3 lines and moved the rationale into the regression test (which is where future maintainers will read it anyway). CI fix: `test_patch_promote_main_protection_follows_target` was implicitly relying on `.last()` returning an inherited step (the buggy fork numbering put inherited steps after owned ones). After this PR's fix the fork's `.last()` is the fork-OWNED step at step_number=fork_at_step+1, and that step isn't visible on main, so the promote-and-mutate visibility check 400s. Pin the test to explicitly select step_number=2 (the inherited one) so it tests main-protection regardless of the step-counter behavior. Also added a `main_edits_env_lock()` mutex shared by the two tests that mutate REWIND_ALLOW_MAIN_EDITS — previously they raced under parallel `cargo test`, surfaced once both tests do real PATCHes against main. 296 tests across rewind-web pass, 16 in rewind-replay pass, full workspace green. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
After v0.14.6's promote-and-mutate landed, the dashboard's step picker showed two rows at the same step_number for forks where an inherited step had been promoted — and the inherited row visually masked the user's edit.
Caught live in dev1 with session
ray-agent-30053072on 2026-04-29:temperature: 0 -> 0.7)temperature: 0because the inherited row was still in the union view returned byengine.get_full_timeline_stepsFix
get_full_timeline_stepsnow de-duplicates bystep_number, with owned overriding inherited. HashMap-based: insert parents first, then own steps;HashMap::insertreturning the previous value gives "owned wins" for free.After the fix, the fork's view returns one row at #N — the owned one if it exists, else the inherited one.
Pre-existing test adjustments
Two tests needed updates because they exercised a scenario the dashboard would never produce after dedup:
test_patch_promote_idempotent_re_edit— the second edit now PATCHes the resolved owned id directly (matches the dashboard flow: refetch the list, click pencil on the now-visible owned row). Previously the test PATCHed the inherited id twice withtarget_timeline_id, which after dedup correctly fails the visibility check (the inherited row is no longer visible by id once the owned twin exists).test_cascade_count_target_aware— no longer seeds owned steps with step_numbers that collide with inherited ones. The comment spells out why this scenario isn't reachable from the UI after dedup.New tests
crates/rewind-replay/src/lib.rs—get_full_timeline_steps_dedupes_owned_over_inheritedpins the unit invariant.crates/rewind-web/tests/recording_api_tests.rs—test_steps_endpoint_dedupes_owned_over_inherited_after_promotepins the API surface (mirrors the dev1 screenshot-bug repro).Test plan
cargo test -p rewind-replay --lib→ 14/14 pass (was 13; +1 dedup test)cargo test -p rewind-web --test recording_api_tests→ 68/68 pass (was 68 minus 2 broken + 1 new + 2 adjusted)cargo build -p rewind-web→ cleanOut of scope (existing follow-ups)
get_full_timeline_steps(only walks one parent level → grandchild forks lose grandparent steps). This dedup applies once-per-level; the recursive walk would extend it.StoreEvent::StepUpdatedso other tabs refresh on edit.Made with Cursor