Skip to content

fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps#162

Merged
risjai merged 2 commits into
masterfrom
fix/dedup-inherited-vs-owned-steps
Apr 29, 2026
Merged

fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps#162
risjai merged 2 commits into
masterfrom
fix/dedup-inherited-vs-owned-steps

Conversation

@risjai
Copy link
Copy Markdown
Collaborator

@risjai risjai commented Apr 29, 2026

Summary

After v0.14.6's promote-and-mutate landed, the dashboard's step picker showed two rows at the same step_number for forks where an inherited step had been promoted — and the inherited row visually masked the user's edit.

Caught live in dev1 with session ray-agent-30053072 on 2026-04-29:

  • User edited inherited step Add MCP server for AI assistant integration #3's request_body (temperature: 0 -> 0.7)
  • API correctly stored a new owned step on the fork (different step_id, edited content)
  • Dashboard kept showing temperature: 0 because the inherited row was still in the union view returned by engine.get_full_timeline_steps

Fix

get_full_timeline_steps now de-duplicates by step_number, with owned overriding inherited. HashMap-based: insert parents first, then own steps; HashMap::insert returning the previous value gives "owned wins" for free.

let mut by_step_number: HashMap<u32, Step> = parent_steps.into_iter()
    .filter(|s| s.step_number <= fork_at)
    .map(|s| (s.step_number, s))
    .collect();
for s in own_steps {
    by_step_number.insert(s.step_number, s);
}

After the fix, the fork's view returns one row at #N — the owned one if it exists, else the inherited one.

Pre-existing test adjustments

Two tests needed updates because they exercised a scenario the dashboard would never produce after dedup:

  • test_patch_promote_idempotent_re_edit — the second edit now PATCHes the resolved owned id directly (matches the dashboard flow: refetch the list, click pencil on the now-visible owned row). Previously the test PATCHed the inherited id twice with target_timeline_id, which after dedup correctly fails the visibility check (the inherited row is no longer visible by id once the owned twin exists).
  • test_cascade_count_target_aware — no longer seeds owned steps with step_numbers that collide with inherited ones. The comment spells out why this scenario isn't reachable from the UI after dedup.

New tests

Test plan

  • cargo test -p rewind-replay --lib → 14/14 pass (was 13; +1 dedup test)
  • cargo test -p rewind-web --test recording_api_tests → 68/68 pass (was 68 minus 2 broken + 1 new + 2 adjusted)
  • cargo build -p rewind-web → clean
  • Track 1 version bump 0.14.6 → 0.14.7 per CLAUDE.md (workspace + 2x CLI_VERSION); rewind-mcp pyproject 0.13.5 → 0.13.6

Out of scope (existing follow-ups)

  • Recursive get_full_timeline_steps (only walks one parent level → grandchild forks lose grandparent steps). This dedup applies once-per-level; the recursive walk would extend it.
  • WS forwarding for StoreEvent::StepUpdated so other tabs refresh on edit.

Made with Cursor

When a fork OWNS a step at the same step_number as an inherited
(parent) step — e.g. after a promote-and-mutate PATCH /steps/{id}/edit
with target_timeline_id=fork — the union view returned BOTH rows.
The dashboard's step picker then showed two #N entries on the fork
and the inherited row visually masked the user's edit.

Caught live in dev1 with session ray-agent-30053072 on 2026-04-29:
the user edited an inherited step's request_body (temperature 0 -> 0.7)
and the API stored it correctly, but the picker kept showing the
inherited row (temperature 0). The owned row with the edit existed
at a different step_id but never surfaced in the UI.

Fix the union view to dedupe by step_number, owned overrides inherited.
HashMap-based: insert parents first, then own steps — `HashMap::insert`
returning the previous value gives us "owned wins" for free, then sort
by step_number and return.

Two pre-existing tests had to be adjusted because they exercised a
scenario the dashboard would never produce after this fix:
- test_patch_promote_idempotent_re_edit: now PATCHes the resolved
  owned id directly for the second edit (matches dashboard flow:
  refetch list, click pencil on the now-visible owned row).
- test_cascade_count_target_aware: no longer seeds owned steps that
  collide with inherited step_numbers; comment spells out why the
  collision case isn't reachable from the UI after dedup.

New tests:
- rewind-replay::tests::get_full_timeline_steps_dedupes_owned_over_inherited
  pins the unit invariant.
- rewind-web::recording_api_tests::test_steps_endpoint_dedupes_owned_over_inherited_after_promote
  pins the API surface (the actual screenshot-bug repro from dev1).

Counts: 68/68 web integration tests, 14/14 rewind-replay unit tests.
Track 1 version bump 0.14.6 -> 0.14.7 per CLAUDE.md.

Made-with: Cursor
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
rewind Ready Ready Preview, Comment Apr 29, 2026 10:06am

Copy link
Copy Markdown
Collaborator Author

@risjai risjai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(replay): dedupe inherited vs owned steps in get_full_timeline_steps

Clean fix for the exact bug reported in ray-agent-30053072. The root cause is correctly identified: after promote-and-mutate, the union view returned both the inherited and owned rows at the same step_number, so the dashboard showed stale data.

Findings

# Severity Finding
1 Info Test adjustment rationale is sound
2 Info Dedup correctly cascades to visibility checks
3 Suggestion HashMap could use with_capacity

I1 — Test adjustments are correct and well-documented

Both adjusted tests have inline comments explaining why the old scenario is unreachable after dedup:

  • test_patch_promote_idempotent_re_edit — second edit now targets the resolved owned id directly (matches dashboard flow: refetch → click pencil on owned row)
  • test_cascade_count_target_aware — removes owned-step seeding at colliding step_numbers

The before/after logic is sound: once owned wins over inherited at the same step_number, the old "PATCH inherited id twice" test was testing a flow the dashboard can't produce.

I2 — Dedup cascades correctly to both visibility checks

The patch_step handler's visibility check (s.id == step.id) and the cascade_count handler's visibility check both call get_full_timeline_steps. After dedup:

  • First promote of an inherited step: inherited id IS visible (no owned twin yet) → 200
  • Second edit after promote: inherited id is hidden (owned twin exists) → would 400 if you pass the inherited id → correct, because the dashboard now shows the owned step's id
  • The owned step's id is always visible → in-place PATCH without target_timeline_id works

This is the right behavior: the dedup makes the visibility check stricter in a way that matches the UI's display.

S3 — HashMap capacity hint

Minor optimization: since we know the upper bound is fork_at + own_steps.len(), HashMap::with_capacity would avoid rehash for large sessions:

let mut by_step_number: HashMap<u32, Step> = HashMap::with_capacity(fork_at as usize + own_steps.len());

Not blocking.

What I verified

  • The HashMap approach is correct: insert parents first → insert own → HashMap::insert overwrites parent at same key → owned wins
  • Sort after dedup preserves step_number ordering
  • The filter(|s| s.step_number <= fork_at) still correctly scopes inherited steps to the fork boundary
  • Own steps are inserted WITHOUT the fork_at filter, so owned steps at any step_number (including below fork_at) correctly override inherited ones — this is the promote case
  • New unit test and integration test both pin the invariant from the right angle (unit tests dedup directly, integration test goes through the API endpoint)

Verdict

Ship it. This is the missing piece from the ray-agent-30053072 repro — the edit was landing correctly on the server but the view was masking it with the stale inherited row. Version bump is correct (0.14.6 released → 0.14.7).

Apply the only actionable item from PR #162's review: pre-size the
HashMap to fork_at + own_steps.len() so long sessions don't rehash
during the union-view dedup. Optional micro-optimization, no
behavior change.

Made-with: Cursor
@risjai
Copy link
Copy Markdown
Collaborator Author

risjai commented Apr 29, 2026

Folded the only actionable bit from your review (S3, HashMap with_capacity) in [b8997ed-ish commit\u2014see latest commit on the branch`]. I1 and I2 were info-only confirmations.

CI was already green except for build (macos-latest) which was still running when I left it; the new commit will re-trigger. Will merge once green and then drive the dev1 swap + Run-replay end-to-end against ray-agent-30053072 to close the original bug repro.

@risjai risjai merged commit 1c00f1e into master Apr 29, 2026
7 checks passed
@risjai risjai deleted the fix/dedup-inherited-vs-owned-steps branch April 29, 2026 10:11
risjai added a commit that referenced this pull request May 3, 2026
…r inherited prefix (#164)

* fix(replay): seed step_counters on fork so replay steps continue after inherited prefix

Bug A — engine.fork() created the new timeline row but didn't
seed its step_counters entry. The runner's first replayed step
landed at step_number=1, colliding with steps inherited from the
parent. Combined with the owned-over-inherited dedup added in
PR #162, this shadowed the original turn-1..N steps with the
agent's NEW turn-(N+1) work and put the inherited turn-N user
message at the END of the timeline — sorting the user's edited
question AFTER the agent's response.

Live repro on dev1 session ray-agent-a18ac577 (2026-05-03):
clicking Run replay on edited-fork at step 6 produced owned
steps numbered 1..5 + an inherited step 6, displayed in that
order. Operators saw "nothing after the LLM step" because the
agent's response was already above the question they'd edited.

Fix: seed step_counters[fork_id] = fork_at_step right after
create_timeline. Next runner-recorded step gets fork_at_step+1,
chronological order is preserved.

Verified end-to-end through the dashboard:
- Pre-fix: replay-64e61b6c shape was [1,2,3,4,5,6_inherited]
- Post-fix: replay-d4c0d5b9 / replay-f0c54d86 shape is
  [6_inherited, 7,8,9,10,11] — matches the agent's actual
  conversation flow.

Existing broken forks repaired via scripts/repair-fork-step-
numbering.py (21 forks, 50 steps renumbered on dev1). Script
is idempotent and only touches `replay-*` forks (auto-generated
runner replay forks); user-created forks like edited-fork or
fork-at-N are left alone since their owned steps at step_number
≤ fork_at_step are intentional promote-and-mutate edits.

Tests:
- fork_seeds_step_counter_so_replay_steps_continue_after_inherited_prefix
- fork_at_step_1_seeds_counter_to_1_so_first_replay_step_is_2
- fork_does_not_disturb_existing_step_counters_on_other_timelines

Version bump (Track 1 + Track 2 per CLAUDE.md):
- Cargo workspace: 0.14.8 -> 0.14.9
- python-mcp/pyproject: 0.13.7 -> 0.13.8
- CLI_VERSION (rewind_cli, rewind_mcp_cli): 0.14.8 -> 0.14.9
- python/rewind-agent SDK: 0.15.2 -> 0.15.3

Co-authored-by: Cursor <cursoragent@cursor.com>

* review #164: fix repair-script collision, atomic fork seeding, fix CI

P1 (repair script): the SQL renumber was selecting only owned steps
with step_number <= fork_at_step, then shifting each by +fork_at_step.
That was wrong when the runner overran the fork point — a fork@5 with
M=8 recorded iterations has owned steps [1..8]; shifting only [1..5]
leaves [6..8] in place, and the renumbered early steps land on top of
them ([1->6, 2->7, 3->8] all collide). Since `steps` has no UNIQUE on
(timeline_id, step_number), these UPDATEs would silently create
duplicates and HashMap::insert in get_full_timeline_steps would
arbitrarily drop one of each pair.

Fix: shift EVERY owned step on a flagged fork (not just ≤ fork_at_step)
and process in DESCENDING step_number order. Each UPDATE then targets
a step_number that doesn't yet exist among un-shifted rows, even if a
UNIQUE constraint is added later. Idempotent: a second run finds
nothing to fix because all owned steps now sit at step_number > fork_at_step.

P2 (atomicity): the previous fork() called create_timeline followed by
sync_step_counter as two separate writes on the same connection. If
the second one failed (e.g. disk-full between the two calls), the
fork existed without a seeded counter — the exact broken state this
PR fixes. Wrap both into a single transaction via a new
`Store::create_timeline_with_seeded_counter` helper.

Nit: trimmed the 18-line doc comment on `fork()` to 3 lines and
moved the rationale into the regression test (which is where future
maintainers will read it anyway).

CI fix: `test_patch_promote_main_protection_follows_target` was
implicitly relying on `.last()` returning an inherited step (the
buggy fork numbering put inherited steps after owned ones). After
this PR's fix the fork's `.last()` is the fork-OWNED step at
step_number=fork_at_step+1, and that step isn't visible on main, so
the promote-and-mutate visibility check 400s. Pin the test to
explicitly select step_number=2 (the inherited one) so it tests
main-protection regardless of the step-counter behavior.

Also added a `main_edits_env_lock()` mutex shared by the two tests
that mutate REWIND_ALLOW_MAIN_EDITS — previously they raced under
parallel `cargo test`, surfaced once both tests do real PATCHes
against main.

296 tests across rewind-web pass, 16 in rewind-replay pass, full
workspace green.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant