Skip to content

Merge routed experts deltas with start offsets#37

Merged
S1ro1 merged 1 commit into
mainfrom
feat/routed-experts-delta-replay
May 27, 2026
Merged

Merge routed experts deltas with start offsets#37
S1ro1 merged 1 commit into
mainfrom
feat/routed-experts-delta-replay

Conversation

@S1ro1
Copy link
Copy Markdown

@S1ro1 S1ro1 commented May 25, 2026

Summary:

  • Carry routed-experts start through the compact sidecar payload.
  • Decode and encode {data, shape, start} in the P/D router path.
  • Merge prefill/decode routed experts offset-aware while preserving delta-only transfer.

Note

Add start offset field to routed experts delta encoding and decoding

  • Adds a start: usize field to RoutedExpertsPayload in routed_experts_merge.rs to track the row offset of a payload within a sequence.
  • suffix_rows now increments start by the number of removed rows; concat_rows preserves start from the left-hand operand.
  • encode_routed_experts_payload now includes start in the output JSON; decode_routed_experts_value now requires and parses a numeric start field.
  • Risk: decoding existing routed experts JSON without a start field will now fail with a parse error.

Macroscope summarized af649b5.


Note

Medium Risk
Wire-format change requires start on all routed_experts blobs; older prefill/decode payloads without it will break merge until backends align.

Overview
Adds a start row offset to compact routed-experts sidecars so prefill/decode deltas can be merged correctly in the vLLM P/D path.

RoutedExpertsPayload now tracks where its rows sit in the full sequence: suffix_rows bumps start when dropping overlapping prompt rows from decode, and concat_rows keeps the prefill start on the merged result. The JSON sidecar becomes { data, shape, start }—decode/encode require start, and merged responses emit it.

Compatibility: payloads without start will fail decode until producers are updated.

Reviewed by Cursor Bugbot for commit af649b5. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 marked this pull request as ready for review May 27, 2026 23:22
@S1ro1 S1ro1 merged commit 4249247 into main May 27, 2026
8 of 9 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit af649b5. Configure here.

data.extend_from_slice(&other.data);

Ok(Self {
start: self.start,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge ignores decode start offset when computing rows to skip

High Severity

merge_routed_experts_in_json always calls decode.suffix_rows(prompt.seq_len) regardless of decode.start. When the decode payload is a delta-only transfer (with start > 0), the decode data doesn't contain the prompt rows, yet the code unconditionally tries to strip prompt.seq_len rows. This will either error out (if prompt.seq_len > decode.seq_len) or incorrectly remove completion tokens. The rows-to-skip calculation needs to account for decode.start, e.g. (prompt.start + prompt.seq_len).saturating_sub(decode.start).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit af649b5. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Needs human review

This PR has an unresolved high-severity bug: the merge logic doesn't account for the new start offset when computing rows to skip, which would cause errors or incorrect token removal for delta transfers. Human review is needed to address this implementation issue.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants