Merge routed experts deltas with start offsets#37
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit af649b5. Configure here.
| data.extend_from_slice(&other.data); | ||
|
|
||
| Ok(Self { | ||
| start: self.start, |
There was a problem hiding this comment.
Merge ignores decode start offset when computing rows to skip
High Severity
merge_routed_experts_in_json always calls decode.suffix_rows(prompt.seq_len) regardless of decode.start. When the decode payload is a delta-only transfer (with start > 0), the decode data doesn't contain the prompt rows, yet the code unconditionally tries to strip prompt.seq_len rows. This will either error out (if prompt.seq_len > decode.seq_len) or incorrectly remove completion tokens. The rows-to-skip calculation needs to account for decode.start, e.g. (prompt.start + prompt.seq_len).saturating_sub(decode.start).
Reviewed by Cursor Bugbot for commit af649b5. Configure here.
ApprovabilityVerdict: Needs human review This PR has an unresolved high-severity bug: the merge logic doesn't account for the new You can customize Macroscope's approvability policy. Learn more. |


Summary:
Note
Add
startoffset field to routed experts delta encoding and decodingstart: usizefield toRoutedExpertsPayloadin routed_experts_merge.rs to track the row offset of a payload within a sequence.suffix_rowsnow incrementsstartby the number of removed rows;concat_rowspreservesstartfrom the left-hand operand.encode_routed_experts_payloadnow includesstartin the output JSON;decode_routed_experts_valuenow requires and parses a numericstartfield.startfield will now fail with a parse error.Macroscope summarized af649b5.
Note
Medium Risk
Wire-format change requires
starton all routed_experts blobs; older prefill/decode payloads without it will break merge until backends align.Overview
Adds a
startrow offset to compact routed-experts sidecars so prefill/decode deltas can be merged correctly in the vLLM P/D path.RoutedExpertsPayloadnow tracks where its rows sit in the full sequence:suffix_rowsbumpsstartwhen dropping overlapping prompt rows from decode, andconcat_rowskeeps the prefillstarton the merged result. The JSON sidecar becomes{ data, shape, start }—decode/encode requirestart, and merged responses emit it.Compatibility: payloads without
startwill fail decode until producers are updated.Reviewed by Cursor Bugbot for commit af649b5. Bugbot is set up for automated code reviews on this repo. Configure here.