[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675
[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675s0nskar wants to merge 3 commits into
Conversation
|
Using Since If we do need robust backward compatibility, a better approach would be to add an explicit capability field (e.g., — Reviewed by Claude Code |
There was a problem hiding this comment.
Pull request overview
Updates the client-side quota interrupt handling in ApplicationHeartbeater to be backward compatible with older masters that don’t populate checkQuotaResponse in HeartbeatFromApplicationResponse, preventing accidental job cancellation due to protobuf default values.
Changes:
- Gate shuffle interruption on
CheckQuotaResponse.reason.nonEmptyin addition to!isAvailable. - Avoid treating protobuf default
isAvailable=false, reason=""(from missing message field) as a real quota-exceeded signal.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@s0nskar, please rebase the latest main branch for Grafana Dashboard CI. |
|
@s0nskar, could you please help to update? |
1d92a40 to
cf8d472
Compare
|
Ack, will work on this. |
|
@s0nskar, any update? |
|
@SteNicholas / @RexXiong PTAL. Instead of adding a new field, i have made changes in the parsing of HeartbeatFromApplicationResponse. |
|
Thanks. Merged to main(v0.7.0). |
What changes were proposed in this pull request?
Fix backward compatibility issue with interrupt shuffle by checking if the reason is nonEmpty.
Why are the changes needed?
If someone uses a new client with old server which is not sending
CheckQuotaResponseinHeartbeatFromApplicationResponsethen proto uses default value to build CheckQuotaResponse with isAvailable=false and reason="". In this case the job will always fails without breaching the quota, we should not fail the job if the reason is empty to make it backward compatible.Does this PR resolve a correctness bug?
No
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested in local setup.