Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multistage] fix mailbox cancel race condition #10432

Merged
merged 1 commit into from Mar 16, 2023

Conversation

walterddr
Copy link
Contributor

@walterddr walterddr commented Mar 16, 2023

see #10424 for more details
this is another attempt to address the problem discovered in #10417

  • comparing to approach [multistage] Handle Stream Cancellations for Unstarted Streams #10425 this one doesn't do a fix delay instead will try to leverage the onNext sequential nature to guarantee delivery of at least 1 message, thus not causing the un-started stream issue.
  • this approach could block when onNext() is slow and thus potentially slow down the closing of an exception channel.
  • This PR decided to handle any non-GRPC-related failures by sending normal onNext error blocks before shutting down This way:
    • we treat opChain failures as a recoverable failure and pass-through the system;
    • any GRPC-related issues as non-recoverable failures and are not addressed in this PR

@@ -72,6 +75,7 @@ public void send(TransferableBlock block)
@Override
public void complete()
throws Exception {
// TODO: should wait for _mailboxContentStreamObserver.onNext() finish before calling onComplete().
Copy link
Contributor Author

@walterddr walterddr Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently it is guaranteed that onComplete is called after all onNext() --> implicitly via the OpChain scheduler. however we should still fix this systematically

@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2023

Codecov Report

Merging #10432 (265dc3b) into master (11b6bcd) will decrease coverage by 28.01%.
The diff coverage is 0.00%.

@@              Coverage Diff              @@
##             master   #10432       +/-   ##
=============================================
- Coverage     63.23%   35.22%   -28.01%     
+ Complexity     5068      282     -4786     
=============================================
  Files          2028     2053       +25     
  Lines        110632   111405      +773     
  Branches      16847    16940       +93     
=============================================
- Hits          69955    39240    -30715     
- Misses        35505    68753    +33248     
+ Partials       5172     3412     -1760     
Flag Coverage Δ
integration1 24.51% <0.00%> (+0.12%) ⬆️
integration2 24.30% <0.00%> (-0.01%) ⬇️
unittests1 ?
unittests2 13.83% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/pinot/query/mailbox/GrpcSendingMailbox.java 0.00% <0.00%> (-81.58%) ⬇️

... and 1434 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangfu0 xiangfu0 merged commit ce299cf into apache:master Mar 16, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants