-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Backports stable/1.0] Ensure requests are always completed #7383
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rewrites the request timeout logic in NettyMessagingService by scheduling the request timeout to include the complete request lifecycle, including creating the connection, performing the handshake, sending the request, receiving the response, etc. Previously, it was scheduled only after performing the handshake, which means if anything took too long or went wrong before, the contract of the method was broken. The number of threads for the time out executor was also lowered from 4 to 1, as the executor shouldn't be doing anything but failing futures (so it's not expected to be doing much). (cherry picked from commit 5b6d286)
Simplifies local and remote client connections by dropping the callback construct for a simple map of message ID -> response futures. The futures are returned already set up to clean themselves. Additionally, when the connection is closed in flight, the exception returned now has slightly improved error message (though it's far from ideal still). (cherry picked from commit 8713f4c)
Ensure we report the channel close event by failing the future if the future has not yet been completed. Previously, if the future was not completed (e.g. the channel was closed mid-way through the handshake), we were not notified of the channel closing, and the future remained open forever. (cherry picked from commit 3c2772f)
Zelldon
approved these changes
Jun 28, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
The quiet and timeout periods when shutting down the service is now configurable. This means the messaging service test now takes 10 seconds instead of 40, as previously every service had to wait 2 seconds to shutdown at least, and we spawned a couple of them. (cherry picked from commit 2a10b0f)
npepinpe
force-pushed
the
backport-7362-to-stable/1.0
branch
from
June 28, 2021 07:35
ea5e611
to
b714573
Compare
Canceled. |
Some linting/formatting issues 😅 |
bors retry |
ghost
pushed a commit
that referenced
this pull request
Jun 28, 2021
7383: [Backports stable/1.0] Ensure requests are always completed r=Zelldon a=npepinpe ## Description This PR backports #7362. The only conflicts were with the PartitionCommandSenderImpl, where we still use the Atomix object directly in 1.0.x, but have replaced it with using the communication service directly in 1.1.x. ## Related issues relates to #7360 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Build failed: |
Removes usages of sendAndReceive where the timeout is null. This means adding back the unicast method to the ClusterCommunicationService (so cases where we don't care about timeout can use the unicast), and making sure previous callers now use the previously maximum timeout (5 seconds). We could consider making these time out configurable however, but in general when using a request-reply pattern, we should ALWAYS set an explicit timeout! (cherry picked from commit 1726194)
As the PartitionCommandSenderImpl doesn't await a reply, use a reliable (TCP) unicast, i.e. send the message and don't wait for the reply. This removes a usage of using a "null" timeout, but generally also makes more sense since we don't care about the reply here. (cherry picked from commit ac67e89)
npepinpe
force-pushed
the
backport-7362-to-stable/1.0
branch
from
June 28, 2021 08:13
b714573
to
11e1532
Compare
bors retry |
Build succeeded: |
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR backports #7362. The only conflicts were with the PartitionCommandSenderImpl, where we still use the Atomix object directly in 1.0.x, but have replaced it with using the communication service directly in 1.1.x.
Related issues
relates to #7360
Definition of Done
Not all items need to be done depending on the issue and the pull request.
Code changes:
backport stable/0.25
) to the PR, in case that fails you need to create backports manually.Testing:
Documentation: