Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backports stable/1.0] Ensure requests are always completed #7383

Merged
6 commits merged into from
Jun 28, 2021

Conversation

npepinpe
Copy link
Member

@npepinpe npepinpe commented Jun 28, 2021

Description

This PR backports #7362. The only conflicts were with the PartitionCommandSenderImpl, where we still use the Atomix object directly in 1.0.x, but have replaced it with using the communication service directly in 1.1.x.

Related issues

relates to #7360

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement

Rewrites the request timeout logic in NettyMessagingService by
scheduling the request timeout to include the complete request
lifecycle, including creating the connection, performing the handshake,
sending the request, receiving the response, etc. Previously, it was
scheduled only after performing the handshake, which means if anything
took too long or went wrong before, the contract of the method was
broken.

The number of threads for the time out executor was also lowered from 4
to 1, as the executor shouldn't be doing anything but failing futures
(so it's not expected to be doing much).

(cherry picked from commit 5b6d286)
Simplifies local and remote client connections by dropping the callback
construct for a simple map of message ID -> response futures. The
futures are returned already set up to clean themselves. Additionally,
when the connection is closed in flight, the exception returned now has
slightly improved error message (though it's far from ideal still).

(cherry picked from commit 8713f4c)
Ensure we report the channel close event by failing the future if the
future has not yet been completed. Previously, if the future was not
completed (e.g. the channel was closed mid-way through the handshake),
we were not notified of the channel closing, and the future remained
open forever.

(cherry picked from commit 3c2772f)
Copy link
Member

@Zelldon Zelldon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

The quiet and timeout periods when shutting down the service
is now configurable. This means the messaging service test now
takes 10 seconds instead of 40, as previously every service had to wait
2 seconds to shutdown at least, and we spawned a couple of them.

(cherry picked from commit 2a10b0f)
@npepinpe npepinpe force-pushed the backport-7362-to-stable/1.0 branch from ea5e611 to b714573 Compare June 28, 2021 07:35
@ghost
Copy link

ghost commented Jun 28, 2021

Canceled.

@npepinpe
Copy link
Member Author

Some linting/formatting issues 😅

@npepinpe
Copy link
Member Author

bors retry

ghost pushed a commit that referenced this pull request Jun 28, 2021
7383: [Backports stable/1.0] Ensure requests are always completed r=Zelldon a=npepinpe

## Description

This PR backports #7362. The only conflicts were with the PartitionCommandSenderImpl, where we still use the Atomix object directly in 1.0.x, but have replaced it with using the communication service directly in 1.1.x.

## Related issues

relates to #7360 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@ghost
Copy link

ghost commented Jun 28, 2021

Build failed:

Removes usages of sendAndReceive where the timeout is null. This means
adding back the unicast method to the ClusterCommunicationService (so
cases where we don't care about timeout can use the unicast), and making
sure previous callers now use the previously maximum timeout (5
seconds). We could consider making these time out configurable however,
but in general when using a request-reply pattern, we should ALWAYS set
an explicit timeout!

(cherry picked from commit 1726194)
As the PartitionCommandSenderImpl doesn't await a reply, use a reliable
(TCP) unicast, i.e. send the message and don't wait for the reply. This
removes a usage of using a "null" timeout, but generally also makes more
sense since we don't care about the reply here.

(cherry picked from commit ac67e89)
@npepinpe npepinpe force-pushed the backport-7362-to-stable/1.0 branch from b714573 to 11e1532 Compare June 28, 2021 08:13
@npepinpe
Copy link
Member Author

bors retry

@ghost
Copy link

ghost commented Jun 28, 2021

Build succeeded:

@ghost ghost merged commit 5a30f74 into stable/1.0 Jun 28, 2021
@ghost ghost deleted the backport-7362-to-stable/1.0 branch June 28, 2021 08:56
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants