ARTEMIS-3337: Correctly handle multiple connection failures by meierhofer08 · Pull Request #3613 · apache/artemis

meierhofer08 · 2021-06-07T14:57:16Z

Previously, when during reconnect one session couldn't be transferred
to the new connection, we instantly returned and didn't execute failover
for the other sessions. This produced the issue that for sessions
where no failover was executed, their channels were still present on the
old connection. When the old connection was then destroyed, these channels
were closed although the reconnect was still ongoing, which lead to
"dead" sessions.

Now, we always execute failover for every session so that the channels
are guaranteed to be removed from the old connection before it is destroyed.

brusdev · 2021-06-08T09:45:01Z

@meierhofer08 good catch. Why not just detaching the sessions from the old connections in preHandleFailover avoiding a useless call to handleFailover?

meierhofer08 · 2021-06-08T10:24:00Z

Hmm good question. I didn't want to change the existing code too much to avoid any additional side effects.
In handleFailover, the "ActiveMQSessionContext.reattachOnNewConnection" executes

this.remotingConnection = newConnection; and
sessionChannel.transferConnection((CoreRemotingConnection) newConnection);
before sending any packages to the broker. I wanted to ensure both are executed for every session to stay as close as possible to the original behaviour of the code (session.handleFailover was called for every session in the original HornetQ donation).

Detaching the sessions in preHandleFailover is not so easy as "transferConnection" cannot be called because there is no new connection yet at that point. And I'm not sure if just detaching from the old connection is fine, there is no method for this (yet) and the current transferConnection sets "transferring = true", I'm not sure if I can set this when just removing the old connection.

I think the best compromise that shouldn't create issues would be to call session.getSessionContext().getSessionChannel().transferConnection() for the remaining sessions after 1 session's handleFailover() fails, or create something like a "clientHandleFailover" that wraps this and doesn't do any server requests for the remaining sessions.

meierhofer08 · 2021-06-09T09:43:53Z

I changed it now to only execute "client-side" failover for the remaining sessions if a previous session failed to do correct failover.

Previously, when during reconnect one session couldn't be transferred to the new connection, we instantly returned and didn't execute failover for the other sessions. This produced the issue that for sessions where no failover was executed, their channels were still present on the old connection. When the old connection was then destroyed, these channels were closed although the reconnect was still ongoing, which lead to "dead" sessions. Now, if a session failover fails, for the remaining sessions the "client-side" part of failover is executed, which removes the sessions from the old connection so that they are not closed when the old connection is closed afterwards.

meierhofer08 · 2021-06-14T12:40:33Z

@brusdev Are you ok with the adapted solution and if yes, could you merge this PR?

brusdev · 2021-06-14T13:11:11Z

@meierhofer08 thanks, for your contribution it LGTM, I executed the test-suite and I didn't see any regression. Could you possibly add a test to validate this fix and mitigate any regressions in the future?

brusdev · 2021-06-14T20:35:58Z

I have created the #3623 PR to add a test. @meierhofer08 thanks for your contribution.

meierhofer08 · 2021-06-15T07:41:54Z

Ok thank you, writing a test would've taken myself longer probably.

meierhofer08 force-pushed the main branch from b41108e to f54bdd2 Compare June 9, 2021 09:41

meierhofer08 force-pushed the main branch from f54bdd2 to 10d19e9 Compare June 9, 2021 11:55

clebertsuconic merged commit 3b1f6ee into apache:main Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARTEMIS-3337: Correctly handle multiple connection failures#3613

ARTEMIS-3337: Correctly handle multiple connection failures#3613
clebertsuconic merged 1 commit intoapache:mainfrom
meierhofer08:main

meierhofer08 commented Jun 7, 2021

Uh oh!

brusdev commented Jun 8, 2021

Uh oh!

meierhofer08 commented Jun 8, 2021 •

edited

Loading

Uh oh!

meierhofer08 commented Jun 9, 2021

Uh oh!

meierhofer08 commented Jun 14, 2021

Uh oh!

brusdev commented Jun 14, 2021

Uh oh!

brusdev commented Jun 14, 2021 •

edited

Loading

Uh oh!

meierhofer08 commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

meierhofer08 commented Jun 7, 2021

Uh oh!

brusdev commented Jun 8, 2021

Uh oh!

meierhofer08 commented Jun 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meierhofer08 commented Jun 9, 2021

Uh oh!

meierhofer08 commented Jun 14, 2021

Uh oh!

brusdev commented Jun 14, 2021

Uh oh!

brusdev commented Jun 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meierhofer08 commented Jun 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

meierhofer08 commented Jun 8, 2021 •

edited

Loading

brusdev commented Jun 14, 2021 •

edited

Loading