Skip to content

ARTEMIS-3337: Correctly handle multiple connection failures#3613

Merged
clebertsuconic merged 1 commit intoapache:mainfrom
meierhofer08:main
Jun 14, 2021
Merged

ARTEMIS-3337: Correctly handle multiple connection failures#3613
clebertsuconic merged 1 commit intoapache:mainfrom
meierhofer08:main

Conversation

@meierhofer08
Copy link

Previously, when during reconnect one session couldn't be transferred
to the new connection, we instantly returned and didn't execute failover
for the other sessions. This produced the issue that for sessions
where no failover was executed, their channels were still present on the
old connection. When the old connection was then destroyed, these channels
were closed although the reconnect was still ongoing, which lead to
"dead" sessions.

Now, we always execute failover for every session so that the channels
are guaranteed to be removed from the old connection before it is destroyed.

@brusdev
Copy link
Member

brusdev commented Jun 8, 2021

@meierhofer08 good catch. Why not just detaching the sessions from the old connections in preHandleFailover avoiding a useless call to handleFailover?

@meierhofer08
Copy link
Author

meierhofer08 commented Jun 8, 2021

Hmm good question. I didn't want to change the existing code too much to avoid any additional side effects.
In handleFailover, the "ActiveMQSessionContext.reattachOnNewConnection" executes

  • this.remotingConnection = newConnection; and
  • sessionChannel.transferConnection((CoreRemotingConnection) newConnection);
    before sending any packages to the broker. I wanted to ensure both are executed for every session to stay as close as possible to the original behaviour of the code (session.handleFailover was called for every session in the original HornetQ donation).

Detaching the sessions in preHandleFailover is not so easy as "transferConnection" cannot be called because there is no new connection yet at that point. And I'm not sure if just detaching from the old connection is fine, there is no method for this (yet) and the current transferConnection sets "transferring = true", I'm not sure if I can set this when just removing the old connection.

I think the best compromise that shouldn't create issues would be to call session.getSessionContext().getSessionChannel().transferConnection() for the remaining sessions after 1 session's handleFailover() fails, or create something like a "clientHandleFailover" that wraps this and doesn't do any server requests for the remaining sessions.

@meierhofer08
Copy link
Author

I changed it now to only execute "client-side" failover for the remaining sessions if a previous session failed to do correct failover.

Previously, when during reconnect one session couldn't be transferred
to the new connection, we instantly returned and didn't execute failover
for the other sessions. This produced the issue that for sessions
where no failover was executed, their channels were still present on the
old connection. When the old connection was then destroyed, these channels
were closed although the reconnect was still ongoing, which lead to
"dead" sessions.

Now, if a session failover fails, for the remaining sessions the "client-side" part
of failover is executed, which removes the sessions from the old connection so that
they are not closed when the old connection is closed afterwards.
@meierhofer08
Copy link
Author

@brusdev Are you ok with the adapted solution and if yes, could you merge this PR?

@brusdev
Copy link
Member

brusdev commented Jun 14, 2021

@meierhofer08 thanks, for your contribution it LGTM, I executed the test-suite and I didn't see any regression. Could you possibly add a test to validate this fix and mitigate any regressions in the future?

@brusdev
Copy link
Member

brusdev commented Jun 14, 2021

I have created the #3623 PR to add a test. @meierhofer08 thanks for your contribution.

@clebertsuconic clebertsuconic merged commit 3b1f6ee into apache:main Jun 14, 2021
@meierhofer08
Copy link
Author

Ok thank you, writing a test would've taken myself longer probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants