ARTEMIS-3337: Correctly handle multiple connection failures#3613
ARTEMIS-3337: Correctly handle multiple connection failures#3613clebertsuconic merged 1 commit intoapache:mainfrom
Conversation
|
@meierhofer08 good catch. Why not just detaching the sessions from the old connections in preHandleFailover avoiding a useless call to handleFailover? |
|
Hmm good question. I didn't want to change the existing code too much to avoid any additional side effects.
Detaching the sessions in preHandleFailover is not so easy as "transferConnection" cannot be called because there is no new connection yet at that point. And I'm not sure if just detaching from the old connection is fine, there is no method for this (yet) and the current transferConnection sets "transferring = true", I'm not sure if I can set this when just removing the old connection. I think the best compromise that shouldn't create issues would be to call session.getSessionContext().getSessionChannel().transferConnection() for the remaining sessions after 1 session's handleFailover() fails, or create something like a "clientHandleFailover" that wraps this and doesn't do any server requests for the remaining sessions. |
|
I changed it now to only execute "client-side" failover for the remaining sessions if a previous session failed to do correct failover. |
Previously, when during reconnect one session couldn't be transferred to the new connection, we instantly returned and didn't execute failover for the other sessions. This produced the issue that for sessions where no failover was executed, their channels were still present on the old connection. When the old connection was then destroyed, these channels were closed although the reconnect was still ongoing, which lead to "dead" sessions. Now, if a session failover fails, for the remaining sessions the "client-side" part of failover is executed, which removes the sessions from the old connection so that they are not closed when the old connection is closed afterwards.
|
@brusdev Are you ok with the adapted solution and if yes, could you merge this PR? |
|
@meierhofer08 thanks, for your contribution it LGTM, I executed the test-suite and I didn't see any regression. Could you possibly add a test to validate this fix and mitigate any regressions in the future? |
|
I have created the #3623 PR to add a test. @meierhofer08 thanks for your contribution. |
|
Ok thank you, writing a test would've taken myself longer probably. |
Previously, when during reconnect one session couldn't be transferred
to the new connection, we instantly returned and didn't execute failover
for the other sessions. This produced the issue that for sessions
where no failover was executed, their channels were still present on the
old connection. When the old connection was then destroyed, these channels
were closed although the reconnect was still ongoing, which lead to
"dead" sessions.
Now, we always execute failover for every session so that the channels
are guaranteed to be removed from the old connection before it is destroyed.