ARTEMIS-3496 Replica connection to its live should fail fast #3771

franz1981 · 2021-09-24T07:02:05Z

https://issues.apache.org/jira/browse/ARTEMIS-3496

franz1981 · 2021-09-24T07:04:18Z

@clebertsuconic do you remember any reason why reconnect attempts was set to 1 on purpose on the replica connection to its live?

clebertsuconic · 2021-09-24T12:27:59Z

it was set to 1 as in not meant to retry.. .a single connection...

its a bug... I meant for one connection only.

Replication and clustering should have the retries through the cluster bridge and clustered connection. not through the serverLocator.

Is there a test that fails with this? run the whole test suite just in case.. and if you can add a failing test it would be even greater.

franz1981 · 2021-09-27T11:44:06Z

Is there a test that fails with this?

I'm still investigating on multiple test failure after applied this change :(
Probably I was using an old branch version that was leaking fds: just sent another round on the CI, let's see

franz1981 · 2021-09-27T17:33:23Z

@clebertsuconic I'm going to add a separate test for this tomorrow: I've rebased and now the CI is fully green with this change

gtully · 2021-10-07T09:57:47Z

I guess a test would be best, but this looks a trivial fix and it is important for time to recover from failure. I would like to see this in 2.19.0.

gtully · 2021-10-08T15:55:57Z

I started with mokito and wow was it involved, the end result is not pretty but it does validate the fix. comments welcome!

gemmellr · 2021-10-11T10:25:10Z

Main comment, without looking at the code, the tests all hung and caused the GHA job to time out after 6 hours.

The tests should have an appropriate timeout to stop any one test taking an excessive amount of time before failing (also makes clear which one goes bang when they do, aiding following analysis of what happened).

gtully · 2021-10-11T11:39:08Z

fair, I guess it is related to the use of the default port, that needs sorting.

…ased and quite involved

ARTEMIS-3496 Replica connection to its live should fail fast

14ebe11

franz1981 force-pushed the ARTEMIS-3496 branch from 188f67e to 14ebe11 Compare September 27, 2021 12:00

franz1981 marked this pull request as ready for review September 27, 2021 17:32

gtully force-pushed the ARTEMIS-3496 branch 3 times, most recently from daf5f5f to 650a92c Compare October 11, 2021 16:05

ARTEMIS-3496 - add test to verify no reconnect on locators - mokito b…

3c52883

…ased and quite involved

gtully force-pushed the ARTEMIS-3496 branch from 650a92c to 3c52883 Compare October 11, 2021 16:26

gtully merged commit 6f4c609 into apache:main Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARTEMIS-3496 Replica connection to its live should fail fast #3771

ARTEMIS-3496 Replica connection to its live should fail fast #3771

franz1981 commented Sep 24, 2021

franz1981 commented Sep 24, 2021

clebertsuconic commented Sep 24, 2021

franz1981 commented Sep 27, 2021 •

edited

franz1981 commented Sep 27, 2021

gtully commented Oct 7, 2021

gtully commented Oct 8, 2021

gemmellr commented Oct 11, 2021

gtully commented Oct 11, 2021

ARTEMIS-3496 Replica connection to its live should fail fast #3771

ARTEMIS-3496 Replica connection to its live should fail fast #3771

Conversation

franz1981 commented Sep 24, 2021

franz1981 commented Sep 24, 2021

clebertsuconic commented Sep 24, 2021

franz1981 commented Sep 27, 2021 • edited

franz1981 commented Sep 27, 2021

gtully commented Oct 7, 2021

gtully commented Oct 8, 2021

gemmellr commented Oct 11, 2021

gtully commented Oct 11, 2021

franz1981 commented Sep 27, 2021 •

edited