New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node in Cluster loses connection #5513
Comments
I can reproduce this on another platform:
|
Some more numbers please? :)
|
DetailsFeatures on all nodes:
"/etc/icinga2/zones.conf" without satellite endpoints:
"/etc/icinga2/zones.d" on master01:
Configuration validation on master01:
Configuration validation on master02: ("/var/lib/icinga2/api/zones" is empty)
Behaviourmaster01 establishes a connection with master02 and starts to sync its config: (debug.log on master01)
After 60s master02 closes the connection: ("/var/lib/icinga2/api/zones" is still empty, debug.log of master02)
And master01 gets a stacktrace: (debug.log of master01)
Test 1The first test was to reduce the large number of files, so I added all configs to single files for each zone:
Unfortunately it's still the same behaviour than described above and doesn't have a noticeable effect. Test 2My second test was to reduce the number of objects. So I removed dependency objects until it worked.
The same test on other systems showed that it depends on the systems processing speed how many objects must be removed, either it's more or less. Accordingly it's not a fixed limitation. |
Seems like it's related to #2972 |
|
-> empty config object blobs, no config parsing involved (just 200MB each)
|
Applied a possible fix, tests.
|
@mwaldmueller that's how a perfect issue report for reproducing an issue must look like. should be the standard case 👍 |
I'm playing with some more additional logging, i.e. the bytes synced for a specific zone. This does not replace #5509 but only serves as a base though.
|
This has been added years ago when we had one thread per client. This changed, but we shouldn't entirely remove this cleanup timer. Instead, move the disconnect timer from 1m to 5m. refs #5513
I've split the commits into 3 separate PRs.
Please test that in your master zone using the snapshot packages. Use the attached files and just let it sync between two master inside their master zone. Successful tests and feedback are mandatory for a backport to 2.7.x.
This is independent from the fix, but helps with support. Can be backported to 2.7.x without any special requirements.
This PR is to-be-discussed and won't be merged to master now. This change won't be backported to 2.7.x as this changes the behaviour. |
@darmagan waiting for your tests. |
We have tested this for ubuntu 14.04 and Suse Enterprise. It fixes the closed socket crashes. |
Summary
There are two master nodes m1 and m2. M2 waits for m1, m1 on the other side does a sync, where m2 cuts the connection at that time 'cause m1 reached the timeout of 60sec with:
Infos
Master01:
On the same time on master02
Your Environment
icinga2 --version
):icinga2 feature list
):icinga2 daemon -C
):Regards
The text was updated successfully, but these errors were encountered: