Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
[dev.icinga.com #11390] Command pipe overloaded: Can't send external Icinga command to the local command file #4037
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11390
Created by critical on 2016-03-15 17:47:41 +00:00
I have a custom icingaweb2 module that uses the icinga php command pipe library to schedule dynamic downtime in icinga2.
Before I schedule the downtime (above) I have to make sure that all other scheduled downtimes are cleared. Without knowing the downtime ID's I have to make a query:
And then use the command pipe library:
So for each host that is currently in downtime I have two commands to send (clear service and host downtime), and if the host requires downtime scheduling I need one more. So with 100 hosts I have (at worst) 300 commands to send. When this happens a few of my `$this~~transport~~>send($cmd);` sometimes fail with:
But the pipe exists prior and after the crash. It seems to get overloaded with data, close, and then reopen.
2016-05-11 09:04:28 +00:00 by mfriedrich a529725
2016-05-12 09:11:02 +00:00 by mfriedrich b39634d
Updated by critical on 2016-03-15 18:57:02 +00:00
https://dev.icinga.org/issues/8815: Already have this patch.
Updated by critical on 2016-03-15 21:41:22 +00:00
Can someone move this to icinga2?
I've applied the patches from 0a6505c#diff-41f5f9b62fd89e63b82e66bbe76c0e73 to my 2.3.11 release and still I am having problems.
After reviewing the code further I see some implementation problems. Specifically, https://github.com/Icinga/icinga2/blob/master/lib/compat/externalcommandlistener.cpp#L120, there is a break command for all return codes of sock->Read(...). This should only occur if errno != EAGAIN because we are using non blocking resources.
After applying these fixes and running my tests I have not seen the pipe unexpectedly close.
Some extra work might be required in base/socket.cpp Read(...) to decrease log spam, these shouldn't be critical:
Doing more testing.
Updated by mfriedrich on 2016-03-16 09:16:50 +00:00
First, thanks for all the comments. I would suggest though to get a test vm with 2.4.x+ allowing you to develop and test changes.
We'll happily review and test your patch then.
Updated by critical on 2016-03-16 17:36:50 +00:00
Note: Seems to fail both on rc < 0 and rc == 0 when it shouldn't .
A way to verify that all data has passed through the pipe when this situation occurs.
Updated by mfriedrich on 2016-04-21 15:41:17 +00:00
Hm, I'm looking into a reasonably easy way to reproduce the issue. Putting my Icinga 2 box under stress level (10k hosts, 100k services) and then firing that small script does not give any errors at the moment. Any hints or scripts which would help tackle the issue? Thanks.
Updated by critical on 2016-04-21 16:17:08 +00:00
Unfortunately I have no updates on my end - I have been using the patch I have provided to mitigate this issue.
Would this bash script fail if the CMDFILE did not exist? Or would it write a file instead? Could you try using a C alternative?
Also, what are the specs of the machine you are using? When I am executing these commands my machine's CPU usage (icinga2 and MySQL) raises to 80-90%. Could you try these tests on a VM where you can limit the CPU to 1 or 2 cores with 1GB memory?
Updated by mfriedrich on 2016-05-11 09:34:36 +00:00
FYI - the tests were run on my macbook pro (early 2015, i5, 8 gb ram) taking the resources for the icinga2 and mariadb process when adding 10k hosts and 100k services. At some point the database might be blocking, but all tests were fired before that point. Your patch is reasonable (checking EAGAIN as well as rc=0 continue) so we applied it. Please test the git master / snapshot packages :)