Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #10426] Icinga crashes with a segfault on receiving a lot of check results for nonexisting hosts/services #3526

Closed
icinga-migration opened this issue Oct 21, 2015 · 17 comments

Comments

Projects
None yet
1 participant
@icinga-migration
Copy link
Member

commented Oct 21, 2015

This issue has been migrated from Redmine: https://dev.icinga.com/issues/10426

Created by vytenis on 2015-10-21 18:31:32 +00:00

Assignee: vytenis
Status: Resolved (closed on 2016-02-24 22:27:10 +00:00)
Target Version: 2.4.8
Last Update: 2016-04-21 07:46:57 +00:00 (in Redmine)

Icinga Version: 2.4.0
Backport?: Not yet backported
Include in Changelog: 1

Large setup, ~500k services, check results are fed via nsca-ng/external command pipe. On real load with a lot of check results via external command pipe and some of them (<10%) are not actually registered in Icinga2, it crashes a few seconds after startup:
http://hastebin.com/yivajejape.sm

Running icinga2 git master @ from oct 16 / commit 21a2986

Attachments

Changesets

2016-02-24 22:25:22 +00:00 by vytenis 6729679

Try to queue all PROCESS_FILE commands instead of exploding the stack

fixes #10426

Signed-off-by: Michael Friedrich <michael.friedrich@netways.de>

2016-02-24 22:25:59 +00:00 by mfriedrich 8e0cc70

Update AUTHORS

refs #10426

2016-05-12 09:08:19 +00:00 by vytenis 9f3a6b9

Try to queue all PROCESS_FILE commands instead of exploding the stack

fixes #10426

Signed-off-by: Michael Friedrich <michael.friedrich@netways.de>

2016-05-12 09:08:19 +00:00 by mfriedrich 7175174

Update AUTHORS

refs #10426
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 21, 2015

Updated by vytenis on 2015-10-21 20:57:44 +00:00

BTW removing calls to BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot process passive host check result for non-existent host '" + arguments[0] + "'")); fixes the issue
... or not, it only delays it for a while.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by gbeutner on 2015-10-22 05:54:24 +00:00

  • Category set to libicinga
  • Status changed from New to Feedback
  • Assigned to set to vytenis

What does the file you're passing to PROCESS_FILE look like?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by vytenis on 2015-10-22 10:49:34 +00:00

Most of the files look like this - all `PROCESS_SERVICE_CHECK_RESULT` with one chained `PROCESS_FILE` somewhere

[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_FILE;/dev/shm/nsca.hMLg5v;1
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz-;puppet;0;PUPPET OK: last successful puppet run at Thu Oct 22 05:28:35 UTC 2015 Duration: 212 seconds
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;check-results;0;Processed 53 checks in 3.02s
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;num-procs;0;PROCS OK: 477 total processes
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;config-validation-check;0;config validated
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostbca;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxzapp02;error_watcher;0;OK: no unwanted lines found (15 examined)
[1445494578] PROCESS_SERVICE_CHECK_RESULT;hostyxz;service1234;0;plugin output ...................................................
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by mfriedrich on 2015-10-22 11:29:19 +00:00

So you are processing that file with PROCESS_FILE and then chain another PROCESS_FILE request inside? That's strange.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by vytenis on 2015-10-22 12:00:55 +00:00

That's how https://www.nsca-ng.org works - sends a single PROCESS_FILE command instead of thousands of them.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by mfriedrich on 2015-10-22 12:13:06 +00:00

Sure. But I've never seen that nesting PROCESS_FILE into files does work. Which is most likely the problem here.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by gbeutner on 2015-10-22 12:46:00 +00:00

Well, there's no inherent problem with nesting PROCESS_FILE calls, but I suspect you might be calling PROCESS_FILE for the same file recursively (i.e. file 'a' calls PROCESS_FILE for file 'a').

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by vytenis on 2015-10-22 13:14:40 +00:00

Same exact setup works with Nagios4 :-/

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by mfriedrich on 2015-10-22 13:21:37 +00:00

vytenis wrote:

Most of the files look like this - all `PROCESS_SERVICE_CHECK_RESULT` with one chained `PROCESS_FILE` somewhere
[...]

Whats the exact path for that file being processed?

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 22, 2015

Updated by vytenis on 2015-10-22 13:30:09 +00:00

dnsmichi wrote:

Whats the exact path for that file being processed?
/dev/shm/nsca.****, eg. /dev/shm/nsca.hMLg5v
It is a file owned by nagios user, no permission errors there.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Oct 27, 2015

Updated by vytenis on 2015-10-27 17:08:43 +00:00

  • File added 0001-Try-to-queue-all-PROCESS_FILE-commands-instead-of-ex.patch

We fixed it in our setup with the attached patch.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Nov 25, 2015

Updated by mfriedrich on 2015-11-25 09:42:27 +00:00

  • Status changed from Feedback to New
  • Assigned to deleted vytenis
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Feb 24, 2016

Updated by mfriedrich on 2016-02-24 22:26:57 +00:00

  • Status changed from New to Assigned
  • Assigned to set to vytenis
  • Target Version set to 2.5.0

Sorry for the delay, and thanks for the patch :)

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Feb 24, 2016

Updated by vytenis on 2016-02-24 22:27:10 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 6729679.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Apr 20, 2016

Updated by gbeutner on 2016-04-20 08:14:38 +00:00

  • Target Version changed from 2.5.0 to 2.4.6
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Apr 20, 2016

Updated by gbeutner on 2016-04-20 16:35:39 +00:00

  • Target Version changed from 2.4.6 to 2.4.7
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Apr 21, 2016

Updated by gbeutner on 2016-04-21 07:46:57 +00:00

  • Target Version changed from 2.4.7 to 2.4.8

@icinga-migration icinga-migration added this to the 2.4.8 milestone Jan 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.