[dev.icinga.com #4427] Persistent ido2db process after an ido2db service restart #1312

Closed
icinga-migration opened this Issue Jul 18, 2013 · 8 comments

Comments

Projects
None yet
1 participant
Member

icinga-migration commented Jul 18, 2013

This issue has been migrated from Redmine: https://dev.icinga.com/issues/4427

Created by tontonitch on 2013-07-18 07:52:09 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2014-01-03 20:03:52 +00:00)
Target Version: 1.10.3
Last Update: 2014-12-08 14:38:12 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

Hi,

Since I've upgraded from icinga 1.8 to 1.9 (currently 1.9.3), it appears that one ido2db process is not correctly stopped after an ido2db restart. Problem started to occur with icinga version 1.9.0.

Consequently, sometimes there are 3 ido2db processes running:
# ps -ef | grep ido2db icinga 52898 1 0 Jul17 ? 00:03:10 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79274 1 0 01:00 ? 00:00:07 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg icinga 79298 79274 0 01:00 ? 00:01:53 /Monitoring/icinga/bin/ido2db -c /Monitoring/icinga/etc/ido2db.cfg

Even if I stop the ido2db service, one process remains and I need to kill it (kill -9 52898)

This situation doesn't appear at each ido2db restart. I try to reproduce the problem with debug, but no success yet.

Regards,
Yannick

Changesets

2014-01-03 19:46:41 +00:00 by (unknown) 9516b8c

idoutils: wait for child processes on exit preventing zombies

Refs #4427

2014-01-03 19:59:04 +00:00 by (unknown) 238aa46

Merge branch 'fix/ido2db-kill-waitpid-4427' into next

Fixes #4427

2014-01-03 20:01:38 +00:00 by (unknown) b1ed17b

Update Changelog/THANKS.

Refs #4427

2014-01-09 22:28:36 +00:00 by (unknown) 5164ca6

idoutils: wait for child processes on exit preventing zombies

Refs #4427

Conflicts:
	Changelog

2014-01-23 15:15:33 +00:00 by (unknown) 144a0b7

Update Changelog.

Refs #4968
Refs #5434
Refs #4427
Refs #4825
Refs #5263
Refs #5545
Member

icinga-migration commented Jul 27, 2013

Updated by mfriedrich on 2013-07-27 17:37:40 +00:00

i had that once, but i cannot reproduce it easily.

Member

icinga-migration commented Dec 17, 2013

Updated by bigon on 2013-12-17 11:23:56 +00:00

Hi,

I'm experiencing this issue quite often on my infrastructure when the database is busy. This could actually lead to a problem where the new ido2db process is getting stuck and then blocking everything in the core.

I looked at the code and the problem is IMHO in the ido2db_parent_sighandler() function which is racy. If the child is busy writing to the database, it might miss the kill signal, this means in return that the parent will never recieved the SIGCHLD and thus never wait for the child to die. In this condition, the rest of the function is called and in ido2db_cleanup_socket() both the socket and the pidfile are removed. Most of the initscript are relying on the pid file to see if the process has properly exited and otherwise try to kill -9 the processes, this is not working as the pidfile is already gone.

IMHO, wait()/waitpid() should be called just after calling kill() function and wait until all the children have died.

Edit: The same code seems to be present in nagios codebase

Member

icinga-migration commented Jan 3, 2014

Updated by mfriedrich on 2014-01-03 20:03:18 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 1.11

In regards of waitpid() you're truly right, the parent processes should make sure to wait for all child processes to exit properly before terminating itself (and return early if there are no children). I cannot reproduce it easily, but I've pushed your proposed fix to the current development tree.

Member

icinga-migration commented Jan 3, 2014

Updated by Anonymous on 2014-01-03 20:03:52 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset icinga-core:238aa46023953de0e16c197a83851e317b97aaa6.

Member

icinga-migration commented Jan 9, 2014

Updated by bigon on 2014-01-09 14:10:04 +00:00

Would it be possible to backport this for the next 1.10 point release?

Member

icinga-migration commented Jan 9, 2014

Updated by mfriedrich on 2014-01-09 22:29:31 +00:00

cherry picked into support/1.10

Member

icinga-migration commented Jan 27, 2014

Updated by mfriedrich on 2014-01-27 19:30:45 +00:00

  • Target Version changed from 1.11 to 1.10.3
Member

icinga-migration commented Dec 8, 2014

Updated by mfriedrich on 2014-12-08 14:38:12 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 79 to IDOUtils
  • Icinga Version changed from 1 to 1
  • OS Version set to any

@icinga-migration icinga-migration added this to the 1.10.3 milestone Jan 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment