Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
[dev.icinga.com #1745] ido2db not recovering from system crash #695
This issue has been migrated from Redmine: https://dev.icinga.com/issues/1745
Created by c.hirschmann on 2011-07-23 11:22:58 +00:00
After a system crash, in which ido2db had no chance to shut down propperly, it can't recover after the system has booted again, because ido2db's own socket get's in the way.
First thing I noticed were a lot of lines like the following in the system log:
icinga: idomod: Still unable to connect to data sink. 0 items lost, 1041 queued items to flush.
This soon escalated into:
icinga: idomod: Still unable to connect to data sink. 4410 items lost, 5000 queued items to flush.
I then noticed that the ido2db service wasn't running. When I tried to start it manually, it found it's old lock file, tested wether there was a process with the same PID that was apparently stored in that lock file and after finding no such process it tried to start and immediatley exited with the following message:
Could not bind socket: Address already in use
This error message is misleading, since there is no process blocking the network port and address.
But manually removing the old socket file fixed the problem.
This was observed on a system running the latest CentOS 5.6, with icinga 1.4.0, icinga-api 1.4.0, icinga-doc-1.4.0, icinga-gui 1.4.0, icinga-idoutils 1.4.0.
ido2db should be able to recover without assistance after it has crashed.
After it apparently found the old lock file and discovered that there despite the lock file there was no other ido2db process running, it probably should remove the old socket just as it removes the old lock file.
2011-07-23 18:19:21 +00:00 by mfriedrich 2b9eece
Updated by mfriedrich on 2011-07-23 18:20:42 +00:00
you might fix that in the init script as well. i wouldn't touch ido2db binary in this regard.
Updated by c.hirschmann on 2011-08-05 18:59:25 +00:00
Thanks, that's probably the best place to fix it, I just wasn't sure if the init script came from icinga or my distro.
I have a slightly different patch, I basically just put the removal of the socket in the same line where all the other leftover files are removed.