Skip to content
This repository has been archived by the owner. It is now read-only.

[dev.icinga.com #1745] ido2db not recovering from system crash #695

Closed
icinga-migration opened this issue Jul 23, 2011 · 4 comments

Comments

Projects
None yet
1 participant
@icinga-migration
Copy link
Member

commented Jul 23, 2011

This issue has been migrated from Redmine: https://dev.icinga.com/issues/1745

Created by c.hirschmann on 2011-07-23 11:22:58 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2011-07-23 19:23:26 +00:00)
Target Version: 1.5
Last Update: 2014-12-08 14:35:52 +00:00 (in Redmine)

Icinga Version: 1.10.0
OS Version: any

Observed behaviour:

After a system crash, in which ido2db had no chance to shut down propperly, it can't recover after the system has booted again, because ido2db's own socket get's in the way.

First thing I noticed were a lot of lines like the following in the system log:

icinga: idomod: Still unable to connect to data sink. 0 items lost, 1041 queued items to flush.

This soon escalated into:

icinga: idomod: Still unable to connect to data sink. 4410 items lost, 5000 queued items to flush.

I then noticed that the ido2db service wasn't running. When I tried to start it manually, it found it's old lock file, tested wether there was a process with the same PID that was apparently stored in that lock file and after finding no such process it tried to start and immediatley exited with the following message:

Could not bind socket: Address already in use

This error message is misleading, since there is no process blocking the network port and address.

But manually removing the old socket file fixed the problem.

This was observed on a system running the latest CentOS 5.6, with icinga 1.4.0, icinga-api 1.4.0, icinga-doc-1.4.0, icinga-gui 1.4.0, icinga-idoutils 1.4.0.

Expected behaviour:

ido2db should be able to recover without assistance after it has crashed.

After it apparently found the old lock file and discovered that there despite the lock file there was no other ido2db process running, it probably should remove the old socket just as it removes the old lock file.

Attachments

Changesets

2011-07-23 18:19:21 +00:00 by mfriedrich 2b9eece

idoutils: remove leftover socket file in init-script startup, e.g. from a system crash #1745

fixes #1745

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2011

Updated by mfriedrich on 2011-07-23 18:20:42 +00:00

  • Category set to 25
  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 1.5

you might fix that in the init script as well. i wouldn't touch ido2db binary in this regard.

so e.g.

diff --git a/rc.ido2db.in b/rc.ido2db.in
index 6361d76..33d97e1 100644
--- a/rc.ido2db.in
+++ b/rc.ido2db.in
@@ -45,6 +45,7 @@ Ido2dbBin=@bindir@/@ido2db_name@
 Ido2dbCfgFile=@sysconfdir@/ido2db.cfg
 Ido2dbVarDir=@localstatedir@
 Ido2dbRunFile=$Ido2dbVarDir/ido2db.lock
+Ido2dbSockFile=$Ido2dbVarDir/ido.sock
 Ido2dbLockDir=/var/lock/subsys
 Ido2dbLockFile=@ido2db_name@
 Ido2dbUser=@icinga_user@
@@ -146,6 +147,8 @@ case "$1" in
                        fi
                fi
                printf "Starting $servicename:"
+               # remove leftover sockfile, from a system crash
+               rm -f $Ido2dbSockFile
                touch $Ido2dbRunFile
                chown $Ido2dbUser:$Ido2dbGroup $Ido2dbRunFile
                $Ido2dbBin -c $Ido2dbCfgFile
@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Jul 23, 2011

Updated by mfriedrich on 2011-07-23 19:23:26 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 2b9eece.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Aug 5, 2011

Updated by c.hirschmann on 2011-08-05 18:59:25 +00:00

  • File added ido2db-init.patch

dnsmichi wrote:

you might fix that in the init script as well. i wouldn't touch ido2db binary in this regard.

Thanks, that's probably the best place to fix it, I just wasn't sure if the init script came from icinga or my distro.

I have a slightly different patch, I basically just put the removal of the socket in the same line where all the other leftover files are removed.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

commented Dec 8, 2014

Updated by mfriedrich on 2014-12-08 14:35:52 +00:00

  • Project changed from 18 to Core, Classic UI, IDOUtils
  • Category changed from 25 to IDOUtils
  • Icinga Version set to 1
  • OS Version set to any

@icinga-migration icinga-migration added this to the 1.5 milestone Jan 17, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.