-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Icinga2 sends notifications without logging about it and despite having a downtime #6057
Comments
Notifications in conjuncture with reloads are a known problem. #5521 for example. |
Is there anything we can do to solve this problem? |
I don't know of any workaround |
What i meant was, is there any information we can provide which helps you track down and fix the issue? |
Oh! A minimal configuration example/list of steps that lets us reliably reproduce the issue is always welcome. |
Still trying to realiably reproduce this on a small setup... -.- Strange thing we realized while looking through our logs is, that the NotificationComponent gets started after the config reloads but we never see any stop log message. Is this correct? It seems like it should log https://github.com/Icinga/icinga2/blob/master/lib/notification/notificationcomponent.cpp#L69 |
During a reload this does not happen, the new process takes over as the old one silently dies. I'm going to check whether two NotificationComponents can run at the same time and if this can be a problem in case it is. |
Same problem in our setup, let us know if we can provide some more info. To provide little context how important this fix is: Almost every day it wakes me and other admins up, often after long night work. Because icinga gets reloaded at morning when changing oncall contacts. Or just some admin is adding new check when I'm sleeping. Please help us save our sleep and sanity :) We belived it will be fixed (similar/same behaviour already mentioned in #4696, #5224, #5361) but we have icinga 2.8.1 now and problem remains. |
This happens a lot if you use scheduledDowntimes. |
We see similar issues with notifications being sent out right after reload of icinga2 for services / hosts which have a scheduled Downtime. I'm not really able to reproduce this in a small vagrant setup. For me it seems to only happen in lager installations where restarts / reloads take some time and a lot of system resources.
|
i have same notification issue when servers are kept in scheduled downtime and icinga randomly sending notification mails. - icinga 2.8.0 |
We have the same notification issue. Icinga 2.8.1, config 6 satellites & 2 masters. Seems to be related with config reload too. |
Looks like a race condition. The NotificationComponent is started before the FileLogger object is instantiated, and as such the icinga2.log doesn't have this record. startup.log should hold this detail. Notification events without a call to Log() do not exist. It is hard to reproduce in this case, as one cannot influence the object load order. I would advise to dump all notification (and related objects) attributes in a debug dummy notification.
https://www.icinga.com/docs/icinga2/latest/doc/09-object-types/#host
|
It exists at our environment since Icinga 2.5.4 ;-) |
You've missed my point. In terms of code, no such events will ever pass through the Log() function call. It may be the case that the logging file stream is not there, and this results in an event literally logging to /dev/null. |
@lippserd: this issue, #5521 and quite some other linked/closed issues all have the same root cause and terribly hurt, especially large environments. Could you please raise priority and give someone some dedicated time to fix this? During restart/reload (read: with every config deployment) we are sending out erroneous notifications because the notification component starts and fires before the core learned about currently active downtimes. I've been shown another proof for this in 2.8.1 right now. Notification are sent during downtime with no trace in IDO and/or log files. Happens immediately after reloading/restarting Icinga. Thanks a lot, |
@Thomas-Gelf Indeed, this is a big issue. We'll see what we can do to fix this as soon as possible. |
ProposalAdd a virtual type to all config objects to help Icinga2 work with them. ExplanationCurrently there is not much difference to icinga between a Host and the NotificationComponent, we can't give any securities about the load order. ProblemsFiguring out the correct types for everything is a big load of work, a sensible default is needed. And there a a lot of possible bugs and dependency misjudgments which might have already existed but were just not visible since we loaded/started everything at more or less the same time. |
removed comment because it was related to different issues. |
@Wintermute2k6 that should not belong to this issue. First issue is this one, the second one is what you copied here. Its a complete different problem. |
Agreed. @Wintermute2k6 your problem sounds like #2844 and #4272. Please proceed there. |
@Mikesch-mp & @dnsmichi i agree with both of you. But its not my issue at all. |
I've read #558349 and this issue is just one of many. ScheduledDowntime objects and their generation of runtime downtime objects is not part of this issue. We're discussing and elaborating fixes for unwanted notifications caused by runtime race conditions, either influenced by not-yet activated downtimes/acks, or too-soon-running features like the notification component. |
No problem with that at all i will split it into the accordingly matching issues. |
Config to test attached, was done with a sleep in the Downtime start method to force the bug |
This patch ensures that specific configuration types are pre-activated and post-activated. In general, logging is first, then common configuration objects like host/service, downtimes, etc. In the end, all features are activated after to ensure that notifications are only sent once downtimes are applied. A similar thing happens for starting with checks too early. The ApiListener feature runs first to allow cluster connections at first glance. fixes #6057 fixes #6231
Expected Behavior
Current Behavior
We found that Icinga2 sends notifications without logging about it and despite having a downtime set.
Steps we do/see:
.*information/Notification: Sending 'Problem' notification.*
), it gets only logged by our notification script.Notification script log:
Icinga2 log of the master which send the notification (and reloaded) from 15:58:*
We also have debuglog from this point in time but i would be really hard to anonymize. Please tell me if you think this would help debugging this and we probably find a way to get it to you.
Possible Solution
?
Steps to Reproduce (for bugs)
Not really sure yet on a small test setup. I can more or less frequently reproduce it on our setup.
Context
We're trying to avoid sending notifications to users for hosts which are normally stopped.
Your Environment
icinga2 --version
): r2.8.1-1icinga2 feature list
): Enabled features: api checker command ido-mysql mainlog notificationicinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.The text was updated successfully, but these errors were encountered: