New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the notification mode times.begin is not 0, the first notification has a delay #5561
Comments
Please fetch the runtime information for the
|
I am seeing the same issue with initial notification delays (later than "begin" time). I used the REST API to monitor the notification objects and I believe I understand at least what it's doing.
So, if begin time is 10 minutes, the interval is 15 minutes, and a service goes critical 14 minutes before its currently listed "next_notification"...then the notification goes out at 10 minutes. If that same service is 5 minutes from the "next" notification when it goes critical, then the notification goes out at 15 minutes instead of 10 minutes. (5 remaining minutes + the 10 minutes "begin" time). |
I dug into the source code a bit and believe I have found where the problem is located. In lib/icinga/notification.cpp , in the function: void Notification::BeginExecuteNotification(...), starting around line 293 or so, there are these instructions for setting up the next notification based on times.begin :
I have tweaked this a bit and could at least make it work with either of a couple of changes. I could remove the conditional entirely:
Or I could offset the begin time against the last state change, instead of "now" (which could be "later", if the condition prevented the SetNextNotification on the first try ).
I am not sure why the condition (GetNextNotification() > nextProposedNotification) is taken into consideration, but suspect there is a good reason for that. (re-notifications for previous events?) In any case, though, shouldn't the "begin" time of a notification be referencing the object's hard state change time, instead of "now" ? |
I have researched this issue a bit more and just submitted a pull request for some changes which I think better address the problem. The last hard state change time isn't necessarily the best moment to base the begin and end times on since a notification setup for, say, warning and critical states, will have those times pushed back if the service flips between critical and warning before they are reached (this could delay notifications indefinitely and lead to some uncomfortable questions from admins who never got notified of the problem). In the pull request, I implemented a field to track the actual trigger time of the notification (when it first goes into an applicable hard problem state). That trigger time is reset on recovery or on changing to a non-applicable state. It's a small change in the behaviour of the system, so it should be carefully reviewed, but I believe most users would expect and want the Notifications to act like this, especially if they want a specific layered escalation system for handling problems. |
refs #5890 |
Expected Behavior
Current Behavior
There are three kinds of notifications for each of my monitoring items:
mail-notification:
push-notification:
ivr-notification:
The current situation has been found, ivr-notification when the first notification, there are some delays, than the "times.begin" set the time later.
Possible Solution
Now set the "times.begin = 900" will probably be delayed for more than 1 minute, if there is no way to solve, we may be in the normal begin time: "times.begin = 900" - 60s
Steps to Reproduce (for bugs)
as the picture shows:
Theoretically speaking, "ivr-notification" The first notice should be (19:39:51 + 15min) = 19:54:51;
but now "ivr-notification" The first notice is "19:56:10";
delay time: (19:56:10 - 19:54:51) = 1min 19s;
Context
Your Environment
icinga2 --version
):icinga2 feature list
):icinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.The text was updated successfully, but these errors were encountered: