Skip to content
This repository has been archived by the owner. It is now read-only.

[dev.icinga.com #1782] Icinga forgets to schedule non-24x7 checks #712

Closed
icinga-migration opened this issue Aug 4, 2011 · 60 comments
Closed

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented Aug 4, 2011

This issue has been migrated from Redmine: https://dev.icinga.com/issues/1782

Created by mcp on 2011-08-04 10:34:46 +00:00

Assignee: (none)
Status: Rejected (closed on 2015-03-12 19:14:02 +00:00)
Target Version: (none)
Last Update: 2015-03-12 19:14:02 +00:00 (in Redmine)

Icinga Version: 1.4.2?
OS Version: unknown

Hi,

I'm using Icinga v1.4.2, but had the same problems also with v1.4.1 and v1.4.0,
but it seems to become more worse the past days.

Icinga forgets to schedule checks which are not 24x7.

I have a bunch of checks doing checks from 7am to 10pm and another bunch
of checks from 8am to 6pm, but those checks are not scheduled to start again
when a new day and the timeperiod has begun. Today I had 341 service checks
for 7am and 228 for 8am.

all those checks have "Next Scheduled Check: N/A".

When I restart Icinga after 7am but before 8am all checks which should start
at 7am are scheduled again, but not the ones who should start at 8am. I have
to restart Icinga once more after 8am to get these checks scheduled also.

I'm using the same config as I used for Nagios 3 the past years and didn't
have that problem with Nagios.

any idea?

thanks!

Attachments

Changesets

2011-10-22 18:09:37 +00:00 by mfriedrich 348a795

add more comments and thoughts to timperiods, dst handling and checking for the next valid time #1782

refs #1782

Relations:

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 4, 2011

Updated by mfriedrich on 2011-08-04 10:41:49 +00:00

  • Status changed from New to Feedback
  • Priority changed from Urgent to Normal
  • config examples
  • performance output of icingastats
  • graphing of localhost performance and icingastats
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 4, 2011

Updated by mcp on 2011-08-04 10:59:16 +00:00

  • File added icingastats.txt
  • File added example.cfg
  • File added localhost-CPU.png
  • File added localhost-Load.png
  • File added localhost-Memory.png

config example attached.

icingastats output attached.

I don't have graphs for icingastats, some others attached.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 4, 2011

Updated by mcp on 2011-08-04 11:01:18 +00:00

this is a quadcore machine btw.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mcp on 2011-08-05 11:30:51 +00:00

yesterday I tried using use_retained_scheduling_info=0 and today's morning
I had 568 services with next_check 0 (N/A)

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mfriedrich on 2011-08-05 12:10:30 +00:00

well monitoring the overall core performance on the icingastats output over time would give a better idea than just doing wild guesses. did it work with previous versions than 1.4.x ?

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mcp on 2011-08-05 12:21:41 +00:00

well, the problem isn't a performance problem at all.

I wasn't aware of the problem with icinga 1.3.x but I think
the problem didn't exist with versions before 1.4.

and please notice, its just for service !24x7. the other ones
which are 24x7 (>3000 services) are working just fine.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mfriedrich on 2011-08-05 13:45:22 +00:00

so to speak a timeperiods problem. performance would have been a guess in regards of "forgetting things over time", meaning no space for calculations.

which are the system defaults - UTC, something else? did this happen now in DST?

and just a wild guess from the example

        timeperiod_name         07-22 Uhr - x7

turn on debugging and check if the event for rescheduling your service. it might also be interesting, which state the current service is in ...

just for the record - i've just looked into base/events.c where the logic happens.

                                        if(nudge_seconds) {
                                                /* We nudge the next check time when it is due to too many concurrent service checks */
                                                temp_service->next_check=(time_t)(temp_service->next_check+nudge_seconds);
                                                }
                                        else {  
                                                /* Otherwise reschedule (TODO: This should be smarter as it doesn't consider its timeperiod) */
                                                if(temp_service->state_type==SOFT_STATE && temp_service->current_state!=STATE_OK)
                                                        temp_service->next_check=(time_t)(temp_service->next_check+(temp_service->retry_interval*interval_length));
                                                else
                                                        temp_service->next_check=(time_t)(temp_service->next_check+(temp_service->check_interval*interval_length));
                                                }

so given the todo i would expect that this is exactly the case for you.

git blame tells that

e8c48eb4 (Ton Voon          2009-06-11 22:15:39 +0000 1256)

so let's have a look into ...

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mcp on 2011-08-05 13:48:50 +00:00

the services are almost in state OK.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mcp on 2011-08-05 14:13:33 +00:00

default system timezone is Europe/Berlin (Fri Aug 5 16:13:11 CEST 2011)

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mfriedrich on 2011-08-05 15:27:04 +00:00

ok, to trigger the reschedule (if this really happens), the run_event must be set to false. this is the case if

  • /* don't run a service check if we're already maxed out on the number of parallel service checks... */
  • /* don't run a service check if active checks are disabled */

it would be generally interesting from your debuglog if you can catchup with these messages

(set the debug level to match both "checks" and "events")

  • before scheduling the check
    "Max concurrent service checks (%d) has been reached." (where %d is a number)
  • checking on checks to be scheduled
    "We're not executing service checks right now, so we'll skip this event"
  • after the check should be rescheduled
    "Did not execute scheduled event. Idling for a bit..."
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mfriedrich on 2011-08-05 15:42:08 +00:00

after having analyzed this, more event output could be the lead to the topic (in init_timing_loop which is called before starting the event_loop for monitoring things)

either on of those 2

  • "Service check should not be scheduled"
  • "Service is already scheduled to be checked in the future"
    or
  • "Preferred Check Time:"
  • "Preferred Time is Invalid In Timeperiod"
  • "Actual Check Time:"
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mcp on 2011-08-05 16:52:37 +00:00

Hi dnsmichi,

ok, I set debug_level to 24, and debug_verbosity to 1.

is 1 enough for the verbosity or should I set it to 2?

thanks!

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 5, 2011

Updated by mfriedrich on 2011-08-05 16:54:58 +00:00

be the most verbose, so 2 ...

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 6, 2011

Updated by mcp on 2011-08-06 08:26:13 +00:00

Preferred Time is Invalid In Timeperiod '07-22 Uhr - x7': 1312603714 --> Sat Aug 6 06:08:34 2011

but I think that is because of it should start at 7 am.

at startup I see some with "Service check should not be scheduled" but that's OK because
the service timeperiod has not started/ended.

at 06.08.2011 06:02:02:

[1312603322.977948] [008.2] [pid=31568] Service 'Updates' on host 'loghost'
[1312603322.977953] [008.2] [pid=31568] Service check should not be scheduled.
[1312603322.977958] [008.2] [pid=31568] Service 'Uptime' on host 'loghost'
[1312603322.977963] [008.2] [pid=31568] CIB: 42, IBI: 12, TIB: 152, SIF: 22
[1312603322.977968] [008.2] [pid=31568] Mult factor: 1866
[1312603322.977975] [008.2] [pid=31568] Preferred Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011
[1312603322.977982] [008.2] [pid=31568] Actual Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011

and if I understand this correctly, it is totally wrong, because that
check should start at 0800. That's the last occurence for that service
on that host for today :(

and for the Update Service on host loghost:

Last Check Time: 2011-08-05 18:12:06
Check Type: ACTIVE
Check Latency / Duration: 0.111 /5.982 seconds
Next Scheduled Check: N/A
Last State Change: 2011-07-19 07:28:41

for the other missing 450 for today yet it's the same.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 6, 2011

Updated by mfriedrich on 2011-08-06 10:56:11 +00:00

mcp wrote:

Preferred Time is Invalid In Timeperiod '07-22 Uhr - x7': 1312603714 --> Sat Aug 6 06:08:34 2011

but I think that is because of it should start at 7 am.

hmm that sounds like a DST problem again.

at startup I see some with "Service check should not be scheduled" but that's OK because
the service timeperiod has not started/ended.

at 06.08.2011 06:02:02:

[1312603322.977948] [008.2] [pid=31568] Service 'Updates' on host 'loghost'
[1312603322.977953] [008.2] [pid=31568] Service check should not be scheduled.
[1312603322.977958] [008.2] [pid=31568] Service 'Uptime' on host 'loghost'
[1312603322.977963] [008.2] [pid=31568] CIB: 42, IBI: 12, TIB: 152, SIF: 22
[1312603322.977968] [008.2] [pid=31568] Mult factor: 1866
[1312603322.977975] [008.2] [pid=31568] Preferred Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011
[1312603322.977982] [008.2] [pid=31568] Actual Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011

and if I understand this correctly, it is totally wrong, because that
check should start at 0800. That's the last occurence for that service
on that host for today :(

so the core calculates a check time based on the assumption that DST is not active, which then results in wrong check times which are not executed because their check_period is different.

that actually requires debugging ...

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 6, 2011

Updated by mfriedrich on 2011-08-06 12:06:18 +00:00

ok, ...

the next check time gets calculated from various factors,

first, the CIB (current interleave block) plus IBI (interleave block index) multiplied by TIB (total interleave blocks)

                        interleave_block_index++; // increased on each service looping

                        mult_factor=current_interleave_block+(interleave_block_index*total_interleave_blocks);

creating the mult factor, which is being used to calculate the next_check

                        /* set the preferred next check time for the service */
                        temp_service->next_check=(time_t)(current_time+(mult_factor*scheduling_info.service_inter_check_delay));

which then results in your output. the below code block checks, if the preferred check time is a valid time within the timeperiods.

                        /* make sure the service can actually be scheduled when we want */
                        is_valid_time=check_time_against_period(temp_service->next_check,temp_service->check_period_ptr);
                        if(is_valid_time==ERROR){
                                log_debug_info(DEBUGL_EVENTS,2,"Preferred Time is Invalid In Timeperiod '%s': %lu --> %s",temp_service->check_period_ptr->name,(unsigned long)temp_service->next_check,ctime(&temp_service->next_check));
                                get_next_valid_time(temp_service->next_check,&next_valid_time,temp_service->check_period_ptr);
                                temp_service->next_check=next_valid_time;
                                }

given your example, it fails and spits out the error msg in debug log.
but in fact, it will call get_next_valid_time to re-calculate a possible next_check time given your preferred timeperiod.

[1312603322.977963] [008.2] [pid=31568] CIB: 42, IBI: 12, TIB: 152, SIF: 22
[1312603322.977968] [008.2] [pid=31568] Mult factor: 1866
[1312603322.977975] [008.2] [pid=31568] Preferred Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011
[1312603322.977982] [008.2] [pid=31568] Actual Check Time: 1312603550 --> Sat Aug 6 06:05:50 2011

note: those two entries do not match! but the calculation is wrong either way.

Preferred Time is Invalid In Timeperiod '07-22 Uhr - x7': 1312603714 --> Sat Aug 6 06:08:34 2011

actually it doesn't because the next_check time stays the same afterwards.

                        log_debug_info(DEBUGL_EVENTS,2,"Actual Check Time: %lu --> %s",(unsigned long)temp_service->next_check,ctime(&temp_service->next_check));

so conclusion for this analysis - the calculated next_check time does not fit within the timeperiod throws an error, but the retry of getting a valid check time does something wrong, and probably ignores the timeperiod setting (and/or has DST problems). otherwise the Actual Check Time would be corrected.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 6, 2011

Updated by mfriedrich on 2011-08-06 12:29:40 +00:00

and now for the worse part, reading on nagios bug tracker and lists...

http://tracker.nagios.org/view.php?id=31
http://thread.gmane.org/gmane.network.nagios.devel/5530

and
http://thread.gmane.org/gmane.network.nagios.devel/6170
http://thread.gmane.org/gmane.network.nagios.devel/6158
http://thread.gmane.org/gmane.network.nagios.devel/6067

seems like we got a winner on an old and long lasting bug :(

edit:
once more http://article.gmane.org/gmane.network.nagios.devel/5554

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 6, 2011

Updated by mfriedrich on 2011-08-06 12:45:07 +00:00

one last question if we can build a proposed patch on that ... which exact version of icinga is installed, 1.4.2 release tarball or something different? testing will require a recompiled source install.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 8, 2011

Updated by mcp on 2011-08-08 06:28:59 +00:00

1.4.2 release tarball.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 8, 2011

Updated by mfriedrich on 2011-08-08 06:50:37 +00:00

ok. as i do think that this hurts your production, and i currently can't reproduce the overall problem, i'd propose a workaround for that problem

  • set check_period to 24x7 again
  • create a script using external commands to schedule a fixed downtime between 7 and 8 for all these services (or host_svc_downtime) via cronjob
  • otherwise, if you don't like the cron, set the notification_period accordingly

regarding the problem, i'd love to see the full debuglog (with level=-1 and verbosity=2) from -1 to +1 in the time window). if it contains not-accurate-for-public info, targz it and send it to my mailbox michael.friedrich (at) univie.ac.at

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 8, 2011

Updated by mfriedrich on 2011-08-08 09:14:25 +00:00

i've created a test config, and i can't reproduce it. checks are rescheduled like i want them to be due to the timperiods being set.

define command {
  command_name                  check_ping
  command_line                  /usr/lib64/nagios/plugins/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}

define service{
  use                             generic-service
  check_interval                  5
  retry_interval                  1
  max_check_attempts              3
  host_name                       localhost
  service_description             Testping
  check_command                   check_ping!100.0,20%!500.0,60%
  check_period                    test_timeperiod
}

 # now it's 10:43
##
#Last Check Time:       2011-08-08 10:42:51
#Next Scheduled Check:          2011-08-08 10:47:51

# now it's 10:48
# Last Check Time:       2011-08-08 10:42:51 (Last Update:      2011-08-08 10:48:30  ( 0d 0h 0m 20s ago)) <- ok, check did not happen due to timeperiod
# Next Scheduled Check:         2011-08-08 10:52:51

# now it's 10:53
# Last Check Time:      2011-08-08 10:52:51
# Next Scheduled Check:         2011-08-08 10:57:51

# date -s 10:47

# Aug  8 10:47:00 xxx icinga: Warning: A system time change of 0d 0h 7m 18s (backwards in time) has been detected.  Compensating...
# Aug  8 10:47:00 xxx icinga: TIMEPERIOD TRANSITION: test_timeperiod;1;0


# now it's 10:55 - faked 10_48
# Last Check Time:      2011-08-08 10:45:33
# Next Scheduled Check:         2011-08-08 10:50:33

# resync it
# Aug  8 10:50:00 xxx icinga: TIMEPERIOD TRANSITION: test_timeperiod;0;1
# Aug  8 10:57:49 xxx icinga: Warning: The results of service 'cert[exp]: thomas-1.office.crt' on host 'localhost' are stale by 0d 0h 6m 52s (threshold=0d 0h 7m 15s).  I'm forcing an immediate check of the service.
# Aug  8 10:57:49 xxx icinga: Warning: The results of service 'sys[proc]: httpd' on host 'localhost' are stale by 0d 0h 6m 52s (threshold=0d 0h 7m 15s).  I'm forcing an immediate check of the service.

## 2nd try
## now it's 11:01, disallow checks at 11:05

# Last Check Time:      2011-08-08 10:57:49
# Next Scheduled Check:         2011-08-08 11:02:49

# Last Check Time:      2011-08-08 11:02:49
# Next Scheduled Check:         2011-08-08 11:15:00 <- this is exactly the timeperiod definition ...

## 3rd try
# remove the timeperiod force a check
# Last Check Time:      2011-08-08 11:06:28
# Next Scheduled Check:         2011-08-08 11:11:28


# now enable the timeperiod
# reforce a check
# Last Check Time:      2011-08-08 11:08:42
# Next Scheduled Check:         2011-08-08 11:13:42 <- ok

# adjust the timeperiod again to 11:20
# reload core, force check, wait
# Last Check Time:      2011-08-08 11:11:45
# Next Scheduled Check:         2011-08-08 11:20:00 <- this is also correct ...


define timeperiod {
  timeperiod_name test_timeperiod
  alias           Test Timeperiod
  sunday          00:00-06:00,18:00-24:00
  monday          00:00-11:05,11:20-24:00
  tuesday         00:00-06:00,19:40-24:00
  wednesday       00:00-06:00,21:00-24:00
  thursday        00:00-06:00,22:00-24:00
  friday          00:00-06:00,18:00-24:00
  saturday        00:00-06:00,18:00-24:00
}
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 8, 2011

Updated by mfriedrich on 2011-08-08 13:38:32 +00:00

this thread is very interesting, also the proposed patch.

http://thread.gmane.org/gmane.network.nagios.devel/5530/focus=5554

currently, if the next_check time is not a valid time and not in the timeperiod, the check becomes reschedule for the next week (does this happen in here?).

by changing that to happen at the next check cycle, the core could retry that once more, when hitting the changed timeperiod / changed time window.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 8, 2011

Updated by mcp on 2011-08-08 14:33:42 +00:00

next_check is 0 (N/A)

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 15, 2011

Updated by mcp on 2011-08-15 14:06:32 +00:00

I've narrowed the problem down. If you restart Icinga during the time when the check period should NOT check it won't get rescheduled when it should i.e. when the timeperiod begins again.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 2, 2011

Updated by mcp on 2011-09-02 07:00:56 +00:00

any news on this?

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 4, 2011

Updated by mfriedrich on 2011-09-04 16:00:05 +00:00

still, even if narrowed down, i'd love to see the debug file to see exactly what happens.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 16, 2011

Updated by mfriedrich on 2011-09-16 12:35:29 +00:00

possible this valuable patch from andreas could resolve the rescheduling issue with check periods on (changed) timeperiods.

X-Git-Url: http://git.weiss.in-berlin.de/cgi-bin/gitweb/nagios.git/blobdiff_plain/2b933716d8d24312e359cddfb0c101200459b3d7..de852ab49e812abbed936abcf21f9a530382989e:/base/events.c

diff --git a/base/events.c b/base/events.c
index ed30d53..3ff0717 100644
--- a/base/events.c
+++ b/base/events.c
@@ -338,6 +338,7 @@ void init_timing_loop(void) {
        log_debug_info(DEBUGL_EVENTS, 2, "Current Interleave Block: %d\n", current_interleave_block);

        for(interleave_block_index = 0; interleave_block_index < scheduling_info.service_interleave_factor && temp_service != NULL; temp_service = temp_service->next) {
+           int check_delay = 0;

            log_debug_info(DEBUGL_EVENTS, 2, "Service '%s' on host '%s'\n", temp_service->description, temp_service->host_name);
            /* skip this service if it shouldn't be scheduled */
@@ -346,8 +347,17 @@ void init_timing_loop(void) {
                continue;
                }

-           /* skip services that are already scheduled for the future (from retention data), but reschedule ones that were supposed to happen while we weren't running... */
-           if(temp_service->next_check > current_time) {
+           /*
+            * skip services that are already scheduled for the (near)
+            * future from retention data, but reschedule ones that
+            * were supposed to happen while we weren't running...
+            * We check to make sure the check isn't scheduled to run
+            * far in the future to make sure checks who've hade their
+            * timeperiods changed during the restart aren't left
+            * hanging too long without being run.
+            */
+           check_delay = temp_service->next_check - current_time;
+           if(check_delay > 0 && check_delay < (temp_service->check_interval * interval_length)) {
                log_debug_info(DEBUGL_EVENTS, 2, "Service is already scheduled to be checked in the future: %s\n", ctime(&temp_service->next_check));
                continue;
                }
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 16, 2011

Updated by mcp on 2011-09-16 12:49:16 +00:00

ok, will test. some very little modifications, mostly spaces, and it'll apply
cleanly on Icinga 1.5.1.

Will test this evening. More to come ...

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 19, 2011

Updated by mcp on 2011-09-19 08:07:23 +00:00

the patch does not fix this problem.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 20, 2011

Updated by mcp on 2011-09-20 15:42:29 +00:00

http://www.nagios-portal.org/wbb/index.php?page=Thread&postID=85322

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 15:29:32 +00:00

host check viability when running an asynchronous host check, is_valid_time gets set by that, determining if the check should be run and if time_is_valid is true or false.

                /* make sure this is a valid time to check the host */
                if (check_time_against_period((unsigned long)current_time, hst->check_period_ptr) == ERROR) {
                        preferred_time = current_time;
                        if (time_is_valid)
                                *time_is_valid = FALSE;
                        perform_check = FALSE;
                }

time_is_valid is used in here

/* run a scheduled host check asynchronously */
int run_scheduled_host_check_3x(host *hst, int check_options, double latency) {

/* attempt to run the check */
result = run_async_host_check_3x(hst, check_options, latency, TRUE, TRUE, &time_is_valid, &preferred_time);

/* an error occurred, so reschedule the check */
if (result == ERROR) {

/* the host could not be rescheduled properly - set the next check time for next week */
if (time_is_valid FALSE && next_valid_time preferred_time) {

time_is_valid is set to FALSE if the check could not be executed. so this is not the actual check but a timeperiod check before.

preferred_time is then calculated too, being matched against as next valid time

                        /* make sure we rescheduled the next host check at a valid time */
                        get_next_valid_time(preferred_time, &next_valid_time, hst->check_period_ptr);

the question is, why is next_valid_time == preferred_time and not having a pre-checked valid time from the check attempt to set a reschedule time 1 week in the future.

for the services, this is a bit different.

/* executes a scheduled service check */
int run_scheduled_service_check(service *svc, int check_options, double latency) {

        int time_is_valid = TRUE;


        /* attempt to run the check */
        result = run_async_service_check(svc, check_options, latency, TRUE, TRUE, &time_is_valid, &preferred_time);

        /* an error occurred, so reschedule the check */
        if (result == ERROR) {


                        /* get current time */
                        time(&current_time);

                        /* determine next time we should check the service if needed */
                        /* if service has no check interval, schedule it again for 5 minutes from now */
                        if (current_time >= preferred_time)
                                preferred_time = current_time + ((svc->check_interval <= 0) ? 300 : (svc->check_interval * interval_length));

                        /* make sure we rescheduled the next service check at a valid time */
                        get_next_valid_time(preferred_time, &next_valid_time, svc->check_period_ptr);
                        /* the service could not be rescheduled properly - set the next check time for next week */
                        /*if(time_is_valid==FALSE && next_valid_time==preferred_time){*/
                        /* UPDATED 08/12/09 EG to reflect proper timeperod check logic */
                        if (time_is_valid == FALSE &&  check_time_against_period(next_valid_time, svc->check_period_ptr) == ERROR) {

so over here, time_is_valid is FALSE too if the check could not happen, but the next_valid_time calculated from previous get_next_valid_time does not fit in check_time_against_period so it still gets rescheduled.

i would suspect a bug in that function, because is_valid_time is always FALSE (so TRUE on the condition, and unless the timeperiod does not match the next_valid_time, the condition is always matched and therefore checks are rescheduled next week,

but as a matter of fact, this only occurs if using non 24x7 timeperiods.

deeper analysis on both needed

  1. get_next_valid_time
  2. check_time_against_period
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 15:31:35 +00:00

base/checks.c:                  get_next_valid_time(preferred_time, &next_valid_time, svc->check_period_ptr);
base/checks.c:          get_next_valid_time(preferred_time, &next_valid_time, temp_service->check_period_ptr);
base/checks.c:                  get_next_valid_time(preferred_time, &next_valid_time, hst->check_period_ptr);
base/checks.c:          get_next_valid_time(preferred_time, &next_valid_time, hst->check_period_ptr);
base/utils.c:/* Separate this out from public get_next_valid_time for testing, so we can mock current_time */
base/utils.c:void _get_next_valid_time(time_t pref_time, time_t current_time, time_t *valid_time, timeperiod *tperiod) {
base/utils.c:           _get_next_valid_time_per_timeperiod(pref_time, &earliest_time, current_time, tperiod);
base/utils.c:void _get_next_valid_time_per_timeperiod(time_t pref_time, time_t *valid_time, time_t current_time, timeperiod *tperiod) {
base/utils.c:   log_debug_info(DEBUGL_FUNCTIONS, 0, "get_next_valid_time_per_timeperiod()\n");
base/utils.c:void get_next_valid_time(time_t pref_time, time_t *valid_time, timeperiod *tperiod) {
base/utils.c:   log_debug_info(DEBUGL_FUNCTIONS, 0, "get_next_valid_time()\n");
base/utils.c:   _get_next_valid_time(pref_time, current_time, valid_time, tperiod);
base/notifications.c:                   get_next_valid_time(current_time, &timeperiod_start, svc->notification_period_ptr);
base/notifications.c:                   get_next_valid_time(current_time, &timeperiod_start, hst->notification_period_ptr);
base/commands.c:                                get_next_valid_time(preferred_time, &next_valid_time, temp_host->check_period_ptr);
base/commands.c:                                get_next_valid_time(preferred_time, &next_valid_time, temp_service->check_period_ptr);
base/commands.c:                get_next_valid_time(preferred_time, &next_valid_time, svc->check_period_ptr);
base/commands.c:                get_next_valid_time(preferred_time, &next_valid_time, hst->check_period_ptr);
base/events.c:                  get_next_valid_time(current_time, &next_valid_time, temp_service->check_period_ptr);
base/events.c:                  get_next_valid_time(current_time, &next_valid_time, temp_host->check_period_ptr);
base/events.c:                          get_next_valid_time(temp_service->next_check, &next_valid_time, temp_service->check_period_ptr);
base/events.c:                  get_next_valid_time(temp_host->next_check, &next_valid_time, temp_host->check_period_ptr);
base/icinga.c:          get_next_valid_time(pref_time, &valid_time, tp);
common/macros.c:                get_next_valid_time(test_time, &next_valid_time, temp_timeperiod);
include/icinga.h:void get_next_valid_time(time_t, time_t *,timeperiod *);       /* get the next valid time in a time period */
include/icinga.h:void _get_next_valid_time_per_timeperiod(time_t, time_t *, time_t, timeperiod *);
t-tap/test-stubs.c:void get_next_valid_time(time_t time_t1, time_t *time_t2, timeperiod *temp_timeperiod) {}
t-tap/test_events.c:void get_next_valid_time(time_t time_t1, time_t *time_t2, timeperiod *temp_timeperiod) {}
t-tap/test_timeperiods.c:       get_next_valid_time(current_time, &next_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       get_next_valid_time(current_time, &next_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:               _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:               ok(test_time == chosen_valid_time, "get_next_valid_time always returns same time");
t-tap/test_timeperiods.c:               _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:               ok(test_time == chosen_valid_time, "get_next_valid_time always returns same time, time_t=%lu", test_time);
t-tap/test_timeperiods.c:               _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:               ok(test_time == chosen_valid_time, "get_next_valid_time always returns same time, time_t=%lu", test_time);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       todo_start("Is a bug in get_next_valid_time for a time that falls in the DST change hour period");
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       //_get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time_per_timeperiod(test_time, &chosen_valid_time, test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       //_get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time_per_timeperiod(test_time, &chosen_valid_time, test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       //_get_next_valid_time(test_time, test_time, &chosen_valid_time, temp_timeperiod);
t-tap/test_timeperiods.c:       _get_next_valid_time_per_timeperiod(test_time, &chosen_valid_time, test_time, temp_timeperiod);
t-tap/stub_utils.c:void get_next_valid_time(time_t pref_time, time_t *valid_time, timeperiod *tperiod) {}


base/checks.c:                  if (time_is_valid == FALSE &&  check_time_against_period(next_valid_time, svc->check_period_ptr) == ERROR) {
base/checks.c:          if (check_time_against_period((unsigned long)current_time, svc->check_period_ptr) == ERROR) {
base/checks.c:          if (temp_dependency->dependency_period != NULL && check_time_against_period(current_time, temp_dependency->dependency_period_ptr) == ERROR)
base/checks.c:          if (check_time_against_period(current_time, temp_service->check_period_ptr) == ERROR)
base/checks.c:          if (temp_dependency->dependency_period != NULL && check_time_against_period(current_time, temp_dependency->dependency_period_ptr) == ERROR)
base/checks.c:          if (check_time_against_period(current_time, temp_host->check_period_ptr) == ERROR)
base/checks.c:          if (check_time_against_period((unsigned long)current_time, hst->check_period_ptr) == ERROR) {
base/utils.c:int check_time_against_period(time_t test_time, timeperiod *tperiod) {
base/utils.c:   log_debug_info(DEBUGL_FUNCTIONS, 0, "check_time_against_period()\n");
base/utils.c:           if (check_time_against_period(test_time, temp_timeperiodexclusion->timeperiod_ptr) == OK) {
base/utils.c:   if (check_time_against_period(preferred_time, tperiod) == OK) {
base/notifications.c:   if (check_time_against_period(current_time, temp_period) == ERROR) {
base/notifications.c:   if (check_time_against_period(time(NULL), cntct->service_notification_period_ptr) == ERROR) {
base/notifications.c:   if (se->escalation_period != NULL && check_time_against_period(current_time, se->escalation_period_ptr) == ERROR)
base/notifications.c:   if (check_time_against_period(current_time, hst->notification_period_ptr) == ERROR) {
base/notifications.c:   if (check_time_against_period(time(NULL), cntct->host_notification_period_ptr) == ERROR) {
base/notifications.c:   if (he->escalation_period != NULL && check_time_against_period(current_time, he->escalation_period_ptr) == ERROR)
base/commands.c:                        if (check_time_against_period(preferred_time, temp_host->check_period_ptr) == ERROR) {
base/commands.c:                        if (check_time_against_period(preferred_time, temp_service->check_period_ptr) == ERROR) {
base/commands.c:        if (check_time_against_period(preferred_time, svc->check_period_ptr) == ERROR) {
base/commands.c:        if (check_time_against_period(preferred_time, hst->check_period_ptr) == ERROR) {
base/events.c:          is_valid_time = check_time_against_period(current_time, temp_service->check_period_ptr);
base/events.c:          is_valid_time = check_time_against_period(current_time, temp_host->check_period_ptr);
base/events.c:                  is_valid_time = check_time_against_period(temp_service->next_check, temp_service->check_period_ptr);
base/events.c:          is_valid_time = check_time_against_period(temp_host->next_check, temp_host->check_period_ptr);
common/macros.c:                dummy = asprintf(output, "%d", (check_time_against_period(test_time, temp_timeperiod) == OK) ? 1 : 0);
common/macros.c:                if (next_valid_time == test_time && check_time_against_period(test_time, temp_timeperiod) == ERROR)
include/icinga.h:int check_time_against_period(time_t,timeperiod *);    /* check to see if a specific time is covered by a time period */
t-tap/test-stubs.c:int check_time_against_period(time_t time_t1, timeperiod *timeperiod) {}
t-tap/test_events.c:int check_time_against_period(time_t time_t1, timeperiod *timeperiod) {}
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:               is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:               is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:               is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/test_timeperiods.c:       is_valid_time = check_time_against_period(test_time, temp_timeperiod);
t-tap/stub_utils.c:int check_time_against_period(time_t test_time, timeperiod *tperiod) {}
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 15:48:59 +00:00

I found the reason for this. The service is scheduled for the next morning after the last check for the day as it should. It's executed a couple of seconds early however, so run_async_service_check returns 'ERROR' as that's outside of the timeperiod. time_is_valid is then set to FALSE.
Next run_scheduled_service_check tries to reschedule with a preferred time of +5 mins, which is inside the timeperiod and thus accepted. So next_valid_time=preferred_time and time_is_valid=FALSE now, which leads to a delay in scheduling of a week (in 3.1.2, in 3.0.6 a year).

# checks.c#run_scheduled_service_check:
result=run_async_service_check(svc,check_options,latency,TRUE,TRUE,&time_is_valid,&preferred_time);
# this results in result=ERROR and time_is_valid=FALSE as it's called just before 07:59:54

if(result==ERROR){
 if(svc->should_be_scheduled==TRUE){
  time(&current_time);
  if(current_time>=preferred_time) # true
   preferred_time=current_time+((svc->check_interval<=0)?300:(svc->check_interval*interval_length));
   # preferred_time is now 08:04:54
   get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);
   if(time_is_valid==FALSE && next_valid_time==preferred_time){
    # the preferred_time was accepted by get_next_valid_time, as it falls in the timeperiod.
    # But time_is_valid was still set to FALSE,
    # so the check scheduling is delayed with a week

I think the time_is_valid that's tested here should be set by get_next_valid_time, or not tested at all. Apart from that, of course it's wrong to execute the check before it's scheduled time.

compared to

Actually this is not the case. What seems to be happening is that the scheduled jobs are getting rescheduled, which is moving them outside of their timeperiods.

I don't understand why these checks would be rescheduled.

which could be true too - the timeperiod checks are not valid and therefore causing within the check viability an error, and afterwards on a reschedule retry they are immediately rescheduled away once more because of probably the same bug. this happens within milliseconds, not to say nanoseconds, to determine

  • don't run the check via viability check_time_against_period
  • we couldn't run it, try to get a new preferred_time
  • check that again against with check_time_against_period
  • 2nd step, still no valid time being said by that function
  • reschedule it one week further, to remove the problem from the current scheduler queue
  • and then 1 week later this happens again
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 17:48:44 +00:00

get_next_valid_time contains a nested check with check_time_against_period and preferred_time against tperiod.

/* given a preferred time, get the next valid time within a time period */
void get_next_valid_time(time_t pref_time, time_t *valid_time, timeperiod *tperiod) {
        time_t current_time = (time_t)0L;

        log_debug_info(DEBUGL_FUNCTIONS, 0, "get_next_valid_time()\n");

        /* get time right now, preferred time must be now or in the future */
        time(&current_time);

        _get_next_valid_time(pref_time, current_time, valid_time, tperiod);
}


/* Separate this out from public get_next_valid_time for testing, so we can mock current_time */

void _get_next_valid_time(time_t pref_time, time_t current_time, time_t *valid_time, timeperiod *tperiod) {
        time_t preferred_time = (time_t)0L;
        preferred_time = (pref_time < current_time) ? current_time : pref_time;

        if (tperiod == NULL) {
                *valid_time = preferred_time;
                return;
        }

        if (check_time_against_period(preferred_time, tperiod) == OK) {
#ifdef TEST_TIMEPERIODS_B
                printf("PREF TIME IS VALID\n");
#endif
                *valid_time = preferred_time;
                return;
        }

        get_earliest_time(preferred_time, valid_time, current_time, tperiod, 0);
}

0 causing get_earliest_time to use

        if ((level % 2) == 0) {
                _get_next_valid_time_per_timeperiod(pref_time, &earliest_time, current_time, tperiod);
                if (*valid_time == 0)
                        *valid_time = earliest_time;
                else if (earliest_time < *valid_time)
                        *valid_time = earliest_time;

and then taking the timeperiod exclusions into account. i expect that no timeperiod exclusion is used currently?

if so this would be called recursively, incrementing the level and also hitting the else tree. why the heck is valid_time +1 over there?

        } else {
                get_min_invalid_time_per_timeperiod(pref_time, &earliest_time, current_time, tperiod);
                if (*valid_time == 0)
                        *valid_time = earliest_time;
                else if (earliest_time < *valid_time)
                        *valid_time = earliest_time + 1;
        }

anyhow, the actual logic happens within _get_next_valid_time_per_timeperiod

this is where a lot of is_dst=-1 are happening, but that should be set by the system and then being checked and compared by the application, like done with shift?

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 17:50:45 +00:00

  • File added dst.c

@mcp

can you save dst.c and let it compile somewhere with

$ gcc dst.c -o dst

and run it with

$ ./dst

and post the output? maybe there is a daylight saving problem?

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 17:58:04 +00:00

just checked ...

$ grep -n 'isdst = -1' base/*
base/utils.c:907:                               t->tm_isdst = -1;
base/utils.c:937:                               t->tm_isdst = -1;
base/utils.c:1297:                              t->tm_isdst = -1;
base/utils.c:1369:                              t->tm_isdst = -1;
base/utils.c:1704:                              t->tm_isdst = -1;
base/utils.c:1772:                              t->tm_isdst = -1;
base/utils.c:2059:              t.tm_isdst = -1;
base/utils.c:2085:                      t.tm_isdst = -1;
base/utils.c:2096:              t.tm_isdst = -1;
base/utils.c:2123:      t.tm_isdst = -1;
base/utils.c:2143:              t.tm_isdst = -1;
base/utils.c:2164:                      t.tm_isdst = -1;
base/utils.c:2177:              t.tm_isdst = -1;

isdst should be determined with

   tmp = localtime(&t);
   is_dst = tmp->tm_isdst;

   memset(&tm, 0, sizeof(tm));
   tm.tm_sec = rtct.tm_sec;
   tm.tm_min = rtct.tm_min;
   tm.tm_hour = rtct.tm_hour;
   tm.tm_mday = rtct.tm_mday;
   tm.tm_mon = rtct.tm_mon;
   tm.tm_year = rtct.tm_year;
   tm.tm_isdst = is_dst;
   t2 = mktime(&tm);

which is actually done within

/* Checks if the given time is in daylight time saving period */
int is_dlst_time(time_t *time) {
        struct tm *bt = localtime(time);
        return bt->tm_isdst;
}

even calculating the shift then.

/* Returns the shift in seconds if the given times are across the daylight time saving period change */
int get_dlst_shift(time_t *start, time_t *end) {
        int shift = 0, dlst_end, dlst_start;
        dlst_start = is_dlst_time(start);
        dlst_end = is_dlst_time(end);
        if (dlst_start < dlst_end) {
                shift = 3600;
        } else if (dlst_start > dlst_end) {
                shift = -3600;
        }
        return shift;
}

but as a matter of fact, not used in each place where new times are calculated. could be a problem too.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 22, 2011

Updated by mfriedrich on 2011-10-22 18:28:47 +00:00

@mcp

can i get the full debuglog what's happening there, sent to michael.friedrich(at)univie.ac.at

icinga.cfg

debug_level=16
debug_verbosity=2
max_debug_file_size=1000000000

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 23, 2011

Updated by mcp on 2011-10-23 09:50:14 +00:00

root@lokalhorst:/root/ # ./dst
TimeDiff=-3600 is_dst=0 at Sun Oct 29 02:00:00 2006
TimeDiff=-3600 is_dst=1 at Sun Mar 25 03:00:00 2007
TimeDiff=-3600 is_dst=0 at Sun Oct 28 02:00:00 2007
TimeDiff=-3600 is_dst=1 at Sun Mar 30 03:00:00 2008
TimeDiff=-3600 is_dst=0 at Sun Oct 26 02:00:00 2008
TimeDiff=-3600 is_dst=1 at Sun Mar 29 03:00:00 2009
TimeDiff=-3600 is_dst=0 at Sun Oct 25 02:00:00 2009
TimeDiff=-3600 is_dst=1 at Sun Mar 28 03:00:00 2010
TimeDiff=-3600 is_dst=0 at Sun Oct 31 02:00:00 2010
TimeDiff=-3600 is_dst=1 at Sun Mar 27 03:00:00 2011
TimeDiff=-3600 is_dst=0 at Sun Oct 30 02:00:00 2011
TimeDiff=-3600 is_dst=1 at Sun Mar 25 03:00:00 2012
TimeDiff=-3600 is_dst=0 at Sun Oct 28 02:00:00 2012
TimeDiff=-3600 is_dst=1 at Sun Mar 31 03:00:00 2013
TimeDiff=-3600 is_dst=0 at Sun Oct 27 02:00:00 2013
TimeDiff=-3600 is_dst=1 at Sun Mar 30 03:00:00 2014
TimeDiff=-3600 is_dst=0 at Sun Oct 26 02:00:00 2014
TimeDiff=-3600 is_dst=1 at Sun Mar 29 03:00:00 2015
TimeDiff=-3600 is_dst=0 at Sun Oct 25 02:00:00 2015
TimeDiff=-3600 is_dst=1 at Sun Mar 27 03:00:00 2016

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 25, 2011

Updated by mcp on 2011-10-25 10:16:52 +00:00

ok debug log is almost ready. I have to replace customer names etc. but when I'm finished I'll send it to you.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 25, 2011

Updated by mcp on 2011-10-25 11:46:52 +00:00

Just sent the debug output via email to you.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 27, 2012

Updated by elagon on 2012-08-27 15:30:52 +00:00

  • Icinga Version set to 1
  • OS Version set to unknown

@mcp Can you try the latest stable version and tell us if you notice the same issue?
Thanks

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 27, 2012

Updated by mcp on 2012-08-27 15:56:31 +00:00

I will.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 12, 2012

Updated by mcp on 2012-09-12 16:17:05 +00:00

Moin elagon,

nice, after several days I did not have 1 check which wasn't scheduled.

No Icinga restarts at 7am and 8am anymore.

Thank you very much! Nice work :)

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 12, 2012

Updated by mfriedrich on 2012-09-12 17:19:05 +00:00

interesting. i did not change anything in this code region. lemme guess - we are in the wrong timezone (dst)?

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 12, 2012

Updated by mfriedrich on 2012-09-12 17:32:35 +00:00

we could test a patch, which was originally done on the nagios tree.
https://github.com/dnsmichi/nagios-svn/commit/64f9abb6bbc9e7cee439469250ee1eb40da2500d

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 12, 2012

Updated by mcp on 2012-09-12 18:07:16 +00:00

to test what? breakage again? ;-)

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 13, 2012

Updated by mcp on 2012-09-13 08:09:56 +00:00

grmpf, for whatever fucking reason, this morning I had 554 services which were not scheduled :-(

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 14, 2012

Updated by mjbrooks on 2012-09-14 00:23:59 +00:00

@mcp please keep your language family friendly. Updates go out to other places such as Twitter.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Sep 24, 2012

Updated by mfriedrich on 2012-09-24 10:42:10 +00:00

as said, we can test that patch. though, i won't take that into 1.8 without longterm tests.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 8, 2012

Updated by mcp on 2012-10-08 10:06:44 +00:00

OK, applied and I will test it.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Oct 9, 2012

Updated by mcp on 2012-10-09 15:13:32 +00:00

the patch does not fix the problem :(

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jul 15, 2013

Updated by shankarpatel on 2013-07-15 06:02:07 +00:00

I am also facing that same issue so I registered 4368 issue for the same and now waiting for reply here as per dnsmichi.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jun 20, 2014

Updated by mfriedrich on 2014-06-20 09:12:59 +00:00

You might try Icinga 2. The event scheduler handling was (like the rest) rewritten from scratch, and Icinga 2 won't forget about tasks being handled.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jul 19, 2014

Updated by mfriedrich on 2014-07-19 12:51:08 +00:00

I guess that's a long-term won't fix unless someone comes up with a patch that works here. If there is none in the next months, I'll close the issue.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Aug 20, 2014

Updated by mfriedrich on 2014-08-20 20:45:03 +00:00

  • Duplicated set to 6971
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Mar 12, 2015

Updated by mfriedrich on 2015-03-12 19:14:02 +00:00

  • Status changed from Feedback to Rejected

Closing as wontfix. If you come up with a patch, we can re-open the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.