Skip to content
This repository has been archived by the owner. It is now read-only.

[dev.icinga.com #2536] scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #944

Closed
icinga-migration opened this issue Apr 22, 2012 · 18 comments
Milestone

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented Apr 22, 2012

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2536

Created by Wolfgang on 2012-04-22 13:47:24 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-04-30 16:07:33 +00:00)
Target Version: 1.7
Last Update: 2012-04-30 16:07:33 +00:00 (in Redmine)


Scheduling a flexible downtime which is not ended within the defined duration the counter scheduled_downtime_depth is falsely incremented using the interval which is defined as duration.

Example:
A flexible downtime for a host was scheduled from 15:30 to 15:45 for a duration of three minutes. The host was already down so the downtime starts at the given time.

The following shell script was run every minute via crontab to get the current values:

#!/bin/bash
date >> /usr/local/icinga/var/depth.log
grep depth /usr/local/icinga/var/status.dat | sort -u >> /usr/local/icinga/var/depth.log

The result is:

Sun Apr 22 15:30:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Sun Apr 22 15:31:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Sun Apr 22 15:32:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Sun Apr 22 15:33:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Sun Apr 22 15:34:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=2
Sun Apr 22 15:35:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=2
Sun Apr 22 15:36:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=2
Sun Apr 22 15:37:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=3
Sun Apr 22 15:38:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=3
Sun Apr 22 15:39:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=3
Sun Apr 22 15:40:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=4
Sun Apr 22 15:41:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=4
Sun Apr 22 15:42:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=4
Sun Apr 22 15:43:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=5
Sun Apr 22 15:44:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=5
Sun Apr 22 15:45:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=5
Sun Apr 22 15:46:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
    scheduled_downtime_depth=4

The counter is incremented every three minutes (duration of the flexible downtime) although no other downtime is planned for the host. The counter is decremented at the end of the downtime period and keeps its value.

Attachments

Changesets

2012-04-22 18:08:41 +00:00 by mfriedrich 4bbd7e8

add more debug logging to downtime start/end to help debug #2536

refs #2536

2012-04-22 20:02:28 +00:00 by mfriedrich f03dbcd

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

2012-04-22 20:52:41 +00:00 by mfriedrich 8d315d0

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

2012-04-23 11:37:19 +00:00 by mfriedrich 2bfc1d4

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

2012-04-28 08:49:12 +00:00 by mfriedrich 95e0400

add more debug logging to downtime start/end to help debug #2536

refs #2536

2012-04-28 08:52:20 +00:00 by mfriedrich 51997db

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

Conflicts:

	Changelog

2012-04-28 08:53:29 +00:00 by mfriedrich dc1569b

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

Conflicts:

	Changelog

2012-04-28 08:56:48 +00:00 by mfriedrich 8422f24

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by Wolfgang on 2012-04-22 14:51:31 +00:00

  • File added icinga.debug.gz

Attached a debug file using debug_level=520 (events and downtime), debug_verbosity=2.
It shows six additional "Scheduled Downtime Event" entries starting at 16:24:36 resulting in scheduled_downtime_depth=7 instead of remaining at ...depth=1.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 17:24:36 +00:00

hmmm, my guess is that the first location where it detects an ending downtime, may be wrong.

common/downtime.c

        /* have we come to the end of the scheduled downtime? */
        if (temp_downtime->is_in_effect == TRUE && current_time >= temp_downtime->end_time) {

if hitting that section, the downtime would be decremented as needed.

                /* decrement the downtime depth variable */
                if (temp_downtime->type == HOST_DOWNTIME)
                        hst->scheduled_downtime_depth--;

if the if condition does not match, you will fall into the else tree, where the downtime_depth gets incremented.

        /* else we are just starting the scheduled downtime */
        else {

there go the startup fixes by ricardo, which may be cleared and reported correctly after restart then.

depth is incremented, in_effect is set.

                /* increment the downtime depth variable */
                if (temp_downtime->type == HOST_DOWNTIME)
                        hst->scheduled_downtime_depth++;
                else
                        svc->scheduled_downtime_depth++;

                /* set the in effect flag */
                temp_downtime->is_in_effect = TRUE;

given the times

[1335104355.549157] [512.0] [pid=14802] Scheduled Downtime Details:
[1335104355.549164] [512.0] [pid=14802]  Type:        Host Downtime
[1335104355.549168] [512.0] [pid=14802]  Host:        localhost
[1335104355.549172] [512.0] [pid=14802]  Fixed/Flex:  Flexible
[1335104355.549176] [512.0] [pid=14802]  Start:       04-22-2012 16:20:00
[1335104355.549180] [512.0] [pid=14802]  End:         04-22-2012 16:40:00
[1335104355.549184] [512.0] [pid=14802]  Duration:    0h 3m 0s
[1335104355.549188] [512.0] [pid=14802]  Downtime ID: 1
[1335104355.549191] [512.0] [pid=14802]  Trigger ID:  0

[1335104496.188777] [512.0] [pid=14802] Flexible downtime (id=1) for host 'localhost' starting now...
[1335104496.188796] [512.0] [pid=14802] Host 'localhost' has entered a period of scheduled downtime (id=1).

Sun, 22 Apr 2012 16:21:36 GMT+1

the first match is ok, but the rest fails. since there's debug log missing, i'll add one, while checking why the if condition fails. maybe this requires a change on the "if" diverging a bit, or the "else" condition being changed as well.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 17:35:16 +00:00

# Values: 
#          -1 = Everything
#          0 = Nothing
#          1 = Functions
#          2 = Configuration
#          4 = Process information
#          8 = Scheduled events
#          16 = Host/service checks
#          32 = Notifications
#          64 = Event broker
#          128 = External commands
#          256 = Commands
#          512 = Scheduled downtime
#          1024 = Comments
#          2048 = Macros
debug_level=513

[1335115843.061482] [512.0] [pid=9908] Scheduled Downtime Details:
[1335115843.061485] [512.0] [pid=9908]  Type:        Host Downtime
[1335115843.061487] [512.0] [pid=9908]  Host:        localhost
[1335115843.061490] [512.0] [pid=9908]  Fixed/Flex:  Flexible
[1335115843.061492] [512.0] [pid=9908]  Start:       04-22-2012 19:28:59
[1335115843.061494] [512.0] [pid=9908]  End:         04-22-2012 19:48:59
[1335115843.061497] [512.0] [pid=9908]  Duration:    3h 0m 0s
[1335115843.061499] [512.0] [pid=9908]  Downtime ID: 1
[1335115843.061502] [512.0] [pid=9908]  Trigger ID:  0

now let's go to the webinterface again, disable active checks for the host, and submit a passive one.

[1335115993.081317] [512.0] [pid=9908] Flexible downtime (id=1) for host 'localhost' starting now...
[1335115993.081323] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=1) with depth=0, endtime=1335116939, duration=10800.
[1335115993.081326] [512.0] [pid=9908] Host 'localhost' has entered a period of scheduled downtime (id=1).

if i am right, the next event for the scheduled downtime expire is scheduled now+3minutes, where the next event (not check!) will happen and then tell us a bit more.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 17:52:07 +00:00

grml mistake, 3h instead of 3 min.

[1335116443.054191] [512.0] [pid=9908] Scheduled Downtime Details:
[1335116443.054194] [512.0] [pid=9908]  Type:        Host Downtime
[1335116443.054197] [512.0] [pid=9908]  Host:        localhost
[1335116443.054199] [512.0] [pid=9908]  Fixed/Flex:  Flexible
[1335116443.054202] [512.0] [pid=9908]  Start:       04-22-2012 19:40:17
[1335116443.054204] [512.0] [pid=9908]  End:         04-22-2012 20:00:00
[1335116443.054206] [512.0] [pid=9908]  Duration:    0h 3m 0s
[1335116443.054209] [512.0] [pid=9908]  Downtime ID: 2
[1335116443.054211] [512.0] [pid=9908]  Trigger ID:  0
[1335116473.109389] [512.0] [pid=9908] Flexible downtime (id=2) for host 'localhost' starting now...
[1335116473.109396] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=0, endtime=1335117600, duration=180.
[1335116473.109410] [512.0] [pid=9908] Host 'localhost' has entered a period of scheduled downtime (id=2).

1335116473 = Sun, 22 Apr 2012 19:41:13 GMT+2

winner.

[1335116473.109389] [512.0] [pid=9908] Flexible downtime (id=2) for host 'localhost' starting now...
[1335116473.109396] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=0, endtime=1335117600, duration=180.
[1335116473.109410] [512.0] [pid=9908] Host 'localhost' has entered a period of scheduled downtime (id=2).
[1335116653.009806] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=1, endtime=1335117600, duration=180.
[1335116833.014835] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=2, endtime=1335117600, duration=180.
[1335117013.009045] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=3, endtime=1335117600, duration=180.
[1335117193.158962] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=4, endtime=1335117600, duration=180.
[1335117373.013955] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=5, endtime=1335117600, duration=180.
[1335117553.003186] [512.0] [pid=9908] Host 'localhost' starting flexible scheduled downtime (id=2) with depth=6, endtime=1335117600, duration=180.
[1335117733.009258] [512.0] [pid=9908] Host 'localhost' ending flexible scheduled downtime (id=2) with depth=7, endtime=1335117600, duration=180.

this is wrong, means a rescheduled downtime then.

current_time = 1335116653
end_time = 1335117600

downtime is in_effect, and current time is not greater than the end_time, so the "else" tree matches, redoing the downtime.

the actual end_time of the downtime is sort of precalculated for flexible downtimes - it takes the current time plus duration.
which is in this cases within the end-start window, and cannot be the condition for calculating the "when does it really end" for flexible downtimes.

so what we now know - a flexible downtime with a (duration < end-start) match cannot be triggered with (in_effect && current>end).
so we basically need an elseif implementation, where we introduce a new condition for the flexible downtime. or we adjust the first one to only be true for fixed+curr>end and add flexible+(entry_time+duration>current_time)

need to enhance the debug log for entry_time.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 17:59:57 +00:00

another brain summary.

basically this means now, that there are two test conditions for flexible downtimes.

  1. duration >= endtime-starttime
  2. duration < endtime-starttime

if you hit 1), it will schedule the event for expiration just like a normal fixed downtime, happening a bit afterwards - which won't trigger the else tree.

if you hit 2) because of a minimal duration window, the scheduled expiration event is current_time+duration, not hitting the set endtime yet, leading to the "else" tree, and then incrementing the counter until the actual endtime is hit somewhere in the future.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 18:25:24 +00:00

enhanced debuglog

[1335117981.010313] [512.0] [pid=32752] Scheduled Downtime Details:
[1335117981.010316] [512.0] [pid=32752]  Type:        Host Downtime
[1335117981.010319] [512.0] [pid=32752]  Host:        localhost
[1335117981.010321] [512.0] [pid=32752]  Fixed/Flex:  Flexible
[1335117981.010324] [512.0] [pid=32752]  Start:       04-22-2012 20:05:53
[1335117981.010327] [512.0] [pid=32752]  End:         04-22-2012 20:25:53
[1335117981.010329] [512.0] [pid=32752]  Duration:    0h 3m 0s
[1335117981.010332] [512.0] [pid=32752]  Downtime ID: 3
[1335117981.010334] [512.0] [pid=32752]  Trigger ID:  0
[1335118006.243759] [512.0] [pid=32752] Flexible downtime (id=3) for host 'localhost' starting now...
[1335118006.243765] [512.0] [pid=32752] Host 'localhost' starting flexible scheduled downtime (id=3) with depth=0, starttime=1335117953, entrytime=1335117977, endtime=1335119153, duration=180.
[1335118006.243769] [512.0] [pid=32752] Host 'localhost' has entered a period of scheduled downtime (id=3).

[1335118366.030741] [512.0] [pid=32752] Host 'localhost' starting flexible scheduled downtime (id=3) with depth=2, starttime=1335117953, entrytime=1335117977, endtime=1335119153, duration=180.
[1335118546.019234] [512.0] [pid=32752] Host 'localhost' starting flexible scheduled downtime (id=3) with depth=3, starttime=1335117953, entrytime=1335117977, endtime=1335119153, duration=180.
[1335118726.009602] [512.0] [pid=32752] Host 'localhost' starting flexible scheduled downtime (id=3) with depth=4, starttime=1335117953, entrytime=1335117977, endtime=1335119153, duration=180.
[1335118906.025225] [512.0] [pid=32752] Host 'localhost' starting flexible scheduled downtime (id=3) with depth=5, starttime=1335117953, entrytime=1335117977, endtime=1335119153, duration=180.

and this leads to another problem. entry_time is NOT what you expect from it. it's actually the time the command for scheduling a downtime is sent. it is NOT the time when a downtime is actually triggered.

so the core won't know when a flexible downtime is triggered, it just assumes that within the start and end time windows, with some given duration, it will go wild, and the least downtime cancel will be end_time.

so with that concept, one could not fix that bug.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:32:26 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Target Version set to 1.7

while working on adding the trigger_time in #2537 i've been reading the docs on flexible downtime. it clearly says "lasts duration" and not "lasts forever".

"Flexible" downtime is intended for times when you know that a host or service is going to be down for X minutes (or hours), but you don't know exactly when that'll start. When you schedule flexible downtime, Icinga will start the scheduled downtime sometime between the start and end times you specified. The downtime will last for as long as the duration you specified when you scheduled the downtime. This assumes that the host or service for which you scheduled flexible downtime either goes down (or becomes unreachable) or goes into a non-OK state sometime between the start and end times you specified. The time at which a host or service transitions to a problem state determines the time at which Icinga actually starts the downtime. The downtime will then last for the duration you specified, even if the host or service recovers before the downtime expires. This is done for a very good reason. As we all know, you might think you've got a problem fixed, but then have to restart a server ten times before it actually works right. Smart, eh?
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:41:20 +00:00

to clarify what we do for fixing #2536, as stated in #2537

        /* have we come to the end of the scheduled downtime? */
        if (temp_downtime->is_in_effect == TRUE && ( /* downtime needs to be in effect and ... */
                (temp_downtime->fixed == TRUE && current_time >= temp_downtime->end_time) || /* fixed downtime, endtime means end of downtime */
                (temp_downtime->fixed == FALSE && current_time >= (temp_downtime->trigger_time+temp_downtime->duration)) /* flexible downtime, endtime of downtime is trigger_time+duration */
                )){

if we happen to trigger the flexible downtime, we check if the currenttime is greater equal than trigger_time (time when the flex downtime started) plus added the duration it lasts. so we can be sure about the 1x duration it should last, and can safely expire the downtime.

so in order to fix this issue here, we must implement the change in #2537

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:43:10 +00:00

basic test with a 5minute fixed downtime.

[1335122890.014894] [512.0] [pid=2290] Host 'localhost' ending flexible scheduled downtime (id=4) with depth=1, starttime=1335122648, entrytime=1335122669, triggertime=1335122710, endtime=1335123848, duration=180.
[1335122890.014901] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=4).
[1335123320.055726] [512.0] [pid=2290] Scheduled Downtime Details:
[1335123320.055729] [512.0] [pid=2290]  Type:        Host Downtime
[1335123320.055731] [512.0] [pid=2290]  Host:        localhost
[1335123320.055734] [512.0] [pid=2290]  Fixed/Flex:  Fixed
[1335123320.055736] [512.0] [pid=2290]  Start:       04-22-2012 21:35:02
[1335123320.055739] [512.0] [pid=2290]  End:         04-22-2012 21:40:02
[1335123320.055741] [512.0] [pid=2290]  Duration:    0h 5m 0s
[1335123320.055744] [512.0] [pid=2290]  Downtime ID: 5
[1335123320.055746] [512.0] [pid=2290]  Trigger ID:  0
[1335123320.131192] [512.0] [pid=2290] Host 'localhost' starting fixed scheduled downtime (id=5) with depth=0, starttime=1335123302, entrytime=1335123316, endtime=1335123602, duration=300.
[1335123320.131197] [512.0] [pid=2290] Host 'localhost' has entered a period of scheduled downtime (id=5) at triggertime=1335123320.
[1335123602.007077] [512.0] [pid=2290] Host 'localhost' ending fixed scheduled downtime (id=5) with depth=1, starttime=1335123302, entrytime=1335123316, triggertime=1335123320, endtime=1335123602, duration=300.
[1335123602.007085] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=5).
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:44:15 +00:00

test with a 3min flexible downtime in a 20min window.

[1335122669.965720] [512.0] [pid=2290] Scheduled Downtime Details:
[1335122669.965723] [512.0] [pid=2290]  Type:        Host Downtime
[1335122669.965725] [512.0] [pid=2290]  Host:        localhost
[1335122669.965728] [512.0] [pid=2290]  Fixed/Flex:  Flexible
[1335122669.965731] [512.0] [pid=2290]  Start:       04-22-2012 21:24:08
[1335122669.965733] [512.0] [pid=2290]  End:         04-22-2012 21:44:08
[1335122669.965736] [512.0] [pid=2290]  Duration:    0h 3m 0s
[1335122669.965738] [512.0] [pid=2290]  Downtime ID: 4
[1335122669.965741] [512.0] [pid=2290]  Trigger ID:  0
[1335122710.055156] [512.0] [pid=2290] Flexible downtime (id=4) for host 'localhost' starting now...
[1335122710.055162] [512.0] [pid=2290] Host 'localhost' starting flexible scheduled downtime (id=4) with depth=0, starttime=1335122648, entrytime=1335122669, endtime=1335123848, duration=180.
[1335122710.055166] [512.0] [pid=2290] Host 'localhost' has entered a period of scheduled downtime (id=4) at triggertime=1335122710.
[1335122890.014894] [512.0] [pid=2290] Host 'localhost' ending flexible scheduled downtime (id=4) with depth=1, starttime=1335122648, entrytime=1335122669, triggertime=1335122710, endtime=1335123848, duration=180.
[1335122890.014901] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=4).
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:53:18 +00:00

test with a 5min flexible downtime in a 5min window (which stands for the default, the gui says, but for 2h)

[1335123965.004528] [512.0] [pid=2290] Scheduled Downtime Details:
[1335123965.004531] [512.0] [pid=2290]  Type:        Host Downtime
[1335123965.004533] [512.0] [pid=2290]  Host:        localhost
[1335123965.004536] [512.0] [pid=2290]  Fixed/Flex:  Flexible
[1335123965.004539] [512.0] [pid=2290]  Start:       04-22-2012 21:45:42
[1335123965.004542] [512.0] [pid=2290]  End:         04-22-2012 21:50:42
[1335123965.004544] [512.0] [pid=2290]  Duration:    0h 5m 0s
[1335123965.004547] [512.0] [pid=2290]  Downtime ID: 6
[1335123965.004549] [512.0] [pid=2290]  Trigger ID:  0
[1335123990.076185] [512.0] [pid=2290] Flexible downtime (id=6) for host 'localhost' starting now...
[1335123990.076195] [512.0] [pid=2290] Host 'localhost' starting flexible scheduled downtime (id=6) with depth=0, starttime=1335123942, entrytime=1335123963, endtime=1335124242, duration=300.
[1335123990.076199] [512.0] [pid=2290] Host 'localhost' has entered a period of scheduled downtime (id=6) at triggertime=1335123990.
[1335124290.196179] [512.0] [pid=2290] Host 'localhost' ending flexible scheduled downtime (id=6) with depth=1, starttime=1335123942, entrytime=1335123963, triggertime=1335123990, endtime=1335124242, duration=300.
[1335124290.196199] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=6).
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 20:09:32 +00:00

  • Subject changed from scheduled_downtime_depth falsely incremented if in flexible downtime to scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 23, 2012

Updated by mfriedrich on 2012-04-23 00:05:44 +00:00

  • Status changed from Assigned to Feedback
  • Done % changed from 0 to 90

requires more tests.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 26, 2012

Updated by melle on 2012-04-26 13:08:00 +00:00

Seems to work for me, though I'm not sure if the results are what the "downtime experts" expect them to be.
I ran the follwing tests using an already-down host.

Results in detail:

  • fixed downtime from START_TIME to END_TIME --> Host entered downtime instantly (~START_TIME) and exited from downtime at END_TIME
  • flexible downtime of 3min in 7min window between START_TIME to END_TIME --> host entered downtime instantly (~START_TIME) and exited from downtime at START_TIME + 3min
  • flexible downtime of 5min in 3min window between START_TIME to END_TIME --> host entered downtime instantly (~START_TIME) and exited from downtime at START_TIME + 5min (which was after END_TIME)
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 27, 2012

Updated by Frankstar on 2012-04-27 09:44:48 +00:00

Downtime test

04-27-2012 11:31:05 Icinga Admin    2536    04-27-2012 11:35:00 04-27-2012 11:55:00 Flexible    01-01-1970 01:00:00 0d 0h 5m 0s False   25  N/A

dept.log

Fri Apr 27 11:35:01 CEST 2012
        scheduled_downtime_depth=0
Fri Apr 27 11:36:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:37:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:38:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:39:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:40:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:41:01 CEST 2012
        scheduled_downtime_depth=0
Fri Apr 27 11:42:01 CEST 2012
        scheduled_downtime_depth=0
Fri Apr 27 11:43:01 CEST 2012
        scheduled_downtime_depth=0

also the gui displayed everything correct.

next step, fix downtime.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 27, 2012

Updated by Frankstar on 2012-04-27 10:06:30 +00:00

Downtime test, fix downtime
Info: only tested flexible and fix downtime with classic-gui !

04-27-2012 11:45:57 Icinga Admin    fix down,2536   04-27-2012 11:50:00 04-27-2012 12:05:00 Fixed   01-01-1970 01:00:00 0d 0h 15m 0s    False   33

depth.log

Fri Apr 27 11:50:01 CEST 2012
        scheduled_downtime_depth=0
Fri Apr 27 11:51:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:52:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:53:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:54:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:55:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:56:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:57:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:58:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 11:59:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:00:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:01:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:02:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:03:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:04:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:05:01 CEST 2012
        scheduled_downtime_depth=0
        scheduled_downtime_depth=1
Fri Apr 27 12:06:01 CEST 2012
        scheduled_downtime_depth=0

seems to work fine.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 27, 2012

Updated by Frankstar on 2012-04-27 11:11:19 +00:00

flexible downtime test, 1h, 7min dur., with up/down/up/down simulation.
host was down before scheduling.

04-27-2012 12:07:18 Icinga Admin    flexible,1h,7min dur., #2536    04-27-2012 12:10:00 04-27-2012 13:10:00 Flexible    01-01-1970 01:00:00 0d 0h 7m 0s False   41

send up: 12:18
send down: 12:38
send up: 12:47
send down: 13:00

depth.log

Fri Apr 27 12:09:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:10:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:11:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:12:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:13:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:14:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:15:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:16:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:17:01 CEST 2012
    scheduled_downtime_depth=0
    scheduled_downtime_depth=1
Fri Apr 27 12:18:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:19:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:20:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:21:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:22:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:23:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:24:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:25:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:26:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:27:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:28:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:29:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:30:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:31:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:32:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:33:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:34:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:35:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:36:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:37:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:38:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:39:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:40:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:41:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:42:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:43:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:44:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:45:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:46:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:47:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:48:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:49:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:50:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:51:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:52:02 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:53:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:54:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:55:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:56:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:57:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:58:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 12:59:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:00:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:01:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:02:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:03:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:04:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:05:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:06:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:07:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:08:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:09:01 CEST 2012
    scheduled_downtime_depth=0
Fri Apr 27 13:10:01 CEST 2012
    scheduled_downtime_depth=0

worked fine. no more testing by me.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Apr 30, 2012

Updated by mfriedrich on 2012-04-30 16:07:33 +00:00

  • Status changed from Feedback to Resolved
  • Done % changed from 90 to 100

got some reports by Thomas, so it looks good to me for now. thanks for testing!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
1 participant
You can’t perform that action at this time.