[dev.icinga.com #2537] add trigger_time to downtimes to allow calculating of flexible downtimes endtime #945

Closed
icinga-migration opened this Issue Apr 22, 2012 · 5 comments

Comments

Projects
None yet
1 participant
Member

icinga-migration commented Apr 22, 2012

This issue has been migrated from Redmine: https://dev.icinga.com/issues/2537

Created by mfriedrich on 2012-04-22 18:30:42 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2012-04-27 17:14:56 +00:00)
Target Version: 1.7
Last Update: 2012-04-27 17:14:56 +00:00 (in Redmine)


as we have learned in #2536 the core won't keep track about the downtime trigger time, but only the start, end, entry times are kept.

the problem is that on a flexible downtime with a duration less that end-start time, this will result in relooping/rescheduling the short flexible downtime for the host/service even if this has ended after one duration.

in order to allow fixing #2536, we need to add an entry to the downtime section - my proposal is "trigger_time" which gets populated only once - when the downtime is started, when downtime ends, this must be reset to 0L.

as this an objects change again, it needs to be kept the compatible abi breaking workaround at the end of the downtime struct.

furthermore, this requires changes for the event broker, as well idoutils recognizing the trigger_time as well.

Changesets

2012-04-22 20:02:28 +00:00 by mfriedrich f03dbcd

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

2012-04-22 20:52:41 +00:00 by mfriedrich 8d315d0

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

2012-04-22 21:23:58 +00:00 by mfriedrich e17125c

fix copy paste error in reading trigger_time from status.dat #2537

refs #2537

2012-04-23 00:06:38 +00:00 by mfriedrich b7a29ac

idoutils: add is_in_effect and trigger_time to scheduleddowntime and downtimehistory tables #2539 - MF

requires change on doing neb callback after
having fetched all necessary data when
starting/triggering a downtime.

all 3 rdbms get scheduleddowntime and
downtimehistory populated.

db sqls and upgrade scripts require tests!

refs #2537
refs #2539

2012-04-23 11:37:19 +00:00 by mfriedrich 2bfc1d4

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

2012-04-28 08:52:20 +00:00 by mfriedrich 51997db

core: add trigger_time to downtimes to allow calculating of flexible downtimes endtime #2537

in order to fix #2536 we must introduce that as a
feature. fetching the actual time when a downtime
is started (triggered) is mandatory for calculating
the flexible downtimes, especially when the duration
is less than end-start time.
the bug found in #2536 actually does not trigger the
endtime after trigger time + duration (because it is
unknown), but waits for the end time provided by the
external command.
this leads into rescheduling the flexible downtime
looping duration by duration til endtime, incrementing
the scheduled_downtime_depth counter as long as
possible.
the docs cleary state that a flexible downtime only
lasts one duration, and then exits. the overlapping
is NOT what we want.

furthermore, introducing this as a basis for fixing
issue #2536 we can actually re-use that in future
changes to show the user the actual time a downtime
was entered - and not the start time which could
be somewhere in the past.

entry_time is only the time when the command was
sent and does not help here.

refs #2537
refs #2536

Conflicts:

	Changelog

2012-04-28 08:53:29 +00:00 by mfriedrich dc1569b

core: fix scheduled_downtime_depth falsely incremented if in flexible downtime with duration < end-starttime window #2536

since we now got support for the trigger_time
of a scheduled downtime, we can now decide if
a flexible downtime is to be ended after trigger
time + duration, or not.

adding further tap tests is currently not possible
as we would have to workaround the not-to-be-found
host in skiplist in handle_scheduled_downtime which
is a pita and requires more rework on clearly
abstracted functionality, beyond the checks on hosts
and services existing, to create the relation to
the downtime being handled.

the issue #2536 holds further analysis and debug logs
on the tests, which has been enhanced in #2537 as well.

refs #2536
refs #2537

Conflicts:

	Changelog

2012-04-28 08:56:48 +00:00 by mfriedrich 8422f24

fix int vs unsigned long mismatches from previous commits #2536 #2537

refs #2536
refs #2537

Relations:

Member

icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:29:36 +00:00

adding this requires further changes.

  • xssdefault.c - save and read statusdata
  • xrddefault.c - save and read retained data over restarts
  • nebstructs.h+broker.h+broker.c - send trigger_time to neb api+struct so neb modules can match on that
    • with that, we will add is_in_effect as well.
hostdowntime {
        host_name=localhost
        downtime_id=4
        entry_time=1335122669
        start_time=1335122648
        end_time=1335123848
        triggered_by=0
        fixed=0
        duration=180
        is_in_effect=1
        author=icinga
        comment=test flex fix
        trigger_time=1335122710
        }

[1335122669.965720] [512.0] [pid=2290] Scheduled Downtime Details:
[1335122669.965723] [512.0] [pid=2290]  Type:        Host Downtime
[1335122669.965725] [512.0] [pid=2290]  Host:        localhost
[1335122669.965728] [512.0] [pid=2290]  Fixed/Flex:  Flexible
[1335122669.965731] [512.0] [pid=2290]  Start:       04-22-2012 21:24:08
[1335122669.965733] [512.0] [pid=2290]  End:         04-22-2012 21:44:08
[1335122669.965736] [512.0] [pid=2290]  Duration:    0h 3m 0s
[1335122669.965738] [512.0] [pid=2290]  Downtime ID: 4
[1335122669.965741] [512.0] [pid=2290]  Trigger ID:  0
[1335122710.055156] [512.0] [pid=2290] Flexible downtime (id=4) for host 'localhost' starting now...
[1335122710.055162] [512.0] [pid=2290] Host 'localhost' starting flexible scheduled downtime (id=4) with depth=0, starttime=1335122648, entrytime=1335122669, endtime=1335123848, duration=180.
[1335122710.055166] [512.0] [pid=2290] Host 'localhost' has entered a period of scheduled downtime (id=4) at triggertime=1335122710.
[1335122890.014894] [512.0] [pid=2290] Host 'localhost' ending flexible scheduled downtime (id=4) with depth=1, starttime=1335122648, entrytime=1335122669, triggertime=1335122710, endtime=1335123848, duration=180.
[1335122890.014901] [512.0] [pid=2290] Host 'localhost' has exited from a period of scheduled downtime (id=4).
Member

icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:32:41 +00:00

  • Category changed from Downtimes to Scheduled Downtime
  • Priority changed from Normal to Urgent
Member

icinga-migration commented Apr 22, 2012

Updated by mfriedrich on 2012-04-22 19:37:29 +00:00

to clarify what we do for fixing #2536

        /* have we come to the end of the scheduled downtime? */
        if (temp_downtime->is_in_effect == TRUE && ( /* downtime needs to be in effect and ... */
                (temp_downtime->fixed == TRUE && current_time >= temp_downtime->end_time) || /* fixed downtime, endtime means end of downtime */
                (temp_downtime->fixed == FALSE && current_time >= (temp_downtime->trigger_time+temp_downtime->duration)) /* flexible downtime, endtime of downtime is trigger_time+duration */
                )){

if we happen to trigger the flexible downtime, we check if the currenttime is greater equal than trigger_time (time when the flex downtime started) plus added the duration it lasts. so we can be sure about the 1x duration it should last, and can safely expire the downtime.

this change requires further tests for all variants of course.

Member

icinga-migration commented Apr 23, 2012

Updated by mfriedrich on 2012-04-23 00:06:11 +00:00

  • Status changed from Assigned to Feedback
  • Done % changed from 0 to 90

tests required.

Member

icinga-migration commented Apr 27, 2012

Updated by mfriedrich on 2012-04-27 17:14:56 +00:00

  • Status changed from Feedback to Resolved

works for me, as it keeps up with a new attribute only. the if condition is done with #2536

icinga-migration added this to the 1.7 milestone Jan 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment