Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #9897] First SOFT state is recognized as second SOFT state #3260

Closed
icinga-migration opened this issue Aug 13, 2015 · 9 comments
Closed
Labels
bug
Milestone

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented Aug 13, 2015

This issue has been migrated from Redmine: https://dev.icinga.com/issues/9897

Created by mwaldmueller on 2015-08-13 11:05:44 +00:00

Assignee: mfriedrich
Status: Resolved (closed on 2015-08-21 08:28:44 +00:00)
Target Version: 2.3.9
Last Update: 2015-08-21 08:28:44 +00:00 (in Redmine)

Icinga Version: 2.3.8
Backport?: Already backported
Include in Changelog: 1

My setup:

  • checker zone with 3 nodes
  • master zone with 1 node as parent zone

Configuration for the service in the checker zone:

template Service "generic-service" {
  max_check_attempts = 3
  check_interval = 10m
  retry_interval = 2m

  enable_notifications = true
  enable_flapping = true
  enable_perfdata = true

  import "pnp-svc"
}

template Service "pnp-svc" {
  action_url = "/pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/popup?host=$HOSTNAME$&srv=$SERVICEDESC$"
}

object Service "win-disks" {
    import "generic-service"
    host_name = "s999vmscd05"
    check_command = "nsclient-disk-all-mit-wmi"
    groups = [ "win-disks" ]
    enable_notifications = 0
    vars.ARG1 = "95%"
    vars.ARG2 = "98%"
    vars.ARG3 = ""
    vars.ARG4 = "25GB"
    vars.ARG5 = "20GB"
}

Logentry in Icinga 2 debug log:

[2015-08-13 13:00:32 +0200] notice/Checkable: State Change: Checkable s999vmscd05!win-disks soft state change from OK to WARNING detected.

Problem is that the first state change is recognized as second SOFT state (2/3), but it should be (1/3). So the order is the following:
SOFT (2/3)
SOF T (3/3)
HARD (1/3)

Changesets

2015-08-21 08:24:49 +00:00 by mfriedrich 6f252bb

Don't increment check attempt counter on OK->NOT-OK transition

This fixes the problem that the first SOFT state is actually considered
the second state.

refs #7287
fixes #9897

Signed-off-by: Michael Friedrich <michael.friedrich@netways.de>

Relations:

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 13, 2015

Updated by mwaldmueller on 2015-08-13 12:50:12 +00:00

There are no SOFT1 states for CRITICAL, WARNING or UNKNOWN.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 13, 2015

Updated by mwaldmueller on 2015-08-13 13:02:24 +00:00

Examples for missing SOFT1 state:

[08-11-2015 14:22:41] SERVICE ALERT: s999vmscd05;win-disks;CRITICAL;HARD;1;CRITICAL: C:\ (SYSTEM) is 99.61% full (139.54MB free)
[08-11-2015 14:20:54] SERVICE ALERT: s999vmscd05;win-disks;CRITICAL;SOFT;3;CRITICAL: C:\ (SYSTEM) is 99.61% full (139.54MB free)
[08-11-2015 14:10:32] SERVICE ALERT: s999vmscd05;win-disks;CRITICAL;SOFT;2;CRITICAL: C:\ (SYSTEM) is 99.72% full (100.49MB free)

But there are SOFT1 states for OK's, I guess this is unnecessary because there should only be a HARD1 state:

[08-11-2015 16:04:04] SERVICE ALERT: centera3_an4;Health;OK;HARD;1;OK: 6 online Nodes. 2 online Switches.
[08-11-2015 16:03:56] SERVICE ALERT: centera3_an4;Health;OK;SOFT;1;OK: 6 online Nodes. 2 online Switches.
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 17, 2015

Updated by mfriedrich on 2015-08-17 18:47:39 +00:00

  • Relates set to 7287
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 17, 2015

Updated by mfriedrich on 2015-08-17 18:48:36 +00:00

  • Category changed from Cluster to libicinga
  • Status changed from New to Feedback
  • Assigned to set to mwaldmueller

Probably related to #7287 and not really a cluster problem. Please re-test with the snapshot packages where the patch is already applied from #7287.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 18, 2015

Updated by mwaldmueller on 2015-08-18 06:51:04 +00:00

It works as expected with the current snapshot, thank you very much!!

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 18, 2015

Updated by mfriedrich on 2015-08-18 07:20:26 +00:00

Ok, thanks for the fast tests. So we might consider splitting up #7287 (2 issues: retry interval, check attempts).

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 18, 2015

Updated by mfriedrich on 2015-08-18 16:13:35 +00:00

  • Status changed from Feedback to Assigned
  • Assigned to changed from mwaldmueller to mfriedrich
  • Target Version set to 2.4.0
  • Estimated Hours set to 2
@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 21, 2015

Updated by mfriedrich on 2015-08-21 08:21:16 +00:00

  • Target Version changed from 2.4.0 to 2.3.9
  • Backport? changed from TBD to Yes

The fix is already in master, I'll cherry-pick this one into 2.3.9.

@icinga-migration

This comment has been minimized.

Copy link
Member Author

@icinga-migration icinga-migration commented Aug 21, 2015

Updated by mfriedrich on 2015-08-21 08:28:44 +00:00

  • Status changed from Assigned to Resolved
  • Done % changed from 0 to 100

Applied in changeset 6f252bb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.