Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871

Closed
icinga-migration opened this issue Jan 22, 2016 · 8 comments
Closed

Comments

@icinga-migration
Copy link
Member

@icinga-migration icinga-migration commented Jan 22, 2016

This issue has been migrated from Redmine: https://dev.icinga.com/issues/11020

Created by tgelf on 2016-01-22 15:23:50 +00:00

Assignee: gbeutner
Status: Resolved (closed on 2016-02-23 09:59:37 +00:00)
Target Version: 2.4.2
Last Update: 2016-02-23 09:59:54 +00:00 (in Redmine)

Icinga Version: 2.4.1
Backport?: Already backported
Include in Changelog: 1

The most convenient configuration variant for Icinga 2 Agents are command endpoints. In such an environment we generate a lot of superfluous state changes (ok/unknown/ok). I didn't try it out, but I guess on slow reloads combined with typical retry_interval settings this would allow one to reach a hard state pretty fast, resulting in false alarms. And even if not, this causes overhead in the IDO, might influence SLA reports and so on. We need some kind of "reload awareness" or grace period to handle this.

Best,
Thomas

Changesets

2016-02-08 08:46:01 +00:00 by (unknown) 6d5014b

Increase grace period for agent-based checks

refs #11020

2016-02-23 09:51:12 +00:00 by (unknown) b8195be

Increase grace period for agent-based checks

refs #11020
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jan 25, 2016

Updated by mfriedrich on 2016-01-25 09:59:19 +00:00

  • Category set to Cluster
  • Target Version set to 2.5.0
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jan 25, 2016

Updated by mfriedrich on 2016-01-25 10:30:57 +00:00

  • Target Version changed from 2.5.0 to 2.4.2
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Jan 28, 2016

Updated by ziaunys on 2016-01-28 18:08:11 +00:00

tgelf wrote:

The most convenient configuration variant for Icinga 2 Agents are command endpoints. In such an environment we generate a lot of superfluous state changes (ok/unknown/ok). I didn't try it out, but I guess on slow reloads combined with typical retry_interval settings this would allow one to reach a hard state pretty fast, resulting in false alarms. And even if not, this causes overhead in the IDO, might influence SLA reports and so on. We need some kind of "reload awareness" or grace period to handle this.

Best,
Thomas

I just started to encounter this issue. I'm not sure if it's because I have a lot of agents now. There are a total of 270. In my environment when Puppet runs and adds a new host it will reload and most of the agent cluster-zone checks will fail once. I have 2 attempts set so some times a handful of checks will fail and page our on-call person which is confusing because it's usually a random set of hosts and it looks like a bunch of hosts have gone down from their perspective.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Feb 5, 2016

Updated by mfriedrich on 2016-02-05 13:59:51 +00:00

  • Status changed from New to Assigned
  • Assigned to set to mfriedrich
  • Priority changed from Normal to High
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Feb 5, 2016

Updated by mfriedrich on 2016-02-05 14:11:06 +00:00

I'll take a look into it, per customer requirement.

@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Feb 8, 2016

Updated by mfriedrich on 2016-02-08 12:46:44 +00:00

  • Assigned to changed from mfriedrich to gbeutner
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Feb 23, 2016

Updated by gbeutner on 2016-02-23 09:59:37 +00:00

  • Status changed from Assigned to Resolved
@icinga-migration
Copy link
Member Author

@icinga-migration icinga-migration commented Feb 23, 2016

Updated by gbeutner on 2016-02-23 09:59:54 +00:00

  • Backport? changed from Not yet backported to Already backported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.