Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
[dev.icinga.com #11020] Master reloads with agents generate false alarms #3871
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11020
Created by tgelf on 2016-01-22 15:23:50 +00:00
The most convenient configuration variant for Icinga 2 Agents are command endpoints. In such an environment we generate a lot of superfluous state changes (ok/unknown/ok). I didn't try it out, but I guess on slow reloads combined with typical retry_interval settings this would allow one to reach a hard state pretty fast, resulting in false alarms. And even if not, this causes overhead in the IDO, might influence SLA reports and so on. We need some kind of "reload awareness" or grace period to handle this.
2016-02-08 08:46:01 +00:00 by (unknown) 6d5014b
2016-02-23 09:51:12 +00:00 by (unknown) b8195be
Updated by ziaunys on 2016-01-28 18:08:11 +00:00
I just started to encounter this issue. I'm not sure if it's because I have a lot of agents now. There are a total of 270. In my environment when Puppet runs and adds a new host it will reload and most of the agent cluster-zone checks will fail once. I have 2 attempts set so some times a handful of checks will fail and page our on-call person which is confusing because it's usually a random set of hosts and it looks like a bunch of hosts have gone down from their perspective.