Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/var/log/icinga2/icinga2.log is growing very fast on satellites #6542

Closed
stevie-sy opened this issue Aug 10, 2018 · 26 comments
Closed

/var/log/icinga2/icinga2.log is growing very fast on satellites #6542

stevie-sy opened this issue Aug 10, 2018 · 26 comments
Labels
area/configuration DSL, parser, compiler, error handling area/log Logging related bug Something isn't working ref/IP ref/NC
Milestone

Comments

@stevie-sy
Copy link
Contributor

stevie-sy commented Aug 10, 2018

We have a HA configuration with 2 master nodes, 2 satellites in the Network Zone and 2 satellites in the Server Zone. For configuration we use the director.

Since some time - we noticed it after upgrade to Version 2.9.1 - that icinga2.log ist growing and growing very fast from one point to another. But only on the satellites in the Network Zone. It seems it happend after deployment with the director.

For example: that are times when there was a deplolyment today:

deployment_director

That is the visualisation with grafana where you can see the growing of the log file:
grafana

and direct in the bash:
filesystem

I think the problem are the ServiceSets which wie generate with the director (see also Icinga/icingaweb2-module-director#1589). Because it will deploy them at every node. So if the ServerSet is for example for the Server Zone, the satellite in the Network Zone won't find the devices to apply. And if I take a look into the icinga2.log at the time of the depolyment ist is full of Messages like this:
apply_rule

the only work arround is to empty the file with
> icinga2.log

Your Environment

•Director version (System - About): Git Master 71ad855
•Icinga Web 2 version and modules (System - About): 2.6.1
•Icinga 2 version (icinga2 --version): r2.9.1-1
•Operating System and version: CentOS 7
•Webserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

@dbu1986
Copy link

dbu1986 commented Aug 17, 2018

Awesome, opening an issue for this has been on my agenda, too.
+1!

Same here:

Your Environment

  • Director version (System - About): 1.5.1
  • Icinga Web 2 version and modules (System - About): 2.6.1
  • Icinga 2 version (icinga2 --version): 2.9.1
  • Operating System and version: RHEL 7.5

Currently we are "handling" this problem by creating a dedicated check_disk service for the logging volume /var, which executes an EventCommand @ warning free space <15%. The EventCommand calls a shell script which executes: > icinga2.log

Rotating & zipping the log may be another approach to handle this, depending on the need to keep the logs.

@dbu1986
Copy link

dbu1986 commented Aug 20, 2018

More thoughts on this:

The commits are not exclusively triggered by Director's Deployments. I triggered 4 Deploys this morning:
image

The icinga2.log counts much more sets of Apply Rule mismatches (generally headed by the message 'Commiting config item(s)', so i grepped for it):

[root@satellite_z1n1 icinga2]# grep 'Committing config item' icinga2.log | wc -l
421

Each commit triggers 690 warnings of Apply Rule mismatches on one of our satellites in Zone A, which seems valid, since theses services need to be applied to hosts in a different Zone B. The log on a sattelite in Zone B looks vice versa: 294 warnings of Apply Rule mismatches, because these services need to be deployed at Zone A.

It may also be helpful to know the reason for the huge amount of commits.

@stevie-sy
Copy link
Contributor Author

@dbu1986 a script for deleting the log file would be a work-around, we thought also in our department. But it first it would be nice if this would be patched and like you wrote why this happend.

@dnsmichi
Copy link
Contributor

When apply rules are configured, and they do not match anywhere, the user should be made aware of them. This is a general principle with Icinga and the config compiler, and has been sitting there since 2.0. I'm not sure which change has had an impact on generating more of such non-matching apply rules.

In any case, I would analyse where these apply rules source from, and why they are deployed into the satellite node. Highly likely they do exist as global zone and are synced to any cluster node, or they are wrongly pushed to the satellite via cluster config sync and the corresponding zone.

Removing the configuration warning, or changing its verbosity level is not an option here, as this would remove a well known configuration validation feature used in many places.

@dnsmichi dnsmichi added question area/configuration DSL, parser, compiler, error handling area/log Logging related labels Aug 22, 2018
@stevie-sy
Copy link
Contributor Author

@dnsmichi thanks for your answer.
I think the "problem" in our case is the director. Because he depolys the ServerSets-configuration into /var/lib/icinga2/api/zones/director-global/director/servicesets.conf on every node in the HA-Cluster.
But why the icinga produce in a couple of minutes so a big log file is the mystery. Maybe you can fix this in combination with the director. Like my Suggestion in the Issue here: Icinga/icingaweb2-module-director#1589

@Thomas-Gelf
Copy link
Contributor

I'm currently on vacation, just some quick thoughts on this:

  • when there are 4 Director-triggered deployments on a single day, but 420 restarts (if I correctly understood your numbers), then there is something weird going on
  • 690 warnings for every one of those 420 restarts may account for 30MB log entries, but never for 11 GB
  • the linked feature request for the Director is reasonable and can be implemented, it will however not solve your main issue here

Cheers,
Thomas

@dbu1986
Copy link

dbu1986 commented Aug 22, 2018

When apply rules are configured, and they do not match anywhere, the user should be made aware of them.

Yes, absolutely! I highly appreciate these messages and it's not my intention to get them suppressed.

The Servicesets are created via Director, and deployed via global zone "director-global". Here is an example:

[2018-08-22 13:16:40 +0200] warning/ApplyRule: Apply rule 'SERVICENAME' (in /var/lib/icinga2/api/zones/director-global/director/servicesets.conf: 13096:1-13096:58) for type 'Service' does not match anywhere!

This seems to be standard behaviour. Director currently does not offer an option to set the zone per Serviceset (Icinga/icingaweb2-module-director#1589).

In my environment the global zone "director-global" is configured on both master nodes and all satellite nodes (2 satellites per zone). Aside of that I have a global zone "globalconfig" configured on all nodes (Masters, Satellites, and client hosts) for deploying Director's CheckCommands. The intention of using these two global zones was to avoid non-matching Apply Rule warnings at all client hosts.

However, I'm confident that the logs will get much smaller if we get the deploying of apply rules solved, I think that is a side effect. But I'm not sure if we have another issue here which causes the high rate of commits. Anyway there still would be a questionble high rate of "Committing config item(s)" mentions in the log.

We also have a problem of reload looping. Sorry for not mentioning earlier, it just came to my mind. At this moment the icinga2.log's look like this:

MasterA (configuration master):

...
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
...

MasterB:

...
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
...

satellite_z1n1 & satellite_z1n2:

same as on MasterB, but additionally with all the apply rules mismatch warnings. That may be the reason for their huge logs.

Any advice, how I can get this under control? I've been struggling for days trying to debug when this starts and when it ends. Currently I'm establishing another separate environment for testing our config, but for now I don't have any clue where this is coming from.

@stevie-sy
Copy link
Contributor Author

@Thomas-Gelf thanks for the answer. Maybe it is a mixture of more things? But I think it reduces the warnings in the logs, if the director deploy the servicesets in the correct zones. For example in our case if the serviceset will deploy it only for the server-zone and not for the network Zone and vice versa.

One information for you: in our configuration we import the hosts every 2 hours from our CMDB via CSV. If there are changes the directore will sync. Sometimes we have to do some manual configuration.

But why icinga produce so much warnings is the other point.

For now: enjoy your holidays ;-) you deserve it! You are doing a good job with the director!

@Thomas-Gelf
Copy link
Contributor

@stevie-sy: glad to hear that, thank you! Just wanted to point you to that discrepancy in your numbers, you should try to dig deeper. Eventually you can track down a looping behavior similar to what @dbu1986 describes. Your log file is so huge, take an average log line length and do the math on your own. Regular deployments can hardly account for even 0.1% of that data.

@dnsmichi
Copy link
Contributor

dnsmichi commented Sep 14, 2018

I'm wondering why these messages are coming in from the satellite to the master. This should have been denied earlier.

[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.

https://github.com/Icinga/icinga2/blob/master/lib/remote/apilistener-configsync.cpp#L79

Please show the Endpoint and Zone configuration on your master, including all the satellites.

@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Sep 14, 2018
@dbu1986
Copy link

dbu1986 commented Sep 17, 2018

6542 zonesconf.zip

@stevie-sy
Copy link
Contributor Author

I want to give you a small update to this Issue:
After updating to the newest Version of Icinga (2.10.x) and the latest version of the director in the master branch we didn't got this behavior again. We also startet with a new and empty director database.

Maybe this [(https://github.com//pull/6663]) from the update 2.10.1 helped.

If @dbu1986 didn't get this Problem also again, maybe this could be closed.

@dnsmichi
Copy link
Contributor

Likely it helps a bit if you have many created objects and non-matching apply rules. It popped into my mind how to silence these logged lines, so I just did it.

@stevie-sy
Copy link
Contributor Author

And you did a good job (y).
if @Thomas-Gelf can also supress this with the director like my suggestion, that would also help I think. If it happen again than it need some more analysis and investigation like you wrote before.

@dnsmichi
Copy link
Contributor

The thing I'm still curious about is that log line

[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.

This must never happen, and I want to know how to reproduce this. @dbu1986 Unfortunately the zones configuration doesn't tell me much, how's the configuration deployed and synced top down, what exactly causes this behaviour?

@dbu1986
Copy link

dbu1986 commented Nov 5, 2018

@dnsmichi

https://github.com/Icinga/icinga2/blob/master/lib/remote/apilistener-configsync.cpp#L79

I assume you are reffering to #L224 ? The log message might be related to the ConfigDeleteObjectAPIHandler, since here no further details on the discarded object are being written into the log (following origin->FromClient->GetIdentity() ).
Could you provide further details on what Icinga2 is attempting to do here?

, what exactly causes this behaviour?

I'd desperately love to reveal that, too 😄 Unfortunately I have no idea what triggers it and how to find a point to start from.

However, I'm not sure if I got your question well, but as mentioned above, we're using HA distributed monitoring infrastructure in top/down mode, containing 2 Master, 3 Zones with 2 Satellites each, 4 Zones with 1 Satellite each, and a load of clients per Zone. Most of the objects are managed by Director (~10.000 Services, ~800 Hosts), plus ~5.000 Services created as runtime objects via REST-API on the config master, plus <20 Services defined in a configfile on the config master.

Additionally we are using a ticket system, fed by Icinga2 Notifications (passed in a perl script via REST API), and as a return path the ticket system sends Acknowledgements and Comments back to Icinga2 asynchronously.

One thing coming up my mind is the discrepancy between the timestamps of these messages. Today's messages as an example:
MasterA:

[root@MasterA ~]# grep "Discarding.*from 'satellite_z1n1" /var/log/icinga2/icinga2.log | head -40
[2018-11-05 04:06:40 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:06:53 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:02 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:06 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:53 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:16 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:20 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:24 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:31 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:52 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:15:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:22:28 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:22:50 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:23:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:24:35 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:24:49 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:02 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:10 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:43 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:05 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:15 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:41 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:30:23 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:39:10 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:39:39 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:44:47 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:48:54 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:49:04 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:49:05 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:55:37 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:56:42 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:59:50 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:05:16 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:18:12 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:20:43 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:24:39 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:39:24 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:42:23 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:55:07 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[root@MasterA ~]#

satellite_z1n1:

[root@satellite_z1n1 ~]# grep "Sending config updates for endpoint 'MasterA" /var/log/icinga2/icinga2.log
[2018-11-05 08:53:44 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:54:05 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:58:30 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:58:48 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:01:16 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:01:35 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:14:51 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:15:10 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:22:30 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:22:52 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:06:06 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:06:23 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:29:28 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:37:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:37:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:45:04 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:48:34 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:51:28 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:22:48 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:23:05 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:29:26 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:29:45 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:20:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:21:12 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:26:17 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:26:36 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:55:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 14:12:35 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[root@satellite_z1n1 ~]#

I somehow expected the message timestamps to nearly match to each other, but the 'discarding' messages at the master started around 04:06 and the 'sending config' messages at the satellite started much later, around 08:53. So there may be config updates on the satellite, that I am totally missing. Director's deployment log is listing today's first deployment at 07:52. And the icinga2 system processes are up and running since a couple of days, so I can also exclude a restart of Icinga2.

A few log lines from satellite_z1n1 before the first 'sending config' message to MasterA:

[2018-11-05 08:53:25 +0100] information/WorkQueue: #10 (JsonRpcConnection, #1) items: 1, rate: 111.45/s (6687/min 32724/5min 98056/15min);
[2018-11-05 08:53:26 +0100] information/ApiListener: New client connection for identity 'q4dee1syw4s' from [139.1.67.48]:44861 (no Endpoint object found for identity)
[2018-11-05 08:53:26 +0100] information/JsonRpcConnection: Received certificate request for CN 'q4dee1syw4s' signed by our CA.
[2018-11-05 08:53:26 +0100] information/JsonRpcConnection: The certificate for CN 'q4dee1syw4s' is valid and uptodate. Skipping automated renewal.
[2018-11-05 08:53:27 +0100] information/ApiListener: New client connection for identity 'linclutest01' from [139.1.67.126]:33682 (no Endpoint object found for identity)
[2018-11-05 08:53:31 +0100] information/ApiListener: New client connection for identity 'clientvm250509' from [139.1.66.123]:55057 (no Endpoint object found for identity)
[2018-11-05 08:53:38 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-05 08:53:40 +0100] information/ApiListener: New client connection for identity 'cldvm00004' from [172.30.14.11]:44500 (no Endpoint object found for identity)
[2018-11-05 08:53:40 +0100] information/JsonRpcConnection: Received certificate request for CN 'cldvm00004' signed by our CA.
[2018-11-05 08:53:40 +0100] information/JsonRpcConnection: The certificate for CN 'cldvm00004' is valid and uptodate. Skipping automated renewal.
[2018-11-05 08:53:40 +0100] warning/TlsStream: TLS stream was disconnected.
[2018-11-05 08:53:40 +0100] warning/JsonRpcConnection: API client disconnected for identity 'MasterA'
[2018-11-05 08:53:40 +0100] warning/ApiListener: Removing API client for endpoint 'MasterA'. 0 API clients left.
[2018-11-05 08:53:44 +0100] information/ApiListener: New client connection for identity 'MasterA' from [139.1.170.8]:39106
[2018-11-05 08:53:44 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'MasterA'.
[2018-11-05 08:53:44 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.

Could the 'API client disconnected' message be related to this?

@hardoverflow
Copy link

We have the same problem here.

Your Environment

  • Icinga 2 version (icinga2 --version): r2.10.2-1
  • Operating System and version: CentOS 7
  • Icinga Web 2 version: 2.6.1

[2018-11-16 06:55:00 +0100] information/ConfigObjectUtility: Created and activated object 'host.example.com!Service Name!8b488fc1-26a3-47cd-bfbe-97010e9a9ddd' of type 'Downtime'.

This is flooding our logs.

cat /var/log/icinga2/icinga2.log | grep "Created and activated object" | wc -l
277086

@dnsmichi
Copy link
Contributor

Are you syncing the ScheduledDowntime objects via global zone to that satellite? Those downtime creation messages can either originate from REST API creation calls (or the legacy command pipe) or they are created from those SD objects.

@dnsmichi
Copy link
Contributor

For further tests, please remove any ScheduledDowntime objects/apply rules from your configuration, and test again. I'd like to know whether this has an influence here or not.

@Al2Klimov
Copy link
Member

@lippserd @dnsmichi After lots of other tasks, tech. problems and (uncached) Icinga 2 builds I've finally set up a playground for this and tried to reproduce the issue. Either I've overseen an elephant (again) or there's something really interesting (and creepy) with that ScheduledDowntimes.

@dnsmichi Let's talk offline about the tech. details on monday.

@Al2Klimov Al2Klimov self-assigned this Nov 23, 2018
@Al2Klimov
Copy link
Member

The ScheduledDowntimes by themselves purr like a cat. But the created downtimes disappear suddenly.

@Al2Klimov
Copy link
Member

The web app shown here polls the Icinga 2 API all the time and appends new downtimes to its list and colors them green. Once a downtime disappear, it colors it red.

@Al2Klimov Al2Klimov removed their assignment Nov 29, 2018
Al2Klimov added a commit that referenced this issue Dec 4, 2018
@dnsmichi
Copy link
Contributor

dnsmichi commented Dec 7, 2018

This seems to be the following case:

  • Downtime -> ExpiredHandler() kicks in after 60 seconds
  • GetObject in HasValidConfigOwner() returns are nullptr
  • Downtime is deleted
  • SD detects that there is downtime missing, so it again creates one

Both calls lead into lib/remote/configobjectutility.cpp where CreateObject() always logs that it has created a runtime object (a downtime in this case). When a downtime is deleted, nothing is logged.

Therefore a local loop exists which immediately stops on restart, but then begins again after the expired downtime handler kicks in.

It doesn't matter whether the zone is the master or the satellite, creation and deletion happens locally.
If the configuration activation is fast enough this might not happen.

@dnsmichi dnsmichi added bug Something isn't working and removed needs feedback We'll only proceed once we hear from you again question labels Dec 7, 2018
@dnsmichi dnsmichi added this to the 2.10.3 milestone Dec 7, 2018
@Al2Klimov Al2Klimov removed their assignment Dec 7, 2018
Al2Klimov added a commit that referenced this issue Dec 17, 2018
dnsmichi pushed a commit that referenced this issue Feb 11, 2019
dnsmichi pushed a commit that referenced this issue Feb 11, 2019
dnsmichi pushed a commit that referenced this issue Feb 11, 2019
dnsmichi pushed a commit that referenced this issue Feb 11, 2019
@dnsmichi
Copy link
Contributor

@ALL

Can you please test the snapshot packages, it includes all referenced issues and PRs.

Thanks,
Michael

@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Feb 11, 2019
@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 11, 2019

ref/NC/585559
ref/IP/9673

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Feb 25, 2019
@dnsmichi
Copy link
Contributor

dnsmichi commented Feb 25, 2019

ref/IP/9673 reported this working as expected. Since there's no more feedback here, I consider this being resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/configuration DSL, parser, compiler, error handling area/log Logging related bug Something isn't working ref/IP ref/NC
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants