Icinga2 sends notifications without logging about it and despite having a downtime #6057

baurmatt · 2018-02-01T16:23:53Z

Expected Behavior

Every notifications send generates a Icinga2 log entry
A downtimed host shouldn't send notifications

Current Behavior

We found that Icinga2 sends notifications without logging about it and despite having a downtime set.

Steps we do/see:

Host and services are included and monitored from Icinga2 and all checks are UP/OK
Host gets downtimed via API call
Host gets normally stopped
Icinga2 checks the host/sercices - checks are going into DOWN/CRITICAL state
Icinga2 regularly gets reloaded because of config changes being synced
After some time (minutes/perhaps 1-2 hours) another reload happens

While this reload happens, a notifications get set to the user telling about the DOWN state of the host
We never saw any notifications for services, only for hosts
This notification isn't logged by Icinga2 (.*information/Notification: Sending 'Problem' notification.*), it gets only logged by our notification script.

Notification script log:

{"date": "2018-02-01 15:58:47", "type": "email", "destination": "foobar@example.org", "notification_type": "PROBLEM", "service": "", "host": "my-fancy-hostname"}

Icinga2 log of the master which send the notification (and reloaded) from 15:58:*

[2018-02-01 15:58:00 +0100] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-02-01 15:58:23 +0100] warning/TlsStream: TLS stream was disconnected.
[2018-02-01 15:58:23 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master01'
[2018-02-01 15:58:23 +0100] warning/ApiListener: Removing API client for endpoint 'master01'. 0 API clients left.
[2018-02-01 15:58:28 +0100] information/ApiListener: New client connection for identity 'master01' from [89.89.89.89]:47040
[2018-02-01 15:58:28 +0100] information/ApiListener: Sending config updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for zone 'bki' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for zone 'blu' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for zone 'master' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtbki' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtblu' to endpoint 'master01'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:28 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master01'.
[2018-02-01 15:58:29 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'master01'.
[2018-02-01 15:58:29 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:29 +0100] information/ApiListener: Sending replay log for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:29 +0100] information/ApiListener: Finished sending replay log for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:29 +0100] information/ApiListener: Finished syncing endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying config update from endpoint 'master01' of zone 'master'.
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/bki//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/bki' (17 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.852829), Current timestamp '2018-02-01 15:39:11 +0100' (1517495951.020393).
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/blu//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/blu' (17 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.819657), Current timestamp '2018-02-01 15:39:11 +0100' (1517495951.050538).
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/globals.d//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/globals.d' (17 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.823386), Current timestamp '2018-02-01 15:39:10 +0100' (1517495950.980507).
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/master//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/master' (17 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.823882), Current timestamp '2018-02-01 15:39:11 +0100' (1517495951.024695).
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/mgmtbki//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/mgmtbki' (17 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.826017), Current timestamp '2018-02-01 15:39:11 +0100' (1517495951.026952).
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/mgmtblu//.timestamp
[2018-02-01 15:58:30 +0100] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/mgmtblu//_etc/hosts.conf
[2018-02-01 15:58:30 +0100] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/mgmtblu' (105135 Bytes). Received timestamp '2018-02-01 15:58:28 +0100' (1517497108.824826), Current timestamp '2018-02-01 15:39:10 +0100' (1517495950.981618).
[2018-02-01 15:58:30 +0100] information/ApiListener: Restarting after configuration change.
[2018-02-01 15:58:30 +0100] warning/TlsStream: TLS stream was disconnected.
[2018-02-01 15:58:30 +0100] warning/JsonRpcConnection: API client disconnected for identity 'satellite06'
[2018-02-01 15:58:30 +0100] warning/ApiListener: Removing API client for endpoint 'satellite06'. 0 API clients left.
[2018-02-01 15:58:31 +0100] information/WorkQueue: #17 (JsonRpcConnection, #7) items: 2365, rate: 373.167/s (22390/min 22390/5min 22390/15min);
[2018-02-01 15:58:31 +0100] information/Application: Got reload command: Starting new instance.
[2018-02-01 15:58:41 +0100] information/ApiListener: Reconnecting to endpoint 'satellite06' via host 'satellite06.example.org' and port '5665'
[2018-02-01 15:58:41 +0100] information/ApiListener: New client connection for identity 'satellite06' to [89.89.89.88]:5665
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite06' via host 'satellite06.example.org' and port '5665'
[2018-02-01 15:58:41 +0100] information/ApiListener: Sending config updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite06'.
[2018-02-01 15:58:41 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite06' signed by our CA.
[2018-02-01 15:58:41 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtblu' to endpoint 'satellite06'.
[2018-02-01 15:58:41 +0100] information/JsonRpcConnection: The certificate for CN 'satellite06' cannot be renewed yet.
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite06'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite06'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Sending replay log for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:41 +0100] information/ApiListener: Finished syncing endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:45 +0100] information/ApiListener: New client connection from [127.0.0.1]:58126 (no client certificate)
[2018-02-01 15:58:45 +0100] information/HttpServerConnection: Request: GET /v1/status (from [127.0.0.1]:58126, user: icingabeat)
[2018-02-01 15:58:46 +0100] information/Application: Received request to shut down.
[2018-02-01 15:58:46 +0100] information/Application: Shutting down...
[2018-02-01 15:58:46 +0100] information/ApiListener: 'api' stopped.
[2018-02-01 15:58:46 +0100] information/CheckerComponent: 'checker' stopped.
[2018-02-01 15:58:46 +0100] information/ExternalCommandListener: 'command' stopped.
[2018-02-01 15:58:57 +0100] information/NotificationComponent: 'notification' started.
[2018-02-01 15:58:57 +0100] information/ApiListener: Adding new listener on port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite04' via host 'satellite04.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite03' via host 'satellite03.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite06' via host 'satellite06.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite01' via host 'satellite01.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite02' via host 'satellite02.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'master01' via host 'master01.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Reconnecting to endpoint 'satellite05' via host 'mgmtsatellite01.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'satellite01' to [89.89.89.87]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite01' via host 'satellite01.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Sending config updates for endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'bki' to endpoint 'satellite01'.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite01' signed by our CA.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: The certificate for CN 'satellite01' cannot be renewed yet.
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'satellite06' to [89.89.89.88]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite06' via host 'satellite06.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Sending config updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtblu' to endpoint 'satellite06'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite06'.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite06' signed by our CA.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: The certificate for CN 'satellite06' cannot be renewed yet.
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'master01' to [89.89.89.89]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'master01' via host 'master01.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Sending config updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtbki' to endpoint 'master01'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'master' to endpoint 'master01'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'blu' to endpoint 'master01'.
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'satellite04' to [89.89.89.86]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite04' via host 'satellite04.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: Sending config updates for endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'blu' to endpoint 'satellite04'.
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'satellite02' to [89.89.89.85]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite02' via host 'satellite02.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/ApiListener: New client connection for identity 'satellite03' to [89.89.89.84]:5665
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite03' via host 'satellite03.example.org' and port '5665'
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite02' signed by our CA.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: The certificate for CN 'satellite02' cannot be renewed yet.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite04' signed by our CA.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: The certificate for CN 'satellite04' cannot be renewed yet.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite03' signed by our CA.
[2018-02-01 15:58:57 +0100] information/JsonRpcConnection: The certificate for CN 'satellite03' cannot be renewed yet.
[2018-02-01 15:58:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite06'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite01'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtblu' to endpoint 'master01'.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for zone 'bki' to endpoint 'master01'.
[2018-02-01 15:58:57 +0100] information/CheckerComponent: 'checker' started.
[2018-02-01 15:58:57 +0100] information/DbConnection: 'ido-mysql' started.
[2018-02-01 15:58:57 +0100] information/ExternalCommandListener: 'command' started.
[2018-02-01 15:58:57 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite04'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'master01'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite06'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite04'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite06' in zone 'mgmtblu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending config updates for endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for zone 'bki' to endpoint 'satellite02'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite02'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite01'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite02'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master01'.
[2018-02-01 15:58:58 +0100] information/ApiListener: New client connection for identity 'satellite05' to [89.89.89.84]:5665
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished reconnecting to endpoint 'satellite05' via host 'mgmtsatellite01.example.org' and port '5665'
[2018-02-01 15:58:58 +0100] information/JsonRpcConnection: Received certificate request for CN 'satellite05' signed by our CA.
[2018-02-01 15:58:58 +0100] information/JsonRpcConnection: The certificate for CN 'satellite05' cannot be renewed yet.
[2018-02-01 15:58:58 +0100] information/ApiListener: Applying config update from endpoint 'master01' of zone 'master'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite02'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite02' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending config updates for endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for zone 'blu' to endpoint 'satellite03'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite03'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite01'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite01' in zone 'bki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending config updates for endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for zone 'mgmtbki' to endpoint 'satellite05'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing configuration files for global zone 'globals.d' to endpoint 'satellite05'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite05'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite04'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite04' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending config file updates for endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Syncing runtime objects to endpoint 'satellite03'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'master01'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Replayed 353 messages.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite05'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite05' in zone 'mgmtbki'.
[2018-02-01 15:58:58 +0100] information/DbConnection: Resuming IDO connection: ido-mysql
[2018-02-01 15:58:58 +0100] information/IdoMysqlConnection: 'ido-mysql' resumed.
[2018-02-01 15:58:58 +0100] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'satellite03'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Sending replay log for endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'satellite03' in zone 'blu'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished sending replay log for endpoint 'master01' in zone 'master'.
[2018-02-01 15:58:58 +0100] information/ApiListener: Finished syncing endpoint 'master01' in zone 'master'.

We also have debuglog from this point in time but i would be really hard to anonymize. Please tell me if you think this would help debugging this and we probably find a way to get it to you.

Possible Solution

?

Steps to Reproduce (for bugs)

Not really sure yet on a small test setup. I can more or less frequently reproduce it on our setup.

Context

We're trying to avoid sending notifications to users for hosts which are normally stopped.

Your Environment

Version used (icinga2 --version): r2.8.1-1
Operating System and version: Ubuntu 16.04.3
Enabled features (icinga2 feature list): Enabled features: api checker command ido-mysql mainlog notification
Icinga Web 2 version and modules (System - About): 2.5.0, monitoring/custom
Config validation (icinga2 daemon -C):

root@master02:/var/log/icinga2# icinga2 daemon -C
information/cli: Icinga application loader (version: r2.8.1-1)
information/cli: Loading configuration file(s).
information/ConfigItem: Committing config item(s).
warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
warning/ApiListener: Please read the upgrading documentation for v2.8: https://www.icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/
information/ApiListener: My API identity: master02
information/WorkQueue: #4 (DaemonUtility::LoadConfigFiles) items: 2, rate: 8214.58/s (492875/min 492875/5min 492875/15min);
information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
information/WorkQueue: #7 (IdoMysqlConnection, ido-mysql) items: 0, rate:  0/s (0/min 0/5min 0/15min);
...
Some apply rule warnings
...
information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 6 Zones.
information/ConfigItem: Instantiated 8 Endpoints.
information/ConfigItem: Instantiated 5 ApiUsers.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 90709 Notifications.
information/ConfigItem: Instantiated 2 NotificationCommands.
information/ConfigItem: Instantiated 11 CheckCommands.
information/ConfigItem: Instantiated 2595 Downtimes.
information/ConfigItem: Instantiated 444 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 2738 Hosts.
information/ConfigItem: Instantiated 259 Comments.
information/ConfigItem: Instantiated 257 UserGroups.
information/ConfigItem: Instantiated 51944 Dependencies.
information/ConfigItem: Instantiated 413 Users.
information/ConfigItem: Instantiated 4 TimePeriods.
information/ConfigItem: Instantiated 60952 Services.
information/ConfigItem: Instantiated 1 ServiceGroup.
information/ConfigItem: Instantiated 262 ScheduledDowntimes.
information/ConfigItem: Instantiated 1 ExternalCommandListener.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
information/cli: Finished validating the configuration file(s).

If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.

object Endpoint NodeName {
  host = "10.0.0.1"
}

object Endpoint "satellite05"  {
  host = "mgmtsatellite01.example.org"
}

object Endpoint "master02"  {
  host = "master02.example.org"
}

object Endpoint "satellite05"  {
  host = "satellite06.example.org"
}

object Endpoint "satellite03"  {
  host = "satellite03.example.org"
}

object Endpoint "satellite04"  {
  host = "satellite04.example.org"
}

object Endpoint "satellite01"  {
  host = "satellite01.example.org"
}

object Endpoint "satellite02"  {
  host = "satellite02.example.org"
}

object Zone "bki"  {
  endpoints = [ "satellite01", "satellite02", ]
  parent = "master"
}

object Zone "blu"  {
  endpoints = [ "satellite03", "satellite04", ]
  parent = "master"
}

object Zone "globals.d"  {
  global = true
}

object Zone "master"  {
  endpoints = [ "master01", "master02", ]
}

object Zone "mgmtbki"  {
  endpoints = [ "satellite05", ]
  parent = "master"
}

object Zone "mgmtblu"  {
  endpoints = [ "satellite05", ]
  parent = "master"
}

The text was updated successfully, but these errors were encountered:

Crunsher · 2018-02-02T09:36:59Z

Notifications in conjuncture with reloads are a known problem. #5521 for example.

baurmatt · 2018-02-02T09:43:34Z

Is there anything we can do to solve this problem?

Crunsher · 2018-02-02T09:45:13Z

I don't know of any workaround

baurmatt · 2018-02-02T10:00:12Z

What i meant was, is there any information we can provide which helps you track down and fix the issue?

Crunsher · 2018-02-02T10:26:10Z

Oh! A minimal configuration example/list of steps that lets us reliably reproduce the issue is always welcome.

baurmatt · 2018-02-02T17:00:32Z

Still trying to realiably reproduce this on a small setup... -.- Strange thing we realized while looking through our logs is, that the NotificationComponent gets started after the config reloads but we never see any stop log message. Is this correct? It seems like it should log https://github.com/Icinga/icinga2/blob/master/lib/notification/notificationcomponent.cpp#L69

Crunsher · 2018-02-05T08:37:07Z

During a reload this does not happen, the new process takes over as the old one silently dies. I'm going to check whether two NotificationComponents can run at the same time and if this can be a problem in case it is.

kepi · 2018-02-09T01:38:30Z

Same problem in our setup, let us know if we can provide some more info.

To provide little context how important this fix is: Almost every day it wakes me and other admins up, often after long night work. Because icinga gets reloaded at morning when changing oncall contacts. Or just some admin is adding new check when I'm sleeping. Please help us save our sleep and sanity :)

We belived it will be fixed (similar/same behaviour already mentioned in #4696, #5224, #5361) but we have icinga 2.8.1 now and problem remains.

Mikesch-mp · 2018-02-13T14:42:42Z

This happens a lot if you use scheduledDowntimes.

marcofl · 2018-02-15T08:12:57Z

We see similar issues with notifications being sent out right after reload of icinga2 for services / hosts which have a scheduled Downtime. I'm not really able to reproduce this in a small vagrant setup. For me it seems to only happen in lager installations where restarts / reloads take some time and a lot of system resources.

information/ConfigItem: Instantiated 1 ApiListener.
information/ConfigItem: Instantiated 640 Zones.
information/ConfigItem: Instantiated 638 Endpoints.
information/ConfigItem: Instantiated 5 ApiUsers.
information/ConfigItem: Instantiated 1 FileLogger.
information/ConfigItem: Instantiated 8786 Notifications.
information/ConfigItem: Instantiated 8 NotificationCommands.
information/ConfigItem: Instantiated 301 CheckCommands.
information/ConfigItem: Instantiated 26239 Downtimes.
information/ConfigItem: Instantiated 38 HostGroups.
information/ConfigItem: Instantiated 1 IcingaApplication.
information/ConfigItem: Instantiated 696 Hosts.
information/ConfigItem: Instantiated 237 Comments.
information/ConfigItem: Instantiated 5 UserGroups.
information/ConfigItem: Instantiated 10 Users.
information/ConfigItem: Instantiated 2 TimePeriods.
information/ConfigItem: Instantiated 39544 Services.
information/ConfigItem: Instantiated 2 ServiceGroups.
information/ConfigItem: Instantiated 13054 ScheduledDowntimes.
information/ConfigItem: Instantiated 1 IdoMysqlConnection.
information/ConfigItem: Instantiated 1 InfluxdbWriter.
information/ConfigItem: Instantiated 1 NotificationComponent.
information/ConfigItem: Instantiated 1 CheckerComponent.
information/ConfigItem: Instantiated 1 ExternalCommandListener.

anan80 · 2018-02-20T04:03:02Z

i have same notification issue when servers are kept in scheduled downtime and icinga randomly sending notification mails. - icinga 2.8.0

n0npax · 2018-03-20T09:48:06Z

We have the same notification issue. Icinga 2.8.1, config 6 satellites & 2 masters. Seems to be related with config reload too.

dnsmichi · 2018-03-26T12:58:57Z

Looks like a race condition. The NotificationComponent is started before the FileLogger object is instantiated, and as such the icinga2.log doesn't have this record. startup.log should hold this detail.

Notification events without a call to Log() do not exist. It is hard to reproduce in this case, as one cannot influence the object load order.

I would advise to dump all notification (and related objects) attributes in a debug dummy notification.
https://www.icinga.com/docs/icinga2/latest/doc/09-object-types/#notification

last_notification
next_notification
last_problem_notification
"now" (depending on your script language, date +%s or likewise).

https://www.icinga.com/docs/icinga2/latest/doc/09-object-types/#host

host state, last_state, state_type, last_in_down, downtime_depth, etc.

winter1967 · 2018-03-26T13:41:08Z

Notification events without a call to Log() do not exist.

It exists at our environment since Icinga 2.5.4 ;-)
It happens sporadicaly, but not all time. So we have over 900 host object it happens every day but on differents systems.
Maybe it happens more often, when we create / import new systems with icinga2 director which causes a lot of load.

dnsmichi · 2018-03-26T14:48:56Z

It exists at our environment since Icinga 2.5.4 ;-)

You've missed my point. In terms of code, no such events will ever pass through the Log() function call. It may be the case that the logging file stream is not there, and this results in an event literally logging to /dev/null.

Thomas-Gelf · 2018-04-10T12:40:59Z

@lippserd: this issue, #5521 and quite some other linked/closed issues all have the same root cause and terribly hurt, especially large environments. Could you please raise priority and give someone some dedicated time to fix this? During restart/reload (read: with every config deployment) we are sending out erroneous notifications because the notification component starts and fires before the core learned about currently active downtimes. I've been shown another proof for this in 2.8.1 right now.

Notification are sent during downtime with no trace in IDO and/or log files. Happens immediately after reloading/restarting Icinga.

Thanks a lot,
Thomas

lippserd · 2018-04-10T12:49:07Z

@Thomas-Gelf Indeed, this is a big issue. We'll see what we can do to fix this as soon as possible.

dnsmichi · 2018-04-18T09:18:46Z

Crunsher · 2018-04-18T09:30:59Z

Proposal

Add a virtual type to all config objects to help Icinga2 work with them.

Explanation

Currently there is not much difference to icinga between a Host and the NotificationComponent, we can't give any securities about the load order.
To remedy this add a virtual type to all objects and start them by type. Something like Logger -> Hosts, Services, etc. -> NotificationComponent. In this case three types would be required, pre-config, config, post-config ('config' refering to monitoring objects like Hosts, Services and Downtimes here)

Problems

Figuring out the correct types for everything is a big load of work, a sensible default is needed. And there a a lot of possible bugs and dependency misjudgments which might have already existed but were just not visible since we loaded/started everything at more or less the same time.

Wintermute2k6 · 2018-04-23T14:53:18Z

removed comment because it was related to different issues.

Mikesch-mp · 2018-04-23T19:00:49Z

@Wintermute2k6 that should not belong to this issue. First issue is this one, the second one is what you copied here. Its a complete different problem.

dnsmichi · 2018-04-24T06:51:30Z

Agreed. @Wintermute2k6 your problem sounds like #2844 and #4272. Please proceed there.

Wintermute2k6 · 2018-04-24T07:12:27Z

@Mikesch-mp & @dnsmichi i agree with both of you. But its not my issue at all.
Regards

dnsmichi · 2018-04-24T09:00:50Z

I've read #558349 and this issue is just one of many. ScheduledDowntime objects and their generation of runtime downtime objects is not part of this issue.

We're discussing and elaborating fixes for unwanted notifications caused by runtime race conditions, either influenced by not-yet activated downtimes/acks, or too-soon-running features like the notification component.

Wintermute2k6 · 2018-04-24T09:35:49Z

No problem with that at all i will split it into the accordingly matching issues.

Crunsher · 2018-04-27T09:35:18Z

Config to test attached, was done with a sleep in the Downtime start method to force the bug
i2_test.zip

This patch ensures that specific configuration types are pre-activated and post-activated. In general, logging is first, then common configuration objects like host/service, downtimes, etc. In the end, all features are activated after to ensure that notifications are only sent once downtimes are applied. A similar thing happens for starting with checks too early. The ApiListener feature runs first to allow cluster connections at first glance. fixes #6057 fixes #6231

Crunsher added bug Something isn't working area/notifications Notification events labels Feb 2, 2018

Crunsher mentioned this issue Feb 20, 2018

Downtime didn't work, Notifications were sent. #5088

Closed

dnsmichi added the blocker Blocks a release or needs immediate attention label Apr 17, 2018

dnsmichi assigned Crunsher Apr 17, 2018

dnsmichi added this to the 2.9.0 milestone Apr 18, 2018

Crunsher mentioned this issue Apr 27, 2018

Add activation priority for config object types #6270

Merged

dnsmichi closed this as completed in #6270 May 4, 2018

ekeih mentioned this issue Dec 19, 2018

Icinga2 sends notification during reload (despite downtime) and does not log it #6846

Closed

dnsmichi unassigned Crunsher Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Icinga2 sends notifications without logging about it and despite having a downtime #6057

Icinga2 sends notifications without logging about it and despite having a downtime #6057

baurmatt commented Feb 1, 2018 •

edited

Loading

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 5, 2018

kepi commented Feb 9, 2018

Mikesch-mp commented Feb 13, 2018

marcofl commented Feb 15, 2018

anan80 commented Feb 20, 2018

n0npax commented Mar 20, 2018

dnsmichi commented Mar 26, 2018

winter1967 commented Mar 26, 2018 •

edited

Loading

dnsmichi commented Mar 26, 2018

Thomas-Gelf commented Apr 10, 2018

lippserd commented Apr 10, 2018

dnsmichi commented Apr 18, 2018

Crunsher commented Apr 18, 2018

Wintermute2k6 commented Apr 23, 2018 •

edited

Loading

Mikesch-mp commented Apr 23, 2018

dnsmichi commented Apr 24, 2018

Wintermute2k6 commented Apr 24, 2018

dnsmichi commented Apr 24, 2018

Wintermute2k6 commented Apr 24, 2018

Crunsher commented Apr 27, 2018

Icinga2 sends notifications without logging about it and despite having a downtime #6057

Icinga2 sends notifications without logging about it and despite having a downtime #6057

Comments

baurmatt commented Feb 1, 2018 • edited Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 2, 2018

baurmatt commented Feb 2, 2018

Crunsher commented Feb 5, 2018

kepi commented Feb 9, 2018

Mikesch-mp commented Feb 13, 2018

marcofl commented Feb 15, 2018

anan80 commented Feb 20, 2018

n0npax commented Mar 20, 2018

dnsmichi commented Mar 26, 2018

winter1967 commented Mar 26, 2018 • edited Loading

dnsmichi commented Mar 26, 2018

Thomas-Gelf commented Apr 10, 2018

lippserd commented Apr 10, 2018

dnsmichi commented Apr 18, 2018

Crunsher commented Apr 18, 2018

Proposal

Explanation

Problems

Wintermute2k6 commented Apr 23, 2018 • edited Loading

Mikesch-mp commented Apr 23, 2018

dnsmichi commented Apr 24, 2018

Wintermute2k6 commented Apr 24, 2018

dnsmichi commented Apr 24, 2018

Wintermute2k6 commented Apr 24, 2018

Crunsher commented Apr 27, 2018

baurmatt commented Feb 1, 2018 •

edited

Loading

winter1967 commented Mar 26, 2018 •

edited

Loading

Wintermute2k6 commented Apr 23, 2018 •

edited

Loading