-
Notifications
You must be signed in to change notification settings - Fork 596
Description
Describe the bug
We have discovered issues with some of our eventHandlers.
The issue occured with one of our service-checks in a icinga2 HA-Zone,
where the command_endpoint got set fixed to one of both HA-Nodes.
Those never executed our eventhandler, a python script with separate logging at early stage.
We discovered that those event handler only got executed once in the past two weeks, when it should have been executed at least two times on each monday.
(We also discovered similar issues with other event handlers, several month ago, but couldn't take a closer look at those.)
Finally we evaluated debuglog and icinga2 source-code and found that it only occures when we do have an active cluster-setup and using the command_endpoint in a service (or host) and when additonaly the check get scheduled and executed on the same host.
Refering to the following coderef the event command get send per RPC to the set endpoint.
icinga2/lib/icinga/checkable-event.cpp
Line 52 in 5c651e4
| if (endpoint && !GetExtension("agent_check")) { |
Our debuglog do show that this do happen, in our case.
notice/ApiListener: Sending message 'event::ExecuteCommand' to 'master01'
(Where master01 is our configuration master and master02 whould be the secondary node in master zone.)
But there is no debug message like the following, that would indicate for us, that the icinga2 instance has received the command:
notice/JsonRpcConnection: Received 'event::SetLastCheckStarted' message from identity 'master01'
Also no futher information about any execution or errors related to the eventCommand.
We don't know if the RPC-Message got silently dropped (cause of drop-rules in RPC-MessageHandling) or never reaches the listener (or even leaves it).
Perhaps at the above mentioned line it could be verified, that the set endpoint does not match the local instance itself, when deciding to send it to the endpoint.
To Reproduce
- Using an active HA-Zone
- Adding a Service (to a host) with:
- an (activated) EventHandler
- a command_endpoint set to one of both zone-endpoints
- Reload the icinga2 daemon until the schedule_source of the check is equal to the set command_endpoint
- Trigger the service to trigger the eventHandler
Expected behavior
The EventCommand schould be executed on the set command_endpoint, or when empty, on the local instance.
Your Environment
Include as many relevant details about the environment you experienced the problem in
- Version used (
icinga2 --version):
icinga2 - The Icinga 2 network monitoring daemon (version: r2.14.5-1)
Copyright (c) 2012-2025 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: Debian GNU/Linux
Platform version: 12 (bookworm)
Kernel: Linux
Kernel version: 6.1.0-31-amd64
Architecture: x86_64
Build information:
Compiler: GNU 12.2.0
Build host: runner-hh8q3bz2-project-575-concurrent-0
OpenSSL version: OpenSSL 3.0.15 3 Sep 2024
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
- Enabled features (
icinga2 feature list):
Disabled features: command compatlog debuglog elasticsearch gelf graphite influxdb2 journald opentsdb perfdata statusdata syslog
Enabled features: api checker icingadb ido-mysql influxdb livestatus mainlog notification
- Config validation (
icinga2 daemon -C):
[2025-03-04 17:36:06 +0100] information/cli: Icinga application loader (version: r2.14.5-1)
[2025-03-04 17:36:06 +0100] information/cli: Loading configuration file(s).
[2025-03-04 17:36:06 +0100] information/config: NotificationModes defined
[2025-03-04 17:36:08 +0100] warning/config: Ignoring directory '/var/lib/icinga2/api/zones/m00' for unknown zone 'm00'.
[2025-03-04 17:36:08 +0100] information/ConfigItem: Committing config item(s).
[2025-03-04 17:36:08 +0100] information/ApiListener: My API identity: master01
[2025-03-04 17:36:18 +0100] information/WorkQueue: #5 (DaemonUtility::LoadConfigFiles) items: 0, rate: 7.18333/s (431/min 431/5min 431/15min);
[2025-03-04 17:36:18 +0100] information/WorkQueue: #8 (InfluxdbWriter, influxdb) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2025-03-04 17:36:18 +0100] information/WorkQueue: #10 (ApiListener, SyncQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
[2025-03-04 17:36:18 +0100] information/WorkQueue: #9 (ApiListener, RelayQueue) items: 0, rate: 0/s (0/min 0/5min 0/15min);
...
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 12 NotificationCommands.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 42202 Notifications.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 374 HostGroups.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1131 Hosts.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 10 EventCommands.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 9194 Downtimes.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 11394 Dependencies.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 101 Comments.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 IcingaDB.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 18 Zones.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 18 Endpoints.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 15 ApiUsers.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 97 Users.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 399 CheckCommands.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 73 UserGroups.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 141 ServiceGroups.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 7 TimePeriods.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 8112 ScheduledDowntimes.
[2025-03-04 17:36:38 +0100] information/ConfigItem: Instantiated 19337 Services.
[2025-03-04 17:36:38 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2025-03-04 17:36:38 +0100] information/cli: Finished validating the configuration file(s).