New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service state appears to switch directly to hard without passing through soft #9835
Comments
On first glance this sounds like the Please check if you service or templates have that setting set to |
<insert huge facepalm emoticon here> You're absolutely right, in a dark corner of the deployment setup there was indeed a volatile set to true which I missed. As a last note on the max check attempts, can anybody confirm that what it says in the documentation: Thank you. |
Glad it helped. I suggest closing this issue then ;) As for the check attempts: Yes, the first check, that detects the problem already counts to the max_check_attempts value. |
Issue closed as the reported behaviour was a configuration error and the extra question was answered. |
Describe the bug
Hello,
I'm not 100% sure the following is a bug, but after reading the documentation and observing the behaviour I don't understand why it happens.
If the below is normal behaviour, please set this issue as a question and help me understand why it happens.
In short, the documentation says:
"When detecting a problem with a host/service, Icinga re-checks the object a number of times (based on the max_check_attempts and retry_interval settings) before sending notifications."
"The number of times a host/service is re-checked before changing into a hard state. Defaults to 3."
We have set a service with max_check_attempts to 4 and retry_interval to 30s.
What we expect:
What happens:
As an observation, on Hosts, the transition via SOFT to hard is properly working.
Why on services it reports directly as hard state on the first check that fails, that is the question.
Also, considering the behaviour, it seems to me that max_check_attempts also includes the first check that failed, not just the retries.
So max_check_attempts = 4 means 1 fail + 3 retries.
To Reproduce
We manage our hosts, service templates and applies via Icinga Director.
Please let us know if it would be useful to provide the configuration jsons for Director, or extract the data via the Icinga API.
Expected behavior
See the bug description
Screenshots
Screenshots attached of:
before the first critical check
{ "check_attempt": 1, "check_commandline": "'/usr/lib/nagios/plugins/check_procs' '-C' 'apache2' '-c' '1:1' '-p' '1' '-u' 'root' '-w' '1:1'", "check_source": "[REDACTED]", "check_timeout": 60000, "environment_id": "3f6acd65f0d7d3677481c1eeb047bc71ef57b0ec", "execution_time": 10, "hard_state": 0, "host_id": "75306fea55a71675ad96ba85056ab9b9f68ac501", "id": "a9d11b26e6060c8b54a3795122e37aa9d26a05d4", "in_downtime": false, "is_acknowledged": 0, "is_active": true, "is_flapping": false, "is_handled": false, "is_problem": false, "is_reachable": true, "last_state_change": 1689584909460, "last_update": 1689589534101, "latency": 0, "next_check": 1689589712223, "next_update": 1689589892223, "normalized_performance_data": "procs=1", "output": "PROCS OK: 1 process with command name 'apache2', PPID = 1, UID = 0 (root) ", "performance_data": "procs=1;1:1;1:1;0;", "previous_hard_state": 0, "previous_soft_state": 2, "scheduling_source": "[REDACTED]", "service_id": "a9d11b26e6060c8b54a3795122e37aa9d26a05d4", "severity": 0, "soft_state": 0, "state_type": 1 }
and after the first check that returned critical
{ "check_attempt": 1, "check_commandline": "'/usr/lib/nagios/plugins/check_procs' '-C' 'apache2' '-c' '1:1' '-p' '1' '-u' 'root' '-w' '1:1'", "check_source": "[REDACTED]", "check_timeout": 60000, "environment_id": "3f6acd65f0d7d3677481c1eeb047bc71ef57b0ec", "execution_time": 10, "hard_state": 2, "host_id": "75306fea55a71675ad96ba85056ab9b9f68ac501", "id": "a9d11b26e6060c8b54a3795122e37aa9d26a05d4", "in_downtime": false, "is_acknowledged": 0, "is_active": true, "is_flapping": false, "is_handled": false, "is_problem": true, "is_reachable": true, "last_state_change": 1689589894101, "last_update": 1689589894101, "latency": 0, "next_check": 1689589922222, "next_update": 1689589952222, "normalized_performance_data": "procs=0", "output": "PROCS CRITICAL: 0 processes with command name 'apache2', PPID = 1, UID = 0 (root) ", "performance_data": "procs=0;1:1;1:1;0;", "previous_hard_state": 0, "previous_soft_state": 0, "scheduling_source": "[REDACTED]", "service_id": "a9d11b26e6060c8b54a3795122e37aa9d26a05d4", "severity": 2176, "soft_state": 2, "state_type": 0 }
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): 2.13.7icinga2 feature list
): api checker icingadb mainlog notification (on masters)icinga2 daemon -C
):``
icinga2 daemon -C
[2023-07-17 11:33:51 +0000] information/cli: Icinga application loader (version: r2.13.7-1)
[2023-07-17 11:33:51 +0000] information/cli: Loading configuration file(s).
[2023-07-17 11:33:51 +0000] information/ConfigItem: Committing config item(s).
[2023-07-17 11:33:51 +0000] information/ApiListener: My API identity: [REDACTED]
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 3 HostGroups.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 78 Hosts.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 943 Notifications.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 IcingaDB.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 76 Zones.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 76 Endpoints.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 4 ApiUsers.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 265 CheckCommands.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 2 UserGroups.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 1 TimePeriod.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 2 Users.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 866 Services.
[2023-07-17 11:33:51 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2023-07-17 11:33:51 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2023-07-17 11:33:51 +0000] information/cli: Finished validating the configuration file(s).
``
Additional context
Thank you.
The text was updated successfully, but these errors were encountered: