New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All recoveries are HARD #575
Comments
Soft OK states were not being triggered when a soft non-OK state turned back into an OK state.
Made this change in the maint branch for 4.4.3. |
Hi!
max_check_attempts 4 nothing other special about this service |
Did you apply this patch to 4.4.2 or did you use the current maint branch when you re-build Core?
I can try it again using only active checks too. |
Hi, |
Great please let me know if it happens again. I will also be testing it again just to be sure. |
Hi, How to reproduce:
which is not good, and no one gets notified also. Service definition:
Tested with active checks, not forced. I've analyzed logs and noted that repeated "CRITICAL;SOFT;X" alerts (where X is max_check_attempts ) happens next day after soft recoveries. It never happens before our core 4.4.2 was upgraded to maint branch. I've reverted 766d0d9 and will test again, if it happens again, I'll revert last commits one by one and test... |
Seems related to #576 |
…Core. In order to do this we are moving some of the resetting logic for service OK states so that the notification for soft recovery goes out before setting it to a HARD OK state.
Soft OK states were not being triggered when a soft non-OK state turned back into an OK state.
…ther versions of Core. In order to do this we are moving some of the resetting logic for service OK states so that the notification for soft recovery goes out before setting it to a HARD OK state.
Soft OK states were not being triggered when a soft non-OK state turned back into an OK state.
…ther versions of Core. In order to do this we are moving some of the resetting logic for service OK states so that the notification for soft recovery goes out before setting it to a HARD OK state.
Soft OK states were not being triggered when a soft non-OK state turned back into an OK state.
…ther versions of Core. In order to do this we are moving some of the resetting logic for service OK states so that the notification for soft recovery goes out before setting it to a HARD OK state.
Soft OK states were not being triggered when a soft non-OK state turned back into an OK state.
…ther versions of Core. In order to do this we are moving some of the resetting logic for service OK states so that the notification for soft recovery goes out before setting it to a HARD OK state.
The issue is described on the Nagios XI support forum here:
https://support.nagios.com/forum/viewtopic.php?t=50067#261007
According to the Nagios Core official documentation:
https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/statetypes.html
however this is not happening in Nagios Core 4.4.2.
Example (Core 4.4.2):
[1535649657] SERVICE ALERT: CentOS6-NRPE;Users;CRITICAL;SOFT;1;USERS CRITICAL - 2 users currently logged in
[1535649657] GLOBAL SERVICE EVENT HANDLER: CentOS6-NRPE;Users;CRITICAL;SOFT;1;xi_service_event_handler
[1535649717] SERVICE ALERT: CentOS6-NRPE;Users;CRITICAL;SOFT;2;USERS CRITICAL - 2 users currently logged in
[1535649717] GLOBAL SERVICE EVENT HANDLER: CentOS6-NRPE;Users;CRITICAL;SOFT;2;xi_service_event_handler
[1535649776] SERVICE ALERT: CentOS6-NRPE;Users;OK;HARD;1;USERS OK - 1 users currently logged in [1535649776] GLOBAL SERVICE EVENT HANDLER: CentOS6-NRPE;Users;OK;HARD;1;xi_service_event_handler
For comparison (Core 4.2.4):
[1535650778] SERVICE ALERT: localhost;Current Users;CRITICAL;SOFT;1;USERS CRITICAL - 2 users currently logged in
[1535650778] GLOBAL SERVICE EVENT HANDLER: localhost;Current Users;CRITICAL;SOFT;1;xi_service_event_handler
[1535650841] SERVICE ALERT: localhost;Current Users;CRITICAL;SOFT;2;USERS CRITICAL - 2 users currently logged in
[1535650841] GLOBAL SERVICE EVENT HANDLER: localhost;Current Users;CRITICAL;SOFT;2;xi_service_event_handler
[1535650902] SERVICE ALERT: localhost;Current Users;OK;SOFT;3;USERS OK - 1 users currently logged in
[1535650902] GLOBAL SERVICE EVENT HANDLER: localhost;Current Users;OK;SOFT;3;xi_service_event_handler
The text was updated successfully, but these errors were encountered: