Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space Health perf data not working when critical #74

Open
edwaars opened this issue Jul 3, 2018 · 2 comments
Open

Space Health perf data not working when critical #74

edwaars opened this issue Jul 3, 2018 · 2 comments

Comments

@edwaars
Copy link

edwaars commented Jul 3, 2018

Issue Type

Bug report

Issue Detail

  • check_netapp_ontap version: 3.01.171611
  • NetApp Ontap version: 9.3
  • Monitoring solution: Nagios XI 5.4.13

Expected Behavior
When checking for aggregate_health, we expect to get performance data for all aggregates.

Actual Behavior
This works fine when the state is OK (0) or WARNING (1). However, when one of the aggregates is in CRITICAL state, it disappears from the performance data output, causing problems with Nagios XI PNP performance data engine. In our case, we are checking 3 aggregates, but as soon as 1 goes in Critical state, we only get performance data for 2 aggregates. The RRD file still expects 3 datasources, so we don't see any performance graphs anymore.

I expect this behaviour will also happen with other checks which use the calc_space_health sub, as when an object is critical, it is removed before checking for Warning, and perf data is only added on Warning check.

How to reproduce Behavior
Run the script for aggregate health so that no aggregates are critical. You should see perf data for all aggregates checked. rerun the check with critical level so that one or more aggregates have critical state, they will not be included in performance data then.

Would be great if this could be fixed soon.
Thanks
Edward

@mmarodin
Copy link

Please see my fix here:
https://github.com/mmarodin/check_netapp_ontap

@willemdh
Copy link
Collaborator

@mmarodin Please provide a pr with your fix :)

Elias481 added a commit to Elias481/check_netapp_ontap that referenced this issue Jul 20, 2020
…here skipped)

* this is just a least-instrusive fix for this impornat issue (fixes district09#74)

from my point of view the thing should be restructered further:
* call space_threshold_helper just once with both thresholds
* use a threshold-check function for the recurring task to determine intAlertLevel for a condition result
* include  thresholds in perf-data (also requested in district09#82) which would be quite hacky currently and I did not do yet despite I also want it
* output perfdata also for metrics where no threshold is defined (threshold is for alerting but a historical graph would be fine even if no alert is defined)
Elias481 added a commit to Elias481/check_netapp_ontap that referenced this issue Sep 24, 2020
…here skipped)

* this is just a least-instrusive fix for this impornat issue (fixes district09#74)

from my point of view the thing should be restructered further:
* call space_threshold_helper just once with both thresholds
* use a threshold-check function for the recurring task to determine intAlertLevel for a condition result
* include  thresholds in perf-data (also requested in district09#82) which would be quite hacky currently and I did not do yet despite I also want it
* output perfdata also for metrics where no threshold is defined (threshold is for alerting but a historical graph would be fine even if no alert is defined)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants