Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface plugin.AddPerfData() error, exit with UNKNOWN exit state #53

Closed
12 tasks done
atc0005 opened this issue Feb 17, 2023 · 2 comments
Closed
12 tasks done

Surface plugin.AddPerfData() error, exit with UNKNOWN exit state #53

atc0005 opened this issue Feb 17, 2023 · 2 comments
Assignees

Comments

@atc0005
Copy link
Owner

atc0005 commented Feb 17, 2023

Overview

Example of code block where the issue is not surfaced:

	pd := getPerfData(allAssertions, fileAssertions, registryAssertions)
	if err := plugin.AddPerfData(false, pd...); err != nil {
		log.Error().
			Err(err).
			Msg("failed to add performance data")
	}

Until now the approach has been just to log the error and continue.

At a minimum I believe that I should surface the error by adding to the error collection; an error "collection" is a relatively new feature of the nagios plugin library, so if I had to guess (without checking my memory) I'd say that I opted to log the error vs collecting it as another error (earlier or later) might be more significant and not something that could be skipped.

Example of the modified code block:

	pd := getPerfData(allAssertions, fileAssertions, registryAssertions)
	if err := plugin.AddPerfData(false, pd...); err != nil {
		log.Error().
			Err(err).
			Msg("failed to add performance data")

		// Surface the error in plugin output.
		plugin.AddError(err)

		// Abort plugin execution with UNKNOWN status. This should not fail,
		// but if it does, then it's *very* likely a logic bug that should be
		// surfaced.
		plugin.ExitStatusCode = nagios.StateUNKNOWNExitCode
		plugin.ServiceOutput = fmt.Sprintf(
			"%s: Failed to process performance data metrics",
			nagios.StateUNKNOWNLabel,
		)

		return
	}

TODO

Review these projects and apply changes where applicable:

  • atc0005/check-cert
  • atc0005/check-illiad
  • atc0005/check-mail
  • atc0005/check-ntpt
  • atc0005/check-path
  • atc0005/check-process
  • atc0005/check-restart
  • atc0005/check-statuspage
  • atc0005/check-vmware
  • atc0005/check-whois
  • atc0005/mysql2sqlite
  • atc0005/nagios-debug
@atc0005
Copy link
Owner Author

atc0005 commented Mar 2, 2023

Example diff from the check-cert project:

diff --git a/cmd/check_cert/main.go b/cmd/check_cert/main.go
index b119daf..fa13b79 100644
--- a/cmd/check_cert/main.go
+++ b/cmd/check_cert/main.go
@@ -391,7 +391,13 @@ func main() {
 		// Surface the error in plugin output.
 		plugin.AddError(perfDataErr)
 
-		// TODO: Abort plugin execution with UNKNOWN status?
+		plugin.ExitStatusCode = nagios.StateUNKNOWNExitCode
+		plugin.ServiceOutput = fmt.Sprintf(
+			"%s: Failed to generate performance data metrics",
+			nagios.StateUNKNOWNLabel,
+		)
+
+		return
 	}
 
 	if err := plugin.AddPerfData(false, pd...); err != nil {
@@ -402,7 +408,11 @@ func main() {
 		// Surface the error in plugin output.
 		plugin.AddError(err)
 
-		// TODO: Abort plugin execution with UNKNOWN status?
+		plugin.ExitStatusCode = nagios.StateUNKNOWNExitCode
+		plugin.ServiceOutput = fmt.Sprintf(
+			"%s: Failed to process performance data metrics",
+			nagios.StateUNKNOWNLabel,
+		)
 	}
 
 	switch {

This was referenced Mar 2, 2023
@atc0005
Copy link
Owner Author

atc0005 commented Apr 6, 2023

NOTE: Checking off TODO items if project doesn't currently emit performance data.

atc0005 added a commit to atc0005/check-vmware that referenced this issue Jul 10, 2023
- abort with UNKNOWN state when failing to add/gather generated
  performance data
- remove duplicate perfdata add step(s)
  - both the problem & OK state switch case blocks attempted to add
    perfdata metrics. Instead of updating the additional case blocks
    to use an UNKNOWN exit state we just move the step outside of the
    switch statements so the step is performed just once

refs #767
refs atc0005/todo#53
atc0005 added a commit to atc0005/check-vmware that referenced this issue Jul 10, 2023
- abort with UNKNOWN state when failing to add/gather generated
  performance data
- remove duplicate perfdata add step(s)
  - both the problem & OK state switch case blocks attempted to add
    perfdata metrics. Instead of updating the additional case blocks
    to use an UNKNOWN exit state we just move the step outside of the
    switch statements so the step is performed just once

refs #767
refs atc0005/todo#53
@atc0005 atc0005 closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant