Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neutron netns check CRIT/CRITICAL mismatch #94

Open
sudeephb opened this issue Jan 4, 2024 · 3 comments
Open

Neutron netns check CRIT/CRITICAL mismatch #94

sudeephb opened this issue Jan 4, 2024 · 3 comments

Comments

@sudeephb
Copy link
Member

sudeephb commented Jan 4, 2024

The included check_netns.sh script for checking neutron gateway netns issues uses "CRIT" to report critical errors, but check_status_file.py expects "CRITICAL".

The former script is here, and writes its output to a file when called by cron, so the "exit $STATE_CRIT" never reaches the check, which can only read the contents of the log file:

https://git.launchpad.net/charm-nrpe/tree/files/plugins/check_netns.sh

An error state will be recorded in the log file as e.g. "CRIT: [...] aren't responding".

But the script used by the check, check_status_file.py, onlny looks for "CRITICAL" in the output, so doesn't match "CRIT":

https://git.launchpad.net/charm-nrpe/tree/files/plugins/check_status_file.py


Imported from Launchpad using lp2gh.

  • date created: 2023-01-20T05:54:44Z

  • owner: barryprice

  • assignee: None

  • the launchpad url

@sudeephb
Copy link
Member Author

sudeephb commented Jan 4, 2024

(by barryprice)
Marking this as also affecting the neutron-gateway charm, as that's where I came across it.

If desired, this could be worked around in that charm's code with something like:

  •    check_cmd='check_status_file.py -f /var/lib/nagios/netns-check.txt'
    
  •    check_cmd='check_status_file.py -c CRIT -f /var/lib/nagios/netns-check.txt'
    

https://opendev.org/openstack/charm-neutron-gateway/src/branch/master/hooks/neutron_hooks.py#L347

Happy to propose that fix if desired.

@sudeephb
Copy link
Member Author

sudeephb commented Jan 4, 2024

(by barryprice)
Actually coming around to the idea that this may be a bad check, I've found it in CRIT state on other clouds which appear to be working fine.

Have marked the linked MP as WIP while we investigate further, but it may be that a better fix is to either remove or update check_netns.sh rather than to trust its current output.

@sudeephb
Copy link
Member Author

sudeephb commented Jan 4, 2024

(by barryprice)
I don't think Launchpad allows blocking one bug against another, but I believe the linked bug here should be addressed before we decide what to do with this one:

https://bugs.launchpad.net/charm-nrpe/+bug/2003641

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant