Skip to content

Commit

Permalink
Add capability for check_ceph_health to look a specific checks
Browse files Browse the repository at this point in the history
This enables the check_ceph_health to select certain monitor health checks
(https://docs.ceph.com/en/latest/rados/operations/health-checks/)
by providing a regexp. This can be used in conjunction with a/the whitelist.

Use-Cases for this might be to have separate checks for certain checks or not
check something specific in one check like flags.

This can also be converted into a blacklist with the right regexp and I provided
an example below how it can be done.

I also add the nagios check command check_ceph_health_filtered with `--check`
and `--whitelist` as arguments as I can imagine it being useful and they play
nicely together. Leaving either argument empty should result in the same result
as omitting the corresponding argument.

Some dummy examples:
$ ./check_ceph_health
WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a )
OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) )
PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded )

$ ./check_ceph_health --check 'PG_DEGRADED|OBJECT_MISPLACED'
WARNING: OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) )
PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded )

$ ./check_ceph_health --check '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$'
WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a )

PS: I am not quite happy with the argument name and open to suggestions and
candidates for a short version of the `--check` argument. (`-hc`?)
  • Loading branch information
Christian Kugler committed Sep 21, 2020
1 parent 3e9591a commit f33dc2b
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 1 deletion.
15 changes: 15 additions & 0 deletions README.md
Expand Up @@ -34,6 +34,8 @@ The `check_ceph_health` nagios plugin monitors the ceph cluster, and report its
-n NAME, --name NAME ceph client name
-k KEYRING, --keyring KEYRING
ceph client keyring file
--check CHECK regexp of which check(s) to check (luminous+) Can be
inverted, e.g. '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$'
-w, --whitelist REGEXP
whitelist regexp for ceph health warnings
-d, --detail exec 'ceph health detail'
Expand All @@ -49,6 +51,19 @@ The `check_ceph_health` nagios plugin monitors the ceph cluster, and report its

nagios$ ./check_ceph_health --id nagios --whitelist 'requests.are.blocked(\s)*32.sec'

nagios$ ./check_ceph_health --id nagios
WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a )
OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) )
PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded )

nagios$ ./check_ceph_health --id nagios --check 'PG_DEGRADED|OBJECT_MISPLACED'
WARNING: OBJECT_MISPLACED( 1937172/695961284 objects misplaced (0.278%) )
PG_DEGRADED( Degraded data redundancy: 98/695961284 objects degraded (0.000%), 1 pg degraded )

nagios$ ./check_ceph_health --id nagios --check '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$'
WARNING: MON_CLOCK_SKEW( clock skew detected on mon.a )


## check_ceph_mon

The `check_ceph_mon` nagios plugin monitors an individual mon daemon, reporting its status.
Expand Down
4 changes: 4 additions & 0 deletions config/ceph.cfg
Expand Up @@ -7,6 +7,10 @@ define command{
command_name check_ceph_health_wargs
command_line /usr/lib/nagios/plugins/check_ceph_health -H '$HOSTADDRESS$'
}
define command{
command_name check_ceph_health_filtered
command_line /usr/lib/nagios/plugins/check_ceph_health -H '$HOSTADDRESS$' --check '$ARG1' --whitelist '$ARG2'
}
define command{
command_name check_ceph_mon
command_line /usr/lib/nagios/plugins/check_ceph_mon -I '$ARG1$'
Expand Down
8 changes: 7 additions & 1 deletion src/check_ceph_health
Expand Up @@ -23,7 +23,7 @@ import sys
import re
import json

__version__ = '1.5.2'
__version__ = '1.6.0'

# default ceph values
CEPH_COMMAND = '/usr/bin/ceph'
Expand All @@ -45,6 +45,8 @@ def main():
parser.add_argument('-i','--id', help='ceph client id')
parser.add_argument('-n','--name', help='ceph client name')
parser.add_argument('-k','--keyring', help='ceph client keyring file')
parser.add_argument('--check', help='regexp of which check(s) to check (luminous+) '
"Can be inverted, e.g. '^((?!PG_DEGRADED|OBJECT_MISPLACED).)*$'")
parser.add_argument('-w','--whitelist', help='whitelist regexp for ceph health warnings')
parser.add_argument('-d','--detail', help="exec 'ceph health detail'", action='store_true')
parser.add_argument('-V','--version', help='show version and exit', action='store_true')
Expand Down Expand Up @@ -114,6 +116,10 @@ def main():
if output.has_key('checks'):
#luminous
for check,status in output['checks'].iteritems():
# skip check if not selected
if args.check and not re.search(args.check, check):
continue

if status["severity"] == "HEALTH_ERR":
extended += msg
msg = "CRITCAL: %s( %s )" % (check,status['summary']['message'])
Expand Down

0 comments on commit f33dc2b

Please sign in to comment.