Skip to content

basho-labs/riak_nagios

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Repository for reusable nagios monitoring scripts.

A Basic Nagios Primer

  • A Nagios check is a shell script

  • This is the documentation for them: http://nagiosplug.sourceforge.net/developer-guidelines.html

  • The most important part to a nagios script is the return codes.

    0 OK
    1 Warning
    2 Critical
    3 Unknown

  • Can be written in any language as long as the exit codes are right.

    • Perl is the standard
    • Python / Bash is usually installed by default, and easier to work with
    • Ruby is not usually installed by default

Riak Nagios

Building

make escript

Deployment

scp check_node root@host.net:/usr/sbin/

Note: This script creates an .erlang.cookie inside the user's home directory. This can cause issues with users with no home directories set by default, or users with non-writable home directories.

NRPE

Riak related checks are configured in /etc/nagios/nrpe.d/riak.cfg, for example:

command[check_riak_up]=/usr/lib/riak/erts-5.8.5/bin/escript check_node --node riak@host.net riak_kv_up
command[check_riak_repl]=/usr/lib/riak/erts-5.8.5/bin/escript check_node --node riak@host.net riak_repl
command[check_riak_cs_up]=/usr/lib/riak/erts-5.8.5/bin/escript check_node --node riak_cs@host.net node_up

After any change to this file, you'll need to restart the nrpe server. On ubuntu, that's as easy as service nagios-nrpe-server restart

You can find the current nrpe config files in the cfg directory

Testing NRPE

Once it's configured, you can use this nrpe plugin to test. If you see any of the possible status messages, and not the NRPE: Unable to read output error, your nrpe monitors are working.

/usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_riak_up
/usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_riak_repl

Response

check_riak_up

  • On critical alert: Riak needs to be restarted by signing in and running riak start. If alerts continue to be triggered after restarting Riak, open a ticket with Basho.

check_riak_repl

  • On critical alert: the check_node script received an unexpected return from Riak and further investigation is required. Open a ticket with Basho.
  • On warning alert: the check_node script detected a socket error. The script automatically informs Riak of the error and Riak resets the connection. No action is required.