riaknostic is an escript and set of tools that diagnoses common problems which could affect a Riak node or cluster. When experiencing any problem with Riak,
riaknostic should be the first thing run during troubleshooting. The tool is integrated with Riak via the
To diagnose problems with Riak, Riaknostic uses a series of checks which are derived from the experience of the Basho Client Services Team as well as numerous public discussions on the mailing list, IRC room, and other online media.
Here is a basic example of using
riaknostic followed immediately by the command's output:
$ riak-admin diag 15:34:52.736 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file. 15:34:52.736 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.
As shown in the above output, Riaknostic tells us about two problems right away. First, an Erlang crash dump is present, indicating that Riak has experienced a crash. Second, a performance problem is mentioned (disk mounted without
noatime argument)along with a helpful tip to resolve the issue.
Important: If you are running Riak v1.3.0 or greater, you already have Riaknostic, so you can skip to the Usage section below.
Riaknostic depends on features introduced by Erlang version R14B04, so verify that you've installed this version of Erlang before proceeding with installation.
riaknostic, download the latest package version, and extract it within the directory shown for your operating system in the following table:
|Linux (Redhat, CentOS, Debian, Ubuntu)||
|Mac OS/X or Self-built||
An example Riaknostic installation for Linux looks like this:
wget https://github.com/basho/riaknostic/downloads/riaknostic-1.0.2.tar.gz -P /tmp cd /usr/lib/riak/lib sudo tar xzvf /tmp/riaknostic-1.0.2.tar.gz
The package will expand to a
riaknostic/ directory which contains the
riaknostic script, source code in the
src/ directory, and documentation.
Now try it out!
For most cases, you can just run the
riak-admin diag command as given at the top of this README. However, sometimes you might want to know some extra detail or run only specific checks. For that, there are command-line options. Execute
riaknostic --help to learn more about these options:
riak-admin diag --help Usage: riak-admin diag [-d <level>] [-l] [-h] [check_name ...] -d, --level Minimum message severity level (default: notice) -l, --list Describe available diagnostic tasks -h, --help Display help/usage check_name A specific check to run
To get an idea of what checks will be run, use the
riak-admin diag --list Available diagnostic checks: disk Data directory permissions and atime dumps Find crash dumps memory_use Measure memory usage nodes_connected Cluster node liveness ring_membership Cluster membership validity ring_size Ring size valid
If you want all the gory details about what Riaknostic is doing, you can run the checks at a more verbose logging level with the --level option:
riak-admin diag --level debug 18:34:19.708 [debug] Lager installed handler lager_console_backend into lager_event 18:34:19.720 [debug] Lager installed handler error_logger_lager_h into error_logger 18:34:19.720 [info] Application lager started on node nonode@nohost 18:34:20.736 [debug] Not connected to the local Riak node, trying to connect. alive:false connect_failed:undefined 18:34:20.737 [debug] Starting distributed Erlang. 18:34:20.740 [debug] Supervisor net_sup started erl_epmd:start_link() at pid <0.42.0> 18:34:20.742 [debug] Supervisor net_sup started auth:start_link() at pid <0.43.0> 18:34:20.771 [debug] Supervisor net_sup started net_kernel:start_link(['email@example.com',longnames]) at pid <0.44.0> 18:34:20.771 [debug] Supervisor kernel_sup started erl_distribution:start_link(['firstname.lastname@example.org',longnames]) at pid <0.41.0> 18:34:20.781 [debug] Supervisor inet_gethost_native_sup started undefined at pid <0.49.0> 18:34:20.782 [debug] Supervisor kernel_safe_sup started inet_gethost_native:start_link() at pid <0.48.0> 18:34:20.834 [debug] Connected to local Riak node 'email@example.com'. 18:34:20.939 [debug] Local RPC: os:getpid()  18:34:20.939 [debug] Running shell command: ps -o pmem,rss,command -p 83144 18:34:20.946 [debug] Shell command output: %MEM RSS COMMAND 0.4 31004 /srv/riak/erts-5.8.4/bin/beam.smp -K true -A 64 -W w -- -root /srv/riak/rel/riak -progname riak -- -home /Users/sean -- -boot /srv/riak/releases/1.0.2/riak -embedded -config /srv/riak/etc/app.config -name firstname.lastname@example.org -setcookie riak -- console 18:34:20.960 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file. 18:34:20.961 [notice] Data directory /srv/riak/data/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance. 18:34:20.961 [info] Riak process is using 0.4% of available RAM, totalling 31004 KB of real memory.
Most times you'll want to use the defaults, but any Syslog severity name will do (from most to least verbose):
debug, info, notice, warning, error, critical, alert, emergency.
Finally, if you want to run just a single diagnostic or a list of specific ones, you can pass their name(s):
riak-admin diag dumps 18:41:24.083 [warning] Riak crashed at Wed, 07 Dec 2011 21:47:50 GMT, leaving crash dump in /srv/riak/log/erl_crash.dump. Please inspect or remove the file.
- Read DEVELOPMENT.md
- Fork the project on Github.
- Make your changes or additions on a "topic" branch, test and document them. If you are making a new diagnostic, make sure you give some module-level information about the checks it performs. Note: diagnostics should not make modifications to Riak, only inspect things.
- Push to your fork and send a pull-request.
- A Basho Developer Advocate or Engineer will review your pull-request and get back to you.