Prerequisites and Requirements
Operating System Requirements
Only Debian/Ubuntu Linux is supported as a platform for the master and tablet server nodes. Tested to work on Debian 8 Jessie.
The Clojure code is integrated into the project using the
The kudu-jepsen tests are invoked by executing the
plugin-specific goal. The parameters are passed via the standard
-D<property>=<value> notation. There is a dedicated Clojure wrapper script
populates the test environment with appropriate properties and iteratively
runs all the registered tests with different nemeses scenarios.
To build the library the following components are required:
Apache Maven version 3.3.6 or higher
To build the project, run in the parent directory (i.e.
$ mvn clean compile test-compile -Pjepsen
The machines for Kudu master and tserver nodes should be created prior to running the test: the tests does not create those itself. The machines should be up and running when starting the test.
To run the test, the following components are required at the control node:
Apache Maven version 3.3.6 or higher
SSH client (and optionally, SSH authentication agent)
gnuplot (to visualize test results)
Jepsen uses SSH to perform operations at DB nodes. The kudu-jepsen assumes that SSH keys are installed accordingly:
The public part of the SSH key should be added into the
authorized_keysfile at all DB nodes for the
For the SSH private key the options are:
Add the key to the SSH authentication agent running at the control node
Specify the path to the file with the key in plain (non-encrypted) format via the
If using SSH authentication agent to hold the SSH key for DB nodes access, run in the current directory:
$ mvn clojure:run -DtserverNodes="t0,t1,t2,t3,t4" -DmasterNodes="m0"
If not using SSH authentication agent, specify the SSH key location via the
$ mvn clojure:run -DtserverNodes="t0,t1,t2,t3,t4" -DmasterNodes="m0" -DsshKeyPath="./vm_root_id_rsa"
Note that commas (not spaces) are used to separate the names of the nodes. The DNS resolver should be properly configured to resolve the specified hostnames into IP addresses.
tserverNodes property is used to specify the set of nodes where to run
Kudu tablet servers. The
masterNodes property is used to specify the set of
nodes to run Kudu master servers.
In the Jepsen terminology, Kudu master and tserver nodes are playing Jepsen DB node roles. The machine where the above mentioned maven command is run plays Jepsen control node role.
When Jepsen’s analysis doesn’t find inconsistencies in the history of operations it outputs the following in the end of a test:
Everything looks good! ヽ(‘ー`)ノ
However, it might not be the case. If so, it’s crucial to understand why the test failed.
The majority of the kudu-jepsen test failures can be put into two classification buckets:
An error happened while setting up the testing environment, contacting machines at the Kudu cluster, starting up Kudu server-side components, or in any of the other third-party components the Jepsen uses (like clj-ssh), etc.
The Jepsen’s analysis detected inconsistent history of operations.
The former class of failures might be a manifestation of wrong configuration, a problem with the test environment, a bug in the test code itself or some other intermittent failure. Usually, encountering issues like that means the consistency analysis (which is the last step of a test scenario) cannot run. Such issues are reported as errors in the summary message. E.g., the example summary message below reports on 10 errors in 10 tests ran:
21:41:42 Ran 10 tests containing 10 assertions. 21:41:42 0 failures, 10 errors.
To get more details, take a closer look at the output of
or at particular
jepsen.log files under
The latter class represents more serious issue: a manifestation of non-linearizable history of operations. This is reported as failure in the summary message. E.g., the summary message below reports finding 2 instances of non-linearizable history among 10 tests ran:
22:21:52 Ran 10 tests containing 10 assertions. 22:21:52 2 failures, 0 errors.
If Jepsen’s analysis finds non-linearizable history of operations, it outputs the following in the end of a test:
Analysis invalid! (ﾉಥ益ಥ）ﾉ ┻━┻
To troubleshoot, first it’s necessary to find where the failed test stores
the results: it should be one of the timestamp-named sub-directories
$KUDU_HOME/java/kudu-jepsen/store/rw-register. One of the possible ways
to find the directory:
$ cd $KUDU_HOME/java/kudu-jepsen/store/rw-register $ find . -name jepsen.log | xargs grep 'Analysis invalid' ./20170109T071938.000-0800/jepsen.log:Analysis invalid! (ﾉಥ益ಥ）ﾉ ┻━┻ $
Another way is to find sub-directories where the
linear.svg file is present:
$ cd $KUDU_HOME/java/kudu-jepsen/store/rw-register $ find . -name linear.svg ./20170109T071938.000-0800/linear.svg $
history.txt files the failed test generates
linear.svg file (gnuplot is required for that). The diagram in
illustrates the part of the history which Jepsen found inconsistent:
the diagram shows the time/client operation status/system state relationship
and the sequences of legal/illegal operations paths. From this point, the next
step is to locate the corresponding part of the history in the
file. Usually the problem appears around an activation interval of the test
nemesis scenario. Once found, it’s possible to tie the vicinity of the
inconsistent operation sequence with the timestamps in the
Having the timestamps of the operations and their sequence, it’s possible to
find relative messages in
kudu-master.log log files
in sub-directories named as Kudu cluster nodes. Hopefully, that information
is enough to create a reproducible scenario for further troubleshooting