You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running Jepsen tests on a pn-counter can produce invalid final reads even with no faults.
Configuration
Antidote in 5 data-centers, each data-center 1 node.
Erlang client with static transactions, 1 sec timeout.
Workload
random mix of increment/decrement and read
random distribution across nodes
5 keys per test
Test Operation Generators Use Different Counter Strategies
random values
grow-only
swinging between p's and n's
Uses largish unique random values to help in checking the results.
E.g. when calculating all of the possible partial eventually states to evaluate a read, larger values produce a sparser set of possible ranges than +/-1.
No Faults
no faults
all client operations are :ok
not all increment(s) are always reflected on all node(s)
unacceptable final read on some node(s)
appears some operation(s) are replicated on all nodes except node operation was performed on? or node drops an incerment it :ok'd?
Reproducing
~15+% of the tests fail.
This repository is a Docker scripted environment that brings up Antidote and runs Jepsen tests.
(Add recon to Antidote app config if tracing is desired.)
Interpreting the Results
Most of the time using the web interface to the test results is a good first step.
All of the files (including individual node's Antidote logs) are saved with each test run for further analysis.
In the browser:
Test Runs -> Failing Test -> 'independent' (results for each key) -> failed key:
Going to node n2's Antidote log:
results.edn
(This is a case where using larger unique random numbers as increments allows
a single transaction to be identified as the only possible missing value.)
;; looking at node n2's history;; last valid read
{:type:ok, :f:read, :value5651785, :monotonic?true, :time64261497446, :process1, :node"n2", :index11335}
;; this read is not possible:;; - the op's seen by this client plus any combination of other op's cannot equal this value;; - the difference is equal to dropping a single :ok transaction that happened on this node
{:type:ok, :f:read, :value5852405, :monotonic?true, :time66418677772, :process1, :node"n2", :index11465}
;; all further reads return same not possible value
{:type:ok, :f:read, :value5852405, :monotonic?true, :time68087898812, :process1, :node"n2", :index11557}
...
;; final read is invalid, the difference is a missing :ok op on this node
{:index11889, :value5852405, :time75466391766, :process1, :type:ok, :node"n2", :final?true, :monotonic?true, :f:read}
Ploting the difference between the read values from each node and the complete total order counter (one can see node n2 not fully updating):
Please ask if there's any way the tests could be more useful, or any questions.
The text was updated successfully, but these errors were encountered:
Running Jepsen tests on a pn-counter can produce invalid final reads even with no faults.
Configuration
Workload
Test Operation Generators Use Different Counter Strategies
Uses largish unique random values to help in checking the results.
E.g. when calculating all of the possible partial eventually states to evaluate a read, larger values produce a sparser set of possible ranges than +/-1.
No Faults
:ok
Reproducing
~15+% of the tests fail.
This repository is a Docker scripted environment that brings up Antidote and runs Jepsen tests.
To run 20 randomly configured tests:
To run tests until a failure (or max 20 times), leave the db running for debugging, and trace an interesting Erlang function call:
(Add
recon
to Antidote app config if tracing is desired.)Interpreting the Results
Most of the time using the web interface to the test results is a good first step.
All of the files (including individual node's Antidote logs) are saved with each test run for further analysis.
In the browser:
Going to node n2's Antidote log:
results.edn
(This is a case where using larger unique random numbers as increments allows
a single transaction to be identified as the only possible missing value.)
Ploting the difference between the read values from each node and the complete total order counter (one can see node n2 not fully updating):
Please ask if there's any way the tests could be more useful, or any questions.
The text was updated successfully, but these errors were encountered: