improved Txn restart behaviour #221

tbg · 2014-12-13T23:52:05Z

The NodeIDs of nodes that cause an uncertainty related restart are now stored
within the Transaction associated to the request.
After having stored a NodeID, future read requests within the same transaction
will set MaxTimestamp = Timestamp in order to prevent uncertainty related
restarts for that node.

This should considerably impact performance in the relevant case
of a transaction reading a busy key from a node with a fast clock, which
would otherwise lead to repeated restarts until MaxTimestamp passes.

TestUncertaintyRestarts verifies that transactional reads within the
uncertainty interval cause exactly one restart. The test runs a transaction
which attempts to read a single key, but just before that read, a future
version of that key is written directly through the MVCC layer.
Indirectly this tests that the transaction remembers the NodeID of the node
being read from correctly, at least in this simple case. Not remembering the
node would lead to thousands of transaction restarts.
This can easily be verified by disabling that code in the local sender and
running the test.

I've also refactored createTestDB to return an error instead of taking a *testing.T
because once you want to reuse that code in a benchmark, you will need that.

Added a simple txn write benchmark.
The benchmark runs N transactions back to back, each writing once to the same
key. This has nothing to do with the uncertainty stuff (and is an isolated commit).

I've been seeing a bunch of WriteIntentError related restarts, supposedly
because cleaning up the intents happens asynchronously. This looks like it's
something to expect in such a benchmark to me.

Once Raft is in place, likely the bottleneck will be that.

tbg · 2014-12-13T23:59:51Z

proto/data.proto

+  // that as a reaction to this error, the transaction's timestamp is forwarded
+  // appropriately to reflect that node's clock uncertainty. Future reads to
+  // the same node are therefore freed from uncertainty restarts.
+  optional SeenNodes certain_nodes = 11 [(gogoproto.nullable) = false];


@spencerkimball we'll need MaxTimestamp for the ~~serializability~~ linearizability, that's why I didn't marry it off into SeenNodes.

tbg · 2014-12-14T20:39:59Z

see #222 for the failed testrace, which is not specific to this branch.

spencerkimball · 2014-12-15T21:46:20Z

kv/local_sender.go

@@ -125,6 +125,24 @@ func (ls *LocalSender) Send(call *client.Call) {
 		if err != nil {
 			call.Reply.Header().SetGoError(err)
 		} else {
+			if header := call.Args.Header(); proto.IsReadOnly(call.Method) &&


You still want to do this same thing on some writeable calls. For example, ConditionalPut shouldn't throw a readwithinuncertainty interval error twice on txn restart. Eliminate the IsReadOnly check here. It costs nothing to set max_timestamp = timestamp, so no point in excepting write-only calls anyway.

spencerkimball · 2014-12-15T22:47:27Z

I realize there's one more thing you need here to make this correct. You need to change the code in storage/engine/mvcc.go where we return the ReadWithinUncertaintyIntervalError and instead of setting the timestamp of the intent, use the current timestamp of the node instead. Imagine scenario of reading two keys ("a" and "b") from Node 1. Both keys are in the uncertainty window, key "a" being at t=1, and "b" being at t=2. The txn reads "a" first and restarts (currently) with t=1. With this change, the max_timestamp is set to t=1 and all other reads to Node 1 will set max_timestamp=1, meaning the subsequent read of "b" will get an old value, even though it is properly within the uncertainty interval.

By returning the node's current time, we fix this problem and also get the maximum timestamp which will minimize chances of restart when continuing reads in txn to other nodes.

Otherwise, LGTM

the NodeIDs are stored within the Transaction associated to the request. After having stored a node, future read requests within the same transaction can set MaxTimestamp = Timestamp in order to prevent uncertainty related restarts. Implementing this should considerably impact performance in the relevant case of transaction reading a busy key from a node with a fast clock, which otherwise would lead to repeated restarts until MaxTimestamp passes.

Building on the last commit, this implements intelligent handling of MaxTimestamp, taking into account a prior uncertainty related restarts. In such a case, the host is known to the Txn via its NodeID, and the request may be carried out with Timestamp = MaxTimestamp, preventing a further uncertainty related restart

TestUncertaintyRestarts verifies that transactional reads within the uncertainty interval cause exactly one restart. The test runs a transaction which attempts to read a single key, but just before that read, a future version of that key is written directly through the MVCC layer. Indirectly this tests that the transaction remembers the NodeID of the node being read from correctly, at least in this simple case. Not remembering the node would lead to thousands of transaction restarts. This can easily be tested by disabling that code in the local sender and running the test.

The benchmark runs N transactions back to back, each writing once to the same key. I've been seeing a bunch of WriteIntentError related restarts, supposedly because cleaning up the intents happens asynchronously. This looks like it's something to expect in such a benchmark to me. Once Raft is in place, likely the bottleneck will be that.

tbg · 2014-12-18T18:04:30Z

@spencerkimball, rebased and ready to go. Give it another look if you like, I will merge it over the weekend.

spencerkimball · 2014-12-19T18:26:47Z

storage/engine/mvcc.go

@@ -384,7 +384,8 @@ func mvccGetInternal(engine Engine, key proto.Key, kv proto.RawKeyValue, timesta
 			// Second case: Our read timestamp is behind the latest write, but the
 			// latest write could possibly have happened before our read in
 			// absolute time if the writer had a fast clock.
-			// The reader should try again at meta.Timestamp+1.
+			// The reader should try again with a later timestamp than the
+			// one given below.


As I mentioned in a previous comment, the timestamp we return here needs to be set the this node's current clock, instead of using meta.Timestamp.

spencerkimball · 2014-12-19T18:28:32Z

LGTM

notably, - the range will forward an uncertain read error using the local node's clock, as per spencerkimball's comment - added a test for that - updated comments in all affected files to point to the proto.Transaction comment for details on this somewhat intricate interplay.

improved Txn restart behaviour

tbg force-pushed the certainty branch 2 times, most recently from 272defc to 313b4dc Compare December 13, 2014 23:59

tbg reviewed Dec 13, 2014
View reviewed changes

spencerkimball reviewed Dec 15, 2014
View reviewed changes

tbg added 4 commits December 18, 2014 10:49

tbg force-pushed the certainty branch from 3adffaa to ea74fc6 Compare December 18, 2014 18:03

spencerkimball reviewed Dec 19, 2014
View reviewed changes

tbg force-pushed the certainty branch from ea74fc6 to a6afc9b Compare December 20, 2014 15:16

tbg added a commit that referenced this pull request Dec 20, 2014

Merge pull request #221 from tschottdorf/certainty

72e33df

improved Txn restart behaviour

tbg merged commit 72e33df into cockroachdb:master Dec 20, 2014

tbg deleted the certainty branch December 20, 2014 15:22

petermattis mentioned this pull request Jun 5, 2018

storage: dropping a large table will brick a cluster due to compactions #24029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improved Txn restart behaviour #221

improved Txn restart behaviour #221

tbg commented Dec 13, 2014

tbg Dec 13, 2014

tbg commented Dec 14, 2014

spencerkimball Dec 15, 2014

tbg Dec 17, 2014

spencerkimball commented Dec 15, 2014

tbg commented Dec 18, 2014

spencerkimball Dec 19, 2014

spencerkimball commented Dec 19, 2014

improved Txn restart behaviour #221

improved Txn restart behaviour #221

Conversation

tbg commented Dec 13, 2014

tbg Dec 13, 2014

Choose a reason for hiding this comment

tbg commented Dec 14, 2014

spencerkimball Dec 15, 2014

Choose a reason for hiding this comment

tbg Dec 17, 2014

Choose a reason for hiding this comment

spencerkimball commented Dec 15, 2014

tbg commented Dec 18, 2014

spencerkimball Dec 19, 2014

Choose a reason for hiding this comment

spencerkimball commented Dec 19, 2014