New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved Txn restart behaviour #221
Conversation
272defc
to
313b4dc
Compare
// that as a reaction to this error, the transaction's timestamp is forwarded | ||
// appropriately to reflect that node's clock uncertainty. Future reads to | ||
// the same node are therefore freed from uncertainty restarts. | ||
optional SeenNodes certain_nodes = 11 [(gogoproto.nullable) = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spencerkimball we'll need MaxTimestamp for the serializability linearizability, that's why I didn't marry it off into SeenNodes.
see #222 for the failed testrace, which is not specific to this branch. |
@@ -125,6 +125,24 @@ func (ls *LocalSender) Send(call *client.Call) { | |||
if err != nil { | |||
call.Reply.Header().SetGoError(err) | |||
} else { | |||
if header := call.Args.Header(); proto.IsReadOnly(call.Method) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still want to do this same thing on some writeable calls. For example, ConditionalPut shouldn't throw a readwithinuncertainty interval error twice on txn restart. Eliminate the IsReadOnly check here. It costs nothing to set max_timestamp = timestamp, so no point in excepting write-only calls anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
I realize there's one more thing you need here to make this correct. You need to change the code in storage/engine/mvcc.go where we return the ReadWithinUncertaintyIntervalError and instead of setting the timestamp of the intent, use the current timestamp of the node instead. Imagine scenario of reading two keys ("a" and "b") from Node 1. Both keys are in the uncertainty window, key "a" being at t=1, and "b" being at t=2. The txn reads "a" first and restarts (currently) with t=1. With this change, the max_timestamp is set to t=1 and all other reads to Node 1 will set max_timestamp=1, meaning the subsequent read of "b" will get an old value, even though it is properly within the uncertainty interval. By returning the node's current time, we fix this problem and also get the maximum timestamp which will minimize chances of restart when continuing reads in txn to other nodes. Otherwise, LGTM |
the NodeIDs are stored within the Transaction associated to the request. After having stored a node, future read requests within the same transaction can set MaxTimestamp = Timestamp in order to prevent uncertainty related restarts. Implementing this should considerably impact performance in the relevant case of transaction reading a busy key from a node with a fast clock, which otherwise would lead to repeated restarts until MaxTimestamp passes.
Building on the last commit, this implements intelligent handling of MaxTimestamp, taking into account a prior uncertainty related restarts. In such a case, the host is known to the Txn via its NodeID, and the request may be carried out with Timestamp = MaxTimestamp, preventing a further uncertainty related restart
TestUncertaintyRestarts verifies that transactional reads within the uncertainty interval cause exactly one restart. The test runs a transaction which attempts to read a single key, but just before that read, a future version of that key is written directly through the MVCC layer. Indirectly this tests that the transaction remembers the NodeID of the node being read from correctly, at least in this simple case. Not remembering the node would lead to thousands of transaction restarts. This can easily be tested by disabling that code in the local sender and running the test.
The benchmark runs N transactions back to back, each writing once to the same key. I've been seeing a bunch of WriteIntentError related restarts, supposedly because cleaning up the intents happens asynchronously. This looks like it's something to expect in such a benchmark to me. Once Raft is in place, likely the bottleneck will be that.
@spencerkimball, rebased and ready to go. Give it another look if you like, I will merge it over the weekend. |
@@ -384,7 +384,8 @@ func mvccGetInternal(engine Engine, key proto.Key, kv proto.RawKeyValue, timesta | |||
// Second case: Our read timestamp is behind the latest write, but the | |||
// latest write could possibly have happened before our read in | |||
// absolute time if the writer had a fast clock. | |||
// The reader should try again at meta.Timestamp+1. | |||
// The reader should try again with a later timestamp than the | |||
// one given below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in a previous comment, the timestamp we return here needs to be set the this node's current clock, instead of using meta.Timestamp.
LGTM |
notably, - the range will forward an uncertain read error using the local node's clock, as per spencerkimball's comment - added a test for that - updated comments in all affected files to point to the proto.Transaction comment for details on this somewhat intricate interplay.
The NodeIDs of nodes that cause an uncertainty related restart are now stored
within the Transaction associated to the request.
After having stored a NodeID, future read requests within the same transaction
will set MaxTimestamp = Timestamp in order to prevent uncertainty related
restarts for that node.
This should considerably impact performance in the relevant case
of a transaction reading a busy key from a node with a fast clock, which
would otherwise lead to repeated restarts until MaxTimestamp passes.
TestUncertaintyRestarts verifies that transactional reads within the
uncertainty interval cause exactly one restart. The test runs a transaction
which attempts to read a single key, but just before that read, a future
version of that key is written directly through the MVCC layer.
Indirectly this tests that the transaction remembers the NodeID of the node
being read from correctly, at least in this simple case. Not remembering the
node would lead to thousands of transaction restarts.
This can easily be verified by disabling that code in the local sender and
running the test.
I've also refactored createTestDB to return an error instead of taking a *testing.T
because once you want to reuse that code in a benchmark, you will need that.
Added a simple txn write benchmark.
The benchmark runs N transactions back to back, each writing once to the same
key. This has nothing to do with the uncertainty stuff (and is an isolated commit).
I've been seeing a bunch of WriteIntentError related restarts, supposedly
because cleaning up the intents happens asynchronously. This looks like it's
something to expect in such a benchmark to me.
Once Raft is in place, likely the bottleneck will be that.