Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved Txn restart behaviour #221

Merged
merged 5 commits into from Dec 20, 2014
Merged

improved Txn restart behaviour #221

merged 5 commits into from Dec 20, 2014

Conversation

tbg
Copy link
Member

@tbg tbg commented Dec 13, 2014

The NodeIDs of nodes that cause an uncertainty related restart are now stored
within the Transaction associated to the request.

After having stored a NodeID, future read requests within the same transaction
will set MaxTimestamp = Timestamp in order to prevent uncertainty related
restarts for that node.

This should considerably impact performance in the relevant case
of a transaction reading a busy key from a node with a fast clock, which
would otherwise lead to repeated restarts until MaxTimestamp passes.

TestUncertaintyRestarts verifies that transactional reads within the
uncertainty interval cause exactly one restart.
The test runs a transaction
which attempts to read a single key, but just before that read, a future
version of that key is written directly through the MVCC layer.
Indirectly this tests that the transaction remembers the NodeID of the node
being read from correctly, at least in this simple case. Not remembering the
node would lead to thousands of transaction restarts.
This can easily be verified by disabling that code in the local sender and
running the test.

I've also refactored createTestDB to return an error instead of taking a *testing.T
because once you want to reuse that code in a benchmark, you will need that.

Added a simple txn write benchmark.
The benchmark runs N transactions back to back, each writing once to the same
key. This has nothing to do with the uncertainty stuff (and is an isolated commit).

I've been seeing a bunch of WriteIntentError related restarts, supposedly
because cleaning up the intents happens asynchronously. This looks like it's
something to expect in such a benchmark to me.

Once Raft is in place, likely the bottleneck will be that.

@tbg tbg force-pushed the certainty branch 2 times, most recently from 272defc to 313b4dc Compare December 13, 2014 23:59
// that as a reaction to this error, the transaction's timestamp is forwarded
// appropriately to reflect that node's clock uncertainty. Future reads to
// the same node are therefore freed from uncertainty restarts.
optional SeenNodes certain_nodes = 11 [(gogoproto.nullable) = false];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spencerkimball we'll need MaxTimestamp for the serializability linearizability, that's why I didn't marry it off into SeenNodes.

@tbg
Copy link
Member Author

tbg commented Dec 14, 2014

see #222 for the failed testrace, which is not specific to this branch.

@@ -125,6 +125,24 @@ func (ls *LocalSender) Send(call *client.Call) {
if err != nil {
call.Reply.Header().SetGoError(err)
} else {
if header := call.Args.Header(); proto.IsReadOnly(call.Method) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still want to do this same thing on some writeable calls. For example, ConditionalPut shouldn't throw a readwithinuncertainty interval error twice on txn restart. Eliminate the IsReadOnly check here. It costs nothing to set max_timestamp = timestamp, so no point in excepting write-only calls anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@spencerkimball
Copy link
Member

I realize there's one more thing you need here to make this correct. You need to change the code in storage/engine/mvcc.go where we return the ReadWithinUncertaintyIntervalError and instead of setting the timestamp of the intent, use the current timestamp of the node instead. Imagine scenario of reading two keys ("a" and "b") from Node 1. Both keys are in the uncertainty window, key "a" being at t=1, and "b" being at t=2. The txn reads "a" first and restarts (currently) with t=1. With this change, the max_timestamp is set to t=1 and all other reads to Node 1 will set max_timestamp=1, meaning the subsequent read of "b" will get an old value, even though it is properly within the uncertainty interval.

By returning the node's current time, we fix this problem and also get the maximum timestamp which will minimize chances of restart when continuing reads in txn to other nodes.

Otherwise, LGTM

the NodeIDs are stored within the Transaction associated to the request.
After having stored a node, future read requests within the same transaction
can set MaxTimestamp = Timestamp in order to prevent uncertainty related
restarts.

Implementing this should considerably impact performance in the relevant case
of transaction reading a busy key from a node with a fast clock, which
otherwise would lead to repeated restarts until MaxTimestamp passes.
Building on the last commit, this implements intelligent handling of
MaxTimestamp, taking into account a prior uncertainty related restarts.
In such a case, the host is known to the Txn via its NodeID, and the
request may be carried out with Timestamp = MaxTimestamp, preventing
a further uncertainty related restart
TestUncertaintyRestarts verifies that transactional reads within the
uncertainty interval cause exactly one restart. The test runs a transaction
which attempts to read a single key, but just before that read, a future
version of that key is written directly through the MVCC layer.

Indirectly this tests that the transaction remembers the NodeID of the node
being read from correctly, at least in this simple case. Not remembering the
node would lead to thousands of transaction restarts.

This can easily be tested by disabling that code in the local sender and
running the test.
The benchmark runs N transactions back to back, each writing once to the same
key.

I've been seeing a bunch of WriteIntentError related restarts, supposedly
because cleaning up the intents happens asynchronously. This looks like it's
something to expect in such a benchmark to me.

Once Raft is in place, likely the bottleneck will be that.
@tbg
Copy link
Member Author

tbg commented Dec 18, 2014

@spencerkimball, rebased and ready to go. Give it another look if you like, I will merge it over the weekend.

@@ -384,7 +384,8 @@ func mvccGetInternal(engine Engine, key proto.Key, kv proto.RawKeyValue, timesta
// Second case: Our read timestamp is behind the latest write, but the
// latest write could possibly have happened before our read in
// absolute time if the writer had a fast clock.
// The reader should try again at meta.Timestamp+1.
// The reader should try again with a later timestamp than the
// one given below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in a previous comment, the timestamp we return here needs to be set the this node's current clock, instead of using meta.Timestamp.

@spencerkimball
Copy link
Member

LGTM

notably,
- the range will forward an uncertain read error using the local node's clock,
  as per spencerkimball's comment
- added a test for that
- updated comments in all affected files to point to the proto.Transaction
  comment for details on this somewhat intricate interplay.
tbg added a commit that referenced this pull request Dec 20, 2014
improved Txn restart behaviour
@tbg tbg merged commit 72e33df into cockroachdb:master Dec 20, 2014
@tbg tbg deleted the certainty branch December 20, 2014 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants