Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: augment command queue with timestamps for non-interfering reads #14342

Merged
merged 1 commit into from Mar 24, 2017

Conversation

spencerkimball
Copy link
Member

@spencerkimball spencerkimball commented Mar 23, 2017

Previously, the command queue considered any two commands to overlap if the
key ranges overlapped, regardless of whether the timestamps on the two
commands guaranteed they were non-overlapping.

Consider the case of a read over key ranges "a" - "z" at timestamp t=1s.
If a write arrives for key "c" at timestamp > t=1s, then it should be
free to proceed regardless of the disposition of the read because the
read cannot affect the write's timestamp.

Similarly, if a write is active for key "c" at timestamp t=1s, a read
that arrives for key range "a" - "z" should be free to proceed as long
as its timestamp is < t=1s.

Note that local keys do not allow non-interference between earlier reads
and overlapping writes.

Fixes #14298


This change is Reviewable

@spencerkimball
Copy link
Member Author

+cc @danhhz @dt

@tbg
Copy link
Member

tbg commented Mar 23, 2017

The read can't affect the later write, but the write can affect the read's outcome because of uncertainty errors. With the change (didn't look at the code, just the PR message) it seems that we could fail to let uncertain reads fail, which in turn would lead to anomalies.

@spencerkimball
Copy link
Member Author

@tschottdorf I don't think this is an anomaly. There is no way for the concurrent write at the higher timestamp to return and send a back channel message, back in time, to the reader such that the reader experiences a "does not read what was written previously in absolute time".

A less tongue in cheek way to say this is that the uncertainty interval errors protect us from failing to read something that provably happened before the read started.

Here's another way to look at it: imagine that the write happens at t=10, and the max offset = 0.25s. The case you bring up is that the read and write are concurrent, with read at t=9.90. You're worried that if the read is allowed to proceed with this change, the behavior will change from getting an uncertainty interval error to not getting one. But the same thing is already true if the read arrives just slightly before the write, finishes without any uncertainty error for t=9.90, and then the write arrives and puts a value at t=10.

The reason we return the uncertainty errors is to avoid the failed external guarantee of not reading what some observer can prove already was written. That's not the case with concurrent reads and writes.

@danhhz
Copy link
Contributor

danhhz commented Mar 23, 2017

I tested this locally and can confirm it seems to fix #14298. I'll also push it to lapis to make extra sure

@danhhz
Copy link
Contributor

danhhz commented Mar 23, 2017

Works on lapis, too 👌

@tamird
Copy link
Contributor

tamird commented Mar 23, 2017

Reviewed 2 of 4 files at r1.
Review status: 2 of 4 files reviewed at latest revision, 5 unresolved discussions.


pkg/storage/command_queue.go, line 309 at r1 (raw file):

		//   We solve this problem by always including read commands with timestamps less than
		//   the latest write timestamp seen so far, which guarantees that we will wait on any
		//   reads which might not be dependend on by writes with higher command IDs. Similarly,

command IDs?


pkg/storage/command_queue.go, line 337 at r1 (raw file):

				// this current command to the combined RangeGroup.
				cq.rwRg.Add(keyRange)
				if !cq.wRg.Overlaps(keyRange) || mustWait {

nit: move mustWait before the || to save a call to Overlaps.


pkg/storage/command_queue.go, line 369 at r1 (raw file):

				// dependency established with a dependent of the current overlap, meaning we already established
				// an implicit transitive dependency to the current overlap.
				if !overlapRg.Overlaps(keyRange) || mustWait {

ditto


pkg/storage/command_queue.go, line 425 at r1 (raw file):

timestamp == hlc.Timestamp{}

This condition isn't necessary to check, is it? ditto below.


pkg/storage/replica.go, line 1687 at r1 (raw file):

		// of earlier reads with later writes, but only for the global
		// command queue. Reads and writes to local keys always interfere.
		reqTS := ba.Timestamp

to make the comment above clearer, can you name this reqGlobalTS and add another var reqLocalTS hlc.Timestamp?


Comments from Reviewable

@bdarnell
Copy link
Member

:lgtm:


Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


pkg/storage/command_queue.go, line 207 at r1 (raw file):

// read-only command; false for read-write.
func (cq *CommandQueue) getWait(
	readOnly bool, timestamp hlc.Timestamp, spans []roachpb.Span,

Add a comment about the timestamp argument, and the special meaning of a zero timestamp.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/cmd-queue-timestamps branch from b7e956b to 9f2fe51 Compare March 23, 2017 19:43
@spencerkimball
Copy link
Member Author

Review status: all files reviewed at latest revision, 6 unresolved discussions.


pkg/storage/command_queue.go, line 207 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Add a comment about the timestamp argument, and the special meaning of a zero timestamp.

Done.


pkg/storage/command_queue.go, line 309 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

command IDs?

These are allocated by the command queue.


pkg/storage/command_queue.go, line 337 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

nit: move mustWait before the || to save a call to Overlaps.

Done.


pkg/storage/command_queue.go, line 369 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

ditto

Done.


pkg/storage/command_queue.go, line 425 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

timestamp == hlc.Timestamp{}

This condition isn't necessary to check, is it? ditto below.

Done. Could you check my logic? These timestamp Less functions utterly confuse me.


pkg/storage/replica.go, line 1687 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

to make the comment above clearer, can you name this reqGlobalTS and add another var reqLocalTS hlc.Timestamp?

Done.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Mar 23, 2017

Reviewed 2 of 4 files at r2.
Review status: 2 of 4 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


pkg/storage/command_queue.go, line 425 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Done. Could you check my logic? These timestamp Less functions utterly confuse me.

Yeah, I think this is correct; if timestamp is the zero value and c.timestamp isn't then c.timestamp.Less(timestamp) is not true.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/cmd-queue-timestamps branch from 9f2fe51 to 2e3cdd9 Compare March 23, 2017 23:12
@nvanbenschoten
Copy link
Member

:lgtm:


Review status: 2 of 4 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 327 at r3 (raw file):

			if cmd.readOnly {
				if (cmd.timestamp != hlc.Timestamp{}) {

I'd pull cmd.timestamp != hlc.Timestamp{} up above to a hasTimestamp variable that we can use here and below.


pkg/storage/command_queue_test.go, line 423 at r3 (raw file):

	// Verify that an earlier writer for whole span waits on all commands.
	w = cq.getWait(false, makeTS(0, 1), []roachpb.Span{mkSpan("a", "g")})
	allW := []<-chan struct{}{

This is a pretty nice way to test this. My only concern is that it's dependent on the order that getWait returns the channels in, but that's not a huge deal.


pkg/storage/replica.go, line 1690 at r3 (raw file):

		// Get the requested timestamp. This is used for non-interference
		// of earlier reads with later writes, but only for the global
		// command queue. Reads and writes to local keys always interfere.

Could you expand on why "Reads and writes to local keys always interfere"?


Comments from Reviewable

Previously, the command queue considered any two commands to overlap if the
key ranges overlapped, regardless of whether the timestamps on the two
commands guaranteed they were non-overlapping.

Consider the case of a read over key ranges "a" - "z" at timestamp t=1s.
If a write arrives for key "c" at timestamp > t=1s, then it should be
free to proceed regardless of the disposition of the read because the
read cannot affect the write's timestamp.

Similarly, if a write is active for key "c" at timestamp t=1s, a read
that arrives for key range "a" - "z" should be free to proceed as long
as its timestamp is < t=1s.

Note that local keys do not allow non-interference between earlier reads
and overlapping writes.

Fixes #14298
@spencerkimball
Copy link
Member Author

Review status: 2 of 4 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 327 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd pull cmd.timestamp != hlc.Timestamp{} up above to a hasTimestamp variable that we can use here and below.

Done.


pkg/storage/command_queue_test.go, line 423 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is a pretty nice way to test this. My only concern is that it's dependent on the order that getWait returns the channels in, but that's not a huge deal.

Yes this will break if that changes, but should be easy enough to fix the test.


pkg/storage/replica.go, line 1690 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you expand on why "Reads and writes to local keys always interfere"?

Done.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/cmd-queue-timestamps branch from 2e3cdd9 to 81ce3af Compare March 24, 2017 13:31
@spencerkimball spencerkimball merged commit 6c1b3f1 into master Mar 24, 2017
@spencerkimball spencerkimball deleted the spencerkimball/cmd-queue-timestamps branch March 24, 2017 15:07
spencerkimball added a commit that referenced this pull request Apr 9, 2017
Change #14342 had a bug in that it did not remove the optimization to stop
checking dependencies in the event that the range group being built up
completely enclosed the candidate span. Bailing out early is no longer
allowed because we must additionally include any read-only spans which
might not be transitively depended on by the enclosing read-write span(s).

This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but
manifested much more readily with prop-eval-KV due to the massively
enclosing span which is added to provide linearizability across all ops
hitting the range's command queue.

Fixes #14434
Fixes #8057
spencerkimball added a commit that referenced this pull request Apr 9, 2017
Change #14342 had a bug in that it did not remove the optimization to stop
checking dependencies in the event that the range group being built up
completely enclosed the candidate span. Bailing out early is no longer
allowed because we must additionally include any read-only spans which
might not be transitively depended on by the enclosing read-write span(s).

This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but
manifested much more readily with prop-eval-KV due to the massively
enclosing span which is added to provide linearizability across all ops
hitting the range's command queue.

Fixes #14434
Fixes #8057
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 4, 2017
The test added here verifies that if a dependency relation between
commands inserted into the `CommandQueue` should exist, it is always
transitively maintained even if other commands are inserted between
them. This is important because the transitive dependency relation
is required for the correctness of certain optimizations performed
by the `CommandQueue`, as well as by our approach to command
cancellation and prerequisite migration.

The test made me feel more confident that even with the timestamp
handling added in cockroachdb#14342, our optimizations still work correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants