Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix command queue dependencies where candidate span is enclosed #14733

Merged

Conversation

spencerkimball
Copy link
Member

Change #14342 had a bug in that it did not remove the optimization to stop
checking dependencies in the event that the range group being built up
completely enclosed the candidate span. Bailing out early is no longer
allowed because we must additionally include any read-only spans which
might not be transitively depended on by the enclosing read-write span(s).

This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but
manifested much more readily with prop-eval-KV due to the massively
enclosing span which is added to provide linearizability across all ops
hitting the range's command queue.

Fixes #14434
Fixes #8057

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@bdarnell
Copy link
Member

bdarnell commented Apr 9, 2017

:lgtm_strong:

I've been chasing red herrings related to this all weekend; thanks for finding it!


Reviewed 4 of 4 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/parallel_test.go, line 238 at r1 (raw file):

	defer leaktest.AfterTest(t)()

	if testutils.Stress() {

This LGTM but cc @RaduBerinde for context about why this was skipped under stress to begin with. Was it just copied from logic_test.go which is large and arguably less interesting to run under stress?


pkg/storage/command_queue.go, line 392 at r1 (raw file):

				// The current command is a write, so add it to the write RangeGroup.
				_ = cq.wRg.Add(keyRange)

Remove the _ =; we aren't explicit about ignoring the return value at other call sites of RangeGroup.Add (which is now only used in tests)


pkg/storage/command_queue_test.go, line 474 at r1 (raw file):

// fail to return read-only dependencies that a candidate read/write
// span depends on, despite there being an overlapping span which
// completely encloses the candidate.

Link back to the issue that this is testing for.

Should we test other permutations too? (a read depending on a write, etc)


Comments from Reviewable

@RaduBerinde
Copy link
Member

pkg/sql/parallel_test.go, line 238 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This LGTM but cc @RaduBerinde for context about why this was skipped under stress to begin with. Was it just copied from logic_test.go which is large and arguably less interesting to run under stress?

I am not sure, I did some digging around and it was introduced in #10966; there's some context in #10939.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/serializable-failure-prop-eval-kv branch from 7015e6d to 31576ef Compare April 9, 2017 22:38
@spencerkimball
Copy link
Member Author

Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/sql/parallel_test.go, line 238 at r1 (raw file):

Previously, RaduBerinde wrote…

I am not sure, I did some digging around and it was introduced in #10966; there's some context in #10939.

Hmm, OK I'll revert this.


pkg/storage/command_queue.go, line 392 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Remove the _ =; we aren't explicit about ignoring the return value at other call sites of RangeGroup.Add (which is now only used in tests)

Done.


pkg/storage/command_queue_test.go, line 474 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Link back to the issue that this is testing for.

Should we test other permutations too? (a read depending on a write, etc)

Other combinations don't really make sense. This is the simplest test which will reproduce the bug and it crucially requires that the waiter is a write, the enclosing span is a write, and there is a read-only span which is depended on by the waiter but not the enclosing span.


Comments from Reviewable

@spencerkimball spencerkimball force-pushed the spencerkimball/serializable-failure-prop-eval-kv branch from 31576ef to 6610d39 Compare April 9, 2017 22:57
@bdarnell
Copy link
Member

bdarnell commented Apr 9, 2017

Review status: 1 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/sql/parallel_test.go, line 238 at r1 (raw file):

Previously, spencerkimball (Spencer Kimball) wrote…

Hmm, OK I'll revert this.

I'd leave it (so the parallel tests are run under stress). The logic tests are skipped under stress because A) the quantity and running time of the tests causes problems when run under stress and B) they don't have a history of differing behavior under stress. Neither of these apply to the parallel tests; there are only two (for now; even if it grows it's got a long way to go before it's an issue) and both have an understandable history of flakiness under stress.


Comments from Reviewable

Change #14342 had a bug in that it did not remove the optimization to stop
checking dependencies in the event that the range group being built up
completely enclosed the candidate span. Bailing out early is no longer
allowed because we must additionally include any read-only spans which
might not be transitively depended on by the enclosing read-write span(s).

This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but
manifested much more readily with prop-eval-KV due to the massively
enclosing span which is added to provide linearizability across all ops
hitting the range's command queue.

Fixes #14434
Fixes #8057
@spencerkimball spencerkimball force-pushed the spencerkimball/serializable-failure-prop-eval-kv branch from 6610d39 to ae253ca Compare April 9, 2017 23:08
@spencerkimball
Copy link
Member Author

Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/parallel_test.go, line 238 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I'd leave it (so the parallel tests are run under stress). The logic tests are skipped under stress because A) the quantity and running time of the tests causes problems when run under stress and B) they don't have a history of differing behavior under stress. Neither of these apply to the parallel tests; there are only two (for now; even if it grows it's got a long way to go before it's an issue) and both have an understandable history of flakiness under stress.

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member

Awesome! Exciting to have the multinode parallel test finally enabled!

@nvanbenschoten
Copy link
Member

:lgtm: Nice catch!

With the change in #14342 we might eventually want to consider changing the structure here, now that the classic "reads depend on writes, writes depend on reads and write" rules are getting more complicated. I have a feeling some of the complexity we've been incrementally adding can be simplified if we reconsider the problem from scratch in light of these new constraints.


Reviewed 2 of 4 files at r1.
Review status: 2 of 4 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/storage/command_queue.go, line 358 at r2 (raw file):

					chans = append(chans, cmd.pending)
				}
			} else {

It looks like we could have retained the optimization for cases where !cmd.readOnly && enclosed == true. I'm not sure this is very important in practice though, as all of the operations below should have constant amortized time in the regime where enclosed is already true (cq.wRg.Encloses(newCmdRange)). Still, might be worth adding back in.


Comments from Reviewable

@spencerkimball
Copy link
Member Author

@nvanbenschoten I agree. If the range group data structure could give us enough info to efficiently track inter-command dependencies we could get linear performance in usual case and only quadratic time penalty in worst case scenarios. There is probably something better. Something to keep in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants