New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: fix command queue dependencies where candidate span is enclosed #14733
storage: fix command queue dependencies where candidate span is enclosed #14733
Conversation
I've been chasing red herrings related to this all weekend; thanks for finding it! Reviewed 4 of 4 files at r1. pkg/sql/parallel_test.go, line 238 at r1 (raw file):
This LGTM but cc @RaduBerinde for context about why this was skipped under stress to begin with. Was it just copied from pkg/storage/command_queue.go, line 392 at r1 (raw file):
Remove the pkg/storage/command_queue_test.go, line 474 at r1 (raw file):
Link back to the issue that this is testing for. Should we test other permutations too? (a read depending on a write, etc) Comments from Reviewable |
pkg/sql/parallel_test.go, line 238 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I am not sure, I did some digging around and it was introduced in #10966; there's some context in #10939. Comments from Reviewable |
7015e6d
to
31576ef
Compare
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/sql/parallel_test.go, line 238 at r1 (raw file): Previously, RaduBerinde wrote…
Hmm, OK I'll revert this. pkg/storage/command_queue.go, line 392 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. pkg/storage/command_queue_test.go, line 474 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Other combinations don't really make sense. This is the simplest test which will reproduce the bug and it crucially requires that the waiter is a write, the enclosing span is a write, and there is a read-only span which is depended on by the waiter but not the enclosing span. Comments from Reviewable |
31576ef
to
6610d39
Compare
Review status: 1 of 3 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/sql/parallel_test.go, line 238 at r1 (raw file): Previously, spencerkimball (Spencer Kimball) wrote…
I'd leave it (so the parallel tests are run under stress). The logic tests are skipped under stress because A) the quantity and running time of the tests causes problems when run under stress and B) they don't have a history of differing behavior under stress. Neither of these apply to the parallel tests; there are only two (for now; even if it grows it's got a long way to go before it's an issue) and both have an understandable history of flakiness under stress. Comments from Reviewable |
Change #14342 had a bug in that it did not remove the optimization to stop checking dependencies in the event that the range group being built up completely enclosed the candidate span. Bailing out early is no longer allowed because we must additionally include any read-only spans which might not be transitively depended on by the enclosing read-write span(s). This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but manifested much more readily with prop-eval-KV due to the massively enclosing span which is added to provide linearizability across all ops hitting the range's command queue. Fixes #14434 Fixes #8057
6610d39
to
ae253ca
Compare
Review status: 2 of 4 files reviewed at latest revision, 3 unresolved discussions. pkg/sql/parallel_test.go, line 238 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
Awesome! Exciting to have the multinode parallel test finally enabled! |
With the change in #14342 we might eventually want to consider changing the structure here, now that the classic "reads depend on writes, writes depend on reads and write" rules are getting more complicated. I have a feeling some of the complexity we've been incrementally adding can be simplified if we reconsider the problem from scratch in light of these new constraints. Reviewed 2 of 4 files at r1. pkg/storage/command_queue.go, line 358 at r2 (raw file):
It looks like we could have retained the optimization for cases where Comments from Reviewable |
@nvanbenschoten I agree. If the range group data structure could give us enough info to efficiently track inter-command dependencies we could get linear performance in usual case and only quadratic time penalty in worst case scenarios. There is probably something better. Something to keep in mind. |
Change #14342 had a bug in that it did not remove the optimization to stop
checking dependencies in the event that the range group being built up
completely enclosed the candidate span. Bailing out early is no longer
allowed because we must additionally include any read-only spans which
might not be transitively depended on by the enclosing read-write span(s).
This bug applies to both prop-eval-KV as well as non-prop-eval-KV, but
manifested much more readily with prop-eval-KV due to the massively
enclosing span which is added to provide linearizability across all ops
hitting the range's command queue.
Fixes #14434
Fixes #8057