New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: kvserver: avoid load based splits in middle of SQL row #104563
Conversation
We investigated why running YCSB Workload E results in a single hot range and we observed that range queries of the form SELECT * FROM table WHERE pkey >= A LIMIT B will result in all request spans having the same end key - similar to [A, range_end) - rather than end keys that take into account the specified LIMIT. Since the majority of request spans have the same end key, the load splitter algorithm cannot find a split key without too many contained and balance between left and right requests. A proposed solution is to use the response span rather than the request span, since the response span is more accurate in reflecting the keys that this request truly iterated over. We utilize the request span as well as the response's resume span to derive the key span that this request truly iterated over. Using response data (resume span) rather than just the request span in the load-based splitter (experimentally) allows the load-based splitter to find a split key under range query workloads (YCSB Workload E, KV workload with spans). Release note (ops change): We use response data rather than just the request span in the load-based splitter to pass more accurate data about the keys iterated over to the load splitter to find a suitable split key, enabling the load splitter to find a split key under heavy range query workloads.
Previously, there were no metrics or logging in the load-based splitter. This was inadequate because there is minimal observability into why the load splitter could not find a split key. To address this, this patch adds metrics and logging to the load splitter, including counter metrics indicating number of times could not find a split key and popular key (>25% occurrence) and logging indicating causes for no split key (insufficient counters, imbalance, too many contained). Release note (ops change): Added observability for when load based splitting cannot find a key to indicate the reasons why the load splitter could not find a split key, enabling us to have more observability and insight to debug why a range is not splitting more easily.
When we incorporated the use of response data in the load-based splitter, we called collectSpansRead, which is allocation heavy and computationally expensive, resulting in a performance regression. To address this, this patch performs 3 optimizations: 1. Remove the call to collectSpansRead; instead, add a custom function to iterate over the batch of requests / responses and calculate the true spans 2. Instead of constructing a *spanset.SpanSet and finding the union of spans (which uses O(batch_size) memory), we directly compute the union of spans while iterating over the batch resulting in only O(1) memory used 3. Lazily compute the union of true spans only when it is truly needed i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter has been initialized Release note: None
It was possible for a SQL row to be torn across two ranges due to the load-based splitter not rejecting potentially unsafe split keys. It is impossible to determine with keys sampled from response spans, whether a key is certainly unsafe or safe. This commit side steps this problem by re-using the `adminSplitWithDescriptor` command to find the first real key, after or at the provided `args.SplitKey`. This ensures that the split key will always be a real key whilst not requiring any checks in the splitter itself. The updated `adminSplitWithDescriptor` is local only and requires opting into finding the first safe key by setting `findFirstSafeKey` to `true`. As such, all safe split key checks are also removed from the `split` pkg, with a warning added that the any split key returned is unsafe. Resolves: cockroachdb#103483 Release note (bug fix): It was possible for a SQL row to be split across two ranges. When this occurred, SQL queries could return unexpected errors. This bug is resolved by these changes, as we now inspect the real keys, rather than just request keys to determine load-based split points.
It was possible that a load based split was suggested for `meta1`, which would call `MVCCFirstSplitKey` and panic as the `meta1` start key `\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`. Add a check that the start key is greater than the `meta1` end key before processing in `MVCCFirstSplitKey` to prevent the panic. Note `meta1` would never be split regardless, as `storage.IsValidSplitKey` would fail after finding a split key. Also note that if the desired split key is a local key, the same problem doesn't exist as the minimum split key would be used to seek the first split key instead. Fixes: cockroachdb#104007 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 16 of 16 files at r2, 4 of 4 files at r3, 10 of 10 files at r4, 2 of 2 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)
pkg/kv/kvserver/replica_split_load.go
line 60 at r2 (raw file):
} shouldInitSplit := r.loadBasedSplitter.Record(ctx, timeutil.Now(), len(ba.Requests), func() roachpb.Span { return spans.BoundarySpan(spanset.SpanGlobal)
We discussed on Slack that this change allows us to recommend local split points. These local keys will be handled correctly in adminSplitWithDescriptor
. There is also protection against local keys being used as split points later in the function.
pkg/storage/mvcc.go
line 5506 at r5 (raw file):
// cannot be split. if startKey.Less(roachpb.RKey(keys.LocalMax)) { return nil, nil
Did you consider setting startKey = keys.MetaMin
in this case, instead of failing the split? We don't want to split in meta1, but we do want to allow meta1
to split away from meta2
. The effect of this logic is that we disable all load-based splits on range 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/mvcc.go
line 5506 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you consider setting
startKey = keys.MetaMin
in this case, instead of failing the split? We don't want to split in meta1, but we do want to allowmeta1
to split away frommeta2
. The effect of this logic is that we disable all load-based splits on range 1.
I did consider it but assumed that there are plenty of size splits on meta2 such that we'd never need to look for LB splits in r1.
The size split code does what you're suggesting.
Lines 5864 to 5866 in 6235182
if key.Less(roachpb.RKey(keys.LocalMax)) { | |
key = roachpb.RKey(keys.LocalMax) | |
} |
I'm thinking my assumption is okay but doing what you suggested and size splits do would be better.
Should I open up a PR on master and start the backport process back to here - also should this hold up the backport to 22.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)
pkg/storage/mvcc.go
line 5506 at r5 (raw file):
Should I open up a PR on master and start the backport process back to here
That sounds good. Disabling load-based splits on range 1 is a limitation, but it doesn't feel like a serious probable (for the reasons you listed), so we don't need to hold up this backport on that other PR.
Filed the issue #104995 TYFTR! |
Backport 1/1 commits from #89217 on behalf of @kvoli.
Backport 1/1 commits from #88720 on behalf of @kvoli.
Backport 1/1 commits from #91462 on behalf of @kvoli.
Backport 1/3 commits from #103690 on behalf of @kvoli.
Backport 1/1 commits from #104082 on behalf of @kvoli.
/cc @cockroachdb/release
See individual commit messages. The most important commits are the
last two, backported to 23.1 in #103876. There was some serious
wrestling to backport this fix. In order to avoid future problems if
we decide to backport again, I included some dependency commits which
are lower risk:
#103690 fixes the unsafe load based splits and #104082 addresses
a bug in #103690 where a panic could occur if attempting to load-based
split meta1. These are backported already to 23.1.
It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with just the sampled request keys, whether a
key is certainly unsafe or safe, so a split key is returned regardless of error.
This PR side steps this problem by re-using the
adminSplitWithDescriptor
command to find the first real key, after orat the provided
args.SplitKey
. This ensures that the split key willalways be a real key whilst not requiring any checks in the splitter
itself.
The updated
adminSplitWithDescriptor
is local only and requires optinginto finding the first safe key by setting
findFirstSafeKey
totrue
.As such, all safe split key checks are also removed from the
split
pkg, with a warning added that the any split key returned is unsafe.
Resolves: #103483
Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now sample the real
keys, rather than just request keys to determine load-based split points.
Release justification: Serious bug fix.