Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: kvserver: avoid load based splits in middle of SQL row #104563

Merged
merged 5 commits into from Jun 15, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Jun 7, 2023

Backport 1/1 commits from #89217 on behalf of @kvoli.
Backport 1/1 commits from #88720 on behalf of @kvoli.
Backport 1/1 commits from #91462 on behalf of @kvoli.
Backport 1/3 commits from #103690 on behalf of @kvoli.
Backport 1/1 commits from #104082 on behalf of @kvoli.

/cc @cockroachdb/release


See individual commit messages. The most important commits are the
last two, backported to 23.1 in #103876. There was some serious
wrestling to backport this fix. In order to avoid future problems if
we decide to backport again, I included some dependency commits which
are lower risk:

#103690 fixes the unsafe load based splits and #104082 addresses
a bug in #103690 where a panic could occur if attempting to load-based
split meta1. These are backported already to 23.1.

It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with just the sampled request keys, whether a
key is certainly unsafe or safe, so a split key is returned regardless of error.

This PR side steps this problem by re-using the
adminSplitWithDescriptor command to find the first real key, after or
at the provided args.SplitKey. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated adminSplitWithDescriptor is local only and requires opting
into finding the first safe key by setting findFirstSafeKey to true.

As such, all safe split key checks are also removed from the split
pkg, with a warning added that the any split key returned is unsafe.

Resolves: #103483

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now sample the real
keys, rather than just request keys to determine load-based split points.


Release justification: Serious bug fix.

KaiSun314 and others added 5 commits June 7, 2023 12:46
We investigated why running YCSB Workload E results in a single hot
range and we observed that range queries of the form
SELECT * FROM table WHERE pkey >= A LIMIT B will result in all request
spans having the same end key - similar to [A, range_end) - rather than
end keys that take into account the specified LIMIT. Since the majority
of request spans have the same end key, the load splitter algorithm
cannot find a split key without too many contained and balance between
left and right requests. A proposed solution is to use the response span
rather than the request span, since the response span is more accurate
in reflecting the keys that this request truly iterated over. We utilize
the request span as well as the response's resume span to derive the key
span that this request truly iterated over. Using response data (resume
span) rather than just the request span in the load-based splitter
(experimentally) allows the load-based splitter to find a split key
under range query workloads (YCSB Workload E, KV workload with spans).

Release note (ops change): We use response data rather than just the
request span in the load-based splitter to pass more accurate data
about the keys iterated over to the load splitter to find a suitable
split key, enabling the load splitter to find a split key under heavy
range query workloads.
Previously, there were no metrics or logging in the load-based splitter.

This was inadequate because there is minimal observability into why the
load splitter could not find a split key.

To address this, this patch adds metrics and logging to the load
splitter, including counter metrics indicating number of times could not
find a split key and popular key (>25% occurrence) and logging
indicating causes for no split key (insufficient counters, imbalance,
too many contained).

Release note (ops change): Added observability for when load based
splitting cannot find a key to indicate the reasons why the load
splitter could not find a split key, enabling us to have more
observability and insight to debug why a range is not splitting more
easily.
When we incorporated the use of response data in the load-based
splitter, we called collectSpansRead, which is allocation heavy and
computationally expensive, resulting in a performance regression.

To address this, this patch performs 3 optimizations:
1. Remove the call to collectSpansRead; instead, add a custom function
to iterate over the batch of requests / responses and calculate the true
spans
2. Instead of constructing a *spanset.SpanSet and finding the union of
spans (which uses O(batch_size) memory), we directly compute the union
of spans while iterating over the batch resulting in only O(1) memory
used
3. Lazily compute the union of true spans only when it is truly needed
i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter
has been initialized

Release note: None
It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with keys sampled from response spans, whether a
key is certainly unsafe or safe.

This commit side steps this problem by re-using the
`adminSplitWithDescriptor` command to find the first real key, after or
at the provided `args.SplitKey`. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated `adminSplitWithDescriptor` is local only and requires opting
into finding the first safe key by setting `findFirstSafeKey` to `true`.

As such, all safe split key checks are also removed from the `split`
pkg, with a warning added that the any split key returned is unsafe.

Resolves: cockroachdb#103483

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now inspect the real
keys, rather than just request keys to determine load-based split points.
It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: cockroachdb#104007

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli changed the title kvserver: use response data in the load-based splitter kvserver: avoid load based splits in middle of SQL row Jun 7, 2023
@kvoli kvoli marked this pull request as ready for review June 8, 2023 02:27
@kvoli kvoli requested review from a team as code owners June 8, 2023 02:27
@kvoli kvoli requested a review from sumeerbhola June 8, 2023 02:27
@dhartunian dhartunian removed the request for review from a team June 8, 2023 14:23
@yuzefovich yuzefovich changed the title kvserver: avoid load based splits in middle of SQL row release-22.2: kvserver: avoid load based splits in middle of SQL row Jun 8, 2023
@nvanbenschoten nvanbenschoten self-requested a review June 9, 2023 14:45
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 16 of 16 files at r2, 4 of 4 files at r3, 10 of 10 files at r4, 2 of 2 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)


pkg/kv/kvserver/replica_split_load.go line 60 at r2 (raw file):

	}
	shouldInitSplit := r.loadBasedSplitter.Record(ctx, timeutil.Now(), len(ba.Requests), func() roachpb.Span {
		return spans.BoundarySpan(spanset.SpanGlobal)

We discussed on Slack that this change allows us to recommend local split points. These local keys will be handled correctly in adminSplitWithDescriptor. There is also protection against local keys being used as split points later in the function.


pkg/storage/mvcc.go line 5506 at r5 (raw file):

	// cannot be split.
	if startKey.Less(roachpb.RKey(keys.LocalMax)) {
		return nil, nil

Did you consider setting startKey = keys.MetaMin in this case, instead of failing the split? We don't want to split in meta1, but we do want to allow meta1 to split away from meta2. The effect of this logic is that we disable all load-based splits on range 1.

Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/storage/mvcc.go line 5506 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you consider setting startKey = keys.MetaMin in this case, instead of failing the split? We don't want to split in meta1, but we do want to allow meta1 to split away from meta2. The effect of this logic is that we disable all load-based splits on range 1.

I did consider it but assumed that there are plenty of size splits on meta2 such that we'd never need to look for LB splits in r1.

The size split code does what you're suggesting.

cockroach/pkg/storage/mvcc.go

Lines 5864 to 5866 in 6235182

if key.Less(roachpb.RKey(keys.LocalMax)) {
key = roachpb.RKey(keys.LocalMax)
}

I'm thinking my assumption is okay but doing what you suggested and size splits do would be better.

Should I open up a PR on master and start the backport process back to here - also should this hold up the backport to 22.2?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli and @sumeerbhola)


pkg/storage/mvcc.go line 5506 at r5 (raw file):

Should I open up a PR on master and start the backport process back to here

That sounds good. Disabling load-based splits on range 1 is a limitation, but it doesn't feel like a serious probable (for the reasons you listed), so we don't need to hold up this backport on that other PR.

@kvoli
Copy link
Collaborator Author

kvoli commented Jun 15, 2023

Filed the issue #104995

TYFTR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants