New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release-7.1] Add Manual Shard Split #10909
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 19, 2023 05:24
40fef68
to
f0c8226
Compare
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 20, 2023 16:57
b94052a
to
9892151
Compare
Result of foundationdb-pr-clang on Linux CentOS 7
|
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 23, 2023 04:30
b9e42f2
to
a03faa2
Compare
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 23, 2023 05:04
a03faa2
to
599dfa3
Compare
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 23, 2023 21:04
98f1be8
to
cfeaa31
Compare
kakaiu
force-pushed
the
add-manual-shard-split
branch
from
September 23, 2023 21:05
cfeaa31
to
229b237
Compare
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
liquid-helium
approved these changes
Sep 25, 2023
jzhou77
approved these changes
Sep 25, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
splitMetrics.iosPerKSecond = splitMetrics.infinity; | ||
splitMetrics.bytesReadPerKSecond = splitMetrics.infinity; // Don't split by readBandwidth | ||
|
||
Standalone<VectorRef<KeyRef>> splitKeys_ = wait(getSplitKeys(self, keys, splitMetrics, metrics)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: these two lines can be replaced with
wait(store(splitKeys, getSplitKeys(self, keys, splitMetrics, metrics)));
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements fdbcli:
redistribute <BeginKey> <EndKey>
. The input range is passed to data distributor and DD issues data moves including all data within the range. Suppose current shard boundary is [a, c), [c, d), [d, f) and the input range is [b, d), the DD splits the shard [a, c) into [a, b) and [b, c) and triggers data moves for [a, b), [b, c), and [c, d).100K correctness test:
20230923-200049-zhewang-a2ab10ff266a2d92 compressed=True data_size=24112130 duration=4887150 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:56:16 sanity=False started=100000 stopped=20230923-205705 submitted=20230923-200049 timeout=5400 username=zhewang
100K ManualShardSplit test:
20230924-004553-zhewang-70bf59bc9d72a8b7 compressed=True data_size=24111795 duration=4863033 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:53:25 sanity=False started=100000 stopped=20230924-013918 submitted=20230924-004553 timeout=5400 username=zhewang
Tested by Kubernetes cluster.
Limitation:
Although this PR guarantees to trigger data moves for the splitting, the data moves are not always complete. Sometimes, data moves are aborted due to DD restarts or too many data moves in the source server. For any shard that has triggered data moves but the data moves are failed to complete, this PR does not re-issue the data moves for those shards. We may need to persist and monitor the progress of the data redistribution.
Concerns:
What if too many ranges are included? DDQueue can drop data moves if src is overloaded by data moves.
It is possible that a move is issued but get aborted for some reason. Do we want to re-issue the data move if any move gets aborted? -- persist data moves until data moves complete.
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)