Support loading parquet file in parallel by splitting row group#3945
Merged
chaoyli merged 1 commit intoStarRocks:mainfrom Mar 18, 2022
Merged
Support loading parquet file in parallel by splitting row group#3945chaoyli merged 1 commit intoStarRocks:mainfrom
chaoyli merged 1 commit intoStarRocks:mainfrom
Conversation
833839f to
dadf3db
Compare
Contributor
Author
|
run starrocks_be_unittest |
04e0117 to
2943aa9
Compare
Contributor
Author
|
run starrocks_fe_unittest |
Contributor
|
Add a more concise message about the performance test in the commit message. |
rickif
previously approved these changes
Mar 16, 2022
decster
previously approved these changes
Mar 16, 2022
ABingHuang
reviewed
Mar 17, 2022
2a14611 to
3d63d96
Compare
Collaborator
[FE PR Coverage check]😍 pass : 0 / 0 (0%) |
chaoyli
approved these changes
Mar 18, 2022
decster
approved these changes
Mar 18, 2022
ABingHuang
approved these changes
Mar 18, 2022
wyb
added a commit
to wyb/starrocks
that referenced
this pull request
Mar 19, 2022
StarRocks#3945 This commit supports loading parquet file in parallel by splitting row group, and requires that start offset and size must be set. So set broker range start offset and size in spark load push task.
4 tasks
gengjun-git
pushed a commit
that referenced
this pull request
Mar 19, 2022
) #3945 This commit supports loading parquet file in parallel by splitting row group, and requires that start offset and size must be set. So set broker range start offset and size in spark load push task. * Update BE for smooth upgrade
jaogoy
pushed a commit
to jaogoy/starrocks
that referenced
this pull request
Nov 15, 2023
* Add std.md Signed-off-by: Sida Shen <shenstan1@gmail.com> * Update std.md * Update std.md * Update std.md * Update std.md Signed-off-by: Sida Shen <shenstan1@gmail.com> Co-authored-by: Sida Shen <shenstan1@gmail.com> Co-authored-by: evelyn.zhaojie <98087056+evelynzhaojie@users.noreply.github.com> (cherry picked from commit 0173ebf) Co-authored-by: SidaShen <star.0731@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this:
Which issues of this PR fixes :
Fixes #3942
Problem Summary(Required) :
StarRocks broker load granularity of parallel scan is file. So that load one large file will be only one parallel process.
According to this problem, we support split parquet file using parquet row group and scan parallel.
In the case of FE setting parameter
load_parallel_instance_num=8, the load performance of a single Parquet file is improved by6x times