-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](multi-catalog) Add max_file_splits_num to prevent OOM when file_split_size is too small #58759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
run buildall |
|
run buildall |
TPC-H: Total hot run time: 34570 ms |
TPC-DS: Total hot run time: 180264 ms |
ClickBench: Total hot run time: 27.19 s |
|
run buildall |
TPC-H: Total hot run time: 34415 ms |
TPC-DS: Total hot run time: 180789 ms |
ClickBench: Total hot run time: 27.57 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
|
||
| // Calculate total file size | ||
| long totalFileSize = 0; | ||
| for (long fileSize : fileSizes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You iterate fileSizes twice.
Only need one
What problem does this PR solve?
Problem Summary:
When querying external tables, if
file_split_sizeis set too small, large files will be split into a huge number of splits. In non-batch mode, all splits are loaded into memory at once, which can cause OOM when there are too many splits.Solution
Add
max_file_splits_numconfiguration to limit the total number of splits across all files. The system will perform a global assessment before generating splits:file_split_sizemax_file_splits_num, automatically increasefile_split_sizetoceil(total_file_size / max_file_splits_num)Main Changes
max_file_splits_numconfiguration (default: 100000)adjustSplitSizeForTotalLimit()method to estimate and adjust split sizeUsage
-- Set maximum total split count (default: 1000000)
SET max_file_splits_num = 1000000;
-- Set to 0 to disable the limit
SET max_file_splits_num = 0;
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)