Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Add config to velox's file read #3990

Merged
merged 11 commits into from
Jan 10, 2024
Merged

Conversation

FelixYBW
Copy link
Contributor

Velox PR7217(facebookincubator/velox#7217) added directbufferinput, which leads to performance regression seriously. The root cause is that the default config in the PR is not optimal for remote storage. You may find more talk here: facebookincubator/velox#7873

The PR added 3 config:
loadQuantum: 256M (make sure it's larger than row group size, parquet default is 128M)
maxCoalesceDistance: 1M ( in case the columns are not load contieneously, like select a, c from table_with_column_a_b_c. If b is mall column than 1M, then we can load it to make a large block
CoalesceBytes: 64M, break the row group fetches into small chunks

With these configuration, here is the final traceview: You can see the S3 read is totally in parallel with data processing.

image

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@FelixYBW
Copy link
Contributor Author

Velox PR created. facebookincubator/velox#7978

@FelixYBW
Copy link
Contributor Author

With directinputbuffer, velox fetches data in iothreads (number is configed by spark.gluten.sql.columnar.backend.velox.IOThreads), the task's memory pool is passed from query context, which is used to allocate memory for fetched data. There is a risk of memory leak check may fails.

The new PR #4005 disabled the executor crash on memory leak, it put a warning instead

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@FelixYBW
Copy link
Contributor Author

details are shown here: facebookincubator/velox#8041

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jan 6, 2024

Run Gluten Clickhouse CI

solve conflict
Copy link

github-actions bot commented Jan 6, 2024

Run Gluten Clickhouse CI

@FelixYBW
Copy link
Contributor Author

FelixYBW commented Jan 7, 2024

Velox facebookincubator/velox#7978 merged, waiting for rebase

conf->get<std::string>(kFilePreloadThreshold, "1048576")); // 1M

// set cache_prefetch_min_pct = 0 to force all loads are prefetched in DirectBufferInput.
FLAGS_cache_prefetch_min_pct = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help add gluten conf for this property also instead of hard code 0 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thank you

@FelixYBW FelixYBW changed the title [VL] Add config to velox's file read, DNM until Velox PR merged [VL] Add config to velox's file read Jan 9, 2024
Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@zhli1142015 zhli1142015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Copy link

Run Gluten Clickhouse CI

@FelixYBW FelixYBW merged commit b57f560 into apache:main Jan 10, 2024
15 of 19 checks passed
@FelixYBW FelixYBW deleted the addfilescanconfig branch January 10, 2024 07:19
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3990_time.csv log/native_master_01_09_2024_d3df97aa0_time.csv difference percentage
q1 32.18 33.38 1.202 103.74%
q2 23.31 25.66 2.358 110.12%
q3 38.80 37.88 -0.917 97.64%
q4 37.57 39.11 1.542 104.10%
q5 72.85 71.57 -1.274 98.25%
q6 6.91 6.96 0.053 100.77%
q7 86.77 86.82 0.055 100.06%
q8 87.58 86.62 -0.960 98.90%
q9 125.82 125.62 -0.207 99.84%
q10 44.32 42.71 -1.611 96.36%
q11 19.53 20.21 0.684 103.50%
q12 21.78 29.40 7.628 135.03%
q13 47.10 47.13 0.033 100.07%
q14 18.96 16.36 -2.603 86.27%
q15 28.64 27.94 -0.695 97.57%
q16 14.20 14.99 0.789 105.55%
q17 105.59 156.46 50.875 148.18%
q18 150.49 194.42 43.928 129.19%
q19 15.40 16.64 1.242 108.06%
q20 30.64 28.42 -2.224 92.74%
q21 227.54 225.07 -2.463 98.92%
q22 15.47 14.11 -1.359 91.21%
total 1251.42 1347.49 96.073 107.68%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants