[VL] Add config to velox's file read #3990

FelixYBW · 2023-12-10T06:38:02Z

Velox PR7217(facebookincubator/velox#7217) added directbufferinput, which leads to performance regression seriously. The root cause is that the default config in the PR is not optimal for remote storage. You may find more talk here: facebookincubator/velox#7873

The PR added 3 config:
loadQuantum: 256M (make sure it's larger than row group size, parquet default is 128M)
maxCoalesceDistance: 1M ( in case the columns are not load contieneously, like select a, c from table_with_column_a_b_c. If b is mall column than 1M, then we can load it to make a large block
CoalesceBytes: 64M, break the row group fetches into small chunks

With these configuration, here is the final traceview: You can see the S3 read is totally in parallel with data processing.

github-actions · 2023-12-10T06:38:24Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2023-12-11T08:16:49Z

Run Gluten Clickhouse CI

github-actions · 2023-12-11T08:18:46Z

Run Gluten Clickhouse CI

FelixYBW · 2023-12-12T02:09:31Z

Velox PR created. facebookincubator/velox#7978

FelixYBW · 2023-12-12T02:27:51Z

With directinputbuffer, velox fetches data in iothreads (number is configed by spark.gluten.sql.columnar.backend.velox.IOThreads), the task's memory pool is passed from query context, which is used to allocate memory for fetched data. There is a risk of memory leak check may fails.

The new PR #4005 disabled the executor crash on memory leak, it put a warning instead

github-actions · 2023-12-13T04:03:33Z

Run Gluten Clickhouse CI

github-actions · 2023-12-13T04:06:22Z

Run Gluten Clickhouse CI

FelixYBW · 2023-12-14T08:31:07Z

details are shown here: facebookincubator/velox#8041

github-actions · 2023-12-14T08:34:08Z

Run Gluten Clickhouse CI

github-actions · 2024-01-06T06:21:36Z

Run Gluten Clickhouse CI

solve conflict

github-actions · 2024-01-06T07:10:58Z

Run Gluten Clickhouse CI

FelixYBW · 2024-01-07T01:37:40Z

Velox facebookincubator/velox#7978 merged, waiting for rebase

zhli1142015 · 2024-01-08T02:28:33Z

cpp/velox/compute/VeloxBackend.cc

+      conf->get<std::string>(kFilePreloadThreshold, "1048576")); // 1M
+
+  // set cache_prefetch_min_pct = 0 to force all loads are prefetched in DirectBufferInput.
+  FLAGS_cache_prefetch_min_pct = 0;


Could you help add gluten conf for this property also instead of hard code 0 here?

Added, thank you

github-actions · 2024-01-10T03:30:08Z

Run Gluten Clickhouse CI

zhli1142015

LGTM, thanks

github-actions · 2024-01-10T05:27:45Z

Run Gluten Clickhouse CI

GlutenPerfBot · 2024-01-10T08:03:15Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3990_time.csv	log/native_master_01_09_2024_d3df97aa0_time.csv	difference	percentage
q1	32.18	33.38	1.202	103.74%
q2	23.31	25.66	2.358	110.12%
q3	38.80	37.88	-0.917	97.64%
q4	37.57	39.11	1.542	104.10%
q5	72.85	71.57	-1.274	98.25%
q6	6.91	6.96	0.053	100.77%
q7	86.77	86.82	0.055	100.06%
q8	87.58	86.62	-0.960	98.90%
q9	125.82	125.62	-0.207	99.84%
q10	44.32	42.71	-1.611	96.36%
q11	19.53	20.21	0.684	103.50%
q12	21.78	29.40	7.628	135.03%
q13	47.10	47.13	0.033	100.07%
q14	18.96	16.36	-2.603	86.27%
q15	28.64	27.94	-0.695	97.57%
q16	14.20	14.99	0.789	105.55%
q17	105.59	156.46	50.875	148.18%
q18	150.49	194.42	43.928	129.19%
q19	15.40	16.64	1.242	108.06%
q20	30.64	28.42	-2.224	92.74%
q21	227.54	225.07	-2.463	98.92%
q22	15.47	14.11	-1.359	91.21%
total	1251.42	1347.49	96.073	107.68%

Add file read config

1e4bdd9

FelixYBW added 3 commits December 10, 2023 06:44

fix format

653a51c

set FLAGS_cache_prefetch_min_pct = 0

e2b97dd

iadd config to scala, fix bug

7531376

fix format

0a08532

move config to veloxbback when connector is created

85a804f

format fix

bfa02ac

Merge branch 'main' into addfilescanconfig

ea4a2bd

Merge branch 'main' into addfilescanconfig

02ffcfc

Update VeloxBackend.cc

ccf3799

solve conflict

zhli1142015 reviewed Jan 8, 2024

View reviewed changes

FelixYBW changed the title ~~[VL] Add config to velox's file read, DNM until Velox PR merged~~ [VL] Add config to velox's file read Jan 9, 2024

zhli1142015 approved these changes Jan 10, 2024

View reviewed changes

add config of cache_prefetch_min_pct

cac0ddc

FelixYBW force-pushed the addfilescanconfig branch from 5680032 to cac0ddc Compare January 10, 2024 05:27

FelixYBW merged commit b57f560 into apache:main Jan 10, 2024
15 of 19 checks passed

FelixYBW deleted the addfilescanconfig branch January 10, 2024 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Add config to velox's file read #3990

[VL] Add config to velox's file read #3990

FelixYBW commented Dec 10, 2023

github-actions bot commented Dec 10, 2023

github-actions bot commented Dec 11, 2023

github-actions bot commented Dec 11, 2023

FelixYBW commented Dec 12, 2023

FelixYBW commented Dec 12, 2023

github-actions bot commented Dec 13, 2023

github-actions bot commented Dec 13, 2023

FelixYBW commented Dec 14, 2023

github-actions bot commented Dec 14, 2023

github-actions bot commented Jan 6, 2024

github-actions bot commented Jan 6, 2024

FelixYBW commented Jan 7, 2024

zhli1142015 Jan 8, 2024

FelixYBW Jan 10, 2024

github-actions bot commented Jan 10, 2024

zhli1142015 left a comment

github-actions bot commented Jan 10, 2024

GlutenPerfBot commented Jan 10, 2024

[VL] Add config to velox's file read #3990

[VL] Add config to velox's file read #3990

Conversation

FelixYBW commented Dec 10, 2023

github-actions bot commented Dec 10, 2023

github-actions bot commented Dec 11, 2023

github-actions bot commented Dec 11, 2023

FelixYBW commented Dec 12, 2023

FelixYBW commented Dec 12, 2023

github-actions bot commented Dec 13, 2023

github-actions bot commented Dec 13, 2023

FelixYBW commented Dec 14, 2023

github-actions bot commented Dec 14, 2023

github-actions bot commented Jan 6, 2024

github-actions bot commented Jan 6, 2024

FelixYBW commented Jan 7, 2024

zhli1142015 Jan 8, 2024

Choose a reason for hiding this comment

FelixYBW Jan 10, 2024

Choose a reason for hiding this comment

github-actions bot commented Jan 10, 2024

zhli1142015 left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 10, 2024

GlutenPerfBot commented Jan 10, 2024