[Enhancement](multi-catalog) Make meta cache batch loading concurrently. by dutyu · Pull Request #21471 · apache/doris

dutyu · 2023-07-04T04:03:25Z

Proposed changes

I will enhance performance about querying meta cache of hms tables by 2 steps:
Step1 : use concurrent batch loading for meta cache
Step2 : execute some other tasks concurrently as soon as possible

This pr mainly for step1 and it mainly do the following things:

Create a CacheBulkLoader for batch loading
Remove the executor of the previous async cache loader and change the loader's type to CacheBulkLoader (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
Use a FixedCacheThreadPool to replace the CacheThreadPool (The previous CacheThreadPool just log warn infos and will not throw any exceptions when the pool is full).
Remove parallel streams and use the CacheBulkLoader to do batch loadings
Change the value of max_external_cache_loader_thread_pool_size to 64, and set the pool size of hms client pool to max_external_cache_loader_thread_pool_size
Fix the spelling mistake for max_hive_table_catch_num

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

dutyu · 2023-07-04T10:43:18Z

run buildall

dutyu · 2023-07-04T11:33:36Z

run buildall

dutyu · 2023-07-04T11:43:45Z

run buildall

dutyu · 2023-07-04T12:00:02Z

run buildall

hello-stephen · 2023-07-04T12:25:44Z

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.69 seconds
stream load tsv: 451 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 69.4 seconds inserted 10000000 Rows, about 144K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230704122541_clickbench_pr_172086.html

hello-stephen · 2023-07-04T13:19:29Z

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 38.11 seconds
stream load tsv: 457 seconds loaded 74807831229 Bytes, about 156 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 57 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.6 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230704131926_clickbench_pr_172135.html

dutyu · 2023-07-05T01:40:18Z

run buildall

hello-stephen · 2023-07-05T03:45:28Z

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.28 seconds
stream load tsv: 451 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.8 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705034526_clickbench_pr_172419.html

morningman

LGTM

github-actions · 2023-07-05T10:27:35Z

PR approved by at least one committer and no changes requested.

github-actions · 2023-07-05T10:27:38Z

PR approved by anyone and no changes requested.

dutyu · 2023-07-05T10:57:52Z

run buildall

hello-stephen · 2023-07-05T11:31:05Z

(From new mechine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.82 seconds
stream load tsv: 508 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 87.1 seconds inserted 10000000 Rows, about 114K ops/s
storage size: 17167708611 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705193104_clickbench_pr_172786.html

hello-stephen · 2023-07-05T12:53:43Z

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.85 seconds
stream load tsv: 460 seconds loaded 74807831229 Bytes, about 155 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 66.6 seconds inserted 10000000 Rows, about 150K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705125340_clickbench_pr_172879.html

morningman

LGTM

Jibing-Li

LGTM

…ly. (#21471) I will enhance performance about querying meta cache of hms tables by 2 steps: **Step1** : use concurrent batch loading for meta cache **Step2** : execute some other tasks concurrently as soon as possible **This pr mainly for step1 and it mainly do the following things:** - Create a `CacheBulkLoader` for batch loading - Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful) - Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full). - Remove parallel streams and use the `CacheBulkLoader` to do batch loadings - Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size` - Fix the spelling mistake for `max_hive_table_catch_num`

dutyu force-pushed the enhance-batchget-metacache branch from 6602348 to 03ae40a Compare July 4, 2023 10:42

dutyu marked this pull request as ready for review July 4, 2023 11:27

dutyu changed the title ~~[Enhancement](multi-catalog) Make batch-get meta cache concurrently.~~ [Enhancement](multi-catalog) Make meta cache batch loading concurrently. Jul 4, 2023

dutyu force-pushed the enhance-batchget-metacache branch from 5f31fc7 to ddedcbf Compare July 5, 2023 01:39

morningman added the dev/2.0.0 2.0.0 release label Jul 5, 2023

morningman approved these changes Jul 5, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 5, 2023

github-actions bot added the reviewed label Jul 5, 2023

王翔宇 added 6 commits July 5, 2023 18:56

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

606e1b5

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

bf11e5c

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

84baa01

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

1914c2f

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

d5d256d

[Enhancement](multi-catalog) Make batch-get meta cache concurrently.

d1d52cc

dutyu force-pushed the enhance-batchget-metacache branch from ddedcbf to d1d52cc Compare July 5, 2023 10:56

morningman approved these changes Jul 5, 2023

View reviewed changes

Jibing-Li approved these changes Jul 6, 2023

View reviewed changes

morningman merged commit bb3b677 into apache:master Jul 6, 2023

xiaokang added dev/2.0.0-merged and removed dev/2.0.0 2.0.0 release labels Jul 6, 2023

dutyu deleted the enhance-batchget-metacache branch September 16, 2023 10:40

Conversation

dutyu commented Jul 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

dutyu commented Jul 4, 2023

Uh oh!

dutyu commented Jul 4, 2023

Uh oh!

dutyu commented Jul 4, 2023

Uh oh!

dutyu commented Jul 4, 2023

Uh oh!

hello-stephen commented Jul 4, 2023

Uh oh!

hello-stephen commented Jul 4, 2023

Uh oh!

dutyu commented Jul 5, 2023

Uh oh!

hello-stephen commented Jul 5, 2023

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 5, 2023

Uh oh!

github-actions bot commented Jul 5, 2023

Uh oh!

dutyu commented Jul 5, 2023

Uh oh!

hello-stephen commented Jul 5, 2023

Uh oh!

hello-stephen commented Jul 5, 2023

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

Jibing-Li left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dutyu commented Jul 4, 2023 •

edited

Loading