Skip to content

[Enhancement](multi-catalog) Make meta cache batch loading concurrently.#21471

Merged
morningman merged 6 commits intoapache:masterfrom
dutyu:enhance-batchget-metacache
Jul 6, 2023
Merged

[Enhancement](multi-catalog) Make meta cache batch loading concurrently.#21471
morningman merged 6 commits intoapache:masterfrom
dutyu:enhance-batchget-metacache

Conversation

@dutyu
Copy link
Contributor

@dutyu dutyu commented Jul 4, 2023

Proposed changes

I will enhance performance about querying meta cache of hms tables by 2 steps:
Step1 : use concurrent batch loading for meta cache
Step2 : execute some other tasks concurrently as soon as possible

This pr mainly for step1 and it mainly do the following things:

  • Create a CacheBulkLoader for batch loading
  • Remove the executor of the previous async cache loader and change the loader's type to CacheBulkLoader (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
  • Use a FixedCacheThreadPool to replace the CacheThreadPool (The previous CacheThreadPool just log warn infos and will not throw any exceptions when the pool is full).
  • Remove parallel streams and use the CacheBulkLoader to do batch loadings
  • Change the value of max_external_cache_loader_thread_pool_size to 64, and set the pool size of hms client pool to max_external_cache_loader_thread_pool_size
  • Fix the spelling mistake for max_hive_table_catch_num

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@dutyu dutyu force-pushed the enhance-batchget-metacache branch from 6602348 to 03ae40a Compare July 4, 2023 10:42
@dutyu
Copy link
Contributor Author

dutyu commented Jul 4, 2023

run buildall

@dutyu dutyu marked this pull request as ready for review July 4, 2023 11:27
@dutyu
Copy link
Contributor Author

dutyu commented Jul 4, 2023

run buildall

2 similar comments
@dutyu
Copy link
Contributor Author

dutyu commented Jul 4, 2023

run buildall

@dutyu
Copy link
Contributor Author

dutyu commented Jul 4, 2023

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.69 seconds
stream load tsv: 451 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 69.4 seconds inserted 10000000 Rows, about 144K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230704122541_clickbench_pr_172086.html

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 38.11 seconds
stream load tsv: 457 seconds loaded 74807831229 Bytes, about 156 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 57 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.6 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230704131926_clickbench_pr_172135.html

@dutyu dutyu changed the title [Enhancement](multi-catalog) Make batch-get meta cache concurrently. [Enhancement](multi-catalog) Make meta cache batch loading concurrently. Jul 4, 2023
@dutyu dutyu force-pushed the enhance-batchget-metacache branch from 5f31fc7 to ddedcbf Compare July 5, 2023 01:39
@dutyu
Copy link
Contributor Author

dutyu commented Jul 5, 2023

run buildall

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.28 seconds
stream load tsv: 451 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 68.8 seconds inserted 10000000 Rows, about 145K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705034526_clickbench_pr_172419.html

@morningman morningman added the dev/2.0.0 2.0.0 release label Jul 5, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2023

PR approved by anyone and no changes requested.

@dutyu dutyu force-pushed the enhance-batchget-metacache branch from ddedcbf to d1d52cc Compare July 5, 2023 10:56
@dutyu
Copy link
Contributor Author

dutyu commented Jul 5, 2023

run buildall

@hello-stephen
Copy link
Contributor

(From new mechine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.82 seconds
stream load tsv: 508 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 87.1 seconds inserted 10000000 Rows, about 114K ops/s
storage size: 17167708611 Bytes
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705193104_clickbench_pr_172786.html

@hello-stephen
Copy link
Contributor

TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 51.85 seconds
stream load tsv: 460 seconds loaded 74807831229 Bytes, about 155 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 56 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 28 seconds loaded 861443392 Bytes, about 29 MB/s
insert into select: 66.6 seconds inserted 10000000 Rows, about 150K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230705125340_clickbench_pr_172879.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit bb3b677 into apache:master Jul 6, 2023
@xiaokang xiaokang added dev/2.0.0-merged and removed dev/2.0.0 2.0.0 release labels Jul 6, 2023
xiaokang pushed a commit that referenced this pull request Jul 6, 2023
…ly. (#21471)

I will enhance performance about querying meta cache of hms tables by 2 steps:
**Step1** : use concurrent batch loading for meta cache
**Step2** : execute some other tasks concurrently as soon as possible

**This pr mainly for step1 and it mainly do the following things:**
- Create a `CacheBulkLoader` for batch loading
- Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
- Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full).
- Remove parallel streams and use the `CacheBulkLoader` to do batch loadings
- Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size`
- Fix the spelling mistake for `max_hive_table_catch_num`
@dutyu dutyu deleted the enhance-batchget-metacache branch September 16, 2023 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants