[improvement](topn) Add option to skip file cache writes in topn lazy materialization#65021
Open
bobhan1 wants to merge 3 commits into
Open
[improvement](topn) Add option to skip file cache writes in topn lazy materialization#65021bobhan1 wants to merge 3 commits into
bobhan1 wants to merge 3 commits into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
e33b38c to
59e4e37
Compare
Issue Number: None
Related PR: None
Problem Summary: TopN lazy materialization phase 2 may populate file cache while fetching deferred columns. This can pollute cache when the requested ranges are cache misses. Add a cloud-only session switch for the new PMultiGetRequestV2 path so phase-2 reads use cached blocks only when the full range is already downloaded, and otherwise read remote data directly without writing file cache. The change also exposes phase-2 file-cache counters in the MaterializeNode profile and covers row-store and column-store fetch paths.
Added session variable `enable_topn_lazy_mat_phase2_no_write_file_cache` to avoid file-cache writes on TopN lazy materialization phase-2 cache misses.
- Test:
- Unit Test: ./run-be-ut.sh --run --filter=BlockFileCacheTest.get_downloaded_blocks_if_fully_covered_is_read_only:BlockFileCacheTest.cached_remote_file_reader_remote_only_on_miss -j20
- Build: ./build.sh --be --fe --cloud -j100
- Format: build-support/check-format.sh
- Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/topn_lazy_file_cache -s test_topn_lazy_mat_phase2_no_write_file_cache -g docker -runMode=cloud -dockerSuiteParallel 1
- Behavior changed: Yes. When the new session variable is enabled in cloud mode, TopN lazy materialization phase-2 cache misses read remote data without writing file cache.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: TopN lazy materialization phase 2 exposed only aggregated profile counters, which makes backend-level skew and IO differences hard to identify when phase-2 fetch fans out to multiple backends. Add aggregate rows/segments counters and per-backend rows, segments, and file-cache statistics for the new TopN lazy materialization V2 path. The per-backend values are accumulated in MaterializationSharedState so multiple phase-2 fetch calls in one query are reflected in the final profile.
### Release note
Added per-backend TopN lazy materialization phase-2 profile counters.
### Check List (For Author)
- Test:
- Build: ./build.sh --be --fe --cloud -j100
- Format: build-support/clang-format.sh; build-support/check-format.sh; git diff --check
- Regression test: env -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/topn_lazy_file_cache -s test_topn_lazy_mat_phase2_no_write_file_cache -g docker -runMode=cloud -dockerSuiteParallel 1
- Behavior changed: Yes. TopN lazy materialization phase-2 profiles now include aggregate row/segment counts and per-backend detail counters.
- Does this need documentation: No
59e4e37 to
a7efbdf
Compare
Contributor
Author
|
run buildall |
Contributor
FE UT Coverage ReportIncrement line coverage |
Contributor
Author
|
/review |
Contributor
TPC-H: Total hot run time: 29469 ms |
Contributor
TPC-DS: Total hot run time: 174136 ms |
Contributor
ClickBench: Total hot run time: 25.73 s |
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: The cached remote file reader unit test reads back a fully cached block and leaves the opened cache file reader in the process-wide FDCache. Later FDCache-specific unit tests expect only their own entries to exist, so running the new test before those cases makes their cache size and eviction assertions fail. Clean the exact FDCache entry created by the cached remote file reader test when the test exits.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- ./run-be-ut.sh --run --filter=BlockFileCacheTest.cached_remote_file_reader_remote_only_on_miss:BlockFileCacheTest.fd_cache_remove:BlockFileCacheTest.fd_cache_evict
- Behavior changed: No
- Does this need documentation: No
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 30137 ms |
Contributor
TPC-DS: Total hot run time: 175443 ms |
Contributor
ClickBench: Total hot run time: 29.21 s |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an opt-in session variable for TopN lazy materialization phase-2 file-cache miss handling.
enable_topn_lazy_mat_phase2_no_write_file_cacheMaterializeNodeprofileValidation
./build.sh --be --fe --cloud -j100output/fe/conf/fe_custom.conf, nooutput/be/conf/be_custom.conf, emptyoutput/fe/doris-meta, and emptyoutput/be/storageoutput/fe/conf/fe.conf:enable_debug_points=trueoutput/be/conf/be.conf:enable_debug_points=true,enable_java_support=falseenv -u HTTP_PROXY -u HTTPS_PROXY -u http_proxy -u https_proxy -u ALL_PROXY -u all_proxy ./run-regression-test.sh --run -d cloud_p0/cache/topn_lazy_file_cache -s test_topn_lazy_mat_phase2_no_write_file_cache -g docker -runMode=cloud -dockerSuiteParallel 1Test 1 suites, failed 0 suites, fatal 0 scripts, skipped 0 scripts./run-be-ut.sh --run --filter=MaterializationSharedStateTest.*:BlockFileCacheTest.get_downloaded_blocks_if_fully_covered_is_read_only:BlockFileCacheTest.cached_remote_file_reader_remote_only_on_miss7 tests from 2 test suites ranPASSED 7 tests./run-be-ut.sh --run --filter=BlockFileCacheTest.cached_remote_file_reader_remote_only_on_miss:BlockFileCacheTest.fd_cache_remove:BlockFileCacheTest.fd_cache_evict3 tests from 1 test suite ranPASSED 3 tests