[fix](load) avoid query scanner updating load counters#63781
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR fixes load-quality counter accounting in BE scanners by ensuring only load scanners update RuntimeState’s load filtered/unselected row counters, preventing query-side predicate filtering from polluting INSERT/DELETE load statistics (especially when enable_profile is enabled).
Changes:
- Gate
RuntimeState::update_num_rows_load_filtered/unselected()updates behind_is_loadinScanner::_collect_profile_before_close(). - Clarify via comments that query predicate filtering must not contribute to load-quality counters.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6a52d13 to
3b09c76
Compare
|
/review |
There was a problem hiding this comment.
Review summary: no blocking issues found.
Critical checkpoint conclusions:
- Goal/test: The PR restricts load quality counter updates to scanners that are actually part of load processing. This matches the described DELETE/subquery failure mode, and the added regression enables profile and exercises DELETE with a TVF subquery predicate.
- Scope/focus: The production change is small and focused; it preserves scan profile counter updates while excluding only load-quality counters for non-load scanners.
- Concurrency/lifecycle: No new shared mutable state, threads, locks, lifecycle ownership, or static initialization concerns were introduced. The existing atomic RuntimeState counters remain used only for statistics/accounting.
- Compatibility/configuration: No protocol, storage format, persisted metadata, or configuration changes are introduced.
- Parallel paths: The central Scanner::_collect_profile_before_close path covers OlapScanner/FileScanner subclasses that delegate to it; load scanners still update load filtered/unselected counters through the existing _is_load classification.
- Error handling/data correctness: No ignored Status or visibility-version/delete-bitmap changes. The change prevents query-side predicate filtering from corrupting load success/filtered row accounting.
- Performance/observability: The added branch is trivial and not on a hot per-row path; existing scan profile counters remain collected for observability.
- Test coverage: Regression coverage was added for the profile-enabled DELETE + TVF subquery case. I did not run the regression suite in this runner.
User focus: No additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31896 ms |
TPC-DS: Total hot run time: 173476 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
3b09c76 to
1568a28
Compare
1568a28 to
989d36e
Compare
What
skip_query_scan_load_countersquery option for UPDATE/DELETE plans that are executed through the load path.http_stream.enable_profile=true.Why
For DELETE statements with subquery scans, normal scan predicate filtered rows could be added to RuntimeState load counters. The insert/delete sink and FE insert result checks can then see invalid load totals such as
0/-2and fail withInsert has too many filtered data.The first fix restricted counters to
_is_loadscanners, but that was too broad and broketest_group_commit_http_stream:INSERT ... SELECT ... FROM http_stream WHERE ...needs the non-load http_stream scan predicate to contributeNumberUnselectedRows.Test
git diff --checkDORIS_THIRDPARTY=/data/data1/liaoxin/code/doris/thirdparty ./run-regression-test.sh --run -f regression-test/suites/load_p0/http_stream/test_group_commit_http_stream.groovy(local run reached FE, but stopped before assertions because this local cluster returned401 Access denied for user root@127.0.0.1on HTTP stream load)DORIS_THIRDPARTY=/data/data1/liaoxin/code/doris/thirdparty ./run-regression-test.sh --run -f regression-test/suites/external_table_p0/tvf/test_delete_with_tvf_profile.groovy(framework built and suite started; local run was interrupted atscpFilesbecause this machine prompts forroot@BEpassword)Issue: CIR-20393