Skip to content

[fix](load) avoid query scanner updating load counters#63781

Draft
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix-cir-20393-master
Draft

[fix](load) avoid query scanner updating load counters#63781
liaoxin01 wants to merge 1 commit into
apache:masterfrom
liaoxin01:fix-cir-20393-master

Conversation

@liaoxin01
Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 commented May 28, 2026

What

  • Add a skip_query_scan_load_counters query option for UPDATE/DELETE plans that are executed through the load path.
  • Keep load scanner counters unchanged, and keep INSERT/LOAD query-side scan unselected counters for cases such as group commit http_stream.
  • Add a regression case for DELETE with a query-side TVF scan while enable_profile=true.

Why

For DELETE statements with subquery scans, normal scan predicate filtered rows could be added to RuntimeState load counters. The insert/delete sink and FE insert result checks can then see invalid load totals such as 0/-2 and fail with Insert has too many filtered data.

The first fix restricted counters to _is_load scanners, but that was too broad and broke test_group_commit_http_stream: INSERT ... SELECT ... FROM http_stream WHERE ... needs the non-load http_stream scan predicate to contribute NumberUnselectedRows.

Test

  • git diff --check
  • DORIS_THIRDPARTY=/data/data1/liaoxin/code/doris/thirdparty ./run-regression-test.sh --run -f regression-test/suites/load_p0/http_stream/test_group_commit_http_stream.groovy (local run reached FE, but stopped before assertions because this local cluster returned 401 Access denied for user root@127.0.0.1 on HTTP stream load)
  • Earlier attempted DORIS_THIRDPARTY=/data/data1/liaoxin/code/doris/thirdparty ./run-regression-test.sh --run -f regression-test/suites/external_table_p0/tvf/test_delete_with_tvf_profile.groovy (framework built and suite started; local run was interrupted at scpFiles because this machine prompts for root@BE password)

Issue: CIR-20393

Copilot AI review requested due to automatic review settings May 28, 2026 03:30
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes load-quality counter accounting in BE scanners by ensuring only load scanners update RuntimeState’s load filtered/unselected row counters, preventing query-side predicate filtering from polluting INSERT/DELETE load statistics (especially when enable_profile is enabled).

Changes:

  • Gate RuntimeState::update_num_rows_load_filtered/unselected() updates behind _is_load in Scanner::_collect_profile_before_close().
  • Clarify via comments that query predicate filtering must not contribute to load-quality counters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread be/src/exec/scan/scanner.cpp
@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 6a52d13 to 3b09c76 Compare May 28, 2026 15:39
@liaoxin01
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test: The PR restricts load quality counter updates to scanners that are actually part of load processing. This matches the described DELETE/subquery failure mode, and the added regression enables profile and exercises DELETE with a TVF subquery predicate.
  • Scope/focus: The production change is small and focused; it preserves scan profile counter updates while excluding only load-quality counters for non-load scanners.
  • Concurrency/lifecycle: No new shared mutable state, threads, locks, lifecycle ownership, or static initialization concerns were introduced. The existing atomic RuntimeState counters remain used only for statistics/accounting.
  • Compatibility/configuration: No protocol, storage format, persisted metadata, or configuration changes are introduced.
  • Parallel paths: The central Scanner::_collect_profile_before_close path covers OlapScanner/FileScanner subclasses that delegate to it; load scanners still update load filtered/unselected counters through the existing _is_load classification.
  • Error handling/data correctness: No ignored Status or visibility-version/delete-bitmap changes. The change prevents query-side predicate filtering from corrupting load success/filtered row accounting.
  • Performance/observability: The added branch is trivial and not on a hot per-row path; existing scan profile counters remain collected for observability.
  • Test coverage: Regression coverage was added for the profile-enabled DELETE + TVF subquery case. I did not run the regression suite in this runner.

User focus: No additional user-provided review focus was specified.

@liaoxin01
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31896 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b09c7607eaad3d8e7fa7bc922df315d9faa71e1, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17848	4295	4176	4176
q2	q3	10752	1388	825	825
q4	4685	479	354	354
q5	7702	2296	2104	2104
q6	238	183	139	139
q7	985	789	637	637
q8	9353	1688	1604	1604
q9	5205	4960	4958	4958
q10	6386	2224	1878	1878
q11	431	277	246	246
q12	632	424	296	296
q13	18118	3431	2746	2746
q14	271	261	246	246
q15	q16	818	776	709	709
q17	938	975	971	971
q18	6999	5792	5530	5530
q19	1345	1356	1236	1236
q20	547	435	314	314
q21	6193	2878	2595	2595
q22	460	372	332	332
Total cold run time: 99906 ms
Total hot run time: 31896 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5100	4974	4963	4963
q2	q3	4921	5377	4559	4559
q4	2134	2209	1365	1365
q5	4980	4727	4744	4727
q6	229	185	135	135
q7	1877	1739	1636	1636
q8	2476	2210	2207	2207
q9	7879	7450	7404	7404
q10	4786	4692	4291	4291
q11	531	404	382	382
q12	714	736	538	538
q13	3084	3370	2829	2829
q14	279	272	265	265
q15	q16	685	705	613	613
q17	1288	1259	1255	1255
q18	7314	6899	6849	6849
q19	1101	1100	1086	1086
q20	2236	2228	1963	1963
q21	5301	4731	4580	4580
q22	538	492	411	411
Total cold run time: 57453 ms
Total hot run time: 52058 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 173476 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b09c7607eaad3d8e7fa7bc922df315d9faa71e1, data reload: false

query5	4352	689	509	509
query6	338	237	200	200
query7	4220	593	317	317
query8	335	238	224	224
query9	8846	4082	4096	4082
query10	458	355	298	298
query11	5806	2638	2222	2222
query12	193	129	130	129
query13	1315	608	430	430
query14	6279	5541	5252	5252
query14_1	4563	4590	4520	4520
query15	216	206	180	180
query16	994	460	381	381
query17	1062	740	599	599
query18	2521	496	360	360
query19	216	205	159	159
query20	138	143	129	129
query21	217	148	122	122
query22	13746	13681	13427	13427
query23	17360	16700	16258	16258
query23_1	16372	16381	16420	16381
query24	7495	1842	1349	1349
query24_1	1346	1347	1338	1338
query25	592	533	454	454
query26	1309	327	180	180
query27	2677	556	361	361
query28	4474	2045	1997	1997
query29	1039	623	522	522
query30	305	248	196	196
query31	1151	1090	968	968
query32	87	76	77	76
query33	564	383	293	293
query34	1183	1176	653	653
query35	823	832	715	715
query36	1374	1420	1202	1202
query37	159	106	91	91
query38	3259	3242	3145	3145
query39	926	903	919	903
query39_1	882	868	866	866
query40	232	146	126	126
query41	63	63	62	62
query42	116	110	111	110
query43	343	350	307	307
query44	
query45	218	207	195	195
query46	1104	1243	775	775
query47	2339	2341	2215	2215
query48	418	433	310	310
query49	665	518	403	403
query50	955	362	263	263
query51	4404	4356	4308	4308
query52	112	117	101	101
query53	254	285	208	208
query54	317	275	262	262
query55	95	91	86	86
query56	306	322	305	305
query57	1430	1407	1287	1287
query58	304	280	279	279
query59	1679	1738	1514	1514
query60	329	328	314	314
query61	163	160	184	160
query62	709	679	579	579
query63	246	203	212	203
query64	2446	816	648	648
query65	
query66	1750	520	365	365
query67	29217	29795	29669	29669
query68	
query69	478	345	313	313
query70	1053	1074	1030	1030
query71	307	279	274	274
query72	3003	2773	2554	2554
query73	836	832	460	460
query74	5139	4985	4903	4903
query75	2816	2669	2296	2296
query76	2276	1224	804	804
query77	431	437	358	358
query78	12472	12517	11910	11910
query79	1319	1028	752	752
query80	630	601	488	488
query81	457	291	243	243
query82	245	162	126	126
query83	295	289	263	263
query84	273	148	121	121
query85	936	648	548	548
query86	391	335	334	334
query87	3469	3440	3295	3295
query88	3721	2780	2746	2746
query89	432	402	346	346
query90	2167	194	202	194
query91	182	170	143	143
query92	83	83	82	82
query93	1406	1442	856	856
query94	556	346	290	290
query95	678	395	353	353
query96	1112	801	362	362
query97	2711	2751	2576	2576
query98	238	233	228	228
query99	1158	1143	1033	1033
Total cold run time: 253645 ms
Total hot run time: 173476 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.99% (20982/38863)
Line Coverage 37.55% (198950/529853)
Region Coverage 33.83% (155935/460884)
Branch Coverage 34.84% (67915/194936)

@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 3b09c76 to 1568a28 Compare May 29, 2026 06:30
@liaoxin01 liaoxin01 force-pushed the fix-cir-20393-master branch from 1568a28 to 989d36e Compare May 29, 2026 06:57
@liaoxin01 liaoxin01 marked this pull request as draft May 30, 2026 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants