[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack by starocean999 · Pull Request #65095 · apache/doris

starocean999 · 2026-07-01T08:10:57Z

Related PR: #48502

Problem Summary:

The SetPreAggStatus rule uses a bottom-up traversal to determine whether
pre-aggregation can be enabled for OlapScan nodes. The original implementation
used a Stack<PreAggInfoContext> as the traversal context, which had a
fundamental flaw: multiple scan nodes under sibling subtrees (e.g., both sides
of a Join) incorrectly shared the same stack-level context.

This caused two problems:

A scan could inherit filter/join/project information from unrelated branches
of the plan tree.
To work around this, visitLogicalAggregate used retainAll() to filter out
scan nodes that were not actually children of the current aggregate — an
inelegant patch that partially masked the underlying issue.

Root cause: The stack-based context could not guarantee a one-to-one
correspondence between a scan node and its nearest aggregate ancestor.

Fix: Replace the Stack<PreAggInfoContext> with a Map<RelationId, PreAggInfoContext>, where each scan node gets its own dedicated context entry.
Each visitor method uses a snapshot-diff pattern:

Snapshot the current map keys before visiting children
Visit children (they add scan contexts to the map)
Diff to identify which scans were added by this subtree
Apply node-specific info only to those scans

At LogicalAggregate, child scan contexts are consumed (removed from the map)
and their results are stored — no more retainAll() filtering needed.

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-07-01T08:11:02Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

starocean999 · 2026-07-01T08:11:09Z

/review

github-actions

Automated review completed. I did not find a substantiated blocking issue in this PR.

Critical checkpoint conclusions:

Goal: the PR replaces the shared stack in SetPreAggStatus with per-RelationId contexts. The changed traversal matches that goal and avoids sharing scan context between sibling branches.
Scope: the change is focused to the pre-aggregation status rewriter plus two regression explain expectations.
Optimizer correctness: the snapshot/diff pattern was checked against DefaultPlanRewriter.visitChildren, LogicalFilter, LogicalJoin, LogicalProject, LogicalAggregate, and LogicalRepeat. The handled nodes preserve context until the nearest aggregate consumes it; generic nodes remove only contexts introduced below that node, preserving pre-existing sibling/ancestor contexts.
Parallel paths: rule placement remains in the existing late rewrite stage, and the second registration path still uses the same custom rule. I did not find a missed sibling join/aggregate path in the changed code.
Configuration, concurrency, lifecycle, persistence, compatibility, FE/BE protocol, and storage format: not applicable; this PR does not add or alter those surfaces.
Tests: the regression suite updates the two right-join sibling aggregate cases from preagg_t3 OFF to ON, which targets the reported shared-context behavior. No .out artifact is expected for these explain { contains ... } assertions.
Validation: exact PR-range git diff --check passed. I did not run FE build or regression tests because this runner lacks .worktree_initialized, thirdparty/installed, and executable thirdparty/installed/bin/protoc.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS. tests-session-config proposed TSC-1 for trailing whitespace, but I dismissed it because it came from git show --check HEAD in a grafted checkout; the exact base-to-head git diff --check passes and the PR patch changes only two assertion strings. Final convergence round 1 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS against the same ledger and empty proposed inline comment set.

starocean999 · 2026-07-01T08:26:03Z

run buildall

hello-stephen · 2026-07-01T10:06:00Z

FE UT Coverage Report

Increment line coverage 100.00% (39/39) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-07-01T10:18:09Z

TPC-H: Total hot run time: 30121 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17699	4143	4111	4111
q2	2028	333	205	205
q3	10302	1452	871	871
q4	4681	482	340	340
q5	7496	856	580	580
q6	182	173	144	144
q7	809	858	626	626
q8	9357	1644	1592	1592
q9	5750	4485	4458	4458
q10	6783	1822	1565	1565
q11	526	343	314	314
q12	734	563	441	441
q13	18110	3363	2810	2810
q14	267	259	244	244
q15	q16	800	784	715	715
q17	980	1042	1022	1022
q18	7230	5896	5675	5675
q19	1328	1296	1088	1088
q20	780	677	595	595
q21	5972	2747	2427	2427
q22	445	364	298	298
Total cold run time: 102259 ms
Total hot run time: 30121 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4408	4290	4344	4290
q2	302	324	219	219
q3	4681	4997	4490	4490
q4	2097	2191	1387	1387
q5	4482	4325	4336	4325
q6	231	177	126	126
q7	2113	1999	1627	1627
q8	2628	2278	2213	2213
q9	8288	7869	7865	7865
q10	4839	4802	4359	4359
q11	597	418	389	389
q12	779	766	559	559
q13	3265	3645	2872	2872
q14	307	321	286	286
q15	q16	719	741	672	672
q17	1374	1342	1476	1342
q18	7773	7277	7331	7277
q19	1144	1120	1128	1120
q20	2219	2220	1932	1932
q21	5378	4695	4508	4508
q22	538	467	417	417
Total cold run time: 58162 ms
Total hot run time: 52275 ms

hello-stephen · 2026-07-01T10:29:11Z

TPC-DS: Total hot run time: 174162 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

query5	4332	637	519	519
query6	495	212	202	202
query7	4850	619	341	341
query8	340	195	172	172
query9	8794	4070	4045	4045
query10	471	359	296	296
query11	5905	2350	2146	2146
query12	159	106	105	105
query13	1282	635	461	461
query14	6323	5311	4953	4953
query14_1	4309	4275	4288	4275
query15	225	212	182	182
query16	1034	447	443	443
query17	950	747	623	623
query18	2454	478	364	364
query19	217	200	156	156
query20	112	108	111	108
query21	241	164	137	137
query22	13595	13604	13307	13307
query23	17489	16600	16220	16220
query23_1	16277	16359	16366	16359
query24	7753	1801	1314	1314
query24_1	1324	1323	1337	1323
query25	602	477	408	408
query26	1351	371	210	210
query27	2615	606	376	376
query28	4434	2037	2046	2037
query29	1097	638	524	524
query30	338	266	228	228
query31	1129	1096	996	996
query32	106	65	64	64
query33	545	330	266	266
query34	1197	1150	649	649
query35	779	794	671	671
query36	1431	1360	1224	1224
query37	165	117	138	117
query38	1877	1726	1674	1674
query39	931	946	902	902
query39_1	872	886	891	886
query40	250	170	141	141
query41	67	62	63	62
query42	95	94	93	93
query43	326	327	281	281
query44	1427	773	779	773
query45	221	192	180	180
query46	1116	1220	748	748
query47	2346	2300	2272	2272
query48	397	439	290	290
query49	589	422	325	325
query50	1076	411	351	351
query51	4505	4396	4380	4380
query52	88	87	74	74
query53	271	279	209	209
query54	299	229	210	210
query55	75	71	70	70
query56	294	310	286	286
query57	1434	1392	1324	1324
query58	279	252	274	252
query59	1568	1651	1437	1437
query60	322	271	260	260
query61	161	153	146	146
query62	699	648	613	613
query63	247	206	203	203
query64	2530	800	624	624
query65	4862	4800	4755	4755
query66	1817	552	397	397
query67	29634	29590	29408	29408
query68	3451	1558	944	944
query69	409	315	266	266
query70	1083	964	973	964
query71	345	351	322	322
query72	2967	2716	2312	2312
query73	846	790	458	458
query74	5109	4989	4750	4750
query75	2604	2607	2266	2266
query76	2329	1188	786	786
query77	354	388	279	279
query78	12447	12692	11807	11807
query79	1412	1185	766	766
query80	1113	544	468	468
query81	516	332	285	285
query82	572	160	126	126
query83	373	328	294	294
query84	322	163	132	132
query85	1058	605	521	521
query86	415	294	296	294
query87	1827	1820	1745	1745
query88	3778	2824	2814	2814
query89	464	402	363	363
query90	1759	197	197	197
query91	208	192	163	163
query92	68	62	59	59
query93	1589	1665	943	943
query94	630	355	340	340
query95	787	609	506	506
query96	1077	814	349	349
query97	2682	2722	2581	2581
query98	219	211	202	202
query99	1179	1154	1027	1027
Total cold run time: 259761 ms
Total hot run time: 174162 ms

hello-stephen · 2026-07-01T10:34:07Z

ClickBench: Total hot run time: 25.47 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f60111f0536c7fcbce709339ba8d900fcb5152ef, data reload: false

query1	0.00	0.00	0.00
query2	0.09	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.13
query5	0.27	0.23	0.22
query6	1.29	1.10	1.11
query7	0.03	0.00	0.01
query8	0.06	0.04	0.04
query9	0.38	0.32	0.32
query10	0.55	0.58	0.56
query11	0.21	0.14	0.14
query12	0.20	0.15	0.15
query13	0.47	0.48	0.47
query14	1.01	1.02	1.00
query15	0.61	0.59	0.59
query16	0.32	0.32	0.33
query17	1.16	1.09	1.18
query18	0.24	0.22	0.22
query19	2.11	2.00	1.97
query20	0.02	0.01	0.01
query21	15.43	0.22	0.15
query22	4.76	0.06	0.05
query23	16.10	0.30	0.13
query24	2.97	0.43	0.34
query25	0.10	0.04	0.06
query26	0.77	0.20	0.16
query27	0.05	0.03	0.03
query28	3.50	0.94	0.55
query29	12.49	4.31	3.54
query30	0.27	0.16	0.15
query31	2.77	0.58	0.31
query32	3.23	0.59	0.49
query33	3.15	3.26	3.22
query34	15.66	4.19	3.52
query35	3.51	3.50	3.56
query36	0.55	0.44	0.42
query37	0.09	0.07	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.18	0.17	0.15
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.72 s
Total hot run time: 25.47 s

hello-stephen · 2026-07-01T10:58:59Z

FE Regression Coverage Report

Increment line coverage 57.35% (39/68) 🎉
Increment coverage report
Complete coverage report

…text map instead of shared stack

starocean999 · 2026-07-02T14:46:04Z

/review

starocean999 · 2026-07-02T14:46:11Z

run buildall

github-actions

Automated review completed. I did not find a substantiated blocking issue in this PR.

Critical checkpoint conclusions:

Goal: the PR replaces the shared stack in SetPreAggStatus with per-RelationId scan contexts. The changed traversal matches that goal and prevents sibling subtrees from sharing one pre-aggregation context.
Scope: the change is focused to the Nereids pre-aggregation status rewriter plus two regression explain expectations.
Optimizer correctness: I checked DefaultPlanRewriter.visitChildren, LogicalFilter, LogicalJoin, LogicalProject, LogicalAggregate, LogicalRepeat, SetOlapScanPreAgg, rule registration, and alias replacement. Contexts are added per scan, enriched only by ancestors whose child traversal introduced that scan context, and consumed at the nearest aggregate.
Parallel paths: both SET_PREAGG_STATUS registrations continue to use the same custom rule. I did not find a missed sibling join or aggregate path in the changed code.
Configuration, concurrency, lifecycle, persistence, compatibility, FE/BE protocol, and storage format: not applicable; this PR does not add or alter those surfaces.
Tests: the regression suite updates the two right-join sibling aggregate explain cases from preagg_t3 OFF to ON, which targets the reported shared-context behavior. No .out artifact is expected for these explain contains assertions.
Validation: exact base-to-head git diff --check passed after fetching base SHA 8caf0de. I did not run FE build or regression tests because this runner lacks .worktree_initialized, thirdparty/installed, and executable thirdparty/installed/bin/protoc.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS and added no candidates. tests-session-config reported NO_NEW_VALUABLE_FINDINGS and added no candidates. The main proposed final comment set remained empty. Final convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS against the same ledger and empty proposed inline comment set.

hello-stephen · 2026-07-02T16:16:19Z

FE UT Coverage Report

Increment line coverage 100.00% (39/39) 🎉
Increment coverage report
Complete coverage report

hello-stephen · 2026-07-02T16:27:31Z

TPC-H: Total hot run time: 30003 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43be30c52b5a462eb468e50c8cd5c4b7b1cecad0, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17603	4380	4162	4162
q2	2018	332	201	201
q3	10298	1513	856	856
q4	4738	471	340	340
q5	7985	876	579	579
q6	282	174	140	140
q7	827	828	611	611
q8	10676	1610	1659	1610
q9	5926	4453	4451	4451
q10	6797	1786	1509	1509
q11	515	373	310	310
q12	739	566	462	462
q13	18105	3462	2787	2787
q14	272	266	243	243
q15	q16	789	777	709	709
q17	1000	1065	1063	1063
q18	6862	5793	5703	5703
q19	1169	1303	1049	1049
q20	777	625	574	574
q21	5662	2752	2342	2342
q22	423	366	302	302
Total cold run time: 103463 ms
Total hot run time: 30003 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4483	4351	4361	4351
q2	282	333	213	213
q3	4585	4984	4447	4447
q4	2070	2159	1380	1380
q5	4493	4398	4653	4398
q6	278	215	157	157
q7	2087	1873	1603	1603
q8	2583	2303	2304	2303
q9	7866	7773	7818	7773
q10	4768	4773	4454	4454
q11	616	452	407	407
q12	760	776	562	562
q13	3282	3601	2990	2990
q14	299	317	287	287
q15	q16	731	736	640	640
q17	1378	1390	1367	1367
q18	8199	7627	7057	7057
q19	1112	1119	1123	1119
q20	2224	2278	1951	1951
q21	5513	4687	4685	4685
q22	531	483	444	444
Total cold run time: 58140 ms
Total hot run time: 52588 ms

hello-stephen · 2026-07-02T16:38:41Z

TPC-DS: Total hot run time: 174394 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 43be30c52b5a462eb468e50c8cd5c4b7b1cecad0, data reload: false

query5	4333	632	498	498
query6	464	221	196	196
query7	4951	620	347	347
query8	343	193	177	177
query9	8735	4094	4093	4093
query10	477	363	299	299
query11	5932	2337	2122	2122
query12	157	103	120	103
query13	1256	597	413	413
query14	6276	5369	5011	5011
query14_1	4360	4372	4344	4344
query15	209	201	187	187
query16	989	469	442	442
query17	945	707	564	564
query18	2417	457	329	329
query19	213	195	150	150
query20	109	110	104	104
query21	230	161	135	135
query22	13599	13668	13377	13377
query23	17367	16600	16260	16260
query23_1	16261	16316	16254	16254
query24	7480	1801	1323	1323
query24_1	1341	1316	1322	1316
query25	572	483	407	407
query26	1339	368	214	214
query27	2628	636	395	395
query28	4481	2022	2041	2022
query29	1110	662	503	503
query30	347	258	226	226
query31	1119	1104	987	987
query32	112	68	61	61
query33	549	328	250	250
query34	1189	1155	664	664
query35	779	788	675	675
query36	1362	1344	1208	1208
query37	153	104	96	96
query38	1890	1710	1645	1645
query39	927	921	911	911
query39_1	892	865	893	865
query40	246	163	148	148
query41	67	62	61	61
query42	93	90	92	90
query43	330	339	291	291
query44	1479	790	768	768
query45	200	189	176	176
query46	1103	1210	768	768
query47	2348	2241	2213	2213
query48	367	426	270	270
query49	586	420	340	340
query50	1124	420	346	346
query51	4419	4350	4305	4305
query52	89	88	77	77
query53	254	293	200	200
query54	281	228	234	228
query55	76	71	72	71
query56	295	296	287	287
query57	1420	1377	1338	1338
query58	283	240	242	240
query59	1635	1686	1494	1494
query60	312	269	249	249
query61	157	157	159	157
query62	699	649	590	590
query63	246	211	218	211
query64	2571	775	646	646
query65	4854	4803	4794	4794
query66	1832	498	387	387
query67	29571	29606	29434	29434
query68	3137	1561	1025	1025
query69	435	323	282	282
query70	1107	948	1021	948
query71	384	348	317	317
query72	2953	2619	2333	2333
query73	870	808	427	427
query74	5165	5020	4767	4767
query75	2627	2610	2237	2237
query76	2347	1213	837	837
query77	366	397	287	287
query78	12537	12584	11877	11877
query79	1411	1172	790	790
query80	1278	579	471	471
query81	507	321	287	287
query82	600	160	123	123
query83	425	328	297	297
query84	324	162	131	131
query85	972	607	523	523
query86	424	301	288	288
query87	1839	1838	1775	1775
query88	3769	2829	2840	2829
query89	474	403	357	357
query90	1842	208	203	203
query91	202	189	164	164
query92	59	62	59	59
query93	1557	1560	993	993
query94	724	363	286	286
query95	803	518	559	518
query96	1094	810	359	359
query97	2695	2680	2556	2556
query98	214	211	201	201
query99	1170	1175	1013	1013
Total cold run time: 259300 ms
Total hot run time: 174394 ms

hello-stephen · 2026-07-02T16:43:33Z

ClickBench: Total hot run time: 25.47 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 43be30c52b5a462eb468e50c8cd5c4b7b1cecad0, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.27	0.14	0.14
query4	1.61	0.14	0.15
query5	0.29	0.22	0.22
query6	1.25	1.09	1.07
query7	0.04	0.00	0.00
query8	0.05	0.04	0.04
query9	0.39	0.33	0.32
query10	0.56	0.58	0.56
query11	0.21	0.15	0.14
query12	0.19	0.16	0.14
query13	0.48	0.48	0.47
query14	1.03	1.01	1.00
query15	0.62	0.60	0.60
query16	0.33	0.33	0.33
query17	1.10	1.07	1.13
query18	0.23	0.22	0.22
query19	2.00	1.98	1.95
query20	0.02	0.01	0.01
query21	15.42	0.18	0.14
query22	5.00	0.05	0.05
query23	16.16	0.32	0.13
query24	2.99	0.44	0.35
query25	0.13	0.05	0.04
query26	0.73	0.21	0.17
query27	0.04	0.03	0.04
query28	3.55	0.89	0.55
query29	12.59	4.38	3.49
query30	0.28	0.15	0.18
query31	2.78	0.60	0.32
query32	3.22	0.60	0.49
query33	3.20	3.24	3.28
query34	15.71	4.21	3.52
query35	3.52	3.55	3.52
query36	0.55	0.45	0.44
query37	0.09	0.07	0.06
query38	0.05	0.04	0.03
query39	0.04	0.03	0.03
query40	0.17	0.17	0.15
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 97.17 s
Total hot run time: 25.47 s

hello-stephen · 2026-07-03T04:37:17Z

FE Regression Coverage Report

Increment line coverage 42.39% (39/92) 🎉
Increment coverage report
Complete coverage report

morrySnow · 2026-07-03T08:56:00Z

    @Override
    public Plan visitLogicalAggregate(LogicalAggregate<? extends Plan> logicalAggregate,
-            Stack<PreAggInfoContext> context) {
+            Map<RelationId, PreAggInfoContext> context) {


agg -> join
join -> agg

生成一个空context
后续遍历
再处理这里context

starocean999 changed the title ~~[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack…~~ [refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack Jul 1, 2026

github-actions Bot reviewed Jul 1, 2026

View reviewed changes

[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan con…

43be30c

…text map instead of shared stack

starocean999 force-pushed the master_0414 branch from f60111f to 43be30c Compare July 2, 2026 14:45

github-actions Bot reviewed Jul 2, 2026

View reviewed changes

starocean999 marked this pull request as ready for review July 3, 2026 07:52

starocean999 requested review from 924060929, englefly and morrySnow as code owners July 3, 2026 07:52

morrySnow added the dev/4.2.x label Jul 3, 2026

morrySnow reviewed Jul 3, 2026

View reviewed changes

Uh oh!

Conversation

starocean999 commented Jul 1, 2026

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

starocean999 commented Jul 1, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

starocean999 commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

Uh oh!

hello-stephen commented Jul 1, 2026

FE Regression Coverage Report

Uh oh!

starocean999 commented Jul 2, 2026

Uh oh!

starocean999 commented Jul 2, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Jul 2, 2026

FE UT Coverage Report

Uh oh!

hello-stephen commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 2, 2026

Uh oh!

hello-stephen commented Jul 3, 2026

FE Regression Coverage Report

Uh oh!

morrySnow Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants