[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack#65095
[refactor](fe) Refactor SetPreAggStatus traversal to use per-scan context map instead of shared stack#65095starocean999 wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Automated review completed. I did not find a substantiated blocking issue in this PR.
Critical checkpoint conclusions:
- Goal: the PR replaces the shared stack in
SetPreAggStatuswith per-RelationIdcontexts. The changed traversal matches that goal and avoids sharing scan context between sibling branches. - Scope: the change is focused to the pre-aggregation status rewriter plus two regression
explainexpectations. - Optimizer correctness: the snapshot/diff pattern was checked against
DefaultPlanRewriter.visitChildren,LogicalFilter,LogicalJoin,LogicalProject,LogicalAggregate, andLogicalRepeat. The handled nodes preserve context until the nearest aggregate consumes it; generic nodes remove only contexts introduced below that node, preserving pre-existing sibling/ancestor contexts. - Parallel paths: rule placement remains in the existing late rewrite stage, and the second registration path still uses the same custom rule. I did not find a missed sibling join/aggregate path in the changed code.
- Configuration, concurrency, lifecycle, persistence, compatibility, FE/BE protocol, and storage format: not applicable; this PR does not add or alter those surfaces.
- Tests: the regression suite updates the two right-join sibling aggregate cases from
preagg_t3OFF to ON, which targets the reported shared-context behavior. No.outartifact is expected for theseexplain { contains ... }assertions. - Validation: exact PR-range
git diff --checkpassed. I did not run FE build or regression tests because this runner lacks.worktree_initialized,thirdparty/installed, and executablethirdparty/installed/bin/protoc.
User focus: no additional user-provided review focus was present.
Subagent conclusions: optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS. tests-session-config proposed TSC-1 for trailing whitespace, but I dismissed it because it came from git show --check HEAD in a grafted checkout; the exact base-to-head git diff --check passes and the PR patch changes only two assertion strings. Final convergence round 1 ended with both subagents reporting NO_NEW_VALUABLE_FINDINGS against the same ledger and empty proposed inline comment set.
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 30121 ms |
TPC-DS: Total hot run time: 174162 ms |
ClickBench: Total hot run time: 25.47 s |
FE Regression Coverage ReportIncrement line coverage |
…text map instead of shared stack
|
/review |
|
run buildall |
There was a problem hiding this comment.
Automated review completed. I did not find a substantiated blocking issue in this PR.
Critical checkpoint conclusions:
- Goal: the PR replaces the shared stack in SetPreAggStatus with per-RelationId scan contexts. The changed traversal matches that goal and prevents sibling subtrees from sharing one pre-aggregation context.
- Scope: the change is focused to the Nereids pre-aggregation status rewriter plus two regression explain expectations.
- Optimizer correctness: I checked DefaultPlanRewriter.visitChildren, LogicalFilter, LogicalJoin, LogicalProject, LogicalAggregate, LogicalRepeat, SetOlapScanPreAgg, rule registration, and alias replacement. Contexts are added per scan, enriched only by ancestors whose child traversal introduced that scan context, and consumed at the nearest aggregate.
- Parallel paths: both SET_PREAGG_STATUS registrations continue to use the same custom rule. I did not find a missed sibling join or aggregate path in the changed code.
- Configuration, concurrency, lifecycle, persistence, compatibility, FE/BE protocol, and storage format: not applicable; this PR does not add or alter those surfaces.
- Tests: the regression suite updates the two right-join sibling aggregate explain cases from preagg_t3 OFF to ON, which targets the reported shared-context behavior. No .out artifact is expected for these explain contains assertions.
- Validation: exact base-to-head git diff --check passed after fetching base SHA 8caf0de. I did not run FE build or regression tests because this runner lacks .worktree_initialized, thirdparty/installed, and executable thirdparty/installed/bin/protoc.
User focus: no additional user-provided review focus was present.
Subagent conclusions: optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS and added no candidates. tests-session-config reported NO_NEW_VALUABLE_FINDINGS and added no candidates. The main proposed final comment set remained empty. Final convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS against the same ledger and empty proposed inline comment set.
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 30003 ms |
TPC-DS: Total hot run time: 174394 ms |
ClickBench: Total hot run time: 25.47 s |
FE Regression Coverage ReportIncrement line coverage |
| @Override | ||
| public Plan visitLogicalAggregate(LogicalAggregate<? extends Plan> logicalAggregate, | ||
| Stack<PreAggInfoContext> context) { | ||
| Map<RelationId, PreAggInfoContext> context) { |
There was a problem hiding this comment.
agg -> join
join -> agg
生成一个空context
后续遍历
再处理这里context
Related PR: #48502
Problem Summary:
The SetPreAggStatus rule uses a bottom-up traversal to determine whether
pre-aggregation can be enabled for OlapScan nodes. The original implementation
used a
Stack<PreAggInfoContext>as the traversal context, which had afundamental flaw: multiple scan nodes under sibling subtrees (e.g., both sides
of a Join) incorrectly shared the same stack-level context.
This caused two problems:
of the plan tree.
visitLogicalAggregateusedretainAll()to filter outscan nodes that were not actually children of the current aggregate — an
inelegant patch that partially masked the underlying issue.
Root cause: The stack-based context could not guarantee a one-to-one
correspondence between a scan node and its nearest aggregate ancestor.
Fix: Replace the
Stack<PreAggInfoContext>with aMap<RelationId, PreAggInfoContext>, where each scan node gets its own dedicated context entry.Each visitor method uses a snapshot-diff pattern:
At
LogicalAggregate, child scan contexts are consumed (removed from the map)and their results are stored — no more
retainAll()filtering needed.None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)