[feature](eager-agg) Bilateral push-down for eager aggregation#63690
Open
feiniaofeiafei wants to merge 3 commits into
Open
[feature](eager-agg) Bilateral push-down for eager aggregation#63690feiniaofeiafei wants to merge 3 commits into
feiniaofeiafei wants to merge 3 commits into
Conversation
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Extend PushDownAggregation so that an aggregate above an
inner/cross join can be pushed down to *both* sides simultaneously. Each side
pre-aggregates its own aggregate functions and emits an extra count(*) that
records the row-multiplicity of the pre-aggregated groups. At the top-level
rollup we then combine each side's partial aggregate with the opposite side's
count(*) as a multiplier, using the unified formula
acc = value * m1 * m2 * ...
sum(value) -> sum(acc)
count(value) -> ifnull(sum(acc), 0)
min/max(value)-> min/max(pushedSlot) (multipliers ignored)
Implementation follows the map1/map2/countList framework from the design doc:
- map1: pushDownExprId -> list of cnt(*) multiplier slots
- map2: pushDownExprId -> value expression (partial slot)
- countList: per-scope list of cnt(*) slots produced by leaf aggregates
The map1/map2 state is shared across one rewrite invocation; inner joins
inherit the parent's countList, so 3-way (and deeper) inner joins compose
naturally (each branch's cnt(*) contributes to every other branch's rollup).
None
- Test: Unit Test
- Added EagerAggRewriterTest#testBilateralPushMinMax,
testBilateralPushSumCount, testBilateralPushMultiLevelJoin
covering the three design-document scenarios. All 13 tests pass.
- Behavior changed: No (new rewrite only activates when both sides of an
inner/cross join are valid push targets; falls back to prior behavior otherwise)
- Does this need documentation: No
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
init
init
fix
fix
add session variable force_eager_agg_hint
fix
fix
fix
fix
refactor
fix hint through union
fix
Fix outer join compute jcnt
Fix
make init join order do before push down aggregation
fix produce alias:c1#01 as c1#01
make funtion name easy to read
fix
fix
Fix
fix
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
The current eager aggregation push-down in Nereids mainly covers one-sided pre-aggregation across joins. When aggregate functions reference columns from both sides of an inner join, the optimizer usually can only push aggregation to one side, or give up push-down entirely. As a result, the input size before the join cannot be reduced further, and optimization opportunities for more complex join shapes are limited.
This PR extends
PushDownAggregationto support bilateral eager aggregation push-down for eligible inner joins. For aggregates such as sum/count/min/max, the optimizer can build partial aggregates on both join branches and restore join multiplicity during the upper rollup phase using thecount(*)information produced by the opposite branch. In this process, sum/count are scaled by branch multiplicity before the final aggregation, while min/max are rolled up directly without multiplier adjustment.To support this path, this PR also adds the required state propagation logic across join/project/union/filter during bilateral rewrite, and introduces
force_eager_agg_hintfor testing and debugging. The hint is matched by aggregate-function key, but its effect is applied at the current candidate push-down branch level: if any matched entry in a branch isnopush, push-down is disabled for that branch; otherwise, if any matched entry ispush, push-down may be forced for that branch, and the other aggregates in the same branch follow that branch-level decision.In addition, this PR moves init join order before eager aggregation so that bilateral push-down can work on a more stable join shape, and adds corresponding FE unit tests and query cases.
Release note
The Nereids optimizer now supports more eager aggregation push-down scenarios. For eligible inner joins, it can pre-aggregate both join branches and provides
force_eager_agg_hintfor branch-level testing/debug control.Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)