Skip to content

[feature](eager-agg) Bilateral push-down for eager aggregation#63690

Open
feiniaofeiafei wants to merge 3 commits into
apache:masterfrom
feiniaofeiafei:costbasedRm2
Open

[feature](eager-agg) Bilateral push-down for eager aggregation#63690
feiniaofeiafei wants to merge 3 commits into
apache:masterfrom
feiniaofeiafei:costbasedRm2

Conversation

@feiniaofeiafei
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
The current eager aggregation push-down in Nereids mainly covers one-sided pre-aggregation across joins. When aggregate functions reference columns from both sides of an inner join, the optimizer usually can only push aggregation to one side, or give up push-down entirely. As a result, the input size before the join cannot be reduced further, and optimization opportunities for more complex join shapes are limited.

This PR extends PushDownAggregation to support bilateral eager aggregation push-down for eligible inner joins. For aggregates such as sum/count/min/max, the optimizer can build partial aggregates on both join branches and restore join multiplicity during the upper rollup phase using the count(*) information produced by the opposite branch. In this process, sum/count are scaled by branch multiplicity before the final aggregation, while min/max are rolled up directly without multiplier adjustment.

To support this path, this PR also adds the required state propagation logic across join/project/union/filter during bilateral rewrite, and introduces force_eager_agg_hint for testing and debugging. The hint is matched by aggregate-function key, but its effect is applied at the current candidate push-down branch level: if any matched entry in a branch is nopush, push-down is disabled for that branch; otherwise, if any matched entry is push, push-down may be forced for that branch, and the other aggregates in the same branch follow that branch-level decision.

In addition, this PR moves init join order before eager aggregation so that bilateral push-down can work on a more stable join shape, and adds corresponding FE unit tests and query cases.

Release note

The Nereids optimizer now supports more eager aggregation push-down scenarios. For eligible inner joins, it can pre-aggregate both join branches and provides force_eager_agg_hint for branch-level testing/debug control.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Extend PushDownAggregation so that an aggregate above an
inner/cross join can be pushed down to *both* sides simultaneously. Each side
pre-aggregates its own aggregate functions and emits an extra count(*) that
records the row-multiplicity of the pre-aggregated groups. At the top-level
rollup we then combine each side's partial aggregate with the opposite side's
count(*) as a multiplier, using the unified formula
  acc = value * m1 * m2 * ...
  sum(value)    -> sum(acc)
  count(value)  -> ifnull(sum(acc), 0)
  min/max(value)-> min/max(pushedSlot)    (multipliers ignored)

Implementation follows the map1/map2/countList framework from the design doc:
  - map1: pushDownExprId -> list of cnt(*) multiplier slots
  - map2: pushDownExprId -> value expression (partial slot)
  - countList: per-scope list of cnt(*) slots produced by leaf aggregates

The map1/map2 state is shared across one rewrite invocation; inner joins
inherit the parent's countList, so 3-way (and deeper) inner joins compose
naturally (each branch's cnt(*) contributes to every other branch's rollup).

None

- Test: Unit Test
    - Added EagerAggRewriterTest#testBilateralPushMinMax,
      testBilateralPushSumCount, testBilateralPushMultiLevelJoin
      covering the three design-document scenarios. All 13 tests pass.
- Behavior changed: No (new rewrite only activates when both sides of an
  inner/cross join are valid push targets; falls back to prior behavior otherwise)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

init

init

fix

fix

add session variable force_eager_agg_hint

fix

fix

fix

fix

refactor

fix hint through union

fix

Fix outer join compute jcnt

Fix

make init join order do before push down aggregation

fix produce alias:c1#01 as c1#01

make funtion name easy to read

fix

fix

Fix

fix
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Copy Markdown
Contributor Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants