Skip to content

fix(mtmv): Support agg state combinator rollup rewrite for window function materialized view#62607

Open
XnY-wei wants to merge 1 commit into
apache:masterfrom
XnY-wei:bugfix_mv
Open

fix(mtmv): Support agg state combinator rollup rewrite for window function materialized view#62607
XnY-wei wants to merge 1 commit into
apache:masterfrom
XnY-wei:bugfix_mv

Conversation

@XnY-wei
Copy link
Copy Markdown
Contributor

@XnY-wei XnY-wei commented Apr 20, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Support agg state combinator rollup rewrite when materialized view contains
window function with combinator aggregate function.

When MV uses xx_union in a window function, query using xx_merge in the
same window function can now be rewritten to use the MV.

Supported rollup combinations for window aggregate functions:

query (window) materialized view (window) rewrite result
xx_merge(xx_state(col)) OVER (...) xx_union(xx_state(col)) OVER (...) xx_merge(mv_col)

For example, MV definition:

CREATE MATERIALIZED VIEW mv AS SELECT dt, approx_count_distinct_union(approx_count_distinct_state(uuid)) OVER (PARTITION BY 1 ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_uv_state FROM tb_detail GROUP BY dt;
Query that can now be rewritten by MV:

SELECT dt, approx_count_distinct_merge(uv_state) OVER (PARTITION BY 1 ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM ( SELECT dt, approx_count_distinct_union(approx_count_distinct_state(uuid)) AS uv_state FROM tb_detail GROUP BY dt ) t;

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow morrySnow changed the title fix(mtmv): Support agg state combinator rollup rewrite for window fun… fix(mtmv): Support agg state combinator rollup rewrite for window function materialized view Apr 20, 2026
@morrySnow
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking findings:

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/AbstractMaterializedViewWindowRule.java: rollupWindowAggregateFunction() now drops the full WindowExpression down to its aggregate function before picking the MV slot. If the MV exposes multiple window columns with the same aggregate function but different PARTITION BY / ORDER BY / frame specs, this can bind the query to the wrong MV column because the window spec is no longer part of the match.
  2. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/AbstractMaterializedViewWindowRule.java: after selecting an MV window-result slot, visitWindow() keeps the original OVER (...) and returns xx_merge(mv_col) OVER (...). That re-applies a window over a value that is already the MV's window result. This contradicts the PR's intended rewrite shape xx_merge(mv_col) and is wrong for non-idempotent combinators such as sum_union / sum_merge.

Critical checkpoints:

  • Goal of the task: support agg-state combinator rollup rewrite for window-function MVs. Conclusion: not safely achieved yet, because the current implementation can choose the wrong MV window column and can re-window an already-windowed MV result.
  • Is the modification small, clear, and focused: partially. Only one file changes, but the new fallback reuses all aggregate rollup handlers, which broadens behavior beyond the PR's stated merge/union window case and increases correctness risk.
  • Concurrency: not involved. No thread-safety or lock-order issues found.
  • Special lifecycle or static initialization: not involved.
  • Configuration changes: none.
  • Incompatible FE/BE or storage changes: none found.
  • Functionally parallel code paths: yes. This code reuses non-window aggregate rollup handlers, but window rewrites have different semantics and need additional guards that are currently missing.
  • Special conditional checks: the new fallback removes the generic rewrite failure path and adds a looser rollup path, but the safety conditions around window-spec matching are insufficient.
  • Test coverage: insufficient. No FE unit test or regression test was added for the intended supported case, for ambiguous multiple-window outputs, or for non-idempotent combinators.
  • Test results modified: none.
  • Observability: no new logging/metrics appear necessary here.
  • Transaction, persistence, data-write, and FE-BE variable passing concerns: not applicable for this change.
  • Performance: no obvious blocker; correctness is the main issue.

Overall opinion: request changes.

if (viewExpression instanceof WindowExpression) {
viewExpression = ((WindowExpression) viewExpression).getFunction();
}
Pair<Expression, Expression> mvExprToMvScanExprQueryBasedPair = Pair.of(viewExpression,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: this strips the candidate MV output from a full WindowExpression down to just its aggregate function before choosing expressionEntry.getValue(). If the MV has two window outputs like xx_union(...) OVER (PARTITION BY a ORDER BY dt) and xx_union(...) OVER (PARTITION BY b ORDER BY dt), both candidates collapse to the same xx_union(...) here, so the rewriter can bind the query to the wrong MV column. The window spec needs to stay part of the match, or you need an explicit equality check for partition keys, order keys, and frame before using the candidate slot.

}
return super.visitWindow(windowExpression.withFunction(rewrittenFunction), context);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: rewrittenFunction is built on top of expressionEntry.getValue(), which in this new path is the MV's window-result slot. Wrapping it back into windowExpression.withFunction(rewrittenFunction) therefore re-applies the same OVER (...) to an already-windowed value. That only happens to be harmless for idempotent set-style states such as HLL; for non-idempotent combinators like sum_union / sum_merge it changes the result. The rewrite described in the PR body should collapse to the scalar xx_merge(mv_col) once you've selected the MV window column, not xx_merge(mv_col) OVER (...).

@seawinde
Copy link
Copy Markdown
Member

run buildall

materializationContext.getShuttledExprToScanExprMapping(),
viewToQuerySlotMapping));
return null;
expressionsRewritten = rollupWindowAggregateFunctions(queryStructInfo.getExpressions(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this fallback actually be reached for the documented case? Before rewriteQueryByView(), the MV rewrite flow calls StructInfo.isGraphLogicalEquals(), and window nodes appear to be compared as full
WindowExpression objects. A query window using xx_merge(...) and an MV window using xx_union(...) may be rejected before this fallback runs. Please add a regression test proving the PR example is
rewritten, or handle this equivalence during graph comparison.

context.setValid(false);
return windowExpression;
}
return super.visitWindow(windowExpression.withFunction(rewrittenFunction), context);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to keep the expression as a WindowExpression after rollup. The PR description says the rewrite result should be xx_merge(mv_col), but this code builds xx_merge(mv_col) OVER (...). That can
compute a new window over the already-windowed MV column instead of using the precomputed MV result directly, which may double-aggregate state values. Should this return the rolled-up scalar function
instead of windowExpression.withFunction(...)?

for (Map.Entry<Expression, Expression> expressionEntry : mvExprToMvScanExprQueryBased.entrySet()) {
Expression viewExpression = expressionEntry.getKey();
// Window mapping keys may be full WindowExpression while rollup handlers match aggregate functions.
if (viewExpression instanceof WindowExpression) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping the WindowExpression wrapper here removes the partition/order/frame information from the rollup check. The aggregate rollup handlers only validate function compatibility, so a view expression
with the same aggregate function but a different window spec could be selected incorrectly if graph comparison is relaxed for this feature. Please explicitly verify the query and MV window specs match
before using the MV window column.

@seawinde
Copy link
Copy Markdown
Member

This optimizer change needs regression coverage. At minimum, please add one positive case for the documented xx_merge(...) OVER (...) query rewritten by an MV containing xx_union(...) OVER (...), and
negative cases for mismatched PARTITION BY / ORDER BY / frame and for expressions containing slots not available from the MV scan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants