fix(mtmv): Support agg state combinator rollup rewrite for window function materialized view#62607
fix(mtmv): Support agg state combinator rollup rewrite for window function materialized view#62607XnY-wei wants to merge 1 commit into
Conversation
…ction materialized view
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Blocking findings:
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/AbstractMaterializedViewWindowRule.java:rollupWindowAggregateFunction()now drops the fullWindowExpressiondown to its aggregate function before picking the MV slot. If the MV exposes multiple window columns with the same aggregate function but differentPARTITION BY/ORDER BY/ frame specs, this can bind the query to the wrong MV column because the window spec is no longer part of the match.fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/AbstractMaterializedViewWindowRule.java: after selecting an MV window-result slot,visitWindow()keeps the originalOVER (...)and returnsxx_merge(mv_col) OVER (...). That re-applies a window over a value that is already the MV's window result. This contradicts the PR's intended rewrite shapexx_merge(mv_col)and is wrong for non-idempotent combinators such assum_union/sum_merge.
Critical checkpoints:
- Goal of the task: support agg-state combinator rollup rewrite for window-function MVs. Conclusion: not safely achieved yet, because the current implementation can choose the wrong MV window column and can re-window an already-windowed MV result.
- Is the modification small, clear, and focused: partially. Only one file changes, but the new fallback reuses all aggregate rollup handlers, which broadens behavior beyond the PR's stated merge/union window case and increases correctness risk.
- Concurrency: not involved. No thread-safety or lock-order issues found.
- Special lifecycle or static initialization: not involved.
- Configuration changes: none.
- Incompatible FE/BE or storage changes: none found.
- Functionally parallel code paths: yes. This code reuses non-window aggregate rollup handlers, but window rewrites have different semantics and need additional guards that are currently missing.
- Special conditional checks: the new fallback removes the generic rewrite failure path and adds a looser rollup path, but the safety conditions around window-spec matching are insufficient.
- Test coverage: insufficient. No FE unit test or regression test was added for the intended supported case, for ambiguous multiple-window outputs, or for non-idempotent combinators.
- Test results modified: none.
- Observability: no new logging/metrics appear necessary here.
- Transaction, persistence, data-write, and FE-BE variable passing concerns: not applicable for this change.
- Performance: no obvious blocker; correctness is the main issue.
Overall opinion: request changes.
| if (viewExpression instanceof WindowExpression) { | ||
| viewExpression = ((WindowExpression) viewExpression).getFunction(); | ||
| } | ||
| Pair<Expression, Expression> mvExprToMvScanExprQueryBasedPair = Pair.of(viewExpression, |
There was a problem hiding this comment.
Blocking: this strips the candidate MV output from a full WindowExpression down to just its aggregate function before choosing expressionEntry.getValue(). If the MV has two window outputs like xx_union(...) OVER (PARTITION BY a ORDER BY dt) and xx_union(...) OVER (PARTITION BY b ORDER BY dt), both candidates collapse to the same xx_union(...) here, so the rewriter can bind the query to the wrong MV column. The window spec needs to stay part of the match, or you need an explicit equality check for partition keys, order keys, and frame before using the candidate slot.
| } | ||
| return super.visitWindow(windowExpression.withFunction(rewrittenFunction), context); | ||
| } | ||
|
|
There was a problem hiding this comment.
Blocking: rewrittenFunction is built on top of expressionEntry.getValue(), which in this new path is the MV's window-result slot. Wrapping it back into windowExpression.withFunction(rewrittenFunction) therefore re-applies the same OVER (...) to an already-windowed value. That only happens to be harmless for idempotent set-style states such as HLL; for non-idempotent combinators like sum_union / sum_merge it changes the result. The rewrite described in the PR body should collapse to the scalar xx_merge(mv_col) once you've selected the MV window column, not xx_merge(mv_col) OVER (...).
|
run buildall |
| materializationContext.getShuttledExprToScanExprMapping(), | ||
| viewToQuerySlotMapping)); | ||
| return null; | ||
| expressionsRewritten = rollupWindowAggregateFunctions(queryStructInfo.getExpressions(), |
There was a problem hiding this comment.
Can this fallback actually be reached for the documented case? Before rewriteQueryByView(), the MV rewrite flow calls StructInfo.isGraphLogicalEquals(), and window nodes appear to be compared as full
WindowExpression objects. A query window using xx_merge(...) and an MV window using xx_union(...) may be rejected before this fallback runs. Please add a regression test proving the PR example is
rewritten, or handle this equivalence during graph comparison.
| context.setValid(false); | ||
| return windowExpression; | ||
| } | ||
| return super.visitWindow(windowExpression.withFunction(rewrittenFunction), context); |
There was a problem hiding this comment.
This seems to keep the expression as a WindowExpression after rollup. The PR description says the rewrite result should be xx_merge(mv_col), but this code builds xx_merge(mv_col) OVER (...). That can
compute a new window over the already-windowed MV column instead of using the precomputed MV result directly, which may double-aggregate state values. Should this return the rolled-up scalar function
instead of windowExpression.withFunction(...)?
| for (Map.Entry<Expression, Expression> expressionEntry : mvExprToMvScanExprQueryBased.entrySet()) { | ||
| Expression viewExpression = expressionEntry.getKey(); | ||
| // Window mapping keys may be full WindowExpression while rollup handlers match aggregate functions. | ||
| if (viewExpression instanceof WindowExpression) { |
There was a problem hiding this comment.
Dropping the WindowExpression wrapper here removes the partition/order/frame information from the rollup check. The aggregate rollup handlers only validate function compatibility, so a view expression
with the same aggregate function but a different window spec could be selected incorrectly if graph comparison is relaxed for this feature. Please explicitly verify the query and MV window specs match
before using the MV window column.
|
This optimizer change needs regression coverage. At minimum, please add one positive case for the documented xx_merge(...) OVER (...) query rewritten by an MV containing xx_union(...) OVER (...), and |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Support agg state combinator rollup rewrite when materialized view contains
window function with combinator aggregate function.
When MV uses
xx_unionin a window function, query usingxx_mergein thesame window function can now be rewritten to use the MV.
Supported rollup combinations for window aggregate functions:
xx_merge(xx_state(col)) OVER (...)xx_union(xx_state(col)) OVER (...)xx_merge(mv_col)For example, MV definition:
CREATE MATERIALIZED VIEW mv AS SELECT dt, approx_count_distinct_union(approx_count_distinct_state(uuid)) OVER (PARTITION BY 1 ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_uv_state FROM tb_detail GROUP BY dt;Query that can now be rewritten by MV:
SELECT dt, approx_count_distinct_merge(uv_state) OVER (PARTITION BY 1 ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM ( SELECT dt, approx_count_distinct_union(approx_count_distinct_state(uuid)) AS uv_state FROM tb_detail GROUP BY dt ) t;Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)