[opt](Nereids) strip redundant widening integer cast in SumLiteralRewrite#61220
[opt](Nereids) strip redundant widening integer cast in SumLiteralRewrite#61220
Conversation
…rite SumLiteralRewrite transforms SUM(expr +/- literal) into SUM(expr) +/- literal * COUNT(expr). When type coercion has introduced an implicit widening cast (e.g. CAST(smallint_col AS INT)), the rewritten SUM/COUNT still operates on the wider type, forcing unnecessary wider data reads. This is redundant because SUM always returns BIGINT for any integer input (TINYINT/SMALLINT/INT/BIGINT). Strip implicit widening integer casts in extractSumLiteral() so the aggregate operates on the original narrow column directly. This benefits ClickBench Q29-style queries where SUM(col), SUM(col+1), SUM(col+2) share a narrow integer column — after stripping the cast, SUM(col+1) and SUM(col+2) reuse the existing SUM(col).
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Code Review Summary
This PR adds a cast-stripping optimization to SumLiteralRewrite so that implicit widening integer casts (introduced by type coercion) are removed before constructing the SumInfo. This enables better aggregate sharing for patterns like ClickBench Q29 where SUM(col), SUM(col+1), SUM(col+2) can share a single base SUM(col) + COUNT(col) instead of being treated as separate aggregates due to the intervening cast.
Critical Checkpoint Conclusions
-
Goal / Correctness: The optimization is mathematically correct. SUM returns BIGINT for all integer-like inputs (TINYINT/SMALLINT/INT/BIGINT), so stripping a widening cast does not change the SUM result. COUNT is unaffected by the input type. Nullable semantics are preserved since widening integer casts do not introduce nullability. The
isExplicitType()guard correctly preserves user-written casts. -
Modification scope: The change is small, focused, and well-scoped — 32 lines of production code (one new method + one call site) and 60 lines of tests.
-
Concurrency: Not applicable — this is a single-threaded rewrite rule.
-
Lifecycle / static init: Not applicable.
-
Configuration items: None added.
-
Incompatible changes: None — this is a pure optimizer optimization, no storage/protocol changes.
-
Parallel code paths: The cast stripping is only applied to the
leftoperand ofSUM(expr +/- literal). There are no other functionally parallel paths inSumLiteralRewritethat need the same treatment. -
Test coverage: Two new unit tests cover: (1) implicit cast is stripped and enables sharing, (2) explicit cast is preserved, (3) stripping enables reuse of an existing
SUM(slot). Tests are well-structured. One minor suggestion below. -
Performance: This is a pure optimization — fewer aggregations in the plan.
Minor Suggestion
One inline comment below regarding a defensive widening-direction check. Not a blocker — in the current call context, only widening casts from binary arithmetic type coercion can reach this code path. But the method name promises "widening" while the implementation doesn't verify direction.
Overall: Looks good. Clean, correct, well-tested optimization.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SumLiteralRewrite.java
Show resolved
Hide resolved
|
run buildall |
TPC-H: Total hot run time: 27380 ms |
TPC-DS: Total hot run time: 153430 ms |
What problem does this PR solve?
SumLiteralRewrite transforms SUM(expr +/- literal) into SUM(expr) +/- literal * COUNT(expr). When type coercion has introduced an implicit widening cast (e.g. CAST(smallint_col AS INT)), the rewritten SUM/COUNT still operates on the wider type, forcing unnecessary wider data reads.
This is redundant because SUM always returns BIGINT for any integer input (TINYINT/SMALLINT/INT/BIGINT). Strip implicit widening integer casts in extractSumLiteral() so the aggregate operates on the original narrow column directly.
This benefits ClickBench Q29-style queries where SUM(col), SUM(col+1), SUM(col+2) share a narrow integer column — after stripping the cast, SUM(col+1) and SUM(col+2) reuse the existing SUM(col).
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)