Skip to content

[opt](Nereids) strip redundant widening integer cast in SumLiteralRewrite#61220

Closed
englefly wants to merge 1 commit intomasterfrom
sum-literal-cast
Closed

[opt](Nereids) strip redundant widening integer cast in SumLiteralRewrite#61220
englefly wants to merge 1 commit intomasterfrom
sum-literal-cast

Conversation

@englefly
Copy link
Contributor

What problem does this PR solve?

SumLiteralRewrite transforms SUM(expr +/- literal) into SUM(expr) +/- literal * COUNT(expr). When type coercion has introduced an implicit widening cast (e.g. CAST(smallint_col AS INT)), the rewritten SUM/COUNT still operates on the wider type, forcing unnecessary wider data reads.

This is redundant because SUM always returns BIGINT for any integer input (TINYINT/SMALLINT/INT/BIGINT). Strip implicit widening integer casts in extractSumLiteral() so the aggregate operates on the original narrow column directly.

This benefits ClickBench Q29-style queries where SUM(col), SUM(col+1), SUM(col+2) share a narrow integer column — after stripping the cast, SUM(col+1) and SUM(col+2) reuse the existing SUM(col).

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…rite

SumLiteralRewrite transforms SUM(expr +/- literal) into
SUM(expr) +/- literal * COUNT(expr). When type coercion has introduced
an implicit widening cast (e.g. CAST(smallint_col AS INT)), the
rewritten SUM/COUNT still operates on the wider type, forcing
unnecessary wider data reads.

This is redundant because SUM always returns BIGINT for any integer
input (TINYINT/SMALLINT/INT/BIGINT). Strip implicit widening integer
casts in extractSumLiteral() so the aggregate operates on the original
narrow column directly.

This benefits ClickBench Q29-style queries where SUM(col), SUM(col+1),
SUM(col+2) share a narrow integer column — after stripping the cast,
SUM(col+1) and SUM(col+2) reuse the existing SUM(col).
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Contributor Author

run buildall

@englefly
Copy link
Contributor Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR adds a cast-stripping optimization to SumLiteralRewrite so that implicit widening integer casts (introduced by type coercion) are removed before constructing the SumInfo. This enables better aggregate sharing for patterns like ClickBench Q29 where SUM(col), SUM(col+1), SUM(col+2) can share a single base SUM(col) + COUNT(col) instead of being treated as separate aggregates due to the intervening cast.

Critical Checkpoint Conclusions

  • Goal / Correctness: The optimization is mathematically correct. SUM returns BIGINT for all integer-like inputs (TINYINT/SMALLINT/INT/BIGINT), so stripping a widening cast does not change the SUM result. COUNT is unaffected by the input type. Nullable semantics are preserved since widening integer casts do not introduce nullability. The isExplicitType() guard correctly preserves user-written casts.

  • Modification scope: The change is small, focused, and well-scoped — 32 lines of production code (one new method + one call site) and 60 lines of tests.

  • Concurrency: Not applicable — this is a single-threaded rewrite rule.

  • Lifecycle / static init: Not applicable.

  • Configuration items: None added.

  • Incompatible changes: None — this is a pure optimizer optimization, no storage/protocol changes.

  • Parallel code paths: The cast stripping is only applied to the left operand of SUM(expr +/- literal). There are no other functionally parallel paths in SumLiteralRewrite that need the same treatment.

  • Test coverage: Two new unit tests cover: (1) implicit cast is stripped and enables sharing, (2) explicit cast is preserved, (3) stripping enables reuse of an existing SUM(slot). Tests are well-structured. One minor suggestion below.

  • Performance: This is a pure optimization — fewer aggregations in the plan.

Minor Suggestion

One inline comment below regarding a defensive widening-direction check. Not a blocker — in the current call context, only widening casts from binary arithmetic type coercion can reach this code path. But the method name promises "widening" while the implementation doesn't verify direction.

Overall: Looks good. Clean, correct, well-tested optimization.

@englefly
Copy link
Contributor Author

run buildall

@englefly englefly closed this Mar 11, 2026
@doris-robot
Copy link

TPC-H: Total hot run time: 27380 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8619e768ee7f3ba83057db7b134165654e3136d3, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17697	4483	4317	4317
q2	q3	10657	770	512	512
q4	4679	358	251	251
q5	7562	1186	1017	1017
q6	173	174	150	150
q7	793	832	680	680
q8	9288	1466	1301	1301
q9	4864	4592	4687	4592
q10	6329	1895	1628	1628
q11	473	272	239	239
q12	767	569	475	475
q13	18052	2972	2199	2199
q14	230	231	213	213
q15	934	791	810	791
q16	768	727	675	675
q17	710	856	413	413
q18	6243	5397	5155	5155
q19	1156	969	602	602
q20	488	492	386	386
q21	4519	2065	1506	1506
q22	350	346	278	278
Total cold run time: 96732 ms
Total hot run time: 27380 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4584	4619	4549	4549
q2	q3	3975	4404	3881	3881
q4	887	1220	793	793
q5	4034	4380	4285	4285
q6	178	174	141	141
q7	1769	1595	1526	1526
q8	2496	2702	2527	2527
q9	7660	7405	7181	7181
q10	3769	3965	3663	3663
q11	539	449	421	421
q12	511	600	447	447
q13	2762	3246	2365	2365
q14	277	287	282	282
q15	846	817	822	817
q16	751	831	735	735
q17	1157	1368	1452	1368
q18	7211	6839	6607	6607
q19	836	848	881	848
q20	2059	2139	2025	2025
q21	3960	3553	3370	3370
q22	444	444	384	384
Total cold run time: 50705 ms
Total hot run time: 48215 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153430 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8619e768ee7f3ba83057db7b134165654e3136d3, data reload: false

query5	4320	657	541	541
query6	362	231	222	222
query7	4355	487	282	282
query8	359	252	237	237
query9	9656	2827	2854	2827
query10	548	387	346	346
query11	7427	5914	5566	5566
query12	180	128	126	126
query13	1266	461	362	362
query14	5680	3838	3624	3624
query14_1	2796	2838	2804	2804
query15	207	200	177	177
query16	993	474	445	445
query17	1107	727	619	619
query18	2448	453	347	347
query19	217	210	181	181
query20	134	134	129	129
query21	231	150	126	126
query22	4916	4989	4803	4803
query23	16062	15603	15439	15439
query23_1	15897	15965	15993	15965
query24	7523	1854	1283	1283
query24_1	1279	1279	1317	1279
query25	637	582	493	493
query26	1282	288	161	161
query27	3348	637	333	333
query28	5438	2005	1961	1961
query29	931	633	513	513
query30	323	272	223	223
query31	1438	1427	1275	1275
query32	78	84	69	69
query33	513	339	320	320
query34	952	968	595	595
query35	688	713	617	617
query36	1121	1243	1010	1010
query37	133	103	81	81
query38	2936	2928	2928	2928
query39	895	870	847	847
query39_1	822	830	825	825
query40	231	151	138	138
query41	66	60	58	58
query42	303	310	306	306
query43	251	251	222	222
query44	
query45	202	190	179	179
query46	877	988	601	601
query47	2108	2117	2055	2055
query48	323	329	227	227
query49	628	465	403	403
query50	677	283	210	210
query51	4092	4081	4063	4063
query52	292	291	285	285
query53	287	347	293	293
query54	310	271	293	271
query55	97	96	81	81
query56	321	325	322	322
query57	1377	1342	1269	1269
query58	290	276	271	271
query59	1326	1456	1244	1244
query60	335	334	328	328
query61	148	140	154	140
query62	645	589	536	536
query63	316	274	277	274
query64	5016	1270	976	976
query65	
query66	1454	449	349	349
query67	16367	16330	16305	16305
query68	
query69	395	312	288	288
query70	992	980	951	951
query71	344	315	314	314
query72	2857	2857	2426	2426
query73	533	551	314	314
query74	10072	9995	9761	9761
query75	2833	2753	2457	2457
query76	2288	1037	684	684
query77	357	383	315	315
query78	11124	11351	10668	10668
query79	1877	766	602	602
query80	1429	642	550	550
query81	550	273	243	243
query82	1009	152	120	120
query83	332	267	241	241
query84	247	117	105	105
query85	913	475	449	449
query86	421	336	303	303
query87	3199	3114	3000	3000
query88	3505	2637	2615	2615
query89	431	371	344	344
query90	2024	177	171	171
query91	168	166	139	139
query92	83	80	71	71
query93	1019	827	488	488
query94	635	282	311	282
query95	579	402	315	315
query96	641	524	227	227
query97	2498	2464	2395	2395
query98	228	219	215	215
query99	1000	965	928	928
Total cold run time: 238130 ms
Total hot run time: 153430 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants