Skip to content

[fix](nereids) EliminateGroupByConstant should replace agg's group by after removing constant group by keys #49473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Mar 25, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

 SELECT
    IF(
        t.`gender` IN (''),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN (''),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN (''),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will report error in normalizeagg.
This pr changes using the foldmap rewrite the group by expresssion, and after change the sql after EliminateGroupByConstant become:

 SELECT
    IF(
        t.`gender` IN (''),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN (''),
        0,
        1
    ) ;

the select expression and the group by expression becomes same.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei feiniaofeiafei force-pushed the fix_eliminate_group_by_constant branch from 34c8909 to f1d2701 Compare March 25, 2025 10:06
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@englefly
Copy link
Contributor

any test case?

@doris-robot
Copy link

TPC-H: Total hot run time: 34175 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f1d270169a67c7bd31c6e2c46e23e518894294c2, data reload: false

------ Round 1 ----------------------------------
q1	25855	5064	5067	5064
q2	2067	327	177	177
q3	10366	1319	686	686
q4	10222	1000	527	527
q5	7541	2377	2415	2377
q6	195	166	134	134
q7	924	736	618	618
q8	9343	1292	1121	1121
q9	6983	5149	5227	5149
q10	6794	2316	1889	1889
q11	469	275	262	262
q12	345	363	221	221
q13	17775	3659	3056	3056
q14	227	224	199	199
q15	552	497	475	475
q16	619	620	582	582
q17	578	865	347	347
q18	7522	7175	7266	7175
q19	1220	964	562	562
q20	332	320	190	190
q21	3963	3313	2397	2397
q22	1092	1021	967	967
Total cold run time: 114984 ms
Total hot run time: 34175 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5102	5106	5131	5106
q2	242	330	229	229
q3	2102	2657	2304	2304
q4	1409	1797	1376	1376
q5	4514	4514	4462	4462
q6	213	164	127	127
q7	1996	1957	1795	1795
q8	2584	2633	2502	2502
q9	7326	7329	7239	7239
q10	2950	3155	2748	2748
q11	577	494	502	494
q12	696	789	624	624
q13	3447	3881	3253	3253
q14	282	293	267	267
q15	530	480	464	464
q16	676	675	653	653
q17	1163	1619	1357	1357
q18	7574	7611	7474	7474
q19	813	797	874	797
q20	1968	2004	1807	1807
q21	5205	4617	4667	4617
q22	1083	1028	1008	1008
Total cold run time: 52452 ms
Total hot run time: 50703 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186302 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f1d270169a67c7bd31c6e2c46e23e518894294c2, data reload: false

query1	1018	463	465	463
query2	6544	1911	1924	1911
query3	6800	227	226	226
query4	26548	23936	23338	23338
query5	4378	670	465	465
query6	303	192	184	184
query7	4612	490	265	265
query8	294	229	218	218
query9	8610	2577	2585	2577
query10	465	315	268	268
query11	15411	15162	14863	14863
query12	155	108	102	102
query13	1633	519	374	374
query14	8613	6072	6056	6056
query15	212	185	168	168
query16	7130	638	455	455
query17	917	696	537	537
query18	1962	421	290	290
query19	181	180	151	151
query20	120	112	119	112
query21	216	126	102	102
query22	4528	4286	4158	4158
query23	33806	32976	33075	32976
query24	8475	2334	2420	2334
query25	535	444	394	394
query26	1238	266	142	142
query27	2757	495	322	322
query28	4365	2392	2378	2378
query29	781	598	445	445
query30	289	218	197	197
query31	913	885	756	756
query32	77	75	66	66
query33	574	374	365	365
query34	773	842	500	500
query35	781	818	743	743
query36	937	957	871	871
query37	120	106	80	80
query38	4172	4025	4207	4025
query39	1457	1385	1447	1385
query40	203	116	108	108
query41	55	54	52	52
query42	120	107	101	101
query43	510	511	481	481
query44	1292	801	784	784
query45	175	170	166	166
query46	850	1013	615	615
query47	1750	1871	1764	1764
query48	378	403	297	297
query49	780	508	425	425
query50	679	726	417	417
query51	4223	4147	4114	4114
query52	113	104	103	103
query53	220	249	176	176
query54	479	489	421	421
query55	84	82	79	79
query56	278	274	271	271
query57	1155	1175	1113	1113
query58	246	242	235	235
query59	2591	2768	2765	2765
query60	277	269	272	269
query61	126	122	127	122
query62	776	740	678	678
query63	228	185	185	185
query64	4464	1005	682	682
query65	4320	4234	4251	4234
query66	1177	435	313	313
query67	15711	15661	15408	15408
query68	7938	879	511	511
query69	480	314	274	274
query70	1205	1128	1071	1071
query71	389	319	273	273
query72	5755	4817	4599	4599
query73	647	555	344	344
query74	8963	9014	8915	8915
query75	3252	3230	2739	2739
query76	3164	1188	756	756
query77	487	361	282	282
query78	10051	10283	9342	9342
query79	2270	813	572	572
query80	607	510	442	442
query81	578	257	217	217
query82	480	126	101	101
query83	178	172	157	157
query84	286	96	74	74
query85	773	349	305	305
query86	378	316	292	292
query87	4535	4721	4410	4410
query88	3957	2252	2266	2252
query89	382	309	279	279
query90	1897	214	206	206
query91	147	146	111	111
query92	77	61	59	59
query93	1849	1053	580	580
query94	711	413	311	311
query95	356	272	266	266
query96	542	563	288	288
query97	3129	3231	3119	3119
query98	222	211	198	198
query99	1638	1430	1317	1317
Total cold run time: 273283 ms
Total hot run time: 186302 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f1d270169a67c7bd31c6e2c46e23e518894294c2, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.10
query3	0.25	0.19	0.19
query4	1.60	0.19	0.19
query5	0.57	0.54	0.54
query6	1.19	0.72	0.73
query7	0.03	0.01	0.01
query8	0.04	0.04	0.04
query9	0.58	0.54	0.52
query10	0.63	0.59	0.56
query11	0.15	0.11	0.10
query12	0.15	0.11	0.12
query13	0.62	0.61	0.59
query14	2.74	2.72	2.81
query15	0.94	0.86	0.86
query16	0.39	0.37	0.39
query17	1.03	1.03	1.04
query18	0.22	0.21	0.20
query19	2.08	1.80	1.86
query20	0.01	0.01	0.01
query21	15.35	0.90	0.55
query22	0.76	1.11	0.60
query23	15.08	1.40	0.63
query24	6.97	1.69	1.22
query25	0.43	0.23	0.20
query26	0.61	0.17	0.14
query27	0.05	0.06	0.05
query28	9.23	0.88	0.44
query29	12.55	4.18	3.43
query30	0.25	0.09	0.06
query31	2.82	0.61	0.39
query32	3.23	0.57	0.46
query33	2.99	3.03	3.11
query34	15.80	5.25	4.55
query35	4.51	4.54	4.56
query36	0.67	0.49	0.49
query37	0.09	0.07	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.03
query40	0.18	0.13	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.16 s
Total hot run time: 31.79 s

@feiniaofeiafei feiniaofeiafei force-pushed the fix_eliminate_group_by_constant branch from f1d2701 to 40e6a29 Compare March 25, 2025 10:57
@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34471 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 40e6a2950b4026fb4699cb10a24a36435475987c, data reload: false

------ Round 1 ----------------------------------
q1	25917	5161	5334	5161
q2	2064	295	169	169
q3	10533	1302	707	707
q4	10245	1015	551	551
q5	7543	2424	2309	2309
q6	186	165	135	135
q7	923	790	629	629
q8	9323	1242	1155	1155
q9	6900	5146	5086	5086
q10	6868	2310	1913	1913
q11	488	276	265	265
q12	355	358	231	231
q13	17761	3696	3084	3084
q14	252	241	212	212
q15	529	494	482	482
q16	645	611	592	592
q17	596	857	359	359
q18	7551	7244	7160	7160
q19	1717	961	603	603
q20	328	343	206	206
q21	4068	3456	2472	2472
q22	1035	1021	990	990
Total cold run time: 115827 ms
Total hot run time: 34471 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5528	5220	5134	5134
q2	247	339	234	234
q3	2192	2693	2310	2310
q4	1450	1877	1471	1471
q5	4509	4440	4412	4412
q6	212	187	136	136
q7	1985	1934	1758	1758
q8	2623	2557	2591	2557
q9	7344	7103	7252	7103
q10	3009	3173	2743	2743
q11	564	543	504	504
q12	684	809	613	613
q13	3546	3971	3408	3408
q14	280	286	267	267
q15	517	478	476	476
q16	650	695	621	621
q17	1182	1570	1410	1410
q18	7951	7651	7430	7430
q19	812	852	1033	852
q20	1972	1971	1891	1891
q21	5341	5014	4859	4859
q22	1099	1070	1049	1049
Total cold run time: 53697 ms
Total hot run time: 51238 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193154 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 40e6a2950b4026fb4699cb10a24a36435475987c, data reload: false

query1	1423	1038	1047	1038
query2	6254	1966	1923	1923
query3	11043	4524	4522	4522
query4	53028	25232	23103	23103
query5	5317	526	488	488
query6	398	204	199	199
query7	5173	494	285	285
query8	349	262	243	243
query9	6656	2619	2639	2619
query10	425	331	254	254
query11	15469	15121	14974	14974
query12	166	110	105	105
query13	1166	516	387	387
query14	10110	6582	6447	6447
query15	206	172	183	172
query16	6898	656	461	461
query17	1064	725	579	579
query18	1550	403	307	307
query19	194	193	162	162
query20	130	130	119	119
query21	209	124	102	102
query22	4499	4374	4337	4337
query23	34303	33374	33240	33240
query24	6527	2454	2445	2445
query25	471	482	403	403
query26	655	273	144	144
query27	2220	495	335	335
query28	3128	2447	2486	2447
query29	569	568	437	437
query30	279	218	188	188
query31	859	849	807	807
query32	73	64	59	59
query33	444	385	314	314
query34	756	855	521	521
query35	837	864	785	785
query36	964	1009	960	960
query37	133	103	85	85
query38	4084	4296	4122	4122
query39	1513	1465	1436	1436
query40	202	116	103	103
query41	55	57	52	52
query42	128	113	113	113
query43	521	519	481	481
query44	1344	845	819	819
query45	196	175	173	173
query46	908	1031	667	667
query47	1885	1859	1801	1801
query48	404	435	311	311
query49	701	547	424	424
query50	717	773	441	441
query51	4281	4342	4268	4268
query52	114	110	112	110
query53	225	258	182	182
query54	528	511	414	414
query55	91	82	81	81
query56	272	266	270	266
query57	1152	1228	1115	1115
query58	262	257	263	257
query59	2892	2932	2748	2748
query60	288	289	270	270
query61	137	130	127	127
query62	760	737	688	688
query63	225	194	192	192
query64	1473	1053	689	689
query65	4419	4339	4354	4339
query66	753	393	304	304
query67	15973	15491	15340	15340
query68	5334	880	502	502
query69	521	307	271	271
query70	1220	1099	1116	1099
query71	435	300	256	256
query72	5800	4922	5016	4922
query73	984	620	351	351
query74	8923	8888	8961	8888
query75	3551	3271	2710	2710
query76	3815	1190	763	763
query77	541	381	281	281
query78	10135	10218	9406	9406
query79	2523	812	578	578
query80	668	505	440	440
query81	509	257	219	219
query82	486	134	94	94
query83	174	172	161	161
query84	287	86	75	75
query85	792	355	416	355
query86	430	315	291	291
query87	4394	4468	4301	4301
query88	3543	2229	2239	2229
query89	413	307	279	279
query90	1897	207	206	206
query91	147	156	111	111
query92	76	58	57	57
query93	2181	1049	566	566
query94	681	418	308	308
query95	350	283	266	266
query96	485	574	278	278
query97	3161	3209	3080	3080
query98	267	208	210	208
query99	1419	1376	1274	1274
Total cold run time: 296590 ms
Total hot run time: 193154 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 40e6a2950b4026fb4699cb10a24a36435475987c, data reload: false

query1	0.04	0.03	0.04
query2	0.12	0.10	0.10
query3	0.25	0.20	0.20
query4	1.60	0.19	0.19
query5	0.59	0.57	0.58
query6	1.20	0.71	0.72
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.58	0.51	0.52
query10	0.58	0.61	0.57
query11	0.16	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.83	2.82	2.78
query15	0.93	0.86	0.86
query16	0.38	0.37	0.38
query17	1.03	1.02	1.02
query18	0.21	0.20	0.20
query19	1.92	2.01	1.79
query20	0.02	0.01	0.01
query21	15.35	0.91	0.54
query22	0.77	1.31	0.88
query23	14.73	1.35	0.61
query24	6.76	2.11	1.20
query25	0.51	0.17	0.16
query26	0.58	0.16	0.14
query27	0.06	0.05	0.05
query28	10.37	0.84	0.44
query29	12.60	4.16	3.50
query30	0.24	0.09	0.06
query31	2.80	0.58	0.39
query32	3.23	0.55	0.47
query33	3.02	3.04	3.13
query34	15.66	5.07	4.49
query35	4.57	4.53	4.49
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.67 s
Total hot run time: 32.04 s

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 26, 2025
Copy link
Contributor

PR approved by anyone and no changes requested.

@englefly englefly merged commit c602ed5 into apache:master Mar 26, 2025
32 of 33 checks passed
morrySnow pushed a commit that referenced this pull request Apr 15, 2025
…9589)

### What problem does this PR solve?

Related PR: #32878 #49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR #49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
@morrySnow morrySnow added usercase Important user case type label dev/2.1.x dev/3.0.x labels Apr 15, 2025
github-actions bot pushed a commit that referenced this pull request Apr 15, 2025
…9589)

### What problem does this PR solve?

Related PR: #32878 #49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR #49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
github-actions bot pushed a commit that referenced this pull request Apr 15, 2025
… after removing constant group by keys (#49473)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;
```
after EliminateGroupByConstant, this sql will be rewritten to
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;
```
The select expression and the group by expression is different, and will
report error in normalizeagg.
This pr changes using the foldmap rewrite the group by expresssion, and
after change the sql after EliminateGroupByConstant become:
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        0,
        1
    ) ;
```
the select expression and the group by expression becomes same.
github-actions bot pushed a commit that referenced this pull request Apr 15, 2025
… after removing constant group by keys (#49473)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;
```
after EliminateGroupByConstant, this sql will be rewritten to
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;
```
The select expression and the group by expression is different, and will
report error in normalizeagg.
This pr changes using the foldmap rewrite the group by expresssion, and
after change the sql after EliminateGroupByConstant become:
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        0,
        1
    ) ;
```
the select expression and the group by expression becomes same.
yiguolei pushed a commit that referenced this pull request Apr 16, 2025
…g's group by after removing constant group by keys #49473 (#50044)

Cherry-picked from #49473

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
seawinde pushed a commit to seawinde/doris that referenced this pull request Apr 17, 2025
…ache#49589)

### What problem does this PR solve?

Related PR: apache#32878 apache#49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR apache#49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request Apr 21, 2025
…ache#49589)

Related PR: apache#32878 apache#49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR apache#49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
dataroaring pushed a commit that referenced this pull request Apr 22, 2025
…g's group by after removing constant group by keys #49473 (#50043)

Cherry-picked from #49473

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
feiniaofeiafei added a commit to feiniaofeiafei/doris that referenced this pull request May 8, 2025
…ache#49589)

Related PR: apache#32878 apache#49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR apache#49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
… after removing constant group by keys (apache#49473)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;
```
after EliminateGroupByConstant, this sql will be rewritten to
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;
```
The select expression and the group by expression is different, and will
report error in normalizeagg.
This pr changes using the foldmap rewrite the group by expresssion, and
after change the sql after EliminateGroupByConstant become:
```sql
 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        0,
        1
    ) ;
```
the select expression and the group by expression becomes same.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…ache#49589)

### What problem does this PR solve?

Related PR: apache#32878 apache#49473

Problem Summary:

 SELECT
    IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) AS x0,
    TIMESTAMPDIFF(
        YEAR,
        NOW(),
        NOW()
    ) AS x1
FROM
t1 AS t
GROUP BY
    x0,
    x1;

after EliminateGroupByConstant, this sql will be rewritten to

 SELECT
    IF(
        t.`gender` IN ('女'),
        0,
        1
    ) AS x0,
    0 AS x1
FROM
t1 AS t
GROUP BY
     IF(
        t.`gender` IN ('女'),
        (
            TIMESTAMPDIFF(
                YEAR,
                NOW(),
                NOW()
            )
        ),
        1
    ) ;

The select expression and the group by expression is different, and will
report error in normalizeagg.

The fix in PR apache#49473 may introduce another issue. Consider the following
query:

SELECT func2(100) FROM t GROUP BY func1(), func2(func1());

If func1() can be constant-folded to 100, then func2(func1()) will be
replaced with func2(100), allowing the query to execute successfully.
However, when func1() cannot be folded to 100, the query will fail. This
creates an inconsistent behavior where query execution depends on
whether func1() can be constant-folded or not, which is not an ideal
implementation.

To address this issue, this PR modifies the normalizeAgg logic to
eliminate constant group by keys. With this change, the query will
consistently fail regardless of whether func1() can be folded or not,
ensuring more predictable behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants