Skip to content

[fix](nereids) simplify self compare exclude non-foldable expression#47054

Merged
englefly merged 3 commits intoapache:masterfrom
yujun777:fix-self-compare-not-determinal
Jan 16, 2025
Merged

[fix](nereids) simplify self compare exclude non-foldable expression#47054
englefly merged 3 commits intoapache:masterfrom
yujun777:fix-self-compare-not-determinal

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Jan 16, 2025

What problem does this PR solve?

#46905 add an rewrite expression rule to simplify self comparison, for example: a = a will evaluate to TRUE.

But if a is non-foldable, it shouldn't simplify. for example: function random is non-foldable, then random(1, 10) = random(1, 10) cannot evaluate to TRUE.

What's more, if an expression is non-deterministic, but if it's foldable, then self comparison also can fold it too. for example: function user is foldable and non-deterministic, then user() = user() can still simplify to TRUE.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 16, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

morrySnow
morrySnow previously approved these changes Jan 16, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 16, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jan 16, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 16, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 32335 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9a2eb1216f0b820dee33ad9e4f20b48a75f3a2f5, data reload: false

------ Round 1 ----------------------------------
q1	17563	5454	5928	5454
q2	2052	316	173	173
q3	10626	1349	714	714
q4	10192	956	510	510
q5	7535	2410	2135	2135
q6	190	164	132	132
q7	896	744	618	618
q8	9438	1409	1209	1209
q9	5375	5021	4988	4988
q10	6901	2338	1879	1879
q11	474	269	263	263
q12	339	350	209	209
q13	17780	3665	3057	3057
q14	239	229	200	200
q15	520	489	475	475
q16	643	626	599	599
q17	572	839	334	334
q18	6938	6592	6409	6409
q19	2385	939	526	526
q20	301	316	190	190
q21	2804	2168	1964	1964
q22	359	321	297	297
Total cold run time: 104122 ms
Total hot run time: 32335 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5513	5465	5428	5428
q2	242	330	238	238
q3	2246	2647	2296	2296
q4	1421	1799	1372	1372
q5	4303	4766	4672	4672
q6	177	166	126	126
q7	2038	1941	1786	1786
q8	2603	2792	2688	2688
q9	7358	7268	7325	7268
q10	2992	3312	2751	2751
q11	577	497	478	478
q12	640	722	604	604
q13	3532	3846	3290	3290
q14	282	301	291	291
q15	527	470	470	470
q16	661	713	650	650
q17	1199	1759	1251	1251
q18	7707	7493	7430	7430
q19	802	1047	1090	1047
q20	2036	2042	1966	1966
q21	5665	5084	4955	4955
q22	643	578	562	562
Total cold run time: 53164 ms
Total hot run time: 51619 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194570 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9a2eb1216f0b820dee33ad9e4f20b48a75f3a2f5, data reload: false

query1	1324	959	942	942
query2	6318	2034	2062	2034
query3	11091	4514	4711	4514
query4	61108	29016	23034	23034
query5	5570	606	438	438
query6	427	201	193	193
query7	5579	495	301	301
query8	317	228	225	225
query9	8334	2652	2643	2643
query10	465	304	251	251
query11	17522	15445	15525	15445
query12	156	107	102	102
query13	1491	568	426	426
query14	11469	7408	7219	7219
query15	214	199	188	188
query16	7326	681	489	489
query17	1165	752	588	588
query18	1901	436	315	315
query19	208	187	165	165
query20	121	129	120	120
query21	220	127	106	106
query22	4480	5020	4473	4473
query23	33938	33927	33237	33237
query24	5551	2353	2339	2339
query25	464	437	387	387
query26	650	237	154	154
query27	1965	497	334	334
query28	3940	2453	2444	2444
query29	551	535	419	419
query30	211	190	155	155
query31	906	877	816	816
query32	71	59	59	59
query33	475	367	334	334
query34	717	867	505	505
query35	833	829	778	778
query36	1028	1062	956	956
query37	115	130	77	77
query38	4366	4367	4270	4270
query39	1505	1415	1452	1415
query40	208	117	110	110
query41	52	48	48	48
query42	126	111	98	98
query43	514	547	495	495
query44	1307	851	832	832
query45	194	175	172	172
query46	863	1040	645	645
query47	1923	1955	1855	1855
query48	384	414	332	332
query49	740	502	411	411
query50	636	683	404	404
query51	7024	7132	7044	7044
query52	105	104	93	93
query53	226	263	187	187
query54	492	521	413	413
query55	90	83	79	79
query56	263	246	243	243
query57	1180	1150	1107	1107
query58	230	224	250	224
query59	2918	3095	2951	2951
query60	275	272	255	255
query61	122	112	109	109
query62	759	734	647	647
query63	224	182	188	182
query64	1286	1064	667	667
query65	3237	3192	3156	3156
query66	678	397	332	332
query67	16106	15662	15515	15515
query68	5012	826	533	533
query69	494	303	272	272
query70	1214	1113	1150	1113
query71	439	279	251	251
query72	6078	3668	3776	3668
query73	800	749	360	360
query74	10105	8908	8819	8819
query75	3219	3141	2633	2633
query76	3753	1154	734	734
query77	466	371	269	269
query78	10331	9971	9414	9414
query79	3190	802	581	581
query80	1062	536	437	437
query81	536	270	240	240
query82	351	150	123	123
query83	221	166	149	149
query84	295	97	89	89
query85	753	354	297	297
query86	408	286	308	286
query87	4486	4651	4399	4399
query88	4380	2185	2162	2162
query89	415	321	298	298
query90	1596	182	187	182
query91	134	132	109	109
query92	64	58	52	52
query93	2778	834	530	530
query94	779	410	293	293
query95	331	273	252	252
query96	482	615	280	280
query97	2814	2874	2785	2785
query98	218	203	197	197
query99	1282	1354	1267	1267
Total cold run time: 314740 ms
Total hot run time: 194570 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9a2eb1216f0b820dee33ad9e4f20b48a75f3a2f5, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.60	0.10	0.10
query5	0.42	0.42	0.41
query6	1.14	0.67	0.65
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.59	0.51	0.51
query10	0.55	0.56	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.10
query13	0.61	0.60	0.59
query14	2.73	2.75	2.73
query15	0.90	0.84	0.81
query16	0.39	0.37	0.38
query17	1.04	1.06	1.07
query18	0.23	0.21	0.20
query19	1.95	1.87	1.99
query20	0.02	0.01	0.01
query21	15.36	0.92	0.58
query22	0.74	0.92	0.71
query23	15.18	1.47	0.64
query24	2.67	1.63	1.36
query25	0.18	0.17	0.12
query26	0.25	0.15	0.14
query27	0.06	0.04	0.05
query28	14.17	0.97	0.43
query29	12.61	3.96	3.28
query30	0.25	0.09	0.08
query31	2.81	0.59	0.39
query32	3.22	0.55	0.47
query33	2.97	3.01	3.05
query34	16.74	5.16	4.55
query35	4.58	4.58	4.56
query36	0.62	0.50	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.03
query40	0.17	0.13	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.74 s
Total hot run time: 31.4 s

@englefly
Copy link
Contributor

run p0

1 similar comment
@yujun777
Copy link
Contributor Author

run p0

@englefly englefly merged commit cc61c5f into apache:master Jan 16, 2025
16 of 17 checks passed
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…pache#47054)

### What problem does this PR solve?

apache#46905 add an rewrite expression rule to simplify self comparison, for
example: `a = a` will evaluate to `TRUE`.

But if `a` is non-foldable, it shouldn't simplify. for example: function
`random` is non-foldable, then `random(1, 10) = random(1, 10)` cannot
evaluate to `TRUE`.

What's more, if an expression is non-deterministic, but if it's
foldable, then self comparison also can fold it too. for example:
function `user` is foldable and non-deterministic, then `user() =
user()` can still simplify to `TRUE`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants