Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](nereids) Use correct PREAGGREGATION in agg(filter(scan)) #33454

Merged
merged 5 commits into from
Apr 15, 2024

Conversation

liutang123
Copy link
Contributor

@liutang123 liutang123 commented Apr 10, 2024

Proposed changes

Issue Number: close #33351

  1. set PreAggStatus to ON when agg key column by max or min;
  2. [Feature](materialized-view) support match logicalAggregate(logicalProject(logicalFilter(logicalOl… #28747 may change PreAggStatus of scan, inherit it from the previous one.

Further comments

Further regression test case will be conducted later.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@liutang123 liutang123 changed the title [fix] (nereids) Use correct PREAGGREGATION in agg(filter(scan)) [fix](nereids) Use correct PREAGGREGATION in agg(filter(scan)) Apr 10, 2024
@liutang123
Copy link
Contributor Author

@BiteTheDDDDt hello, do you have time to see this pr?

@liutang123
Copy link
Contributor Author

run buildall

String childNameWithFuncName = ctx.isBaseIndex()
? normalizeName(aggFunc.child(0).toSql())
: normalizeName(CreateMaterializedViewStmt.mvColumnBuilder(
matchingAggType, normalizeName(aggFunc.child(0).toSql())));

boolean contains = containsAllColumn(aggFunc.child(0), ctx.keyNameToColumn.keySet());
if (contains || ctx.keyNameToColumn.containsKey(childNameWithFuncName)) {
if (ctx.isDupKeysOrMergeOnWrite || (!ctx.isBaseIndex() && contains)) {
if (canUseKeyColumn || ctx.isDupKeysOrMergeOnWrite || (!ctx.isBaseIndex() && contains)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use ctx.isBaseIndex() to replace canUseKeyColumn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to work.

@liutang123
Copy link
Contributor Author

run feut

morrySnow
morrySnow previously approved these changes Apr 11, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 12, 2024
@liutang123
Copy link
Contributor Author

run buildall

@liutang123
Copy link
Contributor Author

run buildall

BiteTheDDDDt
BiteTheDDDDt previously approved these changes Apr 12, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 12, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@liutang123
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38325 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 253d58af7913b0540c57c83d6bd845fcf8aea32b, data reload: false

------ Round 1 ----------------------------------
q1	17617	4261	4237	4237
q2	2008	199	188	188
q3	10419	1201	1109	1109
q4	10177	812	782	782
q5	7536	2858	2658	2658
q6	214	130	135	130
q7	998	593	574	574
q8	9211	2032	2018	2018
q9	7860	6530	6540	6530
q10	8479	3494	3539	3494
q11	460	231	226	226
q12	444	225	211	211
q13	18947	2899	2936	2899
q14	279	228	232	228
q15	504	485	483	483
q16	504	399	376	376
q17	956	721	659	659
q18	7306	6778	6786	6778
q19	5039	1520	1464	1464
q20	696	317	299	299
q21	3493	2682	2774	2682
q22	360	302	300	300
Total cold run time: 113507 ms
Total hot run time: 38325 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4317	4208	4201	4201
q2	360	269	258	258
q3	3000	2702	2698	2698
q4	1838	1604	1584	1584
q5	5344	5325	5296	5296
q6	210	123	124	123
q7	2267	1887	1892	1887
q8	3175	3348	3344	3344
q9	8503	8543	8550	8543
q10	4070	3835	3971	3835
q11	616	500	477	477
q12	811	623	647	623
q13	16242	3239	3182	3182
q14	323	292	283	283
q15	504	474	491	474
q16	503	453	463	453
q17	1827	1558	1510	1510
q18	8044	8084	7857	7857
q19	1655	1588	1593	1588
q20	2026	1876	1819	1819
q21	5207	5012	4969	4969
q22	541	472	484	472
Total cold run time: 71383 ms
Total hot run time: 55476 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183712 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 253d58af7913b0540c57c83d6bd845fcf8aea32b, data reload: false

query1	890	1122	1115	1115
query2	6215	2663	2416	2416
query3	6634	202	197	197
query4	36968	21626	21277	21277
query5	4143	386	393	386
query6	243	214	182	182
query7	4053	296	290	290
query8	224	170	174	170
query9	5806	2331	2319	2319
query10	371	238	246	238
query11	14600	14218	14317	14218
query12	138	101	87	87
query13	984	380	364	364
query14	9845	6843	6791	6791
query15	200	177	184	177
query16	6788	254	251	251
query17	1670	557	543	543
query18	1376	279	266	266
query19	193	159	151	151
query20	95	88	84	84
query21	200	127	122	122
query22	4957	4823	4810	4810
query23	33663	32738	33292	32738
query24	9154	3031	2969	2969
query25	524	394	388	388
query26	894	160	151	151
query27	2951	357	359	357
query28	6429	2140	2106	2106
query29	879	639	623	623
query30	292	176	169	169
query31	968	791	761	761
query32	58	50	55	50
query33	520	269	240	240
query34	922	483	501	483
query35	798	734	711	711
query36	1058	934	928	928
query37	106	68	67	67
query38	3720	3509	3573	3509
query39	1635	1566	1606	1566
query40	170	130	122	122
query41	47	44	43	43
query42	105	96	96	96
query43	576	550	562	550
query44	1168	734	735	734
query45	293	268	252	252
query46	1090	727	768	727
query47	2041	1953	1951	1951
query48	376	319	297	297
query49	807	375	370	370
query50	781	387	388	387
query51	6846	6890	6605	6605
query52	97	93	86	86
query53	343	283	283	283
query54	267	235	226	226
query55	79	72	71	71
query56	244	227	225	225
query57	1186	1161	1132	1132
query58	229	210	206	206
query59	3337	3070	3078	3070
query60	269	247	246	246
query61	109	108	105	105
query62	589	456	439	439
query63	306	289	286	286
query64	4211	4171	4092	4092
query65	3037	3004	3030	3004
query66	771	325	339	325
query67	15641	15129	14986	14986
query68	7144	537	546	537
query69	556	311	319	311
query70	1218	1124	1197	1124
query71	440	270	272	270
query72	6642	2748	2557	2557
query73	815	315	314	314
query74	7132	6469	6364	6364
query75	3111	2396	2337	2337
query76	4182	1095	1131	1095
query77	596	243	249	243
query78	10924	10250	10267	10250
query79	8112	522	524	522
query80	2310	423	418	418
query81	525	224	228	224
query82	1570	95	102	95
query83	325	161	163	161
query84	258	80	85	80
query85	1070	263	263	263
query86	474	280	295	280
query87	3739	3441	3519	3441
query88	6060	2253	2280	2253
query89	541	369	367	367
query90	1845	175	170	170
query91	119	92	91	91
query92	58	48	47	47
query93	6515	516	496	496
query94	1145	175	172	172
query95	372	284	288	284
query96	607	262	256	256
query97	2709	2475	2459	2459
query98	263	222	211	211
query99	1225	842	834	834
Total cold run time: 298781 ms
Total hot run time: 183712 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 253d58af7913b0540c57c83d6bd845fcf8aea32b, data reload: false

query1	0.04	0.04	0.02
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.69	0.06	0.06
query5	0.47	0.49	0.50
query6	1.47	0.66	0.65
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.56	0.49	0.47
query10	0.55	0.56	0.56
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.58	0.57
query14	0.76	0.78	0.77
query15	0.85	0.81	0.81
query16	0.36	0.34	0.36
query17	1.01	0.94	0.93
query18	0.24	0.24	0.22
query19	1.74	1.70	1.73
query20	0.01	0.02	0.01
query21	15.44	0.64	0.63
query22	4.68	6.81	1.97
query23	18.34	1.30	1.31
query24	1.66	0.29	0.21
query25	0.14	0.08	0.08
query26	0.26	0.16	0.15
query27	0.08	0.08	0.08
query28	13.45	1.01	0.99
query29	13.86	3.32	3.30
query30	0.27	0.07	0.05
query31	2.91	0.37	0.38
query32	3.30	0.47	0.45
query33	2.79	2.85	2.80
query34	17.11	4.42	4.49
query35	4.45	4.48	4.44
query36	0.64	0.46	0.47
query37	0.18	0.15	0.15
query38	0.15	0.14	0.15
query39	0.05	0.04	0.03
query40	0.17	0.15	0.16
query41	0.09	0.06	0.05
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 111.13 s
Total hot run time: 30.28 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 253d58af7913b0540c57c83d6bd845fcf8aea32b with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       12.8 seconds inserted 10000000 Rows, about 781K ops/s

@liutang123
Copy link
Contributor Author

@morrySnow Hi, Wenxin. This PR is ready for merge. Do you have time to have a look at it?

}

qt_right_when_preagg_on "select k1, min(k2), max(k3) from test_scan_preaggregation where k1 = 1 group by k1;"
qt_right_when_preagg_off "select k1, sum(v1) from test_scan_preaggregation group by k1;"
Copy link
Contributor

@starocean999 starocean999 Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qt_right_when_preagg_off is misleading, the preagg should be on ? and please use explain to verify the preagg status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was originally going to test aggregation on key columns. I modified the test sql.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 15, 2024
@liutang123
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38509 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 54209764bb3d81abd63b09d45bd0c6cd46d20d96, data reload: false

------ Round 1 ----------------------------------
q1	17772	4527	4421	4421
q2	2508	206	191	191
q3	11131	1220	1181	1181
q4	10509	744	752	744
q5	8277	2670	2630	2630
q6	217	131	129	129
q7	1025	593	579	579
q8	9214	2043	2058	2043
q9	7919	6627	6515	6515
q10	8432	3482	3481	3481
q11	455	238	230	230
q12	381	217	215	215
q13	19086	2910	2960	2910
q14	266	239	233	233
q15	508	491	460	460
q16	506	374	375	374
q17	962	682	713	682
q18	7349	6716	6635	6635
q19	7572	1543	1497	1497
q20	694	312	306	306
q21	3523	2920	2759	2759
q22	359	307	294	294
Total cold run time: 118665 ms
Total hot run time: 38509 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4375	4191	4198	4191
q2	370	270	264	264
q3	2952	2783	2790	2783
q4	1858	1588	1579	1579
q5	5340	5336	5306	5306
q6	207	121	122	121
q7	2230	1868	1864	1864
q8	3210	3331	3313	3313
q9	8648	8556	8678	8556
q10	3865	3719	3670	3670
q11	578	471	480	471
q12	721	566	581	566
q13	16451	2930	2920	2920
q14	297	269	259	259
q15	506	466	474	466
q16	469	434	428	428
q17	1750	1483	1443	1443
q18	7475	7501	7352	7352
q19	1619	1564	1543	1543
q20	1961	1757	1739	1739
q21	4890	4682	4725	4682
q22	544	456	476	456
Total cold run time: 70316 ms
Total hot run time: 53972 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183093 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 54209764bb3d81abd63b09d45bd0c6cd46d20d96, data reload: false

query1	881	1112	1119	1112
query2	6522	2570	2420	2420
query3	6652	209	204	204
query4	37520	21661	21530	21530
query5	4132	402	388	388
query6	228	178	179	178
query7	4048	290	279	279
query8	217	174	176	174
query9	5809	2281	2277	2277
query10	546	250	240	240
query11	14690	14270	14293	14270
query12	143	91	90	90
query13	984	365	377	365
query14	9073	6886	6821	6821
query15	205	173	176	173
query16	7045	255	265	255
query17	1482	582	549	549
query18	1460	278	270	270
query19	186	154	147	147
query20	91	86	86	86
query21	203	129	126	126
query22	4992	4834	4825	4825
query23	33709	32892	33071	32892
query24	12814	2963	2933	2933
query25	546	369	365	365
query26	1878	146	146	146
query27	3085	304	308	304
query28	7707	2017	2013	2013
query29	853	604	595	595
query30	301	164	163	163
query31	896	734	702	702
query32	59	53	52	52
query33	591	247	242	242
query34	893	466	470	466
query35	852	688	690	688
query36	1015	920	945	920
query37	256	72	68	68
query38	3523	3406	3400	3400
query39	1572	1531	1521	1521
query40	280	128	123	123
query41	48	48	45	45
query42	107	99	99	99
query43	578	541	534	534
query44	1355	698	692	692
query45	276	274	255	255
query46	1061	718	714	714
query47	1912	1827	1872	1827
query48	355	291	289	289
query49	1145	369	352	352
query50	747	373	369	369
query51	6602	6644	6630	6630
query52	116	90	91	90
query53	352	285	272	272
query54	250	227	216	216
query55	107	70	71	70
query56	240	223	212	212
query57	1220	1114	1119	1114
query58	221	196	200	196
query59	3382	3067	3197	3067
query60	238	228	234	228
query61	104	89	90	89
query62	640	438	458	438
query63	307	272	272	272
query64	4711	4048	3970	3970
query65	3114	3019	3019	3019
query66	1321	348	312	312
query67	15388	14996	14872	14872
query68	4547	539	537	537
query69	480	308	320	308
query70	1239	1175	1176	1175
query71	402	281	265	265
query72	6531	2610	2431	2431
query73	732	313	318	313
query74	6904	6314	6394	6314
query75	3093	2324	2357	2324
query76	2924	1091	1120	1091
query77	624	245	238	238
query78	10851	10171	10191	10171
query79	2858	517	517	517
query80	943	411	408	408
query81	551	243	232	232
query82	1075	91	90	90
query83	283	167	169	167
query84	233	83	83	83
query85	1673	307	313	307
query86	472	304	302	302
query87	3763	3540	3612	3540
query88	5218	2358	2266	2266
query89	450	369	370	369
query90	1953	171	174	171
query91	118	96	96	96
query92	60	45	46	45
query93	3161	496	503	496
query94	1275	176	177	176
query95	400	300	284	284
query96	602	264	256	256
query97	2654	2499	2452	2452
query98	238	222	213	213
query99	1146	841	848	841
Total cold run time: 291289 ms
Total hot run time: 183093 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.41 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 54209764bb3d81abd63b09d45bd0c6cd46d20d96, data reload: false

query1	0.04	0.03	0.03
query2	0.09	0.03	0.04
query3	0.23	0.05	0.06
query4	1.68	0.07	0.07
query5	0.50	0.48	0.50
query6	1.47	0.65	0.66
query7	0.02	0.01	0.01
query8	0.04	0.04	0.04
query9	0.55	0.49	0.50
query10	0.54	0.55	0.54
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.58	0.58	0.57
query14	0.74	0.77	0.78
query15	0.81	0.81	0.81
query16	0.36	0.35	0.36
query17	0.98	1.00	1.01
query18	0.21	0.21	0.25
query19	1.83	1.66	1.76
query20	0.01	0.01	0.02
query21	15.46	0.65	0.66
query22	4.04	7.37	2.26
query23	18.30	1.38	1.25
query24	1.80	0.24	0.20
query25	0.14	0.08	0.07
query26	0.26	0.17	0.16
query27	0.08	0.08	0.07
query28	13.39	0.99	0.98
query29	12.59	3.31	3.29
query30	0.26	0.06	0.06
query31	2.87	0.38	0.36
query32	3.28	0.46	0.46
query33	2.77	2.75	2.82
query34	17.04	4.38	4.41
query35	4.50	4.46	4.44
query36	0.66	0.46	0.46
query37	0.18	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.03
query40	0.17	0.16	0.13
query41	0.09	0.05	0.05
query42	0.04	0.04	0.04
query43	0.04	0.04	0.03
Total cold run time: 109.12 s
Total hot run time: 30.41 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 54209764bb3d81abd63b09d45bd0c6cd46d20d96 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.4 seconds inserted 10000000 Rows, about 746K ops/s

@liutang123
Copy link
Contributor Author

@starocean999 Hi, I fix the regression test case, can you take another look at this PR?

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 15, 2024
@morningman morningman merged commit 4462704 into apache:master Apr 15, 2024
29 of 32 checks passed
yiguolei pushed a commit that referenced this pull request Apr 16, 2024
1. set `PreAggStatus` to `ON` when agg key column by max or min;
2. #28747 may change `PreAggStatus` of scan, inherit it from the previous one.
yiguolei pushed a commit that referenced this pull request Apr 17, 2024
1. set `PreAggStatus` to `ON` when agg key column by max or min;
2. #28747 may change `PreAggStatus` of scan, inherit it from the previous one.
dataroaring pushed a commit that referenced this pull request Apr 24, 2024
1. set `PreAggStatus` to `ON` when agg key column by max or min;
2. #28747 may change `PreAggStatus` of scan, inherit it from the previous one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] (nereids) Close PREAGGREGATION in simple aggregation
7 participants