Skip to content

[fix](agg)Adjust agg strategy when table satisfy distinct key distribution#61248

Open
feiniaofeiafei wants to merge 5 commits intoapache:masterfrom
feiniaofeiafei:adjust_agg_strategy
Open

[fix](agg)Adjust agg strategy when table satisfy distinct key distribution#61248
feiniaofeiafei wants to merge 5 commits intoapache:masterfrom
feiniaofeiafei:adjust_agg_strategy

Conversation

@feiniaofeiafei
Copy link
Contributor

@feiniaofeiafei feiniaofeiafei commented Mar 12, 2026

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Update distinct agg strategy to prefer multi_distinct only when gbyNdv / instanceNum <= 30 with stats available.
Avoid multi_distinct when stats are missing and distinct keys satisfy table hash distribution, even through simple project/filter chains.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 36.00% (27/75) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27725 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0f0730b16d7a1e0e97edfb7d158288a8fb9fb4a4, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17682	4539	4263	4263
q2	q3	10640	789	510	510
q4	4689	355	256	256
q5	7553	1212	1014	1014
q6	194	177	147	147
q7	785	836	661	661
q8	9893	1509	1337	1337
q9	5798	4775	4739	4739
q10	6307	1924	1676	1676
q11	464	260	233	233
q12	691	575	471	471
q13	18046	2938	2179	2179
q14	228	232	211	211
q15	932	802	826	802
q16	749	715	696	696
q17	725	871	404	404
q18	5880	5473	5376	5376
q19	1115	998	610	610
q20	502	506	391	391
q21	4549	1984	1449	1449
q22	354	300	301	300
Total cold run time: 97776 ms
Total hot run time: 27725 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4629	4518	4585	4518
q2	q3	3838	4362	3805	3805
q4	883	1209	793	793
q5	4120	4377	4434	4377
q6	195	177	141	141
q7	1762	1651	1593	1593
q8	2538	2800	2556	2556
q9	7571	7483	7507	7483
q10	3754	3972	3587	3587
q11	513	432	408	408
q12	564	610	440	440
q13	2940	3110	2307	2307
q14	300	299	293	293
q15	865	798	793	793
q16	706	765	723	723
q17	1192	1437	1371	1371
q18	7305	6773	6508	6508
q19	875	831	860	831
q20	2145	2216	2000	2000
q21	4203	3520	3398	3398
q22	479	421	397	397
Total cold run time: 51377 ms
Total hot run time: 48322 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153093 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0f0730b16d7a1e0e97edfb7d158288a8fb9fb4a4, data reload: false

query5	4336	624	507	507
query6	326	223	226	223
query7	4212	458	272	272
query8	346	245	231	231
query9	8709	2724	2688	2688
query10	485	407	339	339
query11	7454	5883	5600	5600
query12	186	133	128	128
query13	1261	434	354	354
query14	5510	3873	3559	3559
query14_1	2824	2794	2824	2794
query15	204	197	176	176
query16	1010	517	466	466
query17	1099	698	589	589
query18	2455	442	339	339
query19	210	216	179	179
query20	132	131	130	130
query21	227	149	120	120
query22	4707	5019	4913	4913
query23	15923	15592	15361	15361
query23_1	15422	16217	15777	15777
query24	8056	1707	1275	1275
query24_1	1313	1362	1260	1260
query25	631	489	422	422
query26	1313	286	175	175
query27	2949	558	321	321
query28	6096	1923	1946	1923
query29	902	592	519	519
query30	373	259	221	221
query31	1453	1358	1280	1280
query32	89	80	83	80
query33	525	348	289	289
query34	980	965	566	566
query35	630	679	587	587
query36	1074	1114	964	964
query37	134	97	88	88
query38	2916	2947	2888	2888
query39	1040	860	870	860
query39_1	856	872	831	831
query40	230	154	136	136
query41	62	58	58	58
query42	296	302	303	302
query43	240	246	216	216
query44	
query45	200	200	189	189
query46	899	991	617	617
query47	2125	2146	2058	2058
query48	316	317	234	234
query49	628	472	387	387
query50	698	284	217	217
query51	4153	4102	4069	4069
query52	287	292	280	280
query53	294	344	286	286
query54	317	275	264	264
query55	95	91	86	86
query56	331	323	328	323
query57	1370	1359	1279	1279
query58	290	285	277	277
query59	1330	1474	1305	1305
query60	344	338	341	338
query61	153	147	168	147
query62	618	582	546	546
query63	304	292	280	280
query64	5258	1366	1119	1119
query65	
query66	1467	481	371	371
query67	16368	16406	16257	16257
query68	
query69	393	308	281	281
query70	955	1005	926	926
query71	339	305	313	305
query72	2954	2852	2457	2457
query73	532	548	329	329
query74	10004	9967	9796	9796
query75	2864	2755	2456	2456
query76	2274	1043	658	658
query77	363	402	323	323
query78	11177	11299	10638	10638
query79	3035	812	615	615
query80	1749	621	552	552
query81	570	286	241	241
query82	1014	157	119	119
query83	335	276	254	254
query84	253	131	100	100
query85	891	481	476	476
query86	433	337	308	308
query87	3133	3100	2969	2969
query88	3543	2664	2650	2650
query89	422	366	350	350
query90	2018	183	177	177
query91	163	161	144	144
query92	81	76	68	68
query93	1195	825	495	495
query94	637	328	306	306
query95	580	356	325	325
query96	658	515	229	229
query97	2495	2508	2459	2459
query98	236	231	216	216
query99	998	1003	919	919
Total cold run time: 238101 ms
Total hot run time: 153093 ms

@feiniaofeiafei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27793 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 73f986232cb3a27aade3ad7d50fc2062ee69b54f, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17605	4533	4303	4303
q2	q3	10637	827	516	516
q4	4677	367	259	259
q5	7570	1204	1015	1015
q6	175	176	152	152
q7	811	874	672	672
q8	9291	1495	1372	1372
q9	4819	4717	4603	4603
q10	6246	1894	1659	1659
q11	458	272	240	240
q12	737	574	465	465
q13	18033	2937	2227	2227
q14	226	227	212	212
q15	901	792	800	792
q16	741	715	700	700
q17	708	854	419	419
q18	5949	5331	5287	5287
q19	1115	1007	649	649
q20	498	503	387	387
q21	4402	2032	1578	1578
q22	433	331	286	286
Total cold run time: 96032 ms
Total hot run time: 27793 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4673	4608	4611	4608
q2	q3	3849	4330	3835	3835
q4	928	1202	794	794
q5	4084	4403	4315	4315
q6	189	174	143	143
q7	1802	1629	1599	1599
q8	2485	2866	2588	2588
q9	7526	7415	7454	7415
q10	3822	3960	3577	3577
q11	507	439	418	418
q12	507	563	454	454
q13	2713	3156	2331	2331
q14	307	315	285	285
q15	886	842	839	839
q16	787	761	718	718
q17	1139	1423	1330	1330
q18	7184	6880	6760	6760
q19	984	972	925	925
q20	2176	2148	1971	1971
q21	3982	3457	3472	3457
q22	474	431	394	394
Total cold run time: 51004 ms
Total hot run time: 48756 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 152759 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 73f986232cb3a27aade3ad7d50fc2062ee69b54f, data reload: false

query5	4326	644	511	511
query6	334	246	220	220
query7	4227	492	272	272
query8	352	256	236	236
query9	8727	2798	2746	2746
query10	525	382	340	340
query11	7296	5909	5661	5661
query12	197	132	131	131
query13	1288	460	349	349
query14	5766	3878	3617	3617
query14_1	2812	2785	2860	2785
query15	220	196	178	178
query16	985	478	453	453
query17	1110	730	626	626
query18	2450	456	350	350
query19	231	217	182	182
query20	136	130	133	130
query21	232	149	127	127
query22	4838	4990	4589	4589
query23	16586	15975	15603	15603
query23_1	15941	15814	15784	15784
query24	7768	1710	1253	1253
query24_1	1216	1238	1232	1232
query25	581	466	408	408
query26	1236	275	150	150
query27	2768	477	293	293
query28	4522	1873	1865	1865
query29	852	573	479	479
query30	314	249	209	209
query31	1395	1309	1217	1217
query32	84	72	73	72
query33	517	378	274	274
query34	980	975	565	565
query35	667	696	606	606
query36	1092	1118	965	965
query37	137	99	84	84
query38	2943	2952	2879	2879
query39	895	863	830	830
query39_1	832	828	824	824
query40	233	162	135	135
query41	62	59	58	58
query42	297	304	304	304
query43	246	252	217	217
query44	
query45	201	188	186	186
query46	944	1061	606	606
query47	2098	2125	2029	2029
query48	321	322	230	230
query49	630	485	386	386
query50	737	290	217	217
query51	4089	4072	4022	4022
query52	292	294	276	276
query53	322	352	294	294
query54	301	269	274	269
query55	98	85	84	84
query56	321	326	297	297
query57	1364	1343	1268	1268
query58	312	278	277	277
query59	1340	1459	1282	1282
query60	371	345	313	313
query61	149	142	139	139
query62	614	584	552	552
query63	329	280	277	277
query64	5077	1289	1000	1000
query65	
query66	1474	469	353	353
query67	16545	16333	16361	16333
query68	
query69	385	309	287	287
query70	979	964	972	964
query71	343	309	300	300
query72	2790	2694	2469	2469
query73	550	561	321	321
query74	9955	9950	9800	9800
query75	3014	2782	2474	2474
query76	2273	1150	719	719
query77	386	429	324	324
query78	11128	11285	10630	10630
query79	1160	814	606	606
query80	1319	664	561	561
query81	532	289	259	259
query82	1328	159	127	127
query83	341	282	250	250
query84	259	130	106	106
query85	1002	539	434	434
query86	414	316	299	299
query87	3198	3139	2997	2997
query88	3618	2688	2678	2678
query89	452	376	343	343
query90	1987	190	193	190
query91	168	158	135	135
query92	78	77	74	74
query93	971	859	502	502
query94	606	327	290	290
query95	605	358	387	358
query96	661	543	236	236
query97	2462	2499	2417	2417
query98	239	221	219	219
query99	997	991	919	919
Total cold run time: 235149 ms
Total hot run time: 152759 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 72.00% (54/75) 🎉
Increment coverage report
Complete coverage report

@feiniaofeiafei
Copy link
Contributor Author

run external

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants