Skip to content

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Jun 23, 2025

What problem does this PR solve?

Guava Sets.intersection will return a SetView, then for chain call Sets.intersection like below code:

List<Set<>>    B;
Set<>  A = B.get(0);
for (int i = 1; i < B.size(); i++) {
    A = Sets.intersection(A, B.get(i));
}

Then A is a recursive SetViews, A = SetView(SetView(SetView(B0 interset B1) interset B2) ... interset Bn).

Then when access A, it will call recursive deeply the multiple SetView's relate functions.

For example, when call A.isEmpty(), its call stack will be like

 java.lang.StackOverflowError: null
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      ...

if the stack is deep, it then throw StackOverflowError.

So for the long chain set intersection, need to eliminate the SetView, and convert it to a Set.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jun 23, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33874 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 465f76625dd02eac5e73f2d0f59a7836a46a2d0e, data reload: false

------ Round 1 ----------------------------------
q1	17576	5148	5043	5043
q2	1915	287	191	191
q3	10347	1274	743	743
q4	10269	997	543	543
q5	7779	2318	2302	2302
q6	180	163	131	131
q7	906	739	598	598
q8	9311	1241	1094	1094
q9	6732	5062	5044	5044
q10	6931	2368	1967	1967
q11	494	285	278	278
q12	346	345	219	219
q13	17772	3586	3081	3081
q14	222	229	216	216
q15	574	478	474	474
q16	428	419	367	367
q17	586	857	362	362
q18	7689	7214	7132	7132
q19	1856	949	551	551
q20	330	336	228	228
q21	3662	3104	2336	2336
q22	1047	1017	974	974
Total cold run time: 106952 ms
Total hot run time: 33874 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5204	5102	5074	5074
q2	252	312	221	221
q3	2152	2569	2246	2246
q4	1321	1762	1359	1359
q5	4191	4061	4324	4061
q6	243	166	127	127
q7	1982	1959	1804	1804
q8	2568	2537	2445	2445
q9	7114	7037	7126	7037
q10	3048	3232	2822	2822
q11	561	504	488	488
q12	675	743	617	617
q13	3441	3912	3244	3244
q14	283	306	266	266
q15	520	482	470	470
q16	439	492	424	424
q17	1139	1495	1390	1390
q18	7654	7106	7051	7051
q19	754	748	915	748
q20	1901	1963	1795	1795
q21	4694	4381	4246	4246
q22	1064	1039	964	964
Total cold run time: 51200 ms
Total hot run time: 48899 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185625 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 465f76625dd02eac5e73f2d0f59a7836a46a2d0e, data reload: false

query1	988	392	413	392
query2	6559	1876	1893	1876
query3	6743	222	218	218
query4	26009	23726	22884	22884
query5	4377	646	483	483
query6	312	210	224	210
query7	4621	499	282	282
query8	263	222	212	212
query9	8630	2645	2678	2645
query10	473	322	269	269
query11	15703	15023	15279	15023
query12	186	116	111	111
query13	1666	518	410	410
query14	10051	6328	6239	6239
query15	213	194	177	177
query16	7660	633	451	451
query17	1222	721	606	606
query18	2022	408	310	310
query19	197	200	160	160
query20	120	111	119	111
query21	219	127	106	106
query22	4167	4185	4173	4173
query23	33726	32935	32862	32862
query24	8344	2388	2376	2376
query25	534	481	391	391
query26	1241	292	154	154
query27	2697	509	345	345
query28	4298	2138	2111	2111
query29	727	543	442	442
query30	283	222	202	202
query31	905	804	762	762
query32	74	63	63	63
query33	550	400	314	314
query34	790	839	546	546
query35	803	832	725	725
query36	931	977	873	873
query37	107	96	78	78
query38	4078	4027	4054	4027
query39	1496	1407	1410	1407
query40	210	115	102	102
query41	62	63	60	60
query42	124	111	110	110
query43	485	513	481	481
query44	1307	833	844	833
query45	176	178	164	164
query46	820	1024	625	625
query47	1754	1787	1722	1722
query48	381	423	307	307
query49	750	478	402	402
query50	635	681	415	415
query51	4142	4104	4204	4104
query52	111	107	101	101
query53	225	259	201	201
query54	573	573	514	514
query55	88	83	85	83
query56	310	296	289	289
query57	1198	1200	1131	1131
query58	267	254	255	254
query59	2627	2694	2610	2610
query60	332	324	292	292
query61	129	128	129	128
query62	825	731	664	664
query63	234	196	192	192
query64	4268	1028	657	657
query65	4271	4174	4163	4163
query66	1057	418	353	353
query67	15567	15603	15384	15384
query68	8793	876	529	529
query69	464	303	267	267
query70	1139	1111	1102	1102
query71	468	324	304	304
query72	5610	4640	4581	4581
query73	701	639	357	357
query74	9059	9136	8927	8927
query75	4174	3208	2681	2681
query76	3617	1194	745	745
query77	794	378	288	288
query78	10008	10296	9478	9478
query79	1923	859	579	579
query80	748	515	458	458
query81	488	264	226	226
query82	419	129	106	106
query83	294	245	236	236
query84	294	112	91	91
query85	779	352	309	309
query86	336	316	278	278
query87	4410	4434	4225	4225
query88	3319	2297	2281	2281
query89	392	317	289	289
query90	1953	211	204	204
query91	145	157	116	116
query92	77	63	56	56
query93	1215	929	579	579
query94	673	398	303	303
query95	379	292	287	287
query96	491	577	280	280
query97	2722	2776	2657	2657
query98	233	206	206	206
query99	1426	1408	1233	1233
Total cold run time: 274456 ms
Total hot run time: 185625 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 465f76625dd02eac5e73f2d0f59a7836a46a2d0e, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.07
query4	1.61	0.10	0.11
query5	0.43	0.41	0.40
query6	1.16	0.64	0.67
query7	0.03	0.02	0.01
query8	0.04	0.04	0.04
query9	0.58	0.50	0.52
query10	0.57	0.58	0.57
query11	0.15	0.11	0.11
query12	0.15	0.11	0.12
query13	0.63	0.60	0.61
query14	0.80	0.84	0.80
query15	0.90	0.86	0.86
query16	0.38	0.39	0.42
query17	1.04	1.09	1.07
query18	0.23	0.20	0.21
query19	1.93	1.79	1.88
query20	0.02	0.02	0.01
query21	15.39	0.88	0.54
query22	0.76	1.24	0.75
query23	14.82	1.39	0.63
query24	6.65	2.12	1.02
query25	0.54	0.19	0.10
query26	0.64	0.16	0.14
query27	0.06	0.05	0.06
query28	9.85	0.88	0.43
query29	12.53	3.93	3.30
query30	0.26	0.10	0.07
query31	2.86	0.59	0.39
query32	3.22	0.56	0.46
query33	3.04	3.11	3.14
query34	16.13	5.37	4.88
query35	4.89	4.89	4.85
query36	0.69	0.51	0.51
query37	0.09	0.08	0.07
query38	0.05	0.05	0.04
query39	0.03	0.02	0.03
query40	0.17	0.15	0.14
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.82 s
Total hot run time: 29.91 s

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33571 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7306c57982fd2e429a0edf02bf671d7ed3160f8b, data reload: false

------ Round 1 ----------------------------------
q1	17599	5177	5009	5009
q2	1924	290	192	192
q3	10286	1243	724	724
q4	10218	969	501	501
q5	7508	2809	2270	2270
q6	179	177	131	131
q7	896	744	608	608
q8	9293	1280	1052	1052
q9	6747	5119	5040	5040
q10	6871	2377	1959	1959
q11	487	292	268	268
q12	334	344	216	216
q13	17759	3620	3078	3078
q14	230	215	215	215
q15	563	473	488	473
q16	425	419	364	364
q17	576	839	361	361
q18	7466	7225	7056	7056
q19	1972	943	530	530
q20	326	337	220	220
q21	3541	3136	2349	2349
q22	1045	1021	955	955
Total cold run time: 106245 ms
Total hot run time: 33571 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5197	5057	5098	5057
q2	253	318	231	231
q3	2198	2639	2253	2253
q4	1308	1777	1337	1337
q5	4182	4150	4364	4150
q6	221	176	131	131
q7	1952	1930	1721	1721
q8	2592	2546	2542	2542
q9	7100	7064	7083	7064
q10	3023	3192	2826	2826
q11	571	494	482	482
q12	656	772	623	623
q13	3468	3886	3192	3192
q14	278	287	276	276
q15	531	482	466	466
q16	438	487	439	439
q17	1149	1536	1372	1372
q18	7581	7050	7036	7036
q19	737	707	754	707
q20	1899	1995	1816	1816
q21	4738	4294	4272	4272
q22	1047	1008	1000	1000
Total cold run time: 51119 ms
Total hot run time: 48993 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185912 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7306c57982fd2e429a0edf02bf671d7ed3160f8b, data reload: false

query1	980	384	404	384
query2	6553	1876	1833	1833
query3	6751	240	221	221
query4	25946	23635	23689	23635
query5	4895	652	474	474
query6	317	213	221	213
query7	4635	484	290	290
query8	271	220	208	208
query9	8603	2672	2662	2662
query10	524	335	282	282
query11	15675	14981	14763	14763
query12	179	115	117	115
query13	1657	547	400	400
query14	9712	6074	6171	6074
query15	201	189	167	167
query16	7651	614	477	477
query17	1157	738	590	590
query18	2022	414	302	302
query19	200	199	163	163
query20	117	127	115	115
query21	213	124	110	110
query22	4132	4284	4185	4185
query23	33754	33018	32979	32979
query24	8379	2391	2383	2383
query25	506	488	391	391
query26	721	261	149	149
query27	2680	495	341	341
query28	4283	2151	2127	2127
query29	677	557	431	431
query30	287	232	190	190
query31	931	839	767	767
query32	71	67	64	64
query33	548	364	314	314
query34	790	854	548	548
query35	770	854	735	735
query36	950	979	879	879
query37	116	100	76	76
query38	4126	4116	4012	4012
query39	1500	1401	1421	1401
query40	212	118	106	106
query41	63	59	60	59
query42	134	110	107	107
query43	518	510	455	455
query44	1303	827	821	821
query45	193	176	170	170
query46	846	1034	615	615
query47	1746	1779	1743	1743
query48	397	432	314	314
query49	707	483	398	398
query50	640	671	404	404
query51	4183	4154	4095	4095
query52	108	110	105	105
query53	225	248	188	188
query54	573	567	508	508
query55	97	84	90	84
query56	314	297	276	276
query57	1190	1193	1122	1122
query58	269	262	275	262
query59	2604	2792	2707	2707
query60	320	330	311	311
query61	127	130	129	129
query62	787	723	649	649
query63	222	183	185	183
query64	3158	994	667	667
query65	4256	4173	4201	4173
query66	873	425	365	365
query67	15892	15562	15257	15257
query68	8600	889	533	533
query69	474	300	279	279
query70	1198	1073	1070	1070
query71	474	314	309	309
query72	5263	4690	4725	4690
query73	729	617	356	356
query74	9142	8995	8716	8716
query75	3984	3162	2670	2670
query76	3655	1179	749	749
query77	799	386	283	283
query78	10189	10228	9400	9400
query79	2098	820	581	581
query80	579	505	461	461
query81	501	260	229	229
query82	451	127	102	102
query83	261	253	228	228
query84	249	105	93	93
query85	819	353	311	311
query86	388	290	304	290
query87	4343	4484	4347	4347
query88	3477	2365	2276	2276
query89	404	315	294	294
query90	1857	211	198	198
query91	141	144	113	113
query92	82	62	60	60
query93	1691	928	594	594
query94	657	397	286	286
query95	382	287	279	279
query96	497	561	281	281
query97	2683	2766	2613	2613
query98	226	205	205	205
query99	1313	1385	1293	1293
Total cold run time: 273043 ms
Total hot run time: 185912 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7306c57982fd2e429a0edf02bf671d7ed3160f8b, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.04
query3	0.24	0.07	0.07
query4	1.61	0.11	0.10
query5	0.42	0.45	0.41
query6	1.16	0.66	0.66
query7	0.03	0.01	0.02
query8	0.05	0.03	0.04
query9	0.58	0.53	0.52
query10	0.57	0.58	0.56
query11	0.15	0.11	0.12
query12	0.14	0.11	0.11
query13	0.63	0.61	0.61
query14	0.80	0.81	0.79
query15	0.89	0.87	0.88
query16	0.38	0.39	0.40
query17	1.10	1.09	1.04
query18	0.22	0.21	0.21
query19	1.91	1.81	1.87
query20	0.02	0.01	0.01
query21	15.39	0.89	0.52
query22	0.75	1.15	0.62
query23	15.08	1.37	0.63
query24	6.62	1.48	0.89
query25	0.45	0.14	0.21
query26	0.66	0.16	0.14
query27	0.07	0.05	0.05
query28	9.93	0.91	0.43
query29	12.54	3.94	3.27
query30	0.26	0.09	0.08
query31	2.82	0.59	0.39
query32	3.28	0.55	0.47
query33	3.00	3.05	3.29
query34	16.05	5.34	4.73
query35	4.79	4.84	4.80
query36	0.69	0.50	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.18	0.15	0.14
query41	0.09	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 103.88 s
Total hot run time: 29.4 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (4/4) 🎉
Increment coverage report
Complete coverage report

@morrySnow morrySnow added usercase Important user case type label dev/2.1.x dev/3.0.x dev/3.1.x labels Jun 24, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 24, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 81ac747 into apache:master Jun 24, 2025
30 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request Jun 24, 2025
### What problem does this PR solve?

Guava Sets.intersection will return a SetView, then for chain call
Sets.intersection like below code:

List<Set<>>    B;
Set<>  A = B.get(0);
for (int i = 1; i < B.size(); i++) {
    A = Sets.intersection(A, B.get(i));
}

Then A is a recursive SetViews, A = SetView(SetView(SetView(B0 interset
B1) interset B2) ... interset Bn).

Then when access A, it will call recursive deeply the multiple SetView's
relate functions.

For example,  when call A.isEmpty(),  its call stack will be like 

 java.lang.StackOverflowError: null
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      ...

if the stack is deep, it then throw StackOverflowError.

So for the long chain set intersection, need to eliminate the SetView,
and convert it to a Set.
github-actions bot pushed a commit that referenced this pull request Jun 24, 2025
### What problem does this PR solve?

Guava Sets.intersection will return a SetView, then for chain call
Sets.intersection like below code:

List<Set<>>    B;
Set<>  A = B.get(0);
for (int i = 1; i < B.size(); i++) {
    A = Sets.intersection(A, B.get(i));
}

Then A is a recursive SetViews, A = SetView(SetView(SetView(B0 interset
B1) interset B2) ... interset Bn).

Then when access A, it will call recursive deeply the multiple SetView's
relate functions.

For example,  when call A.isEmpty(),  its call stack will be like 

 java.lang.StackOverflowError: null
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.Sets$2$1.computeNext(Sets.java:838) ~[guava-33.2.1-jre.jar:?]
      at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145) ~[guava-33.2.1-jre.jar:?]
      ...

if the stack is deep, it then throw StackOverflowError.

So for the long chain set intersection, need to eliminate the SetView,
and convert it to a Set.
morrySnow pushed a commit that referenced this pull request Jun 24, 2025
…2159 (#52184)

Cherry-picked from #52159

Co-authored-by: yujun <yujun@selectdb.com>
dataroaring pushed a commit that referenced this pull request Jun 25, 2025
…2159 (#52183)

Cherry-picked from #52159

Co-authored-by: yujun <yujun@selectdb.com>
Hastyshell pushed a commit to Hastyshell/doris that referenced this pull request Jul 30, 2025
BiteTheDDDDt pushed a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.11-merged dev/3.0.7-merged dev/3.1.0-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants