Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve](inverted_index) update clucene and improve array inverted index writer #32436

Merged
merged 11 commits into from
Mar 28, 2024

Conversation

amorynan
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// EXPECT_TRUE(io::global_local_filesystem()->delete_directory(kTestDir).ok());
}

void test_string(std::string testname, Field* field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'test_string' exceeds recommended size/complexity thresholds [readability-function-size]

    void test_string(std::string testname, Field* field) {
         ^
Additional context

be/test/olap/rowset/segment_v2/inverted_index_array_test.cpp:101: 90 lines including whitespace and comments (threshold 80)

    void test_string(std::string testname, Field* field) {
         ^

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Comment on lines +53 to +54
namespace doris {
namespace segment_v2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]

Suggested change
namespace doris {
namespace segment_v2 {
namespace doris::segment_v2 {

be/test/olap/rowset/segment_v2/inverted_index_array_test.cpp:223:

- } // namespace segment_v2
- } // namespace doris
+ } // namespace doris

EXPECT_TRUE(io::global_local_filesystem()->delete_directory(kTestDir).ok());
}

void test_string(std::string testname, Field* field) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'test_string' exceeds recommended size/complexity thresholds [readability-function-size]

    void test_string(std::string testname, Field* field) {
         ^
Additional context

be/test/olap/rowset/segment_v2/inverted_index_array_test.cpp:111: 94 lines including whitespace and comments (threshold 80)

    void test_string(std::string testname, Field* field) {
         ^

}
RETURN_IF_ERROR(add_document());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if already add_null_document, then add_document() will cause unexpected problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe we should not make this if branch?

@amorynan
Copy link
Contributor Author

run buildall

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@zzzxl1993 zzzxl1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.69% (8823/24722)
Line Coverage: 27.43% (72235/263347)
Region Coverage: 26.66% (37474/140587)
Branch Coverage: 23.47% (19114/81444)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fefabd15cbbfaca699dcdc34c5fa84c0c3f4ea80_fefabd15cbbfaca699dcdc34c5fa84c0c3f4ea80/report/index.html

@amorynan
Copy link
Contributor Author

run p0

1 similar comment
@amorynan
Copy link
Contributor Author

run p0

@amorynan
Copy link
Contributor Author

run external

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.64% (8822/24752)
Line Coverage: 27.39% (72275/263870)
Region Coverage: 26.62% (37500/140894)
Branch Coverage: 23.44% (19133/81628)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fefabd15cbbfaca699dcdc34c5fa84c0c3f4ea80_fefabd15cbbfaca699dcdc34c5fa84c0c3f4ea80/report/index.html

@amorynan
Copy link
Contributor Author

run p0

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 37925 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 881cf28eb1f2cc4cfef274efa4c4174cd6090f34, data reload: false

------ Round 1 ----------------------------------
q1	17651	4964	4116	4116
q2	2122	164	159	159
q3	10589	1126	1187	1126
q4	10226	733	771	733
q5	7468	3040	2921	2921
q6	210	126	123	123
q7	1028	586	567	567
q8	9337	1994	1993	1993
q9	7127	6469	6494	6469
q10	8384	3406	3490	3406
q11	439	222	220	220
q12	427	202	202	202
q13	17824	2868	2865	2865
q14	230	201	207	201
q15	521	467	470	467
q16	487	375	372	372
q17	963	549	595	549
q18	7301	6633	6542	6542
q19	1633	1454	1417	1417
q20	543	260	264	260
q21	3672	3147	2916	2916
q22	336	302	301	301
Total cold run time: 108518 ms
Total hot run time: 37925 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4117	4064	4057	4057
q2	327	231	233	231
q3	3002	2863	2799	2799
q4	1861	1571	1541	1541
q5	5257	5282	5252	5252
q6	196	115	117	115
q7	2251	1839	1845	1839
q8	3172	3300	3280	3280
q9	8601	8570	8609	8570
q10	3715	3699	3657	3657
q11	545	440	443	440
q12	747	552	586	552
q13	16907	2872	2880	2872
q14	279	265	264	264
q15	511	465	479	465
q16	473	428	415	415
q17	1724	1493	1499	1493
q18	7504	7285	7071	7071
q19	1641	1509	1460	1460
q20	1921	1729	1741	1729
q21	4807	4866	4645	4645
q22	533	471	454	454
Total cold run time: 70091 ms
Total hot run time: 53201 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181416 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 881cf28eb1f2cc4cfef274efa4c4174cd6090f34, data reload: false

query1	925	373	345	345
query2	7407	2007	2018	2007
query3	6942	219	206	206
query4	31224	20820	20736	20736
query5	4444	440	417	417
query6	268	186	179	179
query7	4633	297	293	293
query8	234	170	172	170
query9	9395	2333	2317	2317
query10	572	259	261	259
query11	17115	13996	14082	13996
query12	144	95	91	91
query13	1635	436	424	424
query14	11420	10308	11364	10308
query15	261	205	194	194
query16	8233	274	264	264
query17	1976	568	527	527
query18	2096	290	277	277
query19	325	155	155	155
query20	92	92	90	90
query21	200	133	133	133
query22	4596	4436	4404	4404
query23	31751	31104	31068	31068
query24	10321	2838	2828	2828
query25	574	376	367	367
query26	733	154	162	154
query27	2168	348	361	348
query28	5847	1931	1879	1879
query29	879	650	647	647
query30	309	156	152	152
query31	983	734	734	734
query32	97	59	59	59
query33	676	285	261	261
query34	862	483	506	483
query35	848	621	606	606
query36	1012	872	885	872
query37	113	80	87	80
query38	3586	3455	3426	3426
query39	1418	1402	1481	1402
query40	217	120	116	116
query41	55	50	49	49
query42	104	96	99	96
query43	486	455	454	454
query44	1121	737	719	719
query45	281	273	260	260
query46	1116	686	721	686
query47	1664	1611	1595	1595
query48	469	354	351	351
query49	1072	346	351	346
query50	767	388	383	383
query51	6678	6668	6649	6649
query52	110	100	91	91
query53	360	285	286	285
query54	328	260	260	260
query55	89	80	79	79
query56	261	249	235	235
query57	1081	1006	1009	1006
query58	242	215	221	215
query59	2806	2724	2560	2560
query60	316	259	261	259
query61	117	115	118	115
query62	603	401	422	401
query63	319	290	282	282
query64	5463	4024	3969	3969
query65	3031	2990	3042	2990
query66	897	374	359	359
query67	14832	14658	14331	14331
query68	7804	540	550	540
query69	650	395	398	395
query70	1281	1201	1122	1122
query71	490	289	286	286
query72	6602	2659	2509	2509
query73	757	318	320	318
query74	6974	6486	6590	6486
query75	4480	2805	2834	2805
query76	4494	871	980	871
query77	665	268	259	259
query78	10519	9669	9630	9630
query79	10266	532	551	532
query80	1975	409	418	409
query81	542	214	218	214
query82	795	198	208	198
query83	216	147	144	144
query84	287	80	78	78
query85	1284	335	317	317
query86	408	312	289	289
query87	3765	3576	3529	3529
query88	4788	2370	2365	2365
query89	496	379	375	375
query90	1912	180	179	179
query91	175	140	137	137
query92	62	49	50	49
query93	6994	514	502	502
query94	1185	183	177	177
query95	443	337	330	330
query96	619	264	281	264
query97	3064	2858	2900	2858
query98	230	216	203	203
query99	1124	747	761	747
Total cold run time: 306545 ms
Total hot run time: 181416 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 881cf28eb1f2cc4cfef274efa4c4174cd6090f34 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       22.3 seconds inserted 10000000 Rows, about 448K ops/s

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38643 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7bed466d9aa101bbd7d26937b3c66c4e00b4e0bd, data reload: false

------ Round 1 ----------------------------------
q1	17677	4321	4234	4234
q2	2563	180	171	171
q3	11324	1125	1216	1125
q4	10735	779	843	779
q5	7859	3046	3013	3013
q6	210	131	130	130
q7	1080	633	593	593
q8	9708	2068	2052	2052
q9	7211	6631	6627	6627
q10	8494	3421	3507	3421
q11	438	236	222	222
q12	388	211	202	202
q13	17812	2885	2893	2885
q14	239	203	210	203
q15	508	474	463	463
q16	484	370	369	369
q17	958	554	608	554
q18	7299	6787	6568	6568
q19	1734	1472	1477	1472
q20	568	269	255	255
q21	3648	3001	3008	3001
q22	350	313	304	304
Total cold run time: 111287 ms
Total hot run time: 38643 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4141	4112	4063	4063
q2	334	235	235	235
q3	2972	2820	2866	2820
q4	1857	1584	1630	1584
q5	5433	5401	5299	5299
q6	201	115	119	115
q7	2277	1854	1839	1839
q8	3176	3317	3320	3317
q9	8625	8624	8622	8622
q10	3694	3738	3709	3709
q11	555	441	442	441
q12	749	560	550	550
q13	16907	2890	2856	2856
q14	279	242	269	242
q15	498	458	450	450
q16	467	448	413	413
q17	1746	1499	1480	1480
q18	7533	7084	7205	7084
q19	1621	1522	1523	1522
q20	1896	1727	1695	1695
q21	4879	4782	4623	4623
q22	514	445	443	443
Total cold run time: 70354 ms
Total hot run time: 53402 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183465 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7bed466d9aa101bbd7d26937b3c66c4e00b4e0bd, data reload: false

query1	931	374	350	350
query2	7365	2047	2144	2047
query3	6716	213	218	213
query4	31062	20918	20915	20915
query5	4396	431	495	431
query6	272	184	179	179
query7	4640	299	308	299
query8	241	172	170	170
query9	9187	2350	2352	2350
query10	582	275	263	263
query11	14523	14241	14203	14203
query12	136	93	91	91
query13	1627	431	433	431
query14	12111	11632	11508	11508
query15	324	193	197	193
query16	8248	266	256	256
query17	2035	564	524	524
query18	2113	281	286	281
query19	338	149	158	149
query20	94	87	90	87
query21	206	132	127	127
query22	4749	4465	4434	4434
query23	31854	31305	31054	31054
query24	10562	2811	2845	2811
query25	599	372	383	372
query26	1369	159	159	159
query27	2972	356	366	356
query28	7608	1958	1910	1910
query29	884	658	640	640
query30	307	148	148	148
query31	962	755	735	735
query32	99	64	61	61
query33	779	269	271	269
query34	1005	484	493	484
query35	842	617	652	617
query36	984	873	900	873
query37	123	82	79	79
query38	3587	3461	3407	3407
query39	1439	1401	1367	1367
query40	219	119	114	114
query41	51	48	48	48
query42	107	98	99	98
query43	469	465	441	441
query44	1187	745	719	719
query45	265	269	269	269
query46	1108	737	714	714
query47	1685	1603	1624	1603
query48	471	363	370	363
query49	1105	340	337	337
query50	762	383	386	383
query51	6806	6721	6765	6721
query52	111	93	92	92
query53	350	282	277	277
query54	302	280	254	254
query55	90	81	84	81
query56	253	230	251	230
query57	1095	1010	1011	1010
query58	242	216	213	213
query59	2811	2757	2681	2681
query60	267	257	258	257
query61	127	114	119	114
query62	590	402	408	402
query63	302	283	282	282
query64	5269	3988	3988	3988
query65	3163	3005	3044	3005
query66	1457	391	356	356
query67	15294	14336	14493	14336
query68	6788	541	539	539
query69	644	391	404	391
query70	1218	1212	1174	1174
query71	533	292	300	292
query72	6914	2697	2523	2523
query73	739	317	319	317
query74	7046	6610	6595	6595
query75	4085	2848	2866	2848
query76	4990	871	833	833
query77	637	266	262	262
query78	10585	9673	9583	9583
query79	10287	528	510	510
query80	1816	385	416	385
query81	539	221	208	208
query82	890	218	209	209
query83	226	151	141	141
query84	283	80	83	80
query85	1499	334	320	320
query86	464	312	305	305
query87	3753	3571	3550	3550
query88	5179	2357	2367	2357
query89	507	366	359	359
query90	1970	178	178	178
query91	172	153	155	153
query92	59	47	51	47
query93	7352	519	500	500
query94	1236	185	180	180
query95	438	345	331	331
query96	597	281	274	274
query97	3070	2841	2860	2841
query98	231	219	197	197
query99	1251	755	752	752
Total cold run time: 309827 ms
Total hot run time: 183465 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 7bed466d9aa101bbd7d26937b3c66c4e00b4e0bd with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       20.5 seconds inserted 10000000 Rows, about 487K ops/s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.63% (8825/24765)
Line Coverage: 27.37% (72350/264296)
Region Coverage: 26.59% (37525/141113)
Branch Coverage: 23.40% (19145/81826)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7bed466d9aa101bbd7d26937b3c66c4e00b4e0bd_7bed466d9aa101bbd7d26937b3c66c4e00b4e0bd/report/index.html

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amorynan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38461 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3be99854032bd4fb3ae33bf9b9d01a18e32f9a0e, data reload: false

------ Round 1 ----------------------------------
q1	17603	4287	4140	4140
q2	2114	161	153	153
q3	10822	1208	1253	1208
q4	10544	817	828	817
q5	7549	3102	3066	3066
q6	207	126	124	124
q7	1093	636	604	604
q8	9902	2080	2051	2051
q9	7317	6716	6680	6680
q10	8387	3430	3603	3430
q11	424	229	225	225
q12	373	200	192	192
q13	17793	2861	2864	2861
q14	242	198	201	198
q15	512	485	459	459
q16	462	371	370	370
q17	968	561	610	561
q18	7083	6540	6425	6425
q19	1589	1447	1500	1447
q20	545	253	255	253
q21	3638	2985	2894	2894
q22	359	305	303	303
Total cold run time: 109526 ms
Total hot run time: 38461 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4115	4065	4106	4065
q2	339	235	231	231
q3	3031	2845	2828	2828
q4	1839	1578	1547	1547
q5	5333	5311	5373	5311
q6	194	117	118	117
q7	2263	1866	1856	1856
q8	3156	3326	3326	3326
q9	8704	8687	8711	8687
q10	3819	3758	3819	3758
q11	543	435	433	433
q12	719	562	546	546
q13	16895	2851	2844	2844
q14	276	253	269	253
q15	501	466	464	464
q16	480	420	423	420
q17	1717	1497	1478	1478
q18	7461	7080	7189	7080
q19	1611	1551	1543	1543
q20	1923	1737	1707	1707
q21	4694	4717	4643	4643
q22	532	445	453	445
Total cold run time: 70145 ms
Total hot run time: 53582 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182664 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3be99854032bd4fb3ae33bf9b9d01a18e32f9a0e, data reload: false

query1	921	373	350	350
query2	6542	2017	1985	1985
query3	6714	214	213	213
query4	31659	21691	21351	21351
query5	4284	393	399	393
query6	261	190	187	187
query7	4652	309	287	287
query8	246	172	170	170
query9	9452	2342	2340	2340
query10	553	248	244	244
query11	14724	14330	14262	14262
query12	138	94	89	89
query13	1623	422	425	422
query14	10285	7976	7804	7804
query15	334	195	198	195
query16	8248	262	261	261
query17	2051	588	583	583
query18	2100	299	284	284
query19	344	163	158	158
query20	95	88	87	87
query21	205	138	134	134
query22	5029	4849	4870	4849
query23	34007	32873	32790	32790
query24	11806	2907	2833	2833
query25	542	384	394	384
query26	1360	157	158	157
query27	2898	343	351	343
query28	7577	1910	1900	1900
query29	860	624	635	624
query30	306	150	150	150
query31	975	727	730	727
query32	95	57	57	57
query33	766	255	251	251
query34	1053	501	496	496
query35	831	611	594	594
query36	1012	875	881	875
query37	100	66	69	66
query38	3604	3431	3419	3419
query39	1460	1469	1451	1451
query40	292	118	110	110
query41	57	52	49	49
query42	108	95	94	94
query43	482	468	479	468
query44	1195	737	743	737
query45	289	252	273	252
query46	1106	713	708	708
query47	1959	1883	1887	1883
query48	446	359	353	353
query49	1204	343	348	343
query50	754	373	380	373
query51	6657	6697	6587	6587
query52	113	91	95	91
query53	351	275	282	275
query54	316	253	247	247
query55	91	78	83	78
query56	248	236	224	224
query57	1252	1163	1159	1159
query58	243	212	218	212
query59	2854	2598	2689	2598
query60	266	247	255	247
query61	130	113	110	110
query62	645	461	448	448
query63	307	276	286	276
query64	7004	3976	4032	3976
query65	3148	3040	3072	3040
query66	1390	389	349	349
query67	15740	15184	15092	15092
query68	8690	538	525	525
query69	629	381	389	381
query70	1283	1159	1155	1155
query71	517	265	267	265
query72	6093	2721	2582	2582
query73	736	322	325	322
query74	7114	6389	6368	6368
query75	3705	2207	2177	2177
query76	5299	875	856	856
query77	639	253	255	253
query78	10938	10252	10167	10167
query79	10542	538	539	538
query80	1733	378	372	372
query81	504	218	213	213
query82	233	91	83	83
query83	218	141	143	141
query84	279	80	84	80
query85	1123	320	308	308
query86	392	301	312	301
query87	3751	3569	3541	3541
query88	5259	2334	2328	2328
query89	501	377	367	367
query90	2238	185	179	179
query91	172	137	138	137
query92	61	47	50	47
query93	6621	496	488	488
query94	1441	175	178	175
query95	441	332	336	332
query96	604	283	269	269
query97	2632	2508	2463	2463
query98	235	226	209	209
query99	1121	932	898	898
Total cold run time: 314103 ms
Total hot run time: 182664 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 3be99854032bd4fb3ae33bf9b9d01a18e32f9a0e with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.63% (8834/24797)
Line Coverage: 27.35% (72400/264677)
Region Coverage: 26.58% (37559/141295)
Branch Coverage: 23.39% (19157/81914)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3be99854032bd4fb3ae33bf9b9d01a18e32f9a0e_3be99854032bd4fb3ae33bf9b9d01a18e32f9a0e/report/index.html

@amorynan amorynan requested a review from xiaokang March 28, 2024 02:24
Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@xiaokang xiaokang merged commit fce88d7 into apache:master Mar 28, 2024
29 of 32 checks passed
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <hello-stephen@qq.com>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
Co-authored-by: lihangyu <15605149486@163.com>
Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Vallish Pai <vallishpai@gmail.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Jensen <czjourney@163.com>
Co-authored-by: zhangdong <493738387@qq.com>
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: zclllyybb <zhaochangle@selectdb.com>
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
yiguolei pushed a commit that referenced this pull request Apr 1, 2024
yiguolei pushed a commit that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants