Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](routine load) add read lock to fix some concurrent bugs #39242

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Aug 12, 2024

When execute show routine load throw exception:

java.util.ConcurrentModificationException: null
at java.util.ArrayList.forEach(ArrayList.java:1252) ~[?:1.8.0_131]
at org.apache.doris.load.routineload.RoutineLoadJob.getTasksShowInfo(RoutineLoadJob.java:1573) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ShowExecutor.handleShowRoutineLoadTask(ShowExecutor.java:1643) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:327) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2643) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:955) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:599) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:525) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:328) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:206) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:272) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:300) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:358) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

The reason is that the show routine load needs to traverse the routineLoadTaskInfoList, but at the same time it may be changed, such as when a transaction is committed at this moment.

Therefore, a thread safe data structure is used to solve this problem.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@sollhui
Copy link
Contributor Author

sollhui commented Aug 12, 2024

run buildall

@github-actions github-actions bot added the doing label Aug 12, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 39603 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 957653f2117b43011065d6297d20a842eca7b2b6, data reload: false

------ Round 1 ----------------------------------
q1	17608	4340	4239	4239
q2	2013	176	171	171
q3	10514	1164	1100	1100
q4	10148	710	733	710
q5	7506	2510	2474	2474
q6	232	147	141	141
q7	993	606	605	605
q8	9550	1945	1893	1893
q9	8636	6546	6571	6546
q10	7036	2110	2140	2110
q11	447	247	247	247
q12	390	222	223	222
q13	17760	3005	3004	3004
q14	271	236	229	229
q15	526	492	495	492
q16	511	384	393	384
q17	967	691	713	691
q18	8239	7462	7494	7462
q19	4481	1063	910	910
q20	665	325	327	325
q21	5565	4644	4643	4643
q22	1120	1011	1005	1005
Total cold run time: 115178 ms
Total hot run time: 39603 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4359	4249	4200	4200
q2	373	258	262	258
q3	2798	2616	2610	2610
q4	1862	1668	1608	1608
q5	5199	5274	5273	5273
q6	216	135	134	134
q7	2039	1682	1646	1646
q8	3134	3307	3306	3306
q9	8292	8331	8354	8331
q10	3357	3161	3143	3143
q11	595	511	487	487
q12	791	600	611	600
q13	17480	2977	2978	2977
q14	305	285	278	278
q15	519	495	475	475
q16	474	410	423	410
q17	1813	1439	1485	1439
q18	7745	7434	7534	7434
q19	1681	1506	1548	1506
q20	2008	1787	1788	1787
q21	5172	5108	5026	5026
q22	1119	1015	1031	1015
Total cold run time: 71331 ms
Total hot run time: 53943 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 200149 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 957653f2117b43011065d6297d20a842eca7b2b6, data reload: false

query1	912	365	355	355
query2	6443	1961	1921	1921
query3	6650	205	215	205
query4	31405	23086	23084	23084
query5	4193	462	471	462
query6	281	173	168	168
query7	4574	291	283	283
query8	236	190	206	190
query9	8732	2492	2479	2479
query10	558	490	444	444
query11	17679	14964	14918	14918
query12	147	103	93	93
query13	1657	381	359	359
query14	9254	7642	7546	7546
query15	262	217	239	217
query16	7638	485	520	485
query17	1766	559	564	559
query18	1946	296	276	276
query19	203	149	140	140
query20	118	103	107	103
query21	204	106	100	100
query22	4215	3968	3987	3968
query23	34103	33191	32894	32894
query24	11940	2577	2636	2577
query25	683	392	395	392
query26	1743	156	152	152
query27	2901	280	287	280
query28	7379	2034	2042	2034
query29	1099	426	429	426
query30	310	148	143	143
query31	970	727	772	727
query32	97	57	55	55
query33	748	301	302	301
query34	915	462	465	462
query35	977	768	759	759
query36	1092	923	937	923
query37	269	89	78	78
query38	4258	4023	4145	4023
query39	1436	1358	1356	1356
query40	270	117	114	114
query41	48	46	44	44
query42	113	94	101	94
query43	509	480	479	479
query44	1231	731	741	731
query45	228	208	206	206
query46	1089	771	718	718
query47	1858	1779	1785	1779
query48	366	297	303	297
query49	1207	430	422	422
query50	799	405	418	405
query51	6744	6680	6690	6680
query52	98	91	90	90
query53	259	180	180	180
query54	1008	452	452	452
query55	76	82	76	76
query56	275	253	252	252
query57	1163	1049	1069	1049
query58	246	228	229	228
query59	2952	2845	2978	2845
query60	290	263	264	263
query61	120	118	217	118
query62	857	615	652	615
query63	215	187	176	176
query64	10489	2231	1757	1757
query65	3198	3116	3197	3116
query66	1380	329	323	323
query67	15469	14680	14885	14680
query68	7965	551	557	551
query69	479	402	382	382
query70	1204	1067	1137	1067
query71	546	270	268	268
query72	20139	16355	16484	16355
query73	831	327	323	323
query74	9014	8710	8798	8710
query75	5241	2685	2607	2607
query76	4947	996	1013	996
query77	765	300	301	300
query78	9849	9131	8953	8953
query79	8397	516	527	516
query80	992	536	487	487
query81	593	225	222	222
query82	701	134	133	133
query83	324	141	144	141
query84	271	75	74	74
query85	1320	267	267	267
query86	381	276	290	276
query87	4638	4469	4516	4469
query88	4170	2488	2467	2467
query89	516	290	295	290
query90	2052	196	196	196
query91	123	95	96	95
query92	58	51	49	49
query93	7017	536	526	526
query94	944	286	293	286
query95	341	264	263	263
query96	679	278	271	271
query97	3207	3041	3042	3041
query98	221	200	196	196
query99	1562	1246	1248	1246
Total cold run time: 332037 ms
Total hot run time: 200149 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 957653f2117b43011065d6297d20a842eca7b2b6, data reload: false

query1	0.05	0.05	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.69	0.08	0.08
query5	0.51	0.50	0.48
query6	1.15	0.73	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.48	0.48
query10	0.54	0.54	0.54
query11	0.15	0.11	0.12
query12	0.14	0.11	0.13
query13	0.60	0.61	0.58
query14	0.76	0.77	0.76
query15	0.85	0.81	0.81
query16	0.37	0.36	0.36
query17	1.03	0.99	0.99
query18	0.22	0.21	0.21
query19	1.89	1.73	1.72
query20	0.01	0.01	0.00
query21	15.43	0.74	0.64
query22	4.81	6.42	2.09
query23	18.29	1.32	1.29
query24	2.19	0.22	0.22
query25	0.15	0.08	0.08
query26	0.32	0.21	0.22
query27	0.45	0.23	0.22
query28	13.28	1.00	0.98
query29	12.58	3.31	3.34
query30	0.24	0.05	0.05
query31	2.91	0.38	0.39
query32	3.27	0.48	0.47
query33	2.87	2.95	2.89
query34	17.01	4.31	4.37
query35	4.42	4.44	4.39
query36	0.65	0.46	0.48
query37	0.19	0.15	0.16
query38	0.15	0.15	0.14
query39	0.05	0.04	0.05
query40	0.15	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.51 s
Total hot run time: 30.73 s

dataroaring
dataroaring previously approved these changes Aug 13, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 13, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 13, 2024
@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2024

run buildall

@sollhui sollhui changed the title [fix](routine load) make task list thread safe to fix some concurrent bugs [fix](routine load) add read lock to fix some concurrent bugs Aug 13, 2024
@sollhui
Copy link
Contributor Author

sollhui commented Aug 13, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f473fddc9f53e646ee3526bd368977310b3083ed, data reload: false

------ Round 1 ----------------------------------
q1	17613	4529	4335	4335
q2	2027	186	192	186
q3	11626	1070	1073	1070
q4	10469	708	663	663
q5	7744	2816	2847	2816
q6	228	141	138	138
q7	964	600	597	597
q8	9338	2056	2055	2055
q9	8574	6533	6548	6533
q10	7008	2193	2256	2193
q11	458	252	252	252
q12	392	224	223	223
q13	18896	2984	3001	2984
q14	285	243	238	238
q15	533	477	490	477
q16	494	403	386	386
q17	972	721	738	721
q18	8237	7468	7333	7333
q19	6910	1046	913	913
q20	716	335	326	326
q21	5549	4522	4648	4522
q22	1124	1023	1007	1007
Total cold run time: 120157 ms
Total hot run time: 39968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4577	4292	4266	4266
q2	396	271	268	268
q3	3020	2765	2795	2765
q4	1972	1670	1778	1670
q5	5595	5615	5595	5595
q6	243	151	141	141
q7	2141	1778	1782	1778
q8	3349	3495	3425	3425
q9	8768	8962	8744	8744
q10	3419	3248	3287	3248
q11	617	547	501	501
q12	804	666	650	650
q13	16012	3221	3104	3104
q14	309	287	298	287
q15	536	501	494	494
q16	504	436	453	436
q17	1854	1546	1487	1487
q18	8342	8131	8019	8019
q19	1753	1652	1669	1652
q20	2130	1910	1859	1859
q21	10447	5246	5215	5215
q22	1106	1067	1043	1043
Total cold run time: 77894 ms
Total hot run time: 56647 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189998 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f473fddc9f53e646ee3526bd368977310b3083ed, data reload: false

query1	1267	883	880	880
query2	6354	1899	1869	1869
query3	10596	3911	3713	3713
query4	58664	24962	23242	23242
query5	5788	512	493	493
query6	502	165	175	165
query7	6517	315	303	303
query8	304	208	198	198
query9	9402	2460	2445	2445
query10	543	260	258	258
query11	18775	15060	15270	15060
query12	153	102	103	102
query13	1624	381	380	380
query14	12818	8201	7427	7427
query15	243	165	187	165
query16	7787	541	499	499
query17	1142	571	562	562
query18	2127	299	307	299
query19	272	151	150	150
query20	123	106	108	106
query21	204	102	98	98
query22	4340	4239	4215	4215
query23	34247	33572	33232	33232
query24	5519	2887	2859	2859
query25	532	380	378	378
query26	688	164	168	164
query27	1805	273	277	273
query28	3700	2053	2051	2051
query29	697	412	409	409
query30	232	157	145	145
query31	904	761	728	728
query32	97	56	55	55
query33	491	286	278	278
query34	836	452	475	452
query35	809	714	729	714
query36	1067	916	914	914
query37	138	79	83	79
query38	4061	3861	3848	3848
query39	1421	1363	1370	1363
query40	197	147	119	119
query41	46	43	44	43
query42	127	101	97	97
query43	508	465	467	465
query44	1056	720	718	718
query45	201	166	160	160
query46	1084	739	714	714
query47	1842	1776	1755	1755
query48	371	302	301	301
query49	746	408	424	408
query50	796	406	404	404
query51	6743	6702	6741	6702
query52	106	93	89	89
query53	256	179	179	179
query54	561	447	447	447
query55	80	76	76	76
query56	272	255	244	244
query57	1145	1066	1054	1054
query58	212	222	229	222
query59	2875	2750	2569	2569
query60	298	267	268	267
query61	136	96	99	96
query62	754	655	657	655
query63	206	182	181	181
query64	3946	1770	1701	1701
query65	3226	3131	3146	3131
query66	658	329	330	329
query67	14995	14852	14733	14733
query68	4779	539	533	533
query69	440	268	264	264
query70	1161	1168	1084	1084
query71	391	269	272	269
query72	6958	2262	2083	2083
query73	767	344	325	325
query74	9191	8836	8825	8825
query75	3331	2698	2672	2672
query76	2598	991	1025	991
query77	546	299	312	299
query78	9528	9062	8900	8900
query79	1636	531	529	529
query80	1193	502	492	492
query81	552	222	220	220
query82	608	134	145	134
query83	240	146	145	145
query84	266	78	76	76
query85	1075	278	320	278
query86	458	301	309	301
query87	4428	4268	4201	4201
query88	3860	2471	2487	2471
query89	394	299	283	283
query90	1877	194	196	194
query91	123	98	98	98
query92	66	49	50	49
query93	2357	528	527	527
query94	878	303	279	279
query95	348	258	261	258
query96	603	275	269	269
query97	3268	3105	3075	3075
query98	225	209	201	201
query99	1541	1303	1243	1243
Total cold run time: 315918 ms
Total hot run time: 189998 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f473fddc9f53e646ee3526bd368977310b3083ed, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.04
query4	1.68	0.07	0.07
query5	0.50	0.48	0.49
query6	1.13	0.74	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.49	0.50
query10	0.55	0.54	0.53
query11	0.16	0.11	0.12
query12	0.16	0.12	0.12
query13	0.60	0.61	0.59
query14	0.75	0.80	0.79
query15	0.86	0.83	0.81
query16	0.37	0.36	0.38
query17	0.99	1.03	1.01
query18	0.22	0.22	0.22
query19	1.79	1.68	1.75
query20	0.02	0.01	0.01
query21	15.40	0.73	0.65
query22	4.29	7.68	1.36
query23	18.17	1.36	1.40
query24	2.05	0.26	0.22
query25	0.15	0.08	0.09
query26	0.29	0.21	0.21
query27	0.45	0.23	0.22
query28	13.20	1.03	1.01
query29	12.58	3.41	3.35
query30	0.23	0.05	0.05
query31	2.88	0.41	0.39
query32	3.26	0.47	0.46
query33	2.92	2.85	2.96
query34	17.45	4.36	4.41
query35	4.43	4.41	4.46
query36	0.65	0.47	0.49
query37	0.19	0.16	0.16
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.15	0.13	0.12
query41	0.10	0.05	0.05
query42	0.06	0.05	0.06
query43	0.05	0.04	0.04
Total cold run time: 109.9 s
Total hot run time: 30.22 s

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 13, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@XuJianxu XuJianxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit 2d7683f into apache:master Aug 13, 2024
29 of 30 checks passed
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants