Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](Job)Replace BlockingWaitStrategy with LiteTimeoutBlockingWaitStrategy to avoid deadlock issues. #40625

Merged
merged 6 commits into from
Sep 11, 2024

Conversation

CalvinKirs
Copy link
Member

Proposed changes

FYI https://issues.apache.org/jira/browse/LOG4J2-1221

  • BlockingWaitStrategy is a wait strategy used in the Disruptor framework that blocks the thread when the ring buffer is full or not yet available for publishing.

When threads are blocked, they are waiting for space in the ring buffer to become available, which can lead to potential deadlocks if not managed properly.
Timeout Handling:

  • LiteTimeoutBlockingWaitStrategy provides a timeout for waiting threads. If the buffer is not ready within the timeout period, the thread is released, preventing it from being blocked indefinitely.
    Reduced Risk of Deadlocks:

  • By avoiding indefinite blocking, this strategy reduces the risk of deadlocks caused by threads waiting on each other. The timeout allows the system to handle scenarios where resources are temporarily

…rategy to avoid deadlock issues.

Timeout Handling:

LiteTimeoutBlockingWaitStrategy provides a timeout for waiting threads. If the buffer is not ready within the timeout period, the thread is released, preventing it from being blocked indefinitely.
Reduced Risk of Deadlocks:

By avoiding indefinite blocking, this strategy reduces the risk of deadlocks caused by threads waiting on each other. The timeout allows the system to handle scenarios where resources are temporarily
…rategy to avoid deadlock issues.

Timeout Handling:

LiteTimeoutBlockingWaitStrategy provides a timeout for waiting threads. If the buffer is not ready within the timeout period, the thread is released, preventing it from being blocked indefinitely.
Reduced Risk of Deadlocks:

By avoiding indefinite blocking, this strategy reduces the risk of deadlocks caused by threads waiting on each other. The timeout allows the system to handle scenarios where resources are temporarily
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38769 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 164b756e1c28d7fdd8b0bd1145a9a24826c4c25f, data reload: false

------ Round 1 ----------------------------------
q1	18171	4635	4455	4455
q2	2624	191	186	186
q3	12336	1138	1124	1124
q4	10504	871	659	659
q5	8036	2963	2863	2863
q6	231	139	138	138
q7	971	624	604	604
q8	9317	2072	2101	2072
q9	7082	6589	6640	6589
q10	7021	2264	2237	2237
q11	454	254	257	254
q12	402	225	231	225
q13	17772	3117	3092	3092
q14	290	252	258	252
q15	535	491	494	491
q16	517	446	435	435
q17	979	764	730	730
q18	7424	6988	6936	6936
q19	1400	1197	1043	1043
q20	687	323	337	323
q21	3975	3139	3054	3054
q22	1102	1007	1031	1007
Total cold run time: 111830 ms
Total hot run time: 38769 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4427	4320	4364	4320
q2	390	277	280	277
q3	2944	2723	2696	2696
q4	1921	1647	1719	1647
q5	5485	5465	5450	5450
q6	217	129	131	129
q7	2112	1782	1797	1782
q8	3223	3392	3371	3371
q9	8562	8539	8506	8506
q10	3487	3270	3281	3270
q11	617	513	493	493
q12	807	605	609	605
q13	8964	3120	3120	3120
q14	306	277	281	277
q15	535	494	477	477
q16	513	473	480	473
q17	1805	1516	1481	1481
q18	7783	7448	7673	7448
q19	1707	1616	1607	1607
q20	2094	1860	1867	1860
q21	5438	5353	5306	5306
q22	1112	1022	1025	1022
Total cold run time: 64449 ms
Total hot run time: 55617 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194845 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 164b756e1c28d7fdd8b0bd1145a9a24826c4c25f, data reload: false

query1	942	383	387	383
query2	6524	1993	1933	1933
query3	6655	212	216	212
query4	34476	23374	23439	23374
query5	4203	542	513	513
query6	268	175	172	172
query7	4584	301	296	296
query8	295	232	233	232
query9	8644	2528	2552	2528
query10	449	274	269	269
query11	18068	15263	15217	15217
query12	160	104	102	102
query13	1643	395	375	375
query14	10137	7293	7478	7293
query15	262	185	181	181
query16	7664	405	449	405
query17	1640	577	546	546
query18	1354	289	289	289
query19	327	144	142	142
query20	123	109	111	109
query21	208	108	104	104
query22	4356	4427	4420	4420
query23	34542	33752	33790	33752
query24	11275	2869	2866	2866
query25	611	381	383	381
query26	1129	158	157	157
query27	2412	277	284	277
query28	7174	2053	2049	2049
query29	719	413	409	409
query30	300	162	154	154
query31	1000	767	747	747
query32	98	56	53	53
query33	753	281	296	281
query34	957	493	481	481
query35	866	714	714	714
query36	1129	951	947	947
query37	159	94	86	86
query38	3983	3852	3869	3852
query39	1473	1397	1430	1397
query40	199	115	115	115
query41	49	49	45	45
query42	115	95	94	94
query43	518	489	482	482
query44	1186	764	742	742
query45	196	169	164	164
query46	1118	742	787	742
query47	1927	1798	1832	1798
query48	382	313	306	306
query49	1064	470	459	459
query50	835	416	415	415
query51	7110	6847	6840	6840
query52	102	88	89	88
query53	258	192	191	191
query54	916	467	486	467
query55	82	76	77	76
query56	298	265	264	264
query57	1219	1100	1115	1100
query58	257	242	245	242
query59	3112	2941	2825	2825
query60	312	282	279	279
query61	127	124	126	124
query62	853	646	682	646
query63	230	190	197	190
query64	4436	793	746	746
query65	3277	3185	3209	3185
query66	1276	360	346	346
query67	15986	15780	15502	15502
query68	3112	867	849	849
query69	429	327	326	326
query70	1185	1174	1215	1174
query71	349	345	346	345
query72	6301	3732	3730	3730
query73	590	587	587	587
query74	9095	8927	8942	8927
query75	3176	3007	2940	2940
query76	1880	853	842	842
query77	463	398	402	398
query78	9829	9443	9325	9325
query79	903	875	844	844
query80	808	804	780	780
query81	452	261	268	261
query82	268	262	269	262
query83	194	193	200	193
query84	231	138	108	108
query85	655	392	382	382
query86	326	321	314	314
query87	4394	4319	4308	4308
query88	4436	4156	4118	4118
query89	370	370	364	364
query90	1466	320	317	317
query91	123	125	126	125
query92	79	73	71	71
query93	907	921	905	905
query94	593	354	381	354
query95	434	415	413	413
query96	474	472	471	471
query97	3124	3143	3131	3131
query98	238	228	267	228
query99	1410	1282	1320	1282
Total cold run time: 287296 ms
Total hot run time: 194845 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 164b756e1c28d7fdd8b0bd1145a9a24826c4c25f, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.05	0.04
query3	0.22	0.05	0.05
query4	1.67	0.08	0.08
query5	0.49	0.50	0.49
query6	1.13	0.74	0.73
query7	0.01	0.01	0.02
query8	0.05	0.04	0.05
query9	0.55	0.52	0.50
query10	0.54	0.58	0.55
query11	0.15	0.11	0.11
query12	0.14	0.13	0.12
query13	0.60	0.60	0.59
query14	1.35	1.41	1.41
query15	0.84	0.83	0.83
query16	0.39	0.37	0.38
query17	1.01	1.02	1.00
query18	0.21	0.20	0.20
query19	1.96	1.79	1.92
query20	0.01	0.01	0.00
query21	15.42	0.66	0.66
query22	4.65	7.22	2.08
query23	18.27	1.33	1.23
query24	2.15	0.21	0.22
query25	0.14	0.07	0.08
query26	0.28	0.18	0.17
query27	0.08	0.07	0.07
query28	13.23	1.04	1.01
query29	12.60	3.37	3.34
query30	0.25	0.06	0.06
query31	2.86	0.39	0.40
query32	3.26	0.47	0.47
query33	3.00	3.05	3.05
query34	17.08	4.45	4.40
query35	4.52	4.48	4.47
query36	0.66	0.47	0.50
query37	0.19	0.16	0.16
query38	0.16	0.14	0.14
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.10	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.65 s
Total hot run time: 31.71 s

…ror management.

Path Normalization: Added normalize() to standardize the input path, removing redundant elements to enhance security and consistency.
Path Separator Handling: Addressed Windows path separators by converting them to Unix style to ensure consistent URL formatting.
URL Encoding: Implemented URL encoding for the path to safely handle special characters and avoid issues in the URL format.
Exception Handling: Added general exception handling with a default return value to maintain system stability in case of errors.
@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38689 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9258c1e577eee4bcc2c6f32ee58970b1e73a0ab9, data reload: false

------ Round 1 ----------------------------------
q1	17626	4951	4330	4330
q2	2032	193	189	189
q3	11755	990	1153	990
q4	10508	742	812	742
q5	7755	2896	2834	2834
q6	232	141	139	139
q7	972	647	632	632
q8	9302	2138	2139	2138
q9	7297	6622	6616	6616
q10	7020	2288	2204	2204
q11	476	258	249	249
q12	414	230	232	230
q13	17760	3144	3149	3144
q14	285	244	236	236
q15	545	493	487	487
q16	535	451	444	444
q17	1005	734	747	734
q18	7369	6891	6998	6891
q19	1384	1125	1116	1116
q20	705	344	350	344
q21	4108	3206	3012	3012
q22	1131	1030	988	988
Total cold run time: 110216 ms
Total hot run time: 38689 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4449	4490	4390	4390
q2	403	280	267	267
q3	2947	2680	2692	2680
q4	1971	1658	1748	1658
q5	5822	5872	5889	5872
q6	239	140	136	136
q7	2327	1885	1891	1885
q8	3359	3518	3587	3518
q9	9043	9075	8986	8986
q10	3780	3460	3396	3396
q11	612	531	514	514
q12	878	679	716	679
q13	14807	3456	3491	3456
q14	366	316	330	316
q15	576	508	521	508
q16	590	513	542	513
q17	1871	1618	1563	1563
q18	8793	8084	8196	8084
q19	2881	1673	1605	1605
q20	2236	1970	1987	1970
q21	5772	5755	5584	5584
q22	1207	1088	1036	1036
Total cold run time: 74929 ms
Total hot run time: 58616 ms

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 11, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 087048f into apache:master Sep 11, 2024
25 of 28 checks passed
@morrySnow morrySnow added p0_b usercase Important user case type label labels Sep 11, 2024
@CalvinKirs CalvinKirs deleted the master-disruptor branch September 12, 2024 01:53
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Sep 12, 2024
…rategy to avoid deadlock issues. (apache#40625)

FYI https://issues.apache.org/jira/browse/LOG4J2-1221

- BlockingWaitStrategy is a wait strategy used in the Disruptor
framework that blocks the thread when the ring buffer is full or not yet
available for publishing.

When threads are blocked, they are waiting for space in the ring buffer
to become available, which can lead to potential deadlocks if not
managed properly.
Timeout Handling:

- LiteTimeoutBlockingWaitStrategy provides a timeout for waiting
threads. If the buffer is not ready within the timeout period, the
thread is released, preventing it from being blocked indefinitely.
Reduced Risk of Deadlocks:

- By avoiding indefinite blocking, this strategy reduces the risk of
deadlocks caused by threads waiting on each other. The timeout allows
the system to handle scenarios where resources are temporarily

(cherry picked from commit 087048f)
@gavinchou gavinchou added the p0_c label Sep 12, 2024
gavinchou pushed a commit that referenced this pull request Sep 12, 2024
…rategy to avoid deadlock issues. (#40625)

FYI https://issues.apache.org/jira/browse/LOG4J2-1221

- BlockingWaitStrategy is a wait strategy used in the Disruptor
framework that blocks the thread when the ring buffer is full or not yet
available for publishing.

When threads are blocked, they are waiting for space in the ring buffer
to become available, which can lead to potential deadlocks if not
managed properly.
Timeout Handling:

- LiteTimeoutBlockingWaitStrategy provides a timeout for waiting
threads. If the buffer is not ready within the timeout period, the
thread is released, preventing it from being blocked indefinitely.
Reduced Risk of Deadlocks:

- By avoiding indefinite blocking, this strategy reduces the risk of
deadlocks caused by threads waiting on each other. The timeout allows
the system to handle scenarios where resources are temporarily
yiguolei pushed a commit that referenced this pull request Sep 12, 2024
…ockingWaitStrategy to avoid deadlock issues. (#40625) (#40707)

…

FYI https://issues.apache.org/jira/browse/LOG4J2-1221

- BlockingWaitStrategy is a wait strategy used in the Disruptor
framework that blocks the thread when the ring buffer is full or not yet
available for publishing.

When threads are blocked, they are waiting for space in the ring buffer
to become available, which can lead to potential deadlocks if not
managed properly.
Timeout Handling:

- LiteTimeoutBlockingWaitStrategy provides a timeout for waiting
threads. If the buffer is not ready within the timeout period, the
thread is released, preventing it from being blocked indefinitely.
Reduced Risk of Deadlocks:

- By avoiding indefinite blocking, this strategy reduces the risk of
deadlocks caused by threads waiting on each other. The timeout allows
the system to handle scenarios where resources are temporarily

(cherry picked from commit 087048f)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged p0_b p0_c reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants