Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](scanner) Fix deadlock when scanner submit failed #40495

Merged
merged 4 commits into from
Sep 10, 2024

Conversation

zhiqiang-hhhh
Copy link
Contributor

We have dead lock when submit scanner to scheduler failed.

pstack looks like

Thread 2012 (Thread 0x7f87363fb700 (LWP 4179707) "Pipe_normal [wo"):
#0  0x00007f8b8f3dc82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b8f3d5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x000055b20f333e7a in __gthread_mutex_lock (__mutex=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default
.h:749
#3  std::mutex::lock (this=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:100
#4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<optimized out>) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229
#5  doris::vectorized::ScannerContext::append_block_to_queue (this=<optimized out>, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:234
#6  0x000055b20f32c0f9 in doris::vectorized::ScannerScheduler::submit (this=<optimized out>, ctx=..., scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:209
#7  0x000055b20f3338fc in doris::vectorized::ScannerContext::submit_scan_task (this=this@entry=0x7f8733d96010, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:217
#8  0x000055b20f3346cd in doris::vectorized::ScannerContext::get_block_from_queue (this=0x7f8733d96010, state=<optimized out>, block=0x7f871f728de0, eos=0x7f871abce470, id=<optimized out>) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:290
#9  0x000055b214cb4f13 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block (this=<optimized out>, state=0x7f872f0eb400, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f871abce470) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.cpp:1292
#10 0x000055b2142b5772 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block_after_projects (this=0x80, state=0x0, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.h:363
#11 0x000055b2142e7880 in doris::pipeline::StatefulOperatorX<doris::pipeline::StreamingAggLocalState>::get_block (this=0x7f871f9bee00, state=0x7f872f0eb400, block=0x7f8716d49060, eos=0x7f87363f4937) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/operator.cpp:587

Deallock happens with following

Status ScannerContext::get_block_from_queue {
     std::unique_lock l(_transfer_lock);
     ...
     if (scan_task->is_eos()) {
     ...
     } else {
          // resubmit current running scanner to read the next block
         submit_scan_task(scan_task);
     }
}

ScannerContext::submit_scan_task(std::shared_ptr<ScanTask> scan_task) {
     _scanner_scheduler->submit(shared_from_this(), scan_task);
}

void ScannerScheduler::submit(std::shared_ptr<ScannerContext> ctx,
                              std::shared_ptr<ScanTask> scan_task) {
    ...
    if (auto ret = sumbit_task(); !ret) {
        scan_task->set_status(Status::InternalError(
                "Failed to submit scanner to scanner pool reason:" + std::string(ret.msg()) +
                "|type:" + std::to_string(type)));
        ctx->append_block_to_queue(scan_task);
        return;
    }
}

void ScannerContext::append_block_to_queue(std::shared_ptr<ScanTask> scan_task) {
    ...
    std::lock_guard<std::mutex> l(_transfer_lock);
    ...
}

Since mutex in cpp is not re-enterable, so the scanner thread will deadlock with itself.

This pr fix the problem by making ScannerScheduler::submit return a Status instead of doing append failed task to the ScannerContext. The caller itself will decide where resubmit the scanner or just abort the execution of the query.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.85% (9381/25460)
Line Coverage: 28.22% (77339/274044)
Region Coverage: 27.63% (39954/144594)
Branch Coverage: 24.27% (20335/83800)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b42cb56f4d47a027a1a7a3b48524a4bb570b1d8e_b42cb56f4d47a027a1a7a3b48524a4bb570b1d8e/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37953 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b42cb56f4d47a027a1a7a3b48524a4bb570b1d8e, data reload: false

------ Round 1 ----------------------------------
q1	17619	4563	4300	4300
q2	2034	194	195	194
q3	11815	974	1089	974
q4	10518	708	803	708
q5	7765	2868	2781	2781
q6	226	140	139	139
q7	956	620	611	611
q8	9322	2115	2075	2075
q9	6943	6514	6552	6514
q10	6990	2270	2303	2270
q11	460	247	241	241
q12	398	220	228	220
q13	18895	3091	3071	3071
q14	279	240	240	240
q15	540	493	483	483
q16	529	425	422	422
q17	1002	735	685	685
q18	7364	6890	6927	6890
q19	1381	963	996	963
q20	682	341	327	327
q21	3960	3012	2834	2834
q22	1121	1011	1032	1011
Total cold run time: 110799 ms
Total hot run time: 37953 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4356	4324	4267	4267
q2	389	288	273	273
q3	2913	2691	2615	2615
q4	1965	1692	1642	1642
q5	5653	5665	5755	5665
q6	227	135	140	135
q7	2222	1828	1838	1828
q8	3341	3441	3447	3441
q9	8811	8791	8878	8791
q10	3590	3389	3429	3389
q11	610	506	512	506
q12	843	678	644	644
q13	14743	3238	3286	3238
q14	337	285	292	285
q15	518	487	486	486
q16	554	510	497	497
q17	1851	1565	1573	1565
q18	8299	8038	7829	7829
q19	1731	1478	1619	1478
q20	2158	1904	1904	1904
q21	5773	5522	5495	5495
q22	1164	1057	1036	1036
Total cold run time: 72048 ms
Total hot run time: 57009 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192227 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b42cb56f4d47a027a1a7a3b48524a4bb570b1d8e, data reload: false

query1	1257	896	846	846
query2	6344	1971	1978	1971
query3	10769	4179	4043	4043
query4	60056	27306	23102	23102
query5	5408	503	504	503
query6	412	158	157	157
query7	5774	293	306	293
query8	319	220	228	220
query9	9141	2513	2492	2492
query10	495	285	261	261
query11	18286	14916	15499	14916
query12	159	103	109	103
query13	1617	403	398	398
query14	10110	7153	7358	7153
query15	226	174	182	174
query16	7525	450	482	450
query17	1117	573	588	573
query18	2042	288	326	288
query19	284	146	140	140
query20	114	106	106	106
query21	212	104	109	104
query22	4611	4601	4792	4601
query23	34188	33139	33268	33139
query24	5995	2934	2866	2866
query25	538	387	383	383
query26	698	147	150	147
query27	1809	273	275	273
query28	3734	2043	2015	2015
query29	692	401	402	401
query30	230	150	153	150
query31	921	754	728	728
query32	75	53	51	51
query33	427	283	292	283
query34	878	469	477	469
query35	817	742	709	709
query36	1060	935	931	931
query37	143	84	86	84
query38	3980	3894	3887	3887
query39	1452	1393	1403	1393
query40	196	118	119	118
query41	54	85	45	45
query42	111	94	99	94
query43	514	484	477	477
query44	1076	736	737	736
query45	192	165	158	158
query46	1082	733	721	721
query47	1933	1790	1842	1790
query48	359	283	286	283
query49	775	421	440	421
query50	822	402	401	401
query51	7044	6949	7054	6949
query52	97	88	90	88
query53	251	178	176	176
query54	549	437	444	437
query55	75	71	73	71
query56	267	254	253	253
query57	1189	1050	1071	1050
query58	216	237	226	226
query59	3051	2822	2880	2822
query60	286	260	261	260
query61	116	98	97	97
query62	751	645	648	645
query63	216	189	183	183
query64	2805	685	687	685
query65	3222	3158	3164	3158
query66	680	326	332	326
query67	15392	15337	15224	15224
query68	2977	578	583	578
query69	408	267	280	267
query70	1141	1034	1077	1034
query71	355	263	286	263
query72	5312	4035	4069	4035
query73	747	319	320	319
query74	9111	8826	8819	8819
query75	3335	2675	2733	2675
query76	1515	1045	1041	1041
query77	532	312	324	312
query78	9492	9178	8974	8974
query79	1036	539	529	529
query80	684	494	492	492
query81	454	230	236	230
query82	244	137	138	137
query83	171	149	148	148
query84	258	79	77	77
query85	669	328	335	328
query86	316	289	259	259
query87	4556	4312	4161	4161
query88	3033	2327	2325	2325
query89	395	294	293	293
query90	1898	187	190	187
query91	134	110	116	110
query92	59	56	50	50
query93	1079	530	537	530
query94	662	315	326	315
query95	322	284	264	264
query96	588	262	269	262
query97	3163	3056	3051	3051
query98	219	202	202	202
query99	1534	1263	1402	1263
Total cold run time: 303635 ms
Total hot run time: 192227 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b42cb56f4d47a027a1a7a3b48524a4bb570b1d8e, data reload: false

query1	0.04	0.05	0.04
query2	0.08	0.03	0.04
query3	0.24	0.05	0.05
query4	1.68	0.08	0.08
query5	0.53	0.50	0.51
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.05	0.05
query9	0.54	0.48	0.49
query10	0.53	0.55	0.54
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.59	0.58	0.58
query14	1.40	1.43	1.41
query15	0.84	0.82	0.82
query16	0.37	0.37	0.38
query17	1.06	1.08	1.04
query18	0.21	0.21	0.20
query19	1.88	1.81	1.87
query20	0.01	0.00	0.01
query21	15.41	0.69	0.67
query22	3.94	7.69	1.67
query23	18.20	1.34	1.28
query24	2.09	0.23	0.21
query25	0.14	0.08	0.08
query26	0.26	0.18	0.18
query27	0.08	0.08	0.08
query28	13.26	1.02	0.99
query29	12.64	3.36	3.30
query30	0.24	0.05	0.06
query31	2.86	0.40	0.39
query32	3.27	0.48	0.48
query33	2.95	3.00	3.07
query34	16.86	4.43	4.44
query35	4.49	4.55	4.53
query36	0.64	0.46	0.47
query37	0.18	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.04	0.04
query42	0.05	0.05	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.54 s
Total hot run time: 31.35 s

@@ -83,6 +83,8 @@ enum SortType : int;

enum TabletStorageType { STORAGE_TYPE_LOCAL, STORAGE_TYPE_REMOTE, STORAGE_TYPE_REMOTE_AND_LOCAL };

std::string to_string(const TabletStorageType&);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method could be a static method

@@ -139,10 +139,12 @@ class ScannerContext : public std::enable_shared_from_this<ScannerContext>,
// set the next scanned block to `ScanTask::current_block`
// set the error state to `ScanTask::status`
// set the `eos` to `ScanTask::eos` if there is no more data in current scanner
void submit_scan_task(std::shared_ptr<ScanTask> scan_task);
[[nodiscard]] Status submit_scan_task(std::shared_ptr<ScanTask> scan_task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need discard, because status class is always nodiscard

auto scan_task_ptr = std::make_shared<ScanTask>(next_scanner);
Status submit_status = submit_scan_task(scan_task_ptr);
if (!submit_status.ok()) {
append_block_to_queue_locked(scan_task_ptr, l);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return error if submit failed

Copy link
Contributor

@yiguolei yiguolei Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should update the scanner context's status
if (!_process_status.ok()) {
_set_scanner_done();
return _process_status;
}

submit_scan_task(scan_task);
Status submit_status = submit_scan_task(scan_task);
if (!submit_status.ok()) {
append_block_to_queue_locked(scan_task, l);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set process status to abnormal and return the error status

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3f0b67f8373235035345b44ff22f9d46b4be8643, data reload: false

------ Round 1 ----------------------------------
q1	17686	4684	4456	4456
q2	2389	187	180	180
q3	10457	1133	1065	1065
q4	10144	703	722	703
q5	7740	2934	2876	2876
q6	225	141	143	141
q7	987	650	603	603
q8	9349	2085	2101	2085
q9	6837	6564	6589	6564
q10	7007	2268	2180	2180
q11	469	261	244	244
q12	396	228	225	225
q13	17763	3119	3090	3090
q14	286	254	246	246
q15	541	484	492	484
q16	526	419	421	419
q17	983	733	737	733
q18	7478	6831	6908	6831
q19	1402	1072	1022	1022
q20	692	343	320	320
q21	4348	3188	3141	3141
q22	1107	1000	1011	1000
Total cold run time: 108812 ms
Total hot run time: 38608 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4392	4308	4328	4308
q2	379	273	266	266
q3	2899	2705	2684	2684
q4	1927	1675	1652	1652
q5	5496	5430	5455	5430
q6	218	130	134	130
q7	2106	1743	1751	1743
q8	3252	3382	3390	3382
q9	8491	8495	8540	8495
q10	3478	3218	3185	3185
q11	628	526	508	508
q12	782	619	627	619
q13	12706	3135	3088	3088
q14	312	272	277	272
q15	516	469	480	469
q16	531	488	489	488
q17	1809	1490	1493	1490
q18	7734	7465	7463	7463
q19	1708	1594	1607	1594
q20	2094	1840	1809	1809
q21	5628	5263	5159	5159
q22	1122	1027	1020	1020
Total cold run time: 68208 ms
Total hot run time: 55254 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192673 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3f0b67f8373235035345b44ff22f9d46b4be8643, data reload: false

query1	927	382	372	372
query2	6465	1982	1843	1843
query3	6641	207	217	207
query4	27779	23376	23185	23185
query5	4240	507	491	491
query6	257	160	170	160
query7	4592	297	293	293
query8	288	234	230	230
query9	8445	2521	2495	2495
query10	465	289	286	286
query11	16143	15162	15146	15146
query12	158	102	101	101
query13	1648	395	372	372
query14	9668	7396	7900	7396
query15	314	174	164	164
query16	7363	445	469	445
query17	1689	562	567	562
query18	1278	287	289	287
query19	327	154	163	154
query20	125	108	115	108
query21	215	103	100	100
query22	4565	4094	4115	4094
query23	34273	33469	33445	33445
query24	11145	2855	2871	2855
query25	611	379	403	379
query26	1124	155	152	152
query27	2285	277	276	276
query28	6861	2060	2034	2034
query29	677	403	410	403
query30	311	151	149	149
query31	980	782	779	779
query32	102	56	56	56
query33	794	299	285	285
query34	958	494	498	494
query35	881	708	722	708
query36	1093	892	935	892
query37	159	86	82	82
query38	3923	3893	3854	3854
query39	1441	1387	1456	1387
query40	198	117	115	115
query41	48	48	45	45
query42	121	96	101	96
query43	511	471	457	457
query44	1241	760	741	741
query45	198	169	170	169
query46	1110	763	743	743
query47	1878	1780	1797	1780
query48	389	306	296	296
query49	1030	437	436	436
query50	843	415	424	415
query51	6921	6948	6859	6859
query52	99	87	87	87
query53	263	185	185	185
query54	967	455	464	455
query55	80	76	75	75
query56	275	265	249	249
query57	1187	1072	1044	1044
query58	238	235	232	232
query59	3003	2767	2799	2767
query60	302	270	280	270
query61	107	100	105	100
query62	816	643	653	643
query63	220	191	186	186
query64	4052	731	657	657
query65	3252	3148	3148	3148
query66	788	336	341	336
query67	15865	15233	15146	15146
query68	3090	867	851	851
query69	426	336	340	336
query70	1090	1183	1156	1156
query71	347	354	348	348
query72	4939	3460	3441	3441
query73	597	585	594	585
query74	9089	8829	8832	8829
query75	3141	3006	2978	2978
query76	1819	859	855	855
query77	460	409	418	409
query78	9425	9220	9183	9183
query79	906	884	861	861
query80	841	897	806	806
query81	461	258	259	258
query82	266	272	262	262
query83	193	191	195	191
query84	231	109	107	107
query85	643	400	381	381
query86	303	326	322	322
query87	4384	4407	4321	4321
query88	4350	4098	4125	4098
query89	382	370	366	366
query90	1473	316	308	308
query91	124	124	131	124
query92	79	77	77	77
query93	915	953	934	934
query94	555	354	374	354
query95	482	413	416	413
query96	475	474	471	471
query97	3117	3106	3092	3092
query98	229	230	222	222
query99	1378	1281	1293	1281
Total cold run time: 273422 ms
Total hot run time: 192673 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.85% (9392/25485)
Line Coverage: 28.24% (77460/274275)
Region Coverage: 27.64% (39997/144682)
Branch Coverage: 24.27% (20345/83830)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3f0b67f8373235035345b44ff22f9d46b4be8643_3f0b67f8373235035345b44ff22f9d46b4be8643/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3f0b67f8373235035345b44ff22f9d46b4be8643, data reload: false

query1	0.05	0.05	0.04
query2	0.08	0.05	0.04
query3	0.23	0.05	0.05
query4	1.68	0.08	0.07
query5	0.53	0.50	0.48
query6	1.12	0.74	0.73
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.49
query10	0.54	0.55	0.56
query11	0.15	0.12	0.12
query12	0.14	0.12	0.12
query13	0.60	0.59	0.60
query14	1.44	1.42	1.47
query15	0.84	0.82	0.82
query16	0.38	0.36	0.38
query17	0.97	0.99	1.03
query18	0.20	0.20	0.20
query19	1.83	1.76	1.71
query20	0.02	0.01	0.01
query21	15.42	0.69	0.67
query22	4.12	6.76	2.28
query23	18.28	1.34	1.25
query24	2.10	0.22	0.22
query25	0.16	0.08	0.08
query26	0.26	0.18	0.19
query27	0.08	0.08	0.07
query28	13.27	1.02	1.01
query29	12.64	3.35	3.33
query30	0.25	0.06	0.05
query31	2.86	0.39	0.40
query32	3.26	0.49	0.48
query33	2.99	3.00	3.03
query34	17.24	4.44	4.38
query35	4.43	4.48	4.45
query36	0.66	0.48	0.52
query37	0.18	0.15	0.16
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.12
query41	0.09	0.05	0.05
query42	0.05	0.06	0.05
query43	0.06	0.04	0.04
Total cold run time: 110.18 s
Total hot run time: 31.77 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 9, 2024
Copy link
Contributor

github-actions bot commented Sep 9, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Sep 9, 2024

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Contributor

yiguolei commented Sep 9, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.82% (9382/25478)
Line Coverage: 28.21% (77350/274182)
Region Coverage: 27.62% (39948/144658)
Branch Coverage: 24.24% (20321/83818)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7fcfcf35dd64986989944d68b178a5a9e49cbd03_7fcfcf35dd64986989944d68b178a5a9e49cbd03/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37835 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7fcfcf35dd64986989944d68b178a5a9e49cbd03, data reload: false

------ Round 1 ----------------------------------
q1	17696	5266	4279	4279
q2	2022	193	189	189
q3	11653	992	1172	992
q4	10477	677	844	677
q5	7752	2824	2807	2807
q6	227	139	136	136
q7	960	622	616	616
q8	9313	2045	2069	2045
q9	7092	6559	6573	6559
q10	7046	2166	2182	2166
q11	461	246	238	238
q12	393	227	227	227
q13	17782	3141	3097	3097
q14	279	246	236	236
q15	529	486	480	480
q16	544	426	443	426
q17	985	732	627	627
q18	7332	6836	6897	6836
q19	1398	1045	992	992
q20	672	335	332	332
q21	4044	2867	3107	2867
q22	1148	1042	1011	1011
Total cold run time: 109805 ms
Total hot run time: 37835 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4348	4294	4318	4294
q2	386	273	277	273
q3	2864	2638	2732	2638
q4	1939	1664	1614	1614
q5	5628	5698	5734	5698
q6	228	132	135	132
q7	2203	1833	1843	1833
q8	3365	3388	3432	3388
q9	8840	8804	8812	8804
q10	3588	3475	3378	3378
q11	615	541	520	520
q12	833	653	649	649
q13	14405	3308	3272	3272
q14	321	286	287	286
q15	537	477	499	477
q16	528	496	503	496
q17	1827	1567	1572	1567
q18	8273	7940	8111	7940
q19	1713	1486	1522	1486
q20	2176	1911	1914	1911
q21	5803	5506	5340	5340
q22	1159	1119	1087	1087
Total cold run time: 71579 ms
Total hot run time: 57083 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7fcfcf35dd64986989944d68b178a5a9e49cbd03, data reload: false

query1	1272	873	875	873
query2	6339	1984	1967	1967
query3	10736	4144	4004	4004
query4	59274	25737	23338	23338
query5	5379	517	511	511
query6	409	160	157	157
query7	5780	304	299	299
query8	324	239	227	227
query9	8792	2522	2502	2502
query10	478	275	262	262
query11	18133	15013	15439	15013
query12	152	106	101	101
query13	1518	390	377	377
query14	11250	7590	7661	7590
query15	235	176	187	176
query16	7539	499	484	484
query17	1111	593	573	573
query18	2052	301	316	301
query19	290	158	155	155
query20	122	115	131	115
query21	208	106	105	105
query22	4567	4436	4426	4426
query23	34298	33466	33343	33343
query24	5988	2808	2839	2808
query25	554	413	400	400
query26	696	156	158	156
query27	1768	280	281	280
query28	3595	2054	2027	2027
query29	711	424	429	424
query30	232	155	154	154
query31	949	772	782	772
query32	82	56	57	56
query33	469	297	284	284
query34	890	483	481	481
query35	828	744	712	712
query36	1089	906	935	906
query37	155	88	90	88
query38	3947	3886	3880	3880
query39	1441	1406	1391	1391
query40	198	116	118	116
query41	50	52	47	47
query42	128	96	95	95
query43	501	482	495	482
query44	1088	759	750	750
query45	197	165	173	165
query46	1109	751	749	749
query47	1918	1806	1788	1788
query48	361	288	300	288
query49	787	455	454	454
query50	834	422	425	422
query51	7064	6883	6887	6883
query52	97	88	88	88
query53	249	184	185	184
query54	569	475	467	467
query55	79	75	76	75
query56	284	267	267	267
query57	1232	1061	1074	1061
query58	233	241	230	230
query59	3143	2941	2909	2909
query60	293	268	268	268
query61	104	96	101	96
query62	746	673	651	651
query63	224	181	191	181
query64	2734	700	682	682
query65	3209	3107	3154	3107
query66	623	343	360	343
query67	15470	15189	15186	15186
query68	4362	574	569	569
query69	432	275	274	274
query70	1203	1046	1080	1046
query71	429	285	282	282
query72	6582	4063	3982	3982
query73	764	324	328	324
query74	9262	8711	8891	8711
query75	3382	2686	2730	2686
query76	1776	1047	1002	1002
query77	595	331	332	331
query78	9855	8984	9039	8984
query79	1091	546	526	526
query80	816	521	513	513
query81	564	231	228	228
query82	259	140	140	140
query83	178	150	150	150
query84	267	74	74	74
query85	834	335	285	285
query86	358	308	302	302
query87	4549	4253	4283	4253
query88	3032	2278	2286	2278
query89	384	284	287	284
query90	2026	194	195	194
query91	123	96	105	96
query92	58	50	51	50
query93	1505	543	543	543
query94	760	299	298	298
query95	340	258	255	255
query96	592	266	265	265
query97	3199	3062	3109	3062
query98	222	203	198	198
query99	1563	1281	1334	1281
Total cold run time: 308538 ms
Total hot run time: 193218 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7fcfcf35dd64986989944d68b178a5a9e49cbd03, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.23	0.04	0.05
query4	1.68	0.08	0.07
query5	0.51	0.49	0.48
query6	1.12	0.73	0.72
query7	0.02	0.01	0.01
query8	0.06	0.05	0.05
query9	0.55	0.50	0.50
query10	0.54	0.56	0.56
query11	0.15	0.12	0.11
query12	0.14	0.13	0.12
query13	0.60	0.59	0.58
query14	1.37	1.41	1.38
query15	0.85	0.83	0.81
query16	0.37	0.35	0.37
query17	1.01	1.04	1.04
query18	0.21	0.20	0.20
query19	1.89	1.69	1.79
query20	0.01	0.01	0.01
query21	15.41	0.66	0.65
query22	4.28	6.87	2.38
query23	18.31	1.48	1.36
query24	2.13	0.22	0.21
query25	0.15	0.08	0.07
query26	0.28	0.18	0.19
query27	0.08	0.08	0.08
query28	13.31	1.01	0.99
query29	12.65	3.30	3.32
query30	0.24	0.06	0.06
query31	2.86	0.39	0.40
query32	3.26	0.47	0.49
query33	2.99	3.06	3.02
query34	17.04	4.43	4.41
query35	4.57	4.51	4.46
query36	0.65	0.48	0.47
query37	0.19	0.15	0.15
query38	0.17	0.14	0.14
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.03
Total cold run time: 110.43 s
Total hot run time: 31.89 s

@zhiqiang-hhhh
Copy link
Contributor Author

run cloud p0

@yiguolei yiguolei merged commit 5f1fe81 into apache:master Sep 10, 2024
22 of 28 checks passed
@yiguolei yiguolei added the p0_b label Sep 10, 2024
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Sep 10, 2024
We have dead lock when submit scanner to scheduler failed.

pstack looks like
```txt
Thread 2012 (Thread 0x7f87363fb700 (LWP 4179707) "Pipe_normal [wo"):

.h:749

```
Deallock happens with following
```cpp
Status ScannerContext::get_block_from_queue {
     std::unique_lock l(_transfer_lock);
     ...
     if (scan_task->is_eos()) {
     ...
     } else {
          // resubmit current running scanner to read the next block
         submit_scan_task(scan_task);
     }
}

ScannerContext::submit_scan_task(std::shared_ptr<ScanTask> scan_task) {
     _scanner_scheduler->submit(shared_from_this(), scan_task);
}

void ScannerScheduler::submit(std::shared_ptr<ScannerContext> ctx,
                              std::shared_ptr<ScanTask> scan_task) {
    ...
    if (auto ret = sumbit_task(); !ret) {
        scan_task->set_status(Status::InternalError(
                "Failed to submit scanner to scanner pool reason:" + std::string(ret.msg()) +
                "|type:" + std::to_string(type)));
        ctx->append_block_to_queue(scan_task);
        return;
    }
}

void ScannerContext::append_block_to_queue(std::shared_ptr<ScanTask> scan_task) {
    ...
    std::lock_guard<std::mutex> l(_transfer_lock);
    ...
}
```
Since mutex in cpp is not re-enterable, so the scanner thread will
deadlock with itself.

This pr fix the problem by making `ScannerScheduler::submit` return a
Status instead of doing append failed task to the ScannerContext. The
caller itself will decide where resubmit the scanner or just abort the
execution of the query.
@zhiqiang-hhhh zhiqiang-hhhh deleted the fix-scanner-deadlock branch September 10, 2024 16:35
yiguolei pushed a commit that referenced this pull request Sep 11, 2024
yiguolei pushed a commit that referenced this pull request Sep 11, 2024
We have dead lock when submit scanner to scheduler failed.

pstack looks like
```txt
Thread 2012 (Thread 0x7f87363fb700 (LWP 4179707) "Pipe_normal [wo"):
#0  0x00007f8b8f3dc82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b8f3d5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x000055b20f333e7a in __gthread_mutex_lock (__mutex=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default
.h:749
#3  std::mutex::lock (this=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:100
#4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<optimized out>) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229
#5  doris::vectorized::ScannerContext::append_block_to_queue (this=<optimized out>, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:234
#6  0x000055b20f32c0f9 in doris::vectorized::ScannerScheduler::submit (this=<optimized out>, ctx=..., scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:209
#7  0x000055b20f3338fc in doris::vectorized::ScannerContext::submit_scan_task (this=this@entry=0x7f8733d96010, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:217
#8  0x000055b20f3346cd in doris::vectorized::ScannerContext::get_block_from_queue (this=0x7f8733d96010, state=<optimized out>, block=0x7f871f728de0, eos=0x7f871abce470, id=<optimized out>) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:290
#9  0x000055b214cb4f13 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block (this=<optimized out>, state=0x7f872f0eb400, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f871abce470) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.cpp:1292
#10 0x000055b2142b5772 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block_after_projects (this=0x80, state=0x0, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.h:363
#11 0x000055b2142e7880 in doris::pipeline::StatefulOperatorX<doris::pipeline::StreamingAggLocalState>::get_block (this=0x7f871f9bee00, state=0x7f872f0eb400, block=0x7f8716d49060, eos=0x7f87363f4937) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/operator.cpp:587
```
Deallock happens with following
```cpp
Status ScannerContext::get_block_from_queue {
     std::unique_lock l(_transfer_lock);
     ...
     if (scan_task->is_eos()) {
     ...
     } else {
          // resubmit current running scanner to read the next block
         submit_scan_task(scan_task);
     }
}

ScannerContext::submit_scan_task(std::shared_ptr<ScanTask> scan_task) {
     _scanner_scheduler->submit(shared_from_this(), scan_task);
}

void ScannerScheduler::submit(std::shared_ptr<ScannerContext> ctx,
                              std::shared_ptr<ScanTask> scan_task) {
    ...
    if (auto ret = sumbit_task(); !ret) {
        scan_task->set_status(Status::InternalError(
                "Failed to submit scanner to scanner pool reason:" + std::string(ret.msg()) +
                "|type:" + std::to_string(type)));
        ctx->append_block_to_queue(scan_task);
        return;
    }
}

void ScannerContext::append_block_to_queue(std::shared_ptr<ScanTask> scan_task) {
    ...
    std::lock_guard<std::mutex> l(_transfer_lock);
    ...
}
```
Since mutex in cpp is not re-enterable, so the scanner thread will
deadlock with itself.

This pr fix the problem by making `ScannerScheduler::submit` return a
Status instead of doing append failed task to the ScannerContext. The
caller itself will decide where resubmit the scanner or just abort the
execution of the query.
gavinchou pushed a commit that referenced this pull request Sep 12, 2024
We have dead lock when submit scanner to scheduler failed.

pstack looks like
```txt
Thread 2012 (Thread 0x7f87363fb700 (LWP 4179707) "Pipe_normal [wo"):
#0  0x00007f8b8f3dc82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8b8f3d5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x000055b20f333e7a in __gthread_mutex_lock (__mutex=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default
.h:749
#3  std::mutex::lock (this=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:100
#4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<optimized out>) at /mnt/disk1/hezhiqiang/toolchains/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:229
#5  doris::vectorized::ScannerContext::append_block_to_queue (this=<optimized out>, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:234
#6  0x000055b20f32c0f9 in doris::vectorized::ScannerScheduler::submit (this=<optimized out>, ctx=..., scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:209
#7  0x000055b20f3338fc in doris::vectorized::ScannerContext::submit_scan_task (this=this@entry=0x7f8733d96010, scan_task=...) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:217
#8  0x000055b20f3346cd in doris::vectorized::ScannerContext::get_block_from_queue (this=0x7f8733d96010, state=<optimized out>, block=0x7f871f728de0, eos=0x7f871abce470, id=<optimized out>) at /mnt/disk1/hezhiqiang/doris/be/src/vec/exec/scan/scanner_context.cpp:290
#9  0x000055b214cb4f13 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block (this=<optimized out>, state=0x7f872f0eb400, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f871abce470) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.cpp:1292
#10 0x000055b2142b5772 in doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::get_block_after_projects (this=0x80, state=0x0, block=0x7f8b8f3dc82d <__lll_lock_wait+29>, eos=0x7f8733d960a8) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/scan_operator.h:363
#11 0x000055b2142e7880 in doris::pipeline::StatefulOperatorX<doris::pipeline::StreamingAggLocalState>::get_block (this=0x7f871f9bee00, state=0x7f872f0eb400, block=0x7f8716d49060, eos=0x7f87363f4937) at /mnt/disk1/hezhiqiang/doris/be/src/pipeline/exec/operator.cpp:587
```
Deallock happens with following
```cpp
Status ScannerContext::get_block_from_queue {
     std::unique_lock l(_transfer_lock);
     ...
     if (scan_task->is_eos()) {
     ...
     } else {
          // resubmit current running scanner to read the next block
         submit_scan_task(scan_task);
     }
}

ScannerContext::submit_scan_task(std::shared_ptr<ScanTask> scan_task) {
     _scanner_scheduler->submit(shared_from_this(), scan_task);
}

void ScannerScheduler::submit(std::shared_ptr<ScannerContext> ctx,
                              std::shared_ptr<ScanTask> scan_task) {
    ...
    if (auto ret = sumbit_task(); !ret) {
        scan_task->set_status(Status::InternalError(
                "Failed to submit scanner to scanner pool reason:" + std::string(ret.msg()) +
                "|type:" + std::to_string(type)));
        ctx->append_block_to_queue(scan_task);
        return;
    }
}

void ScannerContext::append_block_to_queue(std::shared_ptr<ScanTask> scan_task) {
    ...
    std::lock_guard<std::mutex> l(_transfer_lock);
    ...
}
```
Since mutex in cpp is not re-enterable, so the scanner thread will
deadlock with itself.

This pr fix the problem by making `ScannerScheduler::submit` return a
Status instead of doing append failed task to the ScannerContext. The
caller itself will decide where resubmit the scanner or just abort the
execution of the query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.2-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants