Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug](join) fix broadcast join running when hash table build not finished #37643

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

zhangstar333
Copy link
Contributor

@zhangstar333 zhangstar333 commented Jul 11, 2024

Proposed changes

before when PipelineTask close, will be set sink operator always ready.
but not all sink could be running normal, like some instance of join which not build hash table,
it's need to wait until others build hash table finished and then shared from it.

F20240710 17:29:09.628299 221449 hashjoin_build_sink.cpp:582] Check failed: _shared_hash_table_context->signaled
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/zhangsida/doris/be/src/common/signal_handler.h:421
 1# 0x00007FEF9BF64B50 in /lib64/libc.so.6
 2# gsignal in /lib64/libc.so.6
 3# __GI_abort in /lib64/libc.so.6
 4# 0x0000559C8BD8BE8D in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 5# 0x0000559C8BD7E52A in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 6# google::LogMessage::SendToLog() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 7# google::LogMessage::Flush() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 9# doris::pipeline::HashJoinBuildSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/zhangsida/doris/be/src/pipeline/exec/hashjoin_build_sink.cpp:582
10# doris::pipeline::PipelineTask::execute(bool*)::$_1::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:361
11# doris::pipeline::PipelineTask::execute(bool*) at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:364
12# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:138
13# doris::pipeline::TaskScheduler::start()::$_0::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:64

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@zhangstar333
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 28045266c674393f53e20de9e6bc3486e1bbb390, data reload: false

------ Round 1 ----------------------------------
q1	17617	4328	4240	4240
q2	2013	194	182	182
q3	10453	1175	1075	1075
q4	10190	738	789	738
q5	7653	2704	2641	2641
q6	224	137	137	137
q7	948	605	613	605
q8	9221	2060	2099	2060
q9	8609	6535	6544	6535
q10	8750	3779	3777	3777
q11	479	246	236	236
q12	464	223	227	223
q13	17761	2998	3033	2998
q14	289	240	235	235
q15	528	478	462	462
q16	485	374	384	374
q17	972	689	680	680
q18	8074	7644	7423	7423
q19	5072	1490	1514	1490
q20	696	316	317	316
q21	4952	3235	3209	3209
q22	383	332	333	332
Total cold run time: 115833 ms
Total hot run time: 39968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4433	4268	4236	4236
q2	370	257	270	257
q3	2964	2793	2891	2793
q4	1974	1682	1667	1667
q5	5676	5551	5509	5509
q6	226	130	136	130
q7	2197	1858	1889	1858
q8	3284	3402	3434	3402
q9	8810	8813	8968	8813
q10	4055	3862	3754	3754
q11	615	492	505	492
q12	843	639	645	639
q13	17266	3159	3183	3159
q14	318	286	281	281
q15	529	492	485	485
q16	511	427	429	427
q17	1819	1566	1541	1541
q18	8093	8075	7740	7740
q19	1880	1621	1570	1570
q20	2096	1878	1860	1860
q21	5107	4810	4784	4784
q22	626	533	576	533
Total cold run time: 73692 ms
Total hot run time: 55930 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175698 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 28045266c674393f53e20de9e6bc3486e1bbb390, data reload: false

query1	911	368	365	365
query2	6458	2407	2533	2407
query3	6635	206	221	206
query4	28367	17642	17430	17430
query5	3616	489	478	478
query6	272	165	157	157
query7	4581	297	281	281
query8	330	304	303	303
query9	8504	2435	2432	2432
query10	446	303	285	285
query11	11584	10022	10147	10022
query12	117	86	85	85
query13	1649	386	380	380
query14	10264	7827	7759	7759
query15	258	193	193	193
query16	7777	315	309	309
query17	1794	554	583	554
query18	1832	271	270	270
query19	192	165	150	150
query20	87	80	82	80
query21	204	132	123	123
query22	4408	4201	4033	4033
query23	34036	33711	33614	33614
query24	10714	2968	2984	2968
query25	588	387	393	387
query26	708	151	150	150
query27	2232	281	280	280
query28	5887	2157	2163	2157
query29	905	641	647	641
query30	265	156	159	156
query31	941	760	751	751
query32	98	58	54	54
query33	664	343	285	285
query34	871	492	482	482
query35	684	599	605	599
query36	1140	1004	960	960
query37	148	81	85	81
query38	2957	2840	2829	2829
query39	913	858	845	845
query40	204	126	116	116
query41	54	52	50	50
query42	113	105	105	105
query43	628	556	549	549
query44	1075	755	733	733
query45	202	163	171	163
query46	1091	762	730	730
query47	1872	1761	1761	1761
query48	359	286	290	286
query49	826	410	408	408
query50	777	391	391	391
query51	6920	6762	6727	6727
query52	102	96	92	92
query53	357	291	295	291
query54	854	448	438	438
query55	76	74	77	74
query56	279	275	295	275
query57	1166	1053	1072	1053
query58	250	248	266	248
query59	3370	3161	3176	3161
query60	298	274	286	274
query61	101	100	101	100
query62	828	655	652	652
query63	316	283	293	283
query64	9116	2231	7470	2231
query65	3166	3102	3124	3102
query66	752	326	344	326
query67	15590	15137	14996	14996
query68	5179	542	548	542
query69	641	464	365	365
query70	1178	1091	1080	1080
query71	417	326	271	271
query72	7638	5408	5950	5408
query73	765	317	321	317
query74	5945	5509	5478	5478
query75	3404	2715	2691	2691
query76	3090	908	888	888
query77	621	300	310	300
query78	9737	9110	8954	8954
query79	3250	522	525	522
query80	2435	481	482	481
query81	603	221	233	221
query82	1444	147	152	147
query83	318	172	173	172
query84	275	88	91	88
query85	1369	324	305	305
query86	467	319	321	319
query87	3274	3099	3110	3099
query88	3950	2388	2378	2378
query89	479	389	401	389
query90	1790	199	187	187
query91	135	107	107	107
query92	58	51	50	50
query93	2496	511	514	511
query94	1168	228	214	214
query95	413	314	318	314
query96	609	271	271	271
query97	3247	3009	3009	3009
query98	227	199	199	199
query99	1548	1276	1265	1265
Total cold run time: 282949 ms
Total hot run time: 175698 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 28045266c674393f53e20de9e6bc3486e1bbb390, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.03
query3	0.22	0.05	0.06
query4	1.67	0.07	0.07
query5	0.49	0.48	0.50
query6	1.14	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.50	0.49
query10	0.55	0.54	0.54
query11	0.14	0.11	0.12
query12	0.16	0.13	0.12
query13	0.60	0.60	0.59
query14	0.76	0.78	0.78
query15	0.87	0.80	0.82
query16	0.36	0.37	0.37
query17	1.02	0.97	1.01
query18	0.22	0.22	0.21
query19	1.77	1.68	1.85
query20	0.01	0.01	0.01
query21	15.39	0.76	0.67
query22	3.71	8.27	2.12
query23	18.28	1.41	1.34
query24	2.07	0.23	0.23
query25	0.16	0.08	0.09
query26	0.31	0.21	0.20
query27	0.45	0.24	0.23
query28	13.29	1.02	1.00
query29	12.59	3.27	3.29
query30	0.25	0.06	0.06
query31	2.87	0.38	0.40
query32	3.27	0.48	0.48
query33	2.91	2.96	2.94
query34	17.01	4.37	4.34
query35	4.43	4.39	4.47
query36	0.66	0.46	0.48
query37	0.19	0.15	0.16
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.14	0.11	0.12
query41	0.09	0.04	0.04
query42	0.06	0.04	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.08 s
Total hot run time: 30.86 s

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 15, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 49e123c into apache:master Jul 15, 2024
26 of 31 checks passed
@zhangstar333
Copy link
Contributor Author

use this fixed, PR: #37792

seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
…shed (apache#37643)

## Proposed changes
before when PipelineTask close, will be set sink operator always ready.
but not all sink could be running normal, like some instance of join
which not build hash table,
it's need to wait until others build hash table finished and then shared
from it.


```
F20240710 17:29:09.628299 221449 hashjoin_build_sink.cpp:582] Check failed: _shared_hash_table_context->signaled
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/zhangsida/doris/be/src/common/signal_handler.h:421
 1# 0x00007FEF9BF64B50 in /lib64/libc.so.6
 2# gsignal in /lib64/libc.so.6
 3# __GI_abort in /lib64/libc.so.6
 4# 0x0000559C8BD8BE8D in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 5# 0x0000559C8BD7E52A in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 6# google::LogMessage::SendToLog() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 7# google::LogMessage::Flush() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 9# doris::pipeline::HashJoinBuildSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/zhangsida/doris/be/src/pipeline/exec/hashjoin_build_sink.cpp:582
10# doris::pipeline::PipelineTask::execute(bool*)::$_1::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:361
11# doris::pipeline::PipelineTask::execute(bool*) at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:364
12# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:138
13# doris::pipeline::TaskScheduler::start()::$_0::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:64

```
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…shed (#37643)

## Proposed changes
before when PipelineTask close, will be set sink operator always ready.
but not all sink could be running normal, like some instance of join
which not build hash table,
it's need to wait until others build hash table finished and then shared
from it.


```
F20240710 17:29:09.628299 221449 hashjoin_build_sink.cpp:582] Check failed: _shared_hash_table_context->signaled
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/zhangsida/doris/be/src/common/signal_handler.h:421
 1# 0x00007FEF9BF64B50 in /lib64/libc.so.6
 2# gsignal in /lib64/libc.so.6
 3# __GI_abort in /lib64/libc.so.6
 4# 0x0000559C8BD8BE8D in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 5# 0x0000559C8BD7E52A in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 6# google::LogMessage::SendToLog() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 7# google::LogMessage::Flush() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/disk2/zhangsida/doris/output/be/lib/doris_be
 9# doris::pipeline::HashJoinBuildSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/zhangsida/doris/be/src/pipeline/exec/hashjoin_build_sink.cpp:582
10# doris::pipeline::PipelineTask::execute(bool*)::$_1::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:361
11# doris::pipeline::PipelineTask::execute(bool*) at /mnt/disk2/zhangsida/doris/be/src/pipeline/pipeline_task.cpp:364
12# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:138
13# doris::pipeline::TaskScheduler::start()::$_0::operator()() const at /mnt/disk2/zhangsida/doris/be/src/pipeline/task_scheduler.cpp:64

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.1-merged p0_c reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants