Skip to content

[chore](be) Reject broadcast joins that finalize build side#63445

Merged
BiteTheDDDDt merged 1 commit into
apache:masterfrom
BiteTheDDDDt:fix-broadcast-build-side-finalize
May 21, 2026
Merged

[chore](be) Reject broadcast joins that finalize build side#63445
BiteTheDDDDt merged 1 commit into
apache:masterfrom
BiteTheDDDDt:fix-broadcast-build-side-finalize

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Broadcast hash joins that need to emit build-side rows after probing, such as right outer, full outer, right semi, and right anti joins, cannot safely share one broadcast build side across multiple local probe instances without an additional global finalization protocol. The frontend already avoids generating these broadcast hash joins, but the backend had no guard if such a plan reached execution. This change rejects unsupported broadcast hash joins during build sink preparation and adds a BE unit test covering the rejected join types.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter=HashJoinBuildSinkTest.RejectBroadcastJoinThatRequiresBuildSideFinalize
    • build-support/clang-format.sh
    • build-support/check-format.sh
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Broadcast hash joins that need to emit build-side rows after probing, such as right outer, full outer, right semi, and right anti joins, cannot safely share one broadcast build side across multiple local probe instances without an additional global finalization protocol. The frontend already avoids generating these broadcast hash joins, but the backend had no guard if such a plan reached execution. This change rejects unsupported broadcast hash joins during build sink preparation and adds a BE unit test covering the rejected join types.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-be-ut.sh --run --filter=HashJoinBuildSinkTest.RejectBroadcastJoinThatRequiresBuildSideFinalize
    - build-support/clang-format.sh
    - build-support/check-format.sh
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 20, 2026 09:38
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt BiteTheDDDDt changed the title [fix](be) Reject broadcast joins that finalize build side [chore](be) Reject broadcast joins that finalize build side May 20, 2026
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31429 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 50fab34f0801c27db30b929e071bbf25a586a28c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17638	3897	3888	3888
q2	q3	10789	1405	824	824
q4	4700	473	346	346
q5	7575	2233	2050	2050
q6	239	176	139	139
q7	947	768	640	640
q8	9536	1771	1709	1709
q9	5065	4867	4957	4867
q10	6381	2077	1812	1812
q11	438	275	255	255
q12	632	431	296	296
q13	18120	3319	2778	2778
q14	262	256	232	232
q15	q16	820	787	700	700
q17	985	865	935	865
q18	6859	5753	5589	5589
q19	1306	1261	1106	1106
q20	500	399	352	352
q21	6347	2923	2656	2656
q22	471	387	325	325
Total cold run time: 99610 ms
Total hot run time: 31429 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4849	4487	4523	4487
q2	q3	4891	5219	4635	4635
q4	2123	2216	1433	1433
q5	4703	4834	4648	4648
q6	234	178	129	129
q7	1897	1746	1545	1545
q8	2362	2056	2103	2056
q9	7680	7435	7207	7207
q10	4480	4400	4022	4022
q11	527	375	358	358
q12	712	729	504	504
q13	3013	3347	2813	2813
q14	276	281	256	256
q15	q16	685	708	612	612
q17	1261	1236	1270	1236
q18	7307	6811	6642	6642
q19	1103	1087	1072	1072
q20	2209	2229	1938	1938
q21	5364	4648	4494	4494
q22	532	462	413	413
Total cold run time: 56208 ms
Total hot run time: 50500 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Critical checkpoint conclusions:

  • Goal/test: The PR adds a BE guard rejecting broadcast hash joins that need build-side finalization, and adds a focused BE unit test for right outer, full outer, right semi, and right anti joins.
  • Scope: The implementation is small and localized to hash join build sink preparation plus one unit test.
  • Concurrency/lifecycle: No new concurrency or lifecycle management is introduced. The guard runs before shared-hash-table enablement/local-state setup, so it avoids the unsafe execution path without changing dependency behavior.
  • Configuration/compatibility: No new configs, storage formats, RPC fields, or FE-BE protocol changes.
  • Parallel paths: FE already avoids these broadcast joins; this adds the BE-side safety check for the same unsupported class.
  • Error handling: Uses a non-OK NotSupported status with the join type in the message; no ignored Status paths observed.
  • Memory/transaction/data visibility: No new allocations, transaction handling, or data visibility paths.
  • Tests: Unit coverage is present for the rejected join types. I did not run tests in this review runner.
  • User focus: No additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169874 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 50fab34f0801c27db30b929e071bbf25a586a28c, data reload: false

query5	4315	679	532	532
query6	329	216	222	216
query7	4262	557	309	309
query8	330	235	216	216
query9	8807	4024	3981	3981
query10	464	331	303	303
query11	5748	2436	2182	2182
query12	189	133	128	128
query13	1300	605	426	426
query14	5975	5402	5125	5125
query14_1	4421	4430	4379	4379
query15	214	204	189	189
query16	1005	418	443	418
query17	1147	765	628	628
query18	2557	500	373	373
query19	220	217	175	175
query20	143	133	135	133
query21	221	140	124	124
query22	13598	13597	13507	13507
query23	17360	16422	16136	16136
query23_1	16201	16188	16271	16188
query24	7420	1742	1296	1296
query24_1	1301	1298	1287	1287
query25	572	477	426	426
query26	1313	319	176	176
query27	2717	556	331	331
query28	4388	1942	1930	1930
query29	991	621	499	499
query30	315	239	206	206
query31	1109	1066	945	945
query32	88	76	74	74
query33	549	353	288	288
query34	1181	1094	629	629
query35	762	776	670	670
query36	1350	1360	1218	1218
query37	157	105	90	90
query38	3204	3142	3040	3040
query39	938	926	903	903
query39_1	875	878	877	877
query40	233	155	129	129
query41	66	64	64	64
query42	114	113	112	112
query43	320	333	306	306
query44	
query45	219	213	195	195
query46	1073	1218	732	732
query47	2346	2327	2170	2170
query48	396	396	301	301
query49	632	492	394	394
query50	1003	357	250	250
query51	4308	4285	4256	4256
query52	106	115	97	97
query53	266	283	204	204
query54	317	273	269	269
query55	94	93	89	89
query56	318	324	312	312
query57	1423	1396	1324	1324
query58	295	279	276	276
query59	1560	1635	1454	1454
query60	326	359	308	308
query61	157	162	164	162
query62	670	622	558	558
query63	241	199	213	199
query64	2402	808	643	643
query65	
query66	1698	484	357	357
query67	30042	30065	29914	29914
query68	
query69	455	352	311	311
query70	1018	987	934	934
query71	325	274	271	271
query72	3091	2619	2495	2495
query73	843	766	424	424
query74	5086	4921	4743	4743
query75	2674	2644	2251	2251
query76	2269	1142	781	781
query77	410	411	353	353
query78	12190	12177	11653	11653
query79	1389	1056	688	688
query80	643	547	453	453
query81	449	278	239	239
query82	1380	161	125	125
query83	352	274	247	247
query84	255	144	114	114
query85	895	554	462	462
query86	400	318	318	318
query87	3394	3409	3200	3200
query88	3587	2731	2654	2654
query89	439	385	344	344
query90	1955	184	187	184
query91	181	174	141	141
query92	75	79	77	77
query93	1564	1476	920	920
query94	544	362	307	307
query95	691	477	358	358
query96	1042	809	339	339
query97	2737	2712	2578	2578
query98	239	240	244	240
query99	1116	1100	999	999
Total cold run time: 253674 ms
Total hot run time: 169874 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27925/37938)
Line Coverage 57.57% (302991/526309)
Region Coverage 54.81% (253925/463295)
Branch Coverage 56.31% (109687/194791)

@BiteTheDDDDt BiteTheDDDDt merged commit dc5d139 into apache:master May 21, 2026
33 of 34 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants