[MSE] Add broadcast_right join strategy hint for star-schema joins#18829
Closed
Akanksha-kedia wants to merge 2 commits into
Closed
[MSE] Add broadcast_right join strategy hint for star-schema joins#18829Akanksha-kedia wants to merge 2 commits into
Akanksha-kedia wants to merge 2 commits into
Conversation
Adds a new joinOptions(join_strategy='broadcast_right') hint that broadcasts the entire right side of a join to every worker while hash/random-distributing the left side. This eliminates the right-side network shuffle for star-schema patterns where the right table is small enough to fit in memory but is not pre-replicated as a dimension table. Changes: - PinotHintOptions: add BROADCAST_RIGHT_JOIN_STRATEGY constant and useBroadcastRightJoinStrategy() helper - PinotJoinExchangeNodeInsertRule (V1): detect hint and force BROADCAST distribution on the right exchange - TraitAssignment (V2 physical planner): detect hint and assign BROADCAST_DISTRIBUTED trait to the right input - RelToPlanNodeConverter (V1): set JoinStrategy.BROADCAST_RIGHT on the JoinNode - PRelToPlanNodeConverter (V2): same - JoinNode: add BROADCAST_RIGHT to the JoinStrategy enum - QueryCompilationTest: add two tests — equi-join and non-equi-join — verifying correct distribution types and JoinStrategy assignment Closes apache#14518
…oin guard Per xiangfu0's review: 1. Add BROADCAST_RIGHT = 3 to plan.proto JoinStrategy enum so plans survive serialization/deserialization across broker↔server boundary. 2. Add BROADCAST_RIGHT case to PlanNodeSerializer and PlanNodeDeserializer. 3. Add BROADCAST_RIGHT case to DefaultJoinOperatorFactory — at execution time it runs as HashJoinOperator (distribution was handled at planning time). 4. Fix InStageStatsTreeBuilder.visitJoin to handle BROADCAST_RIGHT and ASOF without a fragile assert (LOOKUP→LOOKUP_JOIN, everything else→HASH_JOIN). 5. Add guard in PinotJoinExchangeNodeInsertRule and TraitAssignment.assignJoin that rejects RIGHT/FULL OUTER joins with broadcast_right hint: broadcasting the right table means each worker independently emits unmatched right rows, which would produce duplicate null-extended rows. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
CC @Jackie-Jiang @xiangfu0 — this adds a |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18829 +/- ##
============================================
- Coverage 64.82% 64.77% -0.05%
Complexity 1319 1319
============================================
Files 3388 3392 +4
Lines 210228 210978 +750
Branches 32948 33128 +180
============================================
+ Hits 136282 136667 +385
- Misses 62978 63296 +318
- Partials 10968 11015 +47
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
|
Duplicate of #18514, which is not needed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
joinOptions(join_strategy='broadcast_right')query hint to the multi-stage query engine (MSE). When set, the entire right-side table is broadcast to every join worker while the left side is hash/random-distributed. This eliminates the right-side network shuffle for star-schema patterns where the right table is small enough to fit in memory but is not pre-replicated as a dimension table.Why this is useful
lookupbroadcast_right(new)Changes
PinotHintOptionsBROADCAST_RIGHT_JOIN_STRATEGYconstant +useBroadcastRightJoinStrategy()helperPinotJoinExchangeNodeInsertRule(V1 planner)TraitAssignment(V2 planner)BROADCAST_DISTRIBUTEDtrait to right inputRelToPlanNodeConverter(V1)JoinStrategy.BROADCAST_RIGHTon theJoinNodePRelToPlanNodeConverter(V2)JoinNodeBROADCAST_RIGHTtoJoinStrategyenumplan.protoBROADCAST_RIGHT = 3toJoinStrategyproto enumPlanNodeSerializer/PlanNodeDeserializerBROADCAST_RIGHTacross broker↔server boundaryDefaultJoinOperatorFactoryBROADCAST_RIGHTasHashJoinOperator(distribution handled at plan time)InStageStatsTreeBuilderassert LOOKUPwith explicit switch — handlesBROADCAST_RIGHTandASOFcleanlyWhy RIGHT/FULL OUTER joins are blocked
HashJoinOperatortracks unmatched right rows locally per worker. Broadcasting the right table to N workers means each worker independently emits null-extended rows for the same unmatched right row → N duplicate rows in the output. The planner rejectsRIGHT/FULL OUTERwith a clear error message at query compile time.Tests
testBroadcastRightJoinHintEquiJoin— equi-join: verifies BROADCAST distribution on right, HASH on left,BROADCAST_RIGHTstrategy on theJoinNodetestBroadcastRightJoinHintNonEquiJoin— non-equi-join: verifies BROADCAST on right, RANDOM on leftmvn test -pl pinot-query-planner -Dtest=QueryCompilationTest— 201 tests, 0 failuresCloses #14518