Add Spark join strategy hints to dimension links#1938
Merged
shangyian merged 10 commits intoDataJunction:mainfrom Mar 30, 2026
Merged
Add Spark join strategy hints to dimension links#1938shangyian merged 10 commits intoDataJunction:mainfrom
shangyian merged 10 commits intoDataJunction:mainfrom
Conversation
✅ Deploy Preview for thriving-cassata-78ae72 canceled.
|
312a21e to
4d11c8f
Compare
…e sure that copy constructors also save hints
4d234d5 to
b2f84a6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This enables users to declare a Spark join strategy hint on any dimension link via a new
spark_hintsfield (broadcast,merge,shuffle_hash,shuffle_replicate_nl). When set, DJ emits the hint in a/*+ ... */comment in the generated SQL, giving the dimension link definer more control over Spark's choice of join algorithm. Multiple hints across different dimension links are combined into a single/*+ BROADCAST(t2), MERGE(t3) */comment.Before: no way to influence Spark's join strategy from the semantic layer:
After: declare broadcast hint on the dimension link
The field is available via the REST API, YAML deployment specs, and the Python client. It also round-trips through all node revision copy paths so it is never silently dropped on node updates or redeployments.
Test Plan
Added various tests
make checkpassesmake testshows 100% unit test coverageDeployment Plan