Skip to content

Add Spark join strategy hints to dimension links#1938

Merged
shangyian merged 10 commits intoDataJunction:mainfrom
shangyian:join-strategy-hints
Mar 30, 2026
Merged

Add Spark join strategy hints to dimension links#1938
shangyian merged 10 commits intoDataJunction:mainfrom
shangyian:join-strategy-hints

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Mar 29, 2026

Summary

This enables users to declare a Spark join strategy hint on any dimension link via a new spark_hints field (broadcast, merge, shuffle_hash, shuffle_replicate_nl). When set, DJ emits the hint in a /*+ ... */ comment in the generated SQL, giving the dimension link definer more control over Spark's choice of join algorithm. Multiple hints across different dimension links are combined into a single /*+ BROADCAST(t2), MERGE(t3) */ comment.

Before: no way to influence Spark's join strategy from the semantic layer:

  node.link_complex_dimension("dim.customer", join_on="...", join_type="left")

After: declare broadcast hint on the dimension link

  node.link_complex_dimension("dim.customer", join_on="...", join_type="left",
  spark_hints="broadcast")
  #  SELECT /*+ BROADCAST(t2) */ t2.country, COUNT(t1.order_id) ...
  #  FROM orders t1 LEFT JOIN customer t2 ON ...

The field is available via the REST API, YAML deployment specs, and the Python client. It also round-trips through all node revision copy paths so it is never silently dropped on node updates or redeployments.

Test Plan

Added various tests

Deployment Plan

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 29, 2026

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit b2f84a6
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/69c9dc4c729a8d00080b205e

@shangyian shangyian force-pushed the join-strategy-hints branch from 312a21e to 4d11c8f Compare March 30, 2026 01:42
@shangyian shangyian force-pushed the join-strategy-hints branch from 4d234d5 to b2f84a6 Compare March 30, 2026 02:13
@shangyian shangyian marked this pull request as ready for review March 30, 2026 03:12
@shangyian shangyian merged commit 1051f3a into DataJunction:main Mar 30, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant