Skip to content

[SPARK-56374][BUILD] Align SBT assembly shade rules with Maven#55306

Closed
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-56374-sbt-shade-alignment
Closed

[SPARK-56374][BUILD] Align SBT assembly shade rules with Maven#55306
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-56374-sbt-shade-alignment

Conversation

@yadavay-amzn
Copy link
Copy Markdown

What changes were proposed in this pull request?

Add missing shade rules to project/SparkBuild.scala to align SBT assembly output with Maven for three connect modules:

  1. SparkConnect (server): Add com.google.commonorg.sparkproject.guava and com.google.thirdpartyorg.sparkproject.guava.thirdparty relocations. Maven's sql/connect/server/pom.xml has these but SBT was missing them.

  2. SparkConnectClient (jvm): Add org.apache.arroworg.sparkproject.connect.client.org.apache.arrow relocation. Maven's connector/connect/client/jvm/pom.xml has this but SBT was missing it.

  3. SparkConnectJdbc: Add org.apache.arrow relocation for consistency with Maven's sql/connect/client/jdbc/pom.xml.

Why are the changes needed?

SBT assembly shade rules were out of sync with Maven, causing differences in the assembled JARs:

  • Server: Without the guava relocation, grpc classes in the server assembly reference unshaded com.google.common.*. At runtime these fail to resolve because Guava is shaded to org.sparkproject.guava in spark-network-common. Verified by inspecting ManagedChannelImpl.class — after the fix, references correctly point to org/sparkproject/guava/base/MoreObjects instead of com/google/common/base/MoreObjects.

  • Client JVM: Arrow classes were not being shaded. After the fix, they appear under org/sparkproject/connect/client/org/apache/arrow/.

Other Maven modules with shade rules (core, sql/core, network-yarn, root pom.xml) were verified to be intentionally different in SBT — those modules don't produce separate assembly JARs in SBT, and their shading is handled at a different level in the SBT build architecture.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Built all three affected SBT assemblies and verified the output JARs:

build/sbt connect/assembly              # ✅ success
build/sbt connect-client-jvm/assembly   # ✅ success
build/sbt connect-client-jdbc/assembly  # ✅ success

Verified shading in the output JARs:

  • Server assembly: strings ManagedChannelImpl.class shows org/sparkproject/guava/base/MoreObjects (was com/google/common/base/MoreObjects before fix)
  • Client JVM assembly: jar tf shows arrow classes under org/sparkproject/connect/client/org/apache/arrow/ with zero unshaded org/apache/arrow entries

Was this patch authored or co-authored using generative AI tooling?

Yes

@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-56374-sbt-shade-alignment branch from 49bfad3 to 3dafde2 Compare April 11, 2026 02:26
@yadavay-amzn yadavay-amzn reopened this Apr 11, 2026
Add missing shade rules to SparkBuild.scala for three connect modules:

1. SparkConnect (server): Add guava and guava.thirdparty relocations to
   match Maven. Without these, grpc classes in the assembly reference
   unshaded com.google.common.* which fails at runtime since Guava is
   shaded to org.sparkproject.guava in spark-network-common.

2. SparkConnectClient (jvm): Add org.apache.arrow relocation to match
   Maven. Arrow classes are now shaded under
   org/sparkproject/connect/client/org/apache/arrow/.

3. SparkConnectJdbc: Add org.apache.arrow relocation for consistency
   with Maven and the jvm client module.

Closes SPARK-56374
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant