Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-276][SEDONA-277] Support Spark 3.4 #825

Merged
merged 7 commits into from
May 1, 2023

Conversation

Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Apr 27, 2023

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

  • Yes, this patch associates with 2 JIRA tickets, namely:
    • SEDONA-276: Support Spark 3.4
    • SEDONA-277: Fix POMs for artifacts produced by profile-guided builds

What changes were proposed in this PR?

Support Spark 3.4

Introduction

This patch added support for Spark 3.4 by building separate artifacts for this minor versions. Later spark minor versions will also be supported in this way to workaround the evolving of Spark internal API changes.

Run the following command to build artifacts for Spark 3.4:

mvn clean install -Dspark=3.4 -Dscala=2.12

Run the following command to build artifacts for Spark 3.0 to 3.3:

mvn clean install -Dspark=3.0 -Dscala=2.12

We can also simply run mvn clean install in this case, since the default setup is to build artifacts for 3.0_2.12.

Implementation

This patch divided sedona-sql into several modules:

  • sedona-sql-common: common Sedona SQL code compatible with all Spark versions
  • sedona-sql: depends on sedona-sql-common, contains code for specific Spark minor versions. They live in sql/spark-3.x directories and use maven profile to select one of the directories to build artifacts.

Tuned Dependency Management

We've also tuned the dependency management of profile-guided dependencies. These dependencies were explicitly declared in submodules instead of relying on the dependencyManagement of parent POM, and the versions of dependencies will be substituted to constants in published POMs.

How was this patch tested?

Added Spark 3.4 tests to GitHub Workflow.

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

@Kontinuation Kontinuation changed the title [SEDONA-276] Support Spark 3.4 [SEDONA-276, SEDONA-277] Support Spark 3.4 Apr 27, 2023
@Kontinuation Kontinuation changed the title [SEDONA-276, SEDONA-277] Support Spark 3.4 [SEDONA-276][SEDONA-277] Support Spark 3.4 Apr 27, 2023
@Kontinuation Kontinuation marked this pull request as ready for review April 27, 2023 02:21
@Kontinuation Kontinuation added attention needed dependencies Pull requests that update a dependency file github-actions labels Apr 27, 2023
@jiayuasu
Copy link
Member

@umartin @Kimahriman Martin and Adam, any comments on this since this introduces changes to the underlying POM design?

Copy link
Contributor

@Kimahriman Kimahriman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Is it worth dropping tests for 3.0? Or only doing the cross Scala/jdk tests for the latest version and dropping for 3.3?

@umartin
Copy link
Contributor

umartin commented Apr 28, 2023

Looks good to me. Nice work!

@jiayuasu jiayuasu added this to the sedona-1.4.1 milestone Apr 30, 2023
@jiayuasu
Copy link
Member

I agree with Adam. We can drop the cross Scala/jdk tests for 3.3, and only do it for 3.4. Also, for Python tests, we should only keep multiple Python version tests for Spark 3.4.

Just a heads-up, this PR by default still builds Sedona-Spark against Spark 3.3 (not 3.4). We should change it to Spark 3.4 after we release Sedona 1.4.1. As a maintenance release, Spark 1.4.1 is not supposed to change the underlying Spark version.

Regarding dropping the support of Spark 3.0, I will start a discussion in our mailing list and gauge the impact. If we'd like to proceed, this should be done in another PR.

@jiayuasu jiayuasu merged commit 56239fd into apache:master May 1, 2023
39 checks passed
@Kontinuation Kontinuation deleted the support-spark-3.4 branch August 23, 2023 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attention needed dependencies Pull requests that update a dependency file github-actions improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants