Skip to content

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Nov 23, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

This pull request upgrades the Sedona Docker environment to support Apache Spark 4.0.1, Sedona 1.8.0, and related dependencies, while improving compatibility and notebook testing. The changes update the CI workflow, Docker build scripts, Python requirements, and documentation to use the latest versions and ensure tests run reliably in local mode.

Dependency and environment upgrades:

  • Updated .github/workflows/docker-build.yml, docker/sedona-docker.dockerfile, and related scripts to use Spark 4.0.1, Sedona 1.8.0, GeoTools 33.1, and Spark extension 2.14.2, with Scala 2.13 and Ubuntu 24.04. Also switched Java version to 17 and adjusted Maven, pipenv, and Python dependency installation for compatibility. [1] [2] [3] [4] [5] [6] [7]

  • Updated docker/requirements.txt to use newer versions of geopandas, numpy, pandas, shapely, and added py4j for Spark 4 compatibility.

Build and installation script improvements:

  • Updated docker/install-spark.sh and docker/install-sedona.sh to use AWS SDK v2 and drop spark-xml dependency, aligning with new Spark extension requirements and simplifying the installation process. [1] [2] [3]

Notebook testing enhancements:

  • Added docker/test-notebooks.sh and integrated notebook testing into the CI workflow, ensuring all example notebooks are converted, cleaned, and executed in local mode with robust error handling and output reporting. [1] [2]

Documentation updates:

  • Updated example notebooks in docs/usecases/ to use the new Spark extension and dependency versions, and added configuration for anonymous S3 access where needed. [1] [2] [3] [4] [5]

Miscellaneous improvements:

  • Minor fixes to notebook execution counts and code cells for consistency and compatibility. [1] [2]

How was this patch tested?

Did this PR include necessary documentation updates?

  • No, this PR does not affect any public API so no need to change the documentation.

@github-actions github-actions bot added the docs label Nov 24, 2025
@jiayuasu jiayuasu force-pushed the fix-image-building branch 2 times, most recently from f0e6ab4 to 912376f Compare November 24, 2025 03:03
@jiayuasu jiayuasu requested a review from Copilot November 24, 2025 03:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades Apache Sedona's Docker environment from Spark 3.x to Spark 4.0.1, Sedona from 1.7.1 to 1.8.0, and modernizes all related dependencies. The changes prepare the project for the latest major versions while ensuring backward compatibility through updated configurations and automated testing.

Key changes:

  • Upgraded to Spark 4.0.1, Sedona 1.8.0, and GeoTools 33.1 with Scala 2.13
  • Modernized base Docker image to Ubuntu 24.04 and Java 17
  • Added automated notebook testing in CI to validate example notebooks

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/workflows/docker-build.yml Updated CI matrix to Spark 4.0.1, Java 17, and added notebook testing step
docker/sedona-docker.dockerfile Upgraded base image to Ubuntu 24.04, Java 17, Spark 4.0.1, and switched to AWS SDK v2
docker/requirements.txt Updated Python dependencies (geopandas, numpy, pandas, shapely) and added py4j
docker/install-spark.sh Removed spark-xml jar and switched from AWS SDK v1 to v2 bundle
docker/install-sedona.sh Changed Sedona/spark-extension downloads from Scala 2.12 to 2.13
docker/build.sh Updated Maven build to use Scala 2.13
docker/test-notebooks.sh New automated test script to validate Jupyter notebooks
docker/zeppelin/conf/interpreter.json Updated Sedona and GeoTools JAR references to new versions
docs/usecases/*.ipynb Updated spark-extension package references to 2.13:2.14.2-4.0 across all example notebooks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu marked this pull request as ready for review November 24, 2025 04:47
@jiayuasu jiayuasu requested a review from Copilot November 24, 2025 04:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu added this to the sedona-1.8.1 milestone Nov 24, 2025
@jiayuasu jiayuasu merged commit eee44b5 into master Nov 24, 2025
10 checks passed
@jiayuasu jiayuasu deleted the fix-image-building branch December 17, 2025 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sedona:latest Docker image has dated packages

1 participant