-
Notifications
You must be signed in to change notification settings - Fork 746
[GH-2489] Update the old dependencies in Sedona docker image #2518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f0e6ab4 to
912376f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR upgrades Apache Sedona's Docker environment from Spark 3.x to Spark 4.0.1, Sedona from 1.7.1 to 1.8.0, and modernizes all related dependencies. The changes prepare the project for the latest major versions while ensuring backward compatibility through updated configurations and automated testing.
Key changes:
- Upgraded to Spark 4.0.1, Sedona 1.8.0, and GeoTools 33.1 with Scala 2.13
- Modernized base Docker image to Ubuntu 24.04 and Java 17
- Added automated notebook testing in CI to validate example notebooks
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/docker-build.yml | Updated CI matrix to Spark 4.0.1, Java 17, and added notebook testing step |
| docker/sedona-docker.dockerfile | Upgraded base image to Ubuntu 24.04, Java 17, Spark 4.0.1, and switched to AWS SDK v2 |
| docker/requirements.txt | Updated Python dependencies (geopandas, numpy, pandas, shapely) and added py4j |
| docker/install-spark.sh | Removed spark-xml jar and switched from AWS SDK v1 to v2 bundle |
| docker/install-sedona.sh | Changed Sedona/spark-extension downloads from Scala 2.12 to 2.13 |
| docker/build.sh | Updated Maven build to use Scala 2.13 |
| docker/test-notebooks.sh | New automated test script to validate Jupyter notebooks |
| docker/zeppelin/conf/interpreter.json | Updated Sedona and GeoTools JAR references to new versions |
| docs/usecases/*.ipynb | Updated spark-extension package references to 2.13:2.14.2-4.0 across all example notebooks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
64963e0 to
b5417a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-2489] my subject. Closes sedona:latest Docker image has dated packages #2489What changes were proposed in this PR?
This pull request upgrades the Sedona Docker environment to support Apache Spark 4.0.1, Sedona 1.8.0, and related dependencies, while improving compatibility and notebook testing. The changes update the CI workflow, Docker build scripts, Python requirements, and documentation to use the latest versions and ensure tests run reliably in local mode.
Dependency and environment upgrades:
Updated
.github/workflows/docker-build.yml,docker/sedona-docker.dockerfile, and related scripts to use Spark 4.0.1, Sedona 1.8.0, GeoTools 33.1, and Spark extension 2.14.2, with Scala 2.13 and Ubuntu 24.04. Also switched Java version to 17 and adjusted Maven, pipenv, and Python dependency installation for compatibility. [1] [2] [3] [4] [5] [6] [7]Updated
docker/requirements.txtto use newer versions ofgeopandas,numpy,pandas,shapely, and addedpy4jfor Spark 4 compatibility.Build and installation script improvements:
docker/install-spark.shanddocker/install-sedona.shto use AWS SDK v2 and drop spark-xml dependency, aligning with new Spark extension requirements and simplifying the installation process. [1] [2] [3]Notebook testing enhancements:
docker/test-notebooks.shand integrated notebook testing into the CI workflow, ensuring all example notebooks are converted, cleaned, and executed in local mode with robust error handling and output reporting. [1] [2]Documentation updates:
docs/usecases/to use the new Spark extension and dependency versions, and added configuration for anonymous S3 access where needed. [1] [2] [3] [4] [5]Miscellaneous improvements:
How was this patch tested?
Did this PR include necessary documentation updates?