feat(spark): Drop Apache Spark 3.3 integration support#18786
Conversation
Closes apache#18784 After this change, the minimum supported Spark version is 3.4. - Delete hudi-spark-datasource/hudi-spark3.3.x module - Remove spark3.3 Maven profile and spark33.version property from root pom - Drop Spark 3.3 jobs from .asf.yaml, GitHub workflows, release scripts, and bundle validation (ci_run.sh, run_docker_java17.sh, Dockerfile) - Remove isSpark3_3 / gteqSpark3_3_2 helpers from SparkVersionsSupport - Remove Spark3_3Adapter and Spark33* fallback branches in SparkAdapterSupport and HoodieAnalysis (now throw IllegalStateException for unsupported versions) - Update READMEs, PySpark quickstart, and bundle-validation README to refer to Spark 3.4+ examples - Clean up dead Spark 3.3 branches in tests Follow-up cleanup (simplifying logic where 3.4+ is now the minimum, e.g., inlining gteqSpark3_3_2 checks, removing historical "borrowed from Spark 3.3" comments) is intentionally out of scope for this PR.
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR drops Apache Spark 3.3 integration support by removing the hudi-spark3.3.x module, the spark3.3 Maven profile, the associated version helpers, and the Spark 3.3 conditional branches in HoodieAnalysis, SparkAdapterSupport, tests, and bundle-validation docs. The remaining version-dispatch chains correctly cover Spark 3.4+ and no out-of-scope callers reference the removed helpers. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One naming nit below — the timestampNTZCompatibility wrapper method is now a no-op, making its name misleading; everything else is a clean removal. a few IllegalStateException messages in HoodieAnalysis.scala drop the actual version number that SparkAdapterSupport.scala includes, making those errors harder to diagnose.
cc @yihua
- Inline timestampNTZCompatibility wrapper at its call sites (no longer needed without the Spark 3.3 quirk) and drop the helper interface - Replace IllegalStateException fallbacks in HoodieAnalysis and SparkAdapterSupport with a Spark 3.4 default branch for brevity - Remove the now-dead spark-3.2 conditional in bundle-validation Dockerfile - Restore Flink 1.17 bundle validation by bumping to Spark 3.5.1 in .asf.yaml, bot.yml, maven_artifact_validation.yml, release_candidate_validation.yml - ci_run.sh: branch on FLINK_PROFILE for scala-2.13 + spark3.5.1 so the Flink 1.19 + Spark 3.5.1 + scala 2.13 matrix entry actually uses the flink1190hive313spark351scala213 image
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18786 +/- ##
============================================
+ Coverage 68.22% 68.90% +0.68%
+ Complexity 29290 29076 -214
============================================
Files 2525 2509 -16
Lines 141733 139442 -2291
Branches 17614 17107 -507
============================================
- Hits 96698 96089 -609
+ Misses 37065 35599 -1466
+ Partials 7970 7754 -216
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
CTTY
left a comment
There was a problem hiding this comment.
LGTM in general! Have some minor comments
| "org.apache.spark.sql.hudi.Spark33ResolveHudiAlterTableCommand" | ||
| } else { | ||
| throw new IllegalStateException("Unsupported Spark version") | ||
| "org.apache.spark.sql.hudi.Spark34ResolveHudiAlterTableCommand" |
There was a problem hiding this comment.
We are silently falling back to spark 34 here. I think we should still throw the exception
There was a problem hiding this comment.
I cleaned this up to use only if-else branches to be consistent across the board, since we now compile against Spark 3.5 and 3.4. Having throw new IllegalStateException("Unsupported Spark version") is redundant.
| else | ||
| IMAGE_TAG=flink1200hive313spark351scala213 | ||
| FLINK_VERSION=1.20.1 | ||
| fi |
There was a problem hiding this comment.
Flink 1.19 was missed in the bundle validation before this change; thus, I'm adding it back.
Describe the issue this Pull Request addresses
Closes #18784
Drops the
hudi-spark3.3.xmodule andspark3.3Maven profile. After this change, the minimum supported Spark version is 3.4. Spark 3.3 is end-of-life upstream and maintaining the adapter blocks simplifications in shared Spark code.Summary and Changelog
hudi-spark-datasource/hudi-spark3.3.x/module and its sources.spark3.3Maven profile andspark33.versionproperty from the rootpom.xml..asf.yaml,.github/workflows/bot.yml,release_candidate_validation.yml,maven_artifact_validation.yml.deploy_staging_jars.sh,validate_staged_bundles.sh,ci_run.sh,run_docker_java17.sh, the Spark 3.3.4 base image build script,Dockerfile).isSpark3_3/gteqSpark3_3_2helpers and theSpark3_3Adapter/Spark33*fallback branches inSparkAdapterSupportandHoodieAnalysis. Unsupported Spark versions now throwIllegalStateException.hudi-spark-datasource/README.md, the PySpark quickstart README, and the bundle-validation README to refer to Spark 3.4+ examples.TestHoodieSparkUtils,TestCOWDataSource,TestMORDataSource,TestMergeIntoTable2,TestHoodieDeltaStreamer, andTestMercifulJsonToRowConverterBase.Follow-up cleanup (simplifying logic where 3.4+ is now the minimum, e.g., inlining
gteqSpark3_3_2checks that are now always true, removing historical "borrowed from Spark 3.3" comments, etc.) is intentionally out of scope for this PR.Impact
Breaking change: Spark 3.3 users must upgrade to Spark 3.4 or later to use Hudi master. No data-format or wire-protocol changes.
Risk Level
low — purely deletion of a Spark version path. Remaining Spark 3.4/3.5/4.0/4.1 CI matrices cover the supported versions.
Documentation Update
README.mdMaven build options table.hudi-spark-datasource/README.mdmodule and version-support tables.hudi-examples/.../python/README.mdandHoodiePySparkQuickstart.pyexample to usespark3.5.packaging/bundle-validation/README.mdto use theflink1181hive313spark343example.Contributor's checklist