Skip to content

feat(spark): Drop Apache Spark 3.3 integration support#18786

Merged
CTTY merged 3 commits into
apache:masterfrom
yihua:drop-spark33-integration
May 21, 2026
Merged

feat(spark): Drop Apache Spark 3.3 integration support#18786
CTTY merged 3 commits into
apache:masterfrom
yihua:drop-spark33-integration

Conversation

@yihua
Copy link
Copy Markdown
Contributor

@yihua yihua commented May 20, 2026

Describe the issue this Pull Request addresses

Closes #18784

Drops the hudi-spark3.3.x module and spark3.3 Maven profile. After this change, the minimum supported Spark version is 3.4. Spark 3.3 is end-of-life upstream and maintaining the adapter blocks simplifications in shared Spark code.

Summary and Changelog

  • Delete hudi-spark-datasource/hudi-spark3.3.x/ module and its sources.
  • Remove the spark3.3 Maven profile and spark33.version property from the root pom.xml.
  • Drop Spark 3.3 jobs from .asf.yaml, .github/workflows/bot.yml, release_candidate_validation.yml, maven_artifact_validation.yml.
  • Drop Spark 3.3 handling from release/bundle scripts (deploy_staging_jars.sh, validate_staged_bundles.sh, ci_run.sh, run_docker_java17.sh, the Spark 3.3.4 base image build script, Dockerfile).
  • Remove isSpark3_3 / gteqSpark3_3_2 helpers and the Spark3_3Adapter / Spark33* fallback branches in SparkAdapterSupport and HoodieAnalysis. Unsupported Spark versions now throw IllegalStateException.
  • Update root README, hudi-spark-datasource/README.md, the PySpark quickstart README, and the bundle-validation README to refer to Spark 3.4+ examples.
  • Remove dead Spark 3.3 branches in TestHoodieSparkUtils, TestCOWDataSource, TestMORDataSource, TestMergeIntoTable2, TestHoodieDeltaStreamer, and TestMercifulJsonToRowConverterBase.

Follow-up cleanup (simplifying logic where 3.4+ is now the minimum, e.g., inlining gteqSpark3_3_2 checks that are now always true, removing historical "borrowed from Spark 3.3" comments, etc.) is intentionally out of scope for this PR.

Impact

Breaking change: Spark 3.3 users must upgrade to Spark 3.4 or later to use Hudi master. No data-format or wire-protocol changes.

Risk Level

low — purely deletion of a Spark version path. Remaining Spark 3.4/3.5/4.0/4.1 CI matrices cover the supported versions.

Documentation Update

  • Updated README.md Maven build options table.
  • Updated hudi-spark-datasource/README.md module and version-support tables.
  • Updated hudi-examples/.../python/README.md and HoodiePySparkQuickstart.py example to use spark3.5.
  • Updated packaging/bundle-validation/README.md to use the flink1181hive313spark343 example.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable (N/A — coverage moves to the remaining Spark profiles)

Closes apache#18784

After this change, the minimum supported Spark version is 3.4.

- Delete hudi-spark-datasource/hudi-spark3.3.x module
- Remove spark3.3 Maven profile and spark33.version property from root pom
- Drop Spark 3.3 jobs from .asf.yaml, GitHub workflows, release scripts,
  and bundle validation (ci_run.sh, run_docker_java17.sh, Dockerfile)
- Remove isSpark3_3 / gteqSpark3_3_2 helpers from SparkVersionsSupport
- Remove Spark3_3Adapter and Spark33* fallback branches in SparkAdapterSupport
  and HoodieAnalysis (now throw IllegalStateException for unsupported versions)
- Update READMEs, PySpark quickstart, and bundle-validation README to refer
  to Spark 3.4+ examples
- Clean up dead Spark 3.3 branches in tests

Follow-up cleanup (simplifying logic where 3.4+ is now the minimum, e.g.,
inlining gteqSpark3_3_2 checks, removing historical "borrowed from Spark
3.3" comments) is intentionally out of scope for this PR.
@github-actions github-actions Bot added the size:XL PR with lines of changes > 1000 label May 20, 2026
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR drops Apache Spark 3.3 integration support by removing the hudi-spark3.3.x module, the spark3.3 Maven profile, the associated version helpers, and the Spark 3.3 conditional branches in HoodieAnalysis, SparkAdapterSupport, tests, and bundle-validation docs. The remaining version-dispatch chains correctly cover Spark 3.4+ and no out-of-scope callers reference the removed helpers. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One naming nit below — the timestampNTZCompatibility wrapper method is now a no-op, making its name misleading; everything else is a clean removal. a few IllegalStateException messages in HoodieAnalysis.scala drop the actual version number that SparkAdapterSupport.scala includes, making those errors harder to diagnose.

cc @yihua

@yihua yihua changed the title fix: Drop Apache Spark 3.3 integration support feat(spark): Drop Apache Spark 3.3 integration support May 20, 2026
Comment thread .github/workflows/bot.yml
Comment thread .github/workflows/maven_artifact_validation.yml
Comment thread .github/workflows/release_candidate_validation.yml
Comment thread packaging/bundle-validation/base/build_flink1171hive313spark334.sh
Comment thread packaging/bundle-validation/ci_run.sh
Comment thread packaging/bundle-validation/Dockerfile Outdated
yihua added 2 commits May 20, 2026 15:39
- Inline timestampNTZCompatibility wrapper at its call sites (no longer
  needed without the Spark 3.3 quirk) and drop the helper interface
- Replace IllegalStateException fallbacks in HoodieAnalysis and
  SparkAdapterSupport with a Spark 3.4 default branch for brevity
- Remove the now-dead spark-3.2 conditional in bundle-validation Dockerfile
- Restore Flink 1.17 bundle validation by bumping to Spark 3.5.1 in
  .asf.yaml, bot.yml, maven_artifact_validation.yml,
  release_candidate_validation.yml
- ci_run.sh: branch on FLINK_PROFILE for scala-2.13 + spark3.5.1 so the
  Flink 1.19 + Spark 3.5.1 + scala 2.13 matrix entry actually uses the
  flink1190hive313spark351scala213 image
@yihua yihua requested a review from CTTY May 20, 2026 23:16
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.90%. Comparing base (f044d3d) to head (7ec563a).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18786      +/-   ##
============================================
+ Coverage     68.22%   68.90%   +0.68%     
+ Complexity    29290    29076     -214     
============================================
  Files          2525     2509      -16     
  Lines        141733   139442    -2291     
  Branches      17614    17107     -507     
============================================
- Hits          96698    96089     -609     
+ Misses        37065    35599    -1466     
+ Partials       7970     7754     -216     
Flag Coverage Δ
common-and-other-modules 44.42% <100.00%> (+0.04%) ⬆️
hadoop-mr-java-client 44.91% <ø> (-0.05%) ⬇️
spark-client-hadoop-common 48.23% <100.00%> (-0.04%) ⬇️
spark-java-tests 49.36% <100.00%> (+0.51%) ⬆️
spark-scala-tests 45.26% <100.00%> (+0.31%) ⬆️
utilities 37.44% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../main/scala/org/apache/hudi/HoodieSparkUtils.scala 74.61% <ø> (-0.27%) ⬇️
...in/scala/org/apache/hudi/SparkAdapterSupport.scala 69.23% <100.00%> (+2.56%) ⬆️
...pache/spark/sql/hudi/analysis/HoodieAnalysis.scala 74.11% <100.00%> (+0.15%) ⬆️

... and 35 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general! Have some minor comments

"org.apache.spark.sql.hudi.Spark33ResolveHudiAlterTableCommand"
} else {
throw new IllegalStateException("Unsupported Spark version")
"org.apache.spark.sql.hudi.Spark34ResolveHudiAlterTableCommand"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are silently falling back to spark 34 here. I think we should still throw the exception

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up to use only if-else branches to be consistent across the board, since we now compile against Spark 3.5 and 3.4. Having throw new IllegalStateException("Unsupported Spark version") is redundant.

else
IMAGE_TAG=flink1200hive313spark351scala213
FLINK_VERSION=1.20.1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flink 1.19 was missed in the bundle validation before this change; thus, I'm adding it back.

Copy link
Copy Markdown
Contributor

@CTTY CTTY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@CTTY CTTY merged commit facb517 into apache:master May 21, 2026
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Drop Apache Spark 3.3 integration support

5 participants