Skip to content

Spark 3.4: Backport Async Micro Batch Planner to 3.4#16311

Merged
kevinjqliu merged 1 commit into
apache:mainfrom
kevinjqliu:spark-3.4-async-microbatch
May 13, 2026
Merged

Spark 3.4: Backport Async Micro Batch Planner to 3.4#16311
kevinjqliu merged 1 commit into
apache:mainfrom
kevinjqliu:spark-3.4-async-microbatch

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

@kevinjqliu kevinjqliu commented May 13, 2026

Backport of #15992 to spark/v3.4.

Adaptations from the source PR

  • SparkMicroBatchStream.java was replaced wholesale with the v3.5 post-Spark 3.5: Backport Async Micro Batch Planner to 3.5 #15992 version because v3.4 had structural drift relative to v3.5. The refactor extracts the planning logic into the new planner classes (BaseSparkMicroBatchPlanner, SyncSparkMicroBatchPlanner, AsyncSparkMicroBatchPlanner, MicroBatchUtils) and there are no v3.4-only features in this file, so a verbatim copy keeps the implementations aligned.
  • TestStructuredStreamingRead3.java was likewise replaced with the v3.5 version (which adds parameterized sync/async coverage). The only non-mechanical change was using SparkCatalogConfig.SPARK instead of SparkCatalogConfig.SPARK_SESSION because v3.4 still uses the older enum name.
  • All other files in Spark 3.5: Backport Async Micro Batch Planner to 3.5 #15992 applied cleanly via git apply --3way.

Validation

  • ./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:classes :iceberg-spark:iceberg-spark-3.4_2.12:testClasses
  • ./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "*TestAsyncSparkMicroBatchPlanner*" --tests "*TestMicroBatchPlanningUtils*"
  • ./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "org.apache.iceberg.spark.source.TestStructuredStreamingRead3.testReadStreamOnIcebergTableWithMultipleSnapshots"
  • ./gradlew :iceberg-spark:iceberg-spark-3.4_2.12:spotlessApply :iceberg-spark:iceberg-spark-extensions-3.4_2.12:spotlessApply

@github-actions github-actions Bot added the spark label May 13, 2026
Backport of apache#15992 to spark/v3.4. Stacked on PR apache#16307 (apache#15683 SerializableFileIOWithSize), which is itself a backport.

Adaptations from the source PR:

- SparkMicroBatchStream.java was replaced wholesale with the v3.5 post-apache#15992 version because v3.4 had structural drift; the refactor extracts the planning logic into the new planner classes and there are no v3.4-only features in this file.

- TestStructuredStreamingRead3.java was likewise replaced with the v3.5 version (which adds parameterized sync/async coverage). The only non-mechanical change is using 'SparkCatalogConfig.SPARK' instead of 'SparkCatalogConfig.SPARK_SESSION', because v3.4 still uses the older enum name.
@kevinjqliu kevinjqliu force-pushed the spark-3.4-async-microbatch branch from efa8943 to 178fb2d Compare May 13, 2026 02:04
@kevinjqliu kevinjqliu requested a review from huaxingao May 13, 2026 02:12
Copy link
Copy Markdown
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Pending CI

Copy link
Copy Markdown
Contributor

@aihuaxu aihuaxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kevinjqliu kevinjqliu merged commit e57247b into apache:main May 13, 2026
27 checks passed
@kevinjqliu kevinjqliu deleted the spark-3.4-async-microbatch branch May 13, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants