[Spark Runner] Prepare Spark 3 structured-streaming to shared base, adopt Flink-style version overrides#38233
[Spark Runner] Prepare Spark 3 structured-streaming to shared base, adopt Flink-style version overrides#38233tkaymak wants to merge 1 commit intoapache:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request performs a structural refactor of the Spark runner to support a more scalable multi-version build architecture. By centralizing the core source code and implementing a layered override system for version-specific logic, the repository is now better prepared for the upcoming integration of Spark 4 without requiring extensive code duplication. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements a version-specific source override mechanism for Spark runners, similar to the existing Flink runner implementation. It introduces logic in spark_runner.gradle to merge shared base sources with version-specific overrides into a single directory for compilation. The review feedback suggests improving the robustness of version comparison to handle multi-digit version numbers, simplifying the override detection logic using list indices, and ensuring that local source directories are consistently included even for the base Spark version.
a7a019d to
b391eec
Compare
|
Force-pushed the updated commit after review comments by Gemini. |
There was a problem hiding this comment.
Code Review
This pull request introduces a per-version source override mechanism for Spark runners, enabling higher Spark versions to merge shared base code with version-specific overrides. Key changes involve updates to gradle.properties and the Spark runner's build configuration to manage merged source sets. A review comment recommends a more robust and idiomatic Gradle configuration by using TaskProvider directly within srcDirs to automatically manage task dependencies.
…pt Flink-style version overrides Move runners/spark/3/src/.../structuredstreaming/ (the only sources Spark 3 shipped) into the shared runners/spark/src/, and replace the existing copySourceBase block in runners/spark/spark_runner.gradle with the per-version source-overrides layout used by runners/flink/flink_runner.gradle: the lowest spark_major builds straight from the shared base; higher majors get a Copy task with DuplicatesStrategy.INCLUDE that merges shared + previous majors + ./src so per-version files override. Pure refactor; Spark 3's compiled output is unchanged. Prepares the tree for the Spark 4 runner (apache#36841 / apache#38212), which lands as a small overrides layer on top.
b391eec to
b6629d1
Compare
|
Followed the last open comment by @gemini-code-assist — applied the fix. |
|
Thanks for the update, @tkaymak. The revised Gradle configuration looks much cleaner and correctly leverages Gradle's task dependency auto-wiring. It's great to see the removal of the manual |
|
assign set of reviewers |
|
Assigning reviewers: R: @Abacn added as fallback since no labels match configuration Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
What
Pure refactor of
runners/spark/. Two changes:runners/spark/3/src/.../structuredstreaming/(the only sources Spark 3 ships) into the sharedrunners/spark/src/.runners/spark/3/src/is removed entirely.copySourceBase/useCopiedSourceSetblock inrunners/spark/spark_runner.gradlewith the per-version source-overrides layering already used byrunners/flink/flink_runner.gradle:spark_major(currently3) builds straight from the shared base;Copytask withDuplicatesStrategy.INCLUDEthat merges shared + previous majors +./srcso per-version files override.runners/spark/3/build.gradlenow setsspark_major = '3'instead of the removedcopySourceBase = falseflag, andgradle.propertiesgainsspark_versions=3(mirrors the existingflink_versions).Why
Prepares the tree for the Spark 4 runner (#36841 / #38212), so it can land as a small overrides layer on top of this restructuring instead of duplicating the entire structured-streaming source tree.
Compatibility
Pure refactor — Spark 3's compiled output and test classpath are unchanged. The hoisted files are git-renames (verified with
git diff --find-renames).Next Step
Spark 4 follow-up branch ready: https://github.com/tkaymak/beam/tree/spark4-runner-slim