[spark] Make LakeSplit extend Serializable to simplify Spark serialization by YannByron · Pull Request #3123 · apache/fluss

YannByron · 2026-04-18T03:23:02Z

Summary

Make LakeSplit extend java.io.Serializable so Spark can transport splits directly instead of manual byte-level serialize/deserialize via SimpleVersionedSerializer
Replace lakeSplitBytes: Array[Byte] with lakeSplit: LakeSplit / lakeSplits: java.util.List[LakeSplit] in InputPartition case classes
Remove serializeLakeSplits/deserializeLakeSplits from FlussLakeUtils and splitSerializer from batch/reader method chains

Test plan

LakeSplitSerializationTest — verifies TestingLakeSplit round-trips through Java serialization
fluss-common, fluss-lake-paimon, fluss-lake-iceberg, fluss-spark-common all compile
Spark lake integration tests in fluss-spark/fluss-spark-ut/

🤖 Generated with Claude Code

…ation LakeSplit objects were previously serialized to Array[Byte] via SimpleVersionedSerializer on the Spark driver side, stored in InputPartition case classes, then deserialized on executors using a re-created serializer. This added unnecessary complexity and coupling. Since both Paimon's DataSplit and Iceberg's FileScanTask are already java.io.Serializable, LakeSplit can safely extend Serializable, allowing Spark to transport splits directly via Java serialization. Changes: - LakeSplit extends java.io.Serializable; PaimonSplit/TestingLakeSplit add serialVersionUID - InputPartition case classes use LakeSplit directly instead of Array[Byte] - Remove splitSerializer from batch/reader method chains - Remove serializeLakeSplits/deserializeLakeSplits from FlussLakeUtils - Add LakeSplitSerializationTest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

beryllw · 2026-04-21T10:20:50Z

Thanks for the pr. LGTM!

Use a unique table name in testJavaSerializationRoundTrip to avoid AlreadyExistsException when running alongside testSerializeAndDeserialize, since both tests share a static Iceberg catalog without per-test cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lation Same issue as the Iceberg counterpart: testJavaSerializationRoundTrip shared DEFAULT_TABLE with testSerializeAndDeserialize. While Paimon's createTable uses ignoreIfExists=true so it wouldn't throw, the second test would silently append to the existing table, breaking test isolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nique table names Drop the default table before each test method in IcebergSourceTestBase and PaimonSourceTestBase to ensure test isolation. This is a more robust approach than requiring each test to use a unique table name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

luoyuxia

@YannByron Thanks for the pr. Left minor comments.

luoyuxia · 2026-04-24T03:58:03Z


 /** Split for Iceberg table. */
-public class IcebergSplit implements LakeSplit, Serializable {
-    private static final long serialVersionUID = 1L;


nit: why remove this serialVersionUID?
And we may also add serialVersionUID for PaimonSplit

luoyuxia

Thanks. LGTM!

…ation (apache#3123)

YannByron force-pushed the main-lakesplit branch from 5956695 to 727afbb Compare April 20, 2026 06:14

update

17d7ec9

leonardBang self-requested a review April 21, 2026 08:20

beryllw reviewed Apr 21, 2026

View reviewed changes

Comment thread fluss-common/src/test/java/org/apache/fluss/lake/source/TestingLakeSplit.java Outdated

beryllw reviewed Apr 21, 2026

View reviewed changes

Comment thread fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/source/PaimonSplit.java Outdated

beryllw reviewed Apr 21, 2026

View reviewed changes

Comment thread fluss-common/src/test/java/org/apache/fluss/lake/source/LakeSplitSerializationTest.java Outdated

[update]

9745994

YannByron and others added 3 commits April 21, 2026 22:05

luoyuxia reviewed Apr 24, 2026

View reviewed changes

[update]

8caa2a8

luoyuxia approved these changes Apr 24, 2026

View reviewed changes

luoyuxia merged commit 27b7cca into apache:main Apr 24, 2026
11 of 13 checks passed

Ugbot pushed a commit to Ugbot/fluss that referenced this pull request Apr 26, 2026

[spark] Make LakeSplit extend Serializable to simplify Spark serializ…

95c4b8e

…ation (apache#3123)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Make LakeSplit extend Serializable to simplify Spark serialization#3123

[spark] Make LakeSplit extend Serializable to simplify Spark serialization#3123
luoyuxia merged 7 commits intoapache:mainfrom
YannByron:main-lakesplit

YannByron commented Apr 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

beryllw commented Apr 21, 2026

Uh oh!

luoyuxia left a comment

Uh oh!

luoyuxia Apr 24, 2026

Uh oh!

luoyuxia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YannByron commented Apr 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

beryllw commented Apr 21, 2026

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

luoyuxia Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

luoyuxia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants