[spark] support merge-read between kv snapshot and log for primary-key table by YannByron · Pull Request #2523 · apache/fluss

YannByron · 2026-01-29T14:51:14Z

Purpose

Linked issue: close #2427

Brief change log

Tests

API and Format

Documentation

YannByron · 2026-01-30T06:09:23Z

@wuchong @Yohahaha please review this.

…y table

wuchong

@YannByron thanks for the contribution.

I rebased the branch and appended a commit to address my minor comments. Will merge it if you don't have concerns.

wuchong · 2026-01-31T16:31:35Z

...uss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala

+
+  private def createSortMergeReader(): SortMergeReader = {
+    // Create key encoder for primary keys
+    val keyEncoder = encode.KeyEncoder.of(rowType, tableInfo.getPhysicalPrimaryKeys, null)


In the latest main branch, we’ve refactored KeyEncoder.
Could you please rebase onto the latest main and use the KeyEncoder.ofPrimaryKey(...) method? Otherwise, the key encoding won’t align with the keys stored in RocksDB, leading to incorrect query results.

3a803df

wuchong · 2026-01-31T16:33:22Z

fluss-client/src/main/java/org/apache/fluss/client/table/scanner/SortMergeReader.java

+    public SortMergeReader(
+            @Nullable int[] projectedFields,
+            int[] pkIndexes,
+            @Nullable CloseableIterator<LogRecord> lakeRecordIterator,


Could you rename this parameter and the member variable lakeRecordIterator to snapshotRecordIterator? This can better reflect the usage of Spark.

Yohahaha · 2026-02-02T14:13:47Z

...uss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala

+      }
+
+      // Collect all log records until logStoppingOffset
+      val allLogRecords = mutable.ArrayBuffer[ScanRecord]()


we need to fetch by size to avoid OOM when log store has huge records.

@Yohahaha However, we need to sort the changelog, which requires buffering all changelog. Therefore, fetching by size doesn't make much sense in this context. If the changelog is truly huge, we may need to consider supporting spilling the changelog buffer to local disk.

Yohahaha · 2026-02-02T14:31:04Z

@YannByron I found a bug while testing reading PK tables, it fails when using the last column as the primary key, current cases in SparkPrimaryKeyTableReadTest all use first column and partition column as primary key.

test("Spark Read: primary key table with last pk") {
    withTable("t") {
      sql("CREATE TABLE t (id int, name string, pk int, pk2 string) TBLPROPERTIES('primary.key'='pk,pk2')")
      checkAnswer(sql("SELECT * FROM t"), Nil)
      sql("INSERT INTO t VALUES (1, 'a', 10, 'x'), (2, 'b', 20, 'y')")
      checkAnswer(sql("SELECT * FROM t ORDER BY id"), Row(1, "a", 10, "x") :: Row(2, "b", 20, "y") :: Nil)
    }
  }

above case will failed with

Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 4) (192.168.0.116 executor driver): java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
	at org.apache.fluss.row.ProjectedRow.getInt(ProjectedRow.java:90)
	at org.apache.fluss.row.InternalRow.lambda$createFieldGetter$ff31e09f$6(InternalRow.java:198)
	at org.apache.fluss.row.encode.CompactedKeyEncoder.encodeKey(CompactedKeyEncoder.java:83)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:113)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:111)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasSamePrimaryKey(LogChangesIterator.scala:117)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasNext(LogChangesIterator.scala:85)
	at org.apache.fluss.client.table.scanner.SortMergeReader.readBatch(SortMergeReader.java:90)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.initialize(FlussUpsertPartitionReader.scala:217)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.<init>(FlussUpsertPartitionReader.scala:86)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReaderFactory.createReader(FlussPartitionReaderFactory.scala:61)

wuchong · 2026-02-04T16:30:40Z

@YannByron I found a bug while testing reading PK tables, it fails when using the last column as the primary key, current cases in SparkPrimaryKeyTableReadTest all use first column and partition column as primary key.

test("Spark Read: primary key table with last pk") {
    withTable("t") {
      sql("CREATE TABLE t (id int, name string, pk int, pk2 string) TBLPROPERTIES('primary.key'='pk,pk2')")
      checkAnswer(sql("SELECT * FROM t"), Nil)
      sql("INSERT INTO t VALUES (1, 'a', 10, 'x'), (2, 'b', 20, 'y')")
      checkAnswer(sql("SELECT * FROM t ORDER BY id"), Row(1, "a", 10, "x") :: Row(2, "b", 20, "y") :: Nil)
    }
  }

above case will failed with

Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 4) (192.168.0.116 executor driver): java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
	at org.apache.fluss.row.ProjectedRow.getInt(ProjectedRow.java:90)
	at org.apache.fluss.row.InternalRow.lambda$createFieldGetter$ff31e09f$6(InternalRow.java:198)
	at org.apache.fluss.row.encode.CompactedKeyEncoder.encodeKey(CompactedKeyEncoder.java:83)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:113)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:111)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasSamePrimaryKey(LogChangesIterator.scala:117)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasNext(LogChangesIterator.scala:85)
	at org.apache.fluss.client.table.scanner.SortMergeReader.readBatch(SortMergeReader.java:90)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.initialize(FlussUpsertPartitionReader.scala:217)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.<init>(FlussUpsertPartitionReader.scala:86)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReaderFactory.createReader(FlussPartitionReaderFactory.scala:61)

Thank you @Yohahaha , could you open a pull request to add and fix the test case?

YannByron and others added 4 commits February 1, 2026 00:34

[spark] support merge-read between kv snapshot and log for primary-ke…

b47d2c8

…y table

update

a2e0a11

update

1988199

address comments

420c379

wuchong reviewed Jan 31, 2026

View reviewed changes

wuchong force-pushed the main-spark branch from 4129ffa to 420c379 Compare January 31, 2026 16:37

wuchong approved these changes Feb 1, 2026

View reviewed changes

wuchong merged commit d3a935f into apache:main Feb 1, 2026
6 checks passed

Yohahaha reviewed Feb 2, 2026

View reviewed changes

Yohahaha mentioned this pull request Apr 2, 2026

[spark][bug] Batch read PK table failed when use random column as primary key #2986

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] support merge-read between kv snapshot and log for primary-key table#2523

[spark] support merge-read between kv snapshot and log for primary-key table#2523
wuchong merged 4 commits intoapache:mainfrom
YannByron:main-spark

YannByron commented Jan 29, 2026

Uh oh!

YannByron commented Jan 30, 2026

Uh oh!

wuchong left a comment

Uh oh!

wuchong Jan 31, 2026

Uh oh!

wuchong Jan 31, 2026

Uh oh!

Uh oh!

Yohahaha Feb 2, 2026

Uh oh!

wuchong Feb 4, 2026

Uh oh!

Yohahaha commented Feb 2, 2026

Uh oh!

wuchong commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YannByron commented Jan 29, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

YannByron commented Jan 30, 2026

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yohahaha Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha commented Feb 2, 2026

Uh oh!

wuchong commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants