Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode #7338

chenjunjiedada · 2023-04-13T12:11:36Z

When consuming a table in TABLE_SCAN_THEN_INCREMENTAL mode and its snapshot history has expired, data can be lost. This is because checkScanMode returns incremental mode when the scan context is streaming. To address this issue, we have added a case to handle the TABLE_SCAN_THEN_INCREMENTAL mode.

stevenzwu · 2023-04-13T14:53:50Z

flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java

    BATCH,
    INCREMENTAL_APPEND_SCAN
  }

-  private static ScanMode checkScanMode(ScanContext context) {
+  @VisibleForTesting
+  static ScanMode checkScanMode(ScanContext context) {


@chenjunjiedada thx for catching the bug and creating the PR fix.

For the conditions here, is there any other simpler logic? E.g., is it enough to just remove the context.isStreaming() condition in the original if clause?

Also I think it is better safer/more clear to construct a new ScanContext object and set the useSnapshotId.

if (scanContext.streamingStartingStrategy() == StreamingStartingStrategy.TABLE_SCAN_THEN_INCREMENTAL) { // do a batch table scan first splits = FlinkSplitPlanner.planIcebergSourceSplits(table, scanContext, workerPool); LOG.info( "Discovered {} splits from initial batch table scan with snapshot Id {}", splits.size(), startSnapshot.snapshotId()); // For TABLE_SCAN_THEN_INCREMENTAL, incremental mode starts exclusive from the startSnapshot toPosition = IcebergEnumeratorPosition.of(startSnapshot.snapshotId(), startSnapshot.timestampMillis());

For the conditions here, is there any other simpler logic? E.g., is it enough to just remove the context.isStreaming() condition in the original if clause?

Yes, it looks more simple and more direct.

Also I think it is better safer/more clear to construct a new ScanContext object and set the useSnapshotId.

Agree, we can use scanContext.copyWithSnapshotId to achieve that.

stevenzwu · 2023-04-16T03:58:54Z

@chenjunjiedada thx for finding and fixing this bug

stevenzwu · 2023-04-18T18:35:47Z

@chenjunjiedada can you create a backport PR too?

…apache#7338)

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode

310aae3

github-actions bot added the flink label Apr 13, 2023

stevenzwu self-requested a review April 13, 2023 13:41

stevenzwu reviewed Apr 13, 2023

View reviewed changes

refactor

5a552e6

stevenzwu approved these changes Apr 16, 2023

View reviewed changes

stevenzwu merged commit b78d336 into apache:master Apr 16, 2023
12 checks passed

chenjunjiedada deleted the fix-incr-start branch April 19, 2023 01:45

chenjunjiedada added a commit to chenjunjiedada/incubator-iceberg that referenced this pull request Apr 19, 2023

Flink: backport apache#7338 to 1.16 and 1.15

4e5a220

stevenzwu pushed a commit that referenced this pull request Apr 19, 2023

Flink: backport #7338 to 1.16 and 1.15 (#7373)

c7b2e95

manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode (…

725bc20

…apache#7338)

manisin pushed a commit to Snowflake-Labs/iceberg that referenced this pull request May 9, 2023

Flink: backport apache#7338 to 1.16 and 1.15 (apache#7373)

a9435d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode #7338

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode #7338

chenjunjiedada commented Apr 13, 2023

stevenzwu Apr 13, 2023

chenjunjiedada Apr 14, 2023

stevenzwu commented Apr 16, 2023

stevenzwu commented Apr 18, 2023

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode #7338

Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode #7338

Conversation

chenjunjiedada commented Apr 13, 2023

stevenzwu Apr 13, 2023

Choose a reason for hiding this comment

chenjunjiedada Apr 14, 2023

Choose a reason for hiding this comment

stevenzwu commented Apr 16, 2023

stevenzwu commented Apr 18, 2023