Skip to content

SparkMicroBatchStream doesn't respect start-snapshot-id or end-snapshot-id #5538

@kbendick

Description

@kbendick

Feature Request / Improvement

Currently, the SparkReadConf can be used with start-snapshot-id and end-snapshot-id options to read just a specific chunk of the table.

These options are not respected when SparkMicroBatchStream is initialized, i.e. using the following will silently ignore these options.

 val df =  spark
      .readStream
      .format("iceberg")
      .option("start-snapshot-id", startId)
      .option("end-snapshot-id", endId)
      .table("rauls_table")

We should implement support for more than jsut timestamp based incremental reads in SparkMicrobatchStream.

This was reported on Slack and I'm opening an issue for it. Please feel free to work on this if you have time 👍

Query engine

Spark

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions