Skip to content

Flink RowDataIterator should support reuseObject flag #1643

@stevenzwu

Description

@stevenzwu

While debugging unit test failure for the FLIP-27 source PoC code, I realized that RowDataIterator always reuse and return the same GenericRowData object. It seems to expect caller to clone or convert the object. TestFlinkInputFormat was using TestHelpers.copyRowData to do it. I guess Flink SQL convert RowData to Row.

This behavior makes it difficult for the SplitReader to use the iterator, because SplitReader fetches a batch of records.
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceSplitReader.java#L80-L87

Should we make the RowDataIterator support reuse flag?

Also what should be the default reuse value (true or false)? When I was looking at this code in FlinkInputFormat, it seems that base InputFormat class expects the iterator to only return reused object, if reuse input arg is not null. So the iterator.next() seems to break that contract.

  @Override
  public RowData nextRecord(RowData reuse) {
    return iterator.next();
  }

@JingsongLi @openinx please share your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions