-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
While debugging unit test failure for the FLIP-27 source PoC code, I realized that RowDataIterator always reuse and return the same GenericRowData object. It seems to expect caller to clone or convert the object. TestFlinkInputFormat was using TestHelpers.copyRowData to do it. I guess Flink SQL convert RowData to Row.
This behavior makes it difficult for the SplitReader to use the iterator, because SplitReader fetches a batch of records.
https://github.com/stevenzwu/iceberg/blob/flip27IcebergSource/flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceSplitReader.java#L80-L87
Should we make the RowDataIterator support reuse flag?
Also what should be the default reuse value (true or false)? When I was looking at this code in FlinkInputFormat, it seems that base InputFormat class expects the iterator to only return reused object, if reuse input arg is not null. So the iterator.next() seems to break that contract.
@Override
public RowData nextRecord(RowData reuse) {
return iterator.next();
}
@JingsongLi @openinx please share your thoughts.