Flink: RecordFactory interface that can create record batch and clone…#3866
Flink: RecordFactory interface that can create record batch and clone…#3866rdblue merged 3 commits intoapache:masterfrom
Conversation
… record, which are needed by DataIteratorBatcher
| /** | ||
| * Clone record | ||
| */ | ||
| void clone(T from, T to); |
There was a problem hiding this comment.
| /** | ||
| * Create a batch of records | ||
| */ | ||
| T[] createBatch(int batchSize); |
There was a problem hiding this comment.
| public RowData[] createBatch(int batchSize) { | ||
| RowData[] arr = new RowData[batchSize]; | ||
| for (int i = 0; i < batchSize; ++i) { | ||
| arr[i] = new GenericRowData(rowType.getFieldCount()); |
There was a problem hiding this comment.
Can the use of rowType.getFieldCount() be removed from the loop? Seems like it only needs to be called once.
There was a problem hiding this comment.
it is just a getter, which should be optimized by JVM
kbendick
left a comment
There was a problem hiding this comment.
LGTM.
Outside of the loop related concern, I'm good with these interfaces.
| private final RowType rowType; | ||
| private final TypeSerializer[] fieldSerializers; | ||
|
|
||
| RowDataRecordFactory(final RowType rowType) { |
There was a problem hiding this comment.
We usually omit final unless it is an instance field.
|
|
||
| @Override | ||
| public void clone(RowData from, RowData to) { | ||
| RowDataUtil.clone(from, to, rowType, fieldSerializers); |
There was a problem hiding this comment.
I think this needs to return the RowData produced by clone to be correct. Otherwise, there are a few modifications to this class or others that cause correctness bugs.
| } | ||
|
|
||
| @Override | ||
| public void clone(RowData from, RowData to) { |
There was a problem hiding this comment.
You might consider changing this to clone(RowData from, RowData[] toBatch, int position)
There was a problem hiding this comment.
I adopted this API.
The reason I had the RecordFactory interface is to work with both RowData and Avro GenericRecord iterator. Both Netflix and Apple uses Avro GenericRecord. For the DataIterator with Avro GenericRecord, the iterator returns non-reused/fresh object for every record. In the Avro case, the clone impl can directly set the from object into the toBatch array.
… record, which are needed by DataIteratorBatcher