Fix record delimiting on row group boundaries (#2025) #2027
So the batch_size bug was a bug, but would be masked by the MIN_BATCH_SIZE setting of 1024 and so would only occur with batch sizes >= 1024. The actual cause of the failure was more subtle. The bug was that once exhausted RecordReader would continue to re-read the last record on subsequent calls to read_records. This would only occur if it returned exactly the batch_size number of records, when reaching the end of a chunk. This wouldn't actually read any new data and so would end up actually returning less records than it claimed to have read.
If only a single list column this somewhat funky behaviour would not be noticeable, as it would eventually read all the data, corresponding to the total number of rows. However, it would read batches of 8, 7, 5 instead of 8, 8, 4, which would throw off the StructArrayReader.