-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the enhancement requested
Problem / Motivation
In PyArrow, pyarrow.ipc.RecordBatchFileReader the only way I found to gather the total number of rows of contained in a Feather file is to do something along the lines of:
num_rows = sum(reader.get_batch(i).num_rows for i in range(reader.num_record_batches))This is not very efficient when it seems you can directly count the rows using the metadata (as in RecordBatchFileReader::CountRows) (if I understand the code correctly?)
The current way of doing is intractable when reading from remote file systems.
Proposed solution
Expose RecordBatchFileReader::CountRows in Python.
References
arrow/cpp/src/arrow/ipc/reader.h
Line 204 in 76f7815
| virtual Result<int64_t> CountRows() = 0; |
Thank you for reading my suggestion and all the amazing work!!
Component(s)
Python
Reactions are currently unavailable