Feature request: Return iterator of RecordBatches (in Python) #58

shenker · 2024-01-17T13:15:47Z

Oxbow's Python functions (read_bam, etc.) currently return a bytes object. It would be great if they instead returned an iterator of pa.RecordBatch objects instead. The goal here would be to allow reading files in chunks (instead of loading the whole file in memory), and also to return PyArrow objects (that can be turned into pa.Tables, polars/pandas dataframes, etc.) instead of bare bytes objects. The desired chunk size (in number of rows? in bytes?) would ideally be exposed as a kwarg.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Return iterator of RecordBatches (in Python) #58

Feature request: Return iterator of RecordBatches (in Python) #58

shenker commented Jan 17, 2024

Feature request: Return iterator of RecordBatches (in Python) #58

Feature request: Return iterator of RecordBatches (in Python) #58

Comments

shenker commented Jan 17, 2024