Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Return iterator of RecordBatches (in Python) #58

Open
shenker opened this issue Jan 17, 2024 · 0 comments
Open

Feature request: Return iterator of RecordBatches (in Python) #58

shenker opened this issue Jan 17, 2024 · 0 comments

Comments

@shenker
Copy link

shenker commented Jan 17, 2024

Oxbow's Python functions (read_bam, etc.) currently return a bytes object. It would be great if they instead returned an iterator of pa.RecordBatch objects instead. The goal here would be to allow reading files in chunks (instead of loading the whole file in memory), and also to return PyArrow objects (that can be turned into pa.Tables, polars/pandas dataframes, etc.) instead of bare bytes objects. The desired chunk size (in number of rows? in bytes?) would ideally be exposed as a kwarg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant