Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame execute_stream and execute_stream_partitioned #12

Closed
Tracked by #24
kylebrooks-8451 opened this issue Jul 24, 2022 · 2 comments · Fixed by #610
Closed
Tracked by #24

Implement DataFrame execute_stream and execute_stream_partitioned #12

kylebrooks-8451 opened this issue Jul 24, 2022 · 2 comments · Fixed by #610

Comments

@kylebrooks-8451
Copy link
Contributor

kylebrooks-8451 commented Jul 24, 2022

arrow-rs now has zero-copy to PyArrow for RecordBatchReaders.

We could use this implement the execute_stream and execute_stream_partitioned methods for DataFrames similar to the Rust DataFrame.

@kszlim
Copy link

kszlim commented Apr 9, 2023

Curious if anyone is actively working on this? Would be a nice addition to the python bindings, I wouldn't mind taking a shot at it, but I probably would have a hard unless heavily guided.

@judahrand
Copy link
Contributor

This is something that would be absolutely fantastic. DuckDB enables this kind of a workflow and we've found it really valuable. The ibis API for Datafusion pretends to do it too... but in reality the .collect() call buffers the whole query result into memory so saves nothing... It'd be nice to improve this!

mesejo added a commit to mesejo/arrow-datafusion-python that referenced this issue Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants