-
Notifications
You must be signed in to change notification settings - Fork 25
Description
What happens?
Thanks for the great work on this project! I noticed an issue with exports to Apache Arrow from DuckDB queries in v1.4.0
which changes behaviors previously experienced. Using .arrow()
from a DuckDB return now returns a pyarrow.lib.RecordBatchReader
instead of a pyarrow.lib.Table
. This goes against the documentation found here (which stipulates that RecordBatchReaders
are the alternative): https://duckdb.org/docs/stable/guides/python/export_arrow.html.
If instead the docs and intention are updated here to prefer streaming/batched processing I suggest that all docs pages be updated to reflect this fact.
To Reproduce
Please see the following gist for an example of output: https://gist.github.com/d33bs/d49279c46142f6375b3d4f070ddd6e0f
Generally, we should see a pyarrow.lib.Table
type returned from the following code (and instead see pyarrow.lib.RecordBatchReader
with v1.4.0
):
import duckdb
with duckdb.connect() as ddb:
result = ddb.execute("SELECT 1,2,3;").arrow()
type(result)
OS:
MacOS
DuckDB Package Version:
1.4.0
Python Version:
3.12
Full Name:
Dave Bunten
Affiliation:
University of Colorado Anschutz Medical Campus
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have