Skip to content

Export to Apache Arrow now defaults to pyarrow.lib.RecordBatchReader #95

@d33bs

Description

@d33bs

What happens?

Thanks for the great work on this project! I noticed an issue with exports to Apache Arrow from DuckDB queries in v1.4.0 which changes behaviors previously experienced. Using .arrow() from a DuckDB return now returns a pyarrow.lib.RecordBatchReader instead of a pyarrow.lib.Table. This goes against the documentation found here (which stipulates that RecordBatchReaders are the alternative): https://duckdb.org/docs/stable/guides/python/export_arrow.html.

If instead the docs and intention are updated here to prefer streaming/batched processing I suggest that all docs pages be updated to reflect this fact.

To Reproduce

Please see the following gist for an example of output: https://gist.github.com/d33bs/d49279c46142f6375b3d4f070ddd6e0f

Generally, we should see a pyarrow.lib.Table type returned from the following code (and instead see pyarrow.lib.RecordBatchReader with v1.4.0):

import duckdb

with duckdb.connect() as ddb:
  result = ddb.execute("SELECT 1,2,3;").arrow()

type(result)

OS:

MacOS

DuckDB Package Version:

1.4.0

Python Version:

3.12

Full Name:

Dave Bunten

Affiliation:

University of Colorado Anschutz Medical Campus

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a nightly build

Did you include all relevant data sets for reproducing the issue?

Not applicable - the reproduction does not require a data set

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration to reproduce the issue?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions