Skip to content

Use pyarrow.substrait to execute scans on Pyarrow Datasets #362

@kdbrooks

Description

@kdbrooks

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The current Dataset TableProvider is ugly and converts DataFusion Exprs to Pyarrow expressions in large match statements. I propose we move towards using Substrait as the method for translating from DataFusion to / from Pyarrow. This was not available yet when this TableProvider was made.

Describe the solution you'd like
Now that both Pyarrow and Datafusion support substrait we could clean up and improve the Pyarrow DataSet TableProvider and ExecutionPlan by using pyarrow.substrait to execute the scan.

Describe alternatives you've considered
Keep the existing ugly Dataset code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions