Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support for 'pa.compute.Expression' in filter argument to 'pa.read_table' #32745

Closed
asfimport opened this issue Aug 20, 2022 · 1 comment

Comments

@asfimport
Copy link
Collaborator

Currently, the filters argument supports {}List{}[{}Tuple{}] or {}List{}[{}List{}[{}Tuple{}]] or None as its input types. I was suprised to see that Expressions were not supported, considering that filters are converted to expressions internally when using use_legacy_dataset=False.

The check on L150-L153 short-circuits and succeeds when encountering an expression, but later fails on L2343 as the expression is evaluated as part of a boolean expression. 

I think declaring filters using pa.compute.Expressions more pythonic and less error-prone,  and ill-formed filters will be detected much earlier than when using list-of-tuple-of-string equivalents.

Example:

import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.parquet as pq

# Creating a dummy table
table = pa.table({
    'year': [2020, 2022, 2021, 2022, 2019, 2021],
    'n_legs': [2, 2, 4, 4, 5, 100],
    'animal': ["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"]
})
pq.write_to_dataset(table, root_path='dataset_name_2', partition_cols=['year'])

# Reading using 'pyarrow.compute.Expression'
pq.read_table('dataset_name_2', columns=["n_legs", "animal"], filters=pc.field("n_legs") < 4)

# Reading using List[Tuple]
pq.read_table('dataset_name_2', columns=["n_legs", "animal"], filters=[('n_legs', '<', 4)])  

Reporter: Patrik Kjærran
Assignee: Miles Granger / @milesgranger

PRs and other links:

Note: This issue was originally created as ARROW-17483. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 14011
#14011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants