-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10574: [Python][Parquet] Allow collections for 'in' / 'not in' filter (in addition to sets) #8672
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Added a few comments
35cc90a
to
a7794d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
ad15ec2
to
cc5b97d
Compare
cc5b97d
to
d66369e
Compare
Hi Joris, Thank you for reviewing my work. I saw that python 3.5 is not supported any more, so I changed back to use "Collection" as you suggested. I believe everything is taken care of. Please review. Weiyang |
Thanks @wyzhao ! |
…filter (in addition to sets) I would like to enhance partition filters in methods such as: pyarrow.parquet.ParquetDataset(path, filters) I am proposing the below enhancements: 1. for operator "in", "not in", the value should be any typing.Iteratable (also a container). But currently only set is supported while other iteratable, such as list, tuple cannot function correctly. I would like to change it to accept any iteratable. 2. Enhance the documents about the partition filters. I see there is a new version implemented with _ParquetDatasetV2 which passed my tests with an iterable for "in" and "not in". So the documentation update is fine for the new version as well. Closes apache#8672 from wyzhao/feature/partition_filter Lead-authored-by: Weiyang Zhao <weiyzha@blackrock.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
I would like to enhance partition filters in methods such as:
pyarrow.parquet.ParquetDataset(path, filters)
I am proposing the below enhancements:
for operator "in", "not in", the value should be any typing.Iteratable (also a container). But currently only set is supported while other iteratable, such as list, tuple cannot function correctly. I would like to change it to accept any iteratable.
Enhance the documents about the partition filters.
I see there is a new version implemented with _ParquetDatasetV2 which passed my tests with an iterable for "in" and "not in". So the documentation update is fine for the new version as well.