-
Notifications
You must be signed in to change notification settings - Fork 722
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
I'm attempting to filter data from a Parquet file to reduce memory usage:
import awswrangler as wr
df = wr.s3.read_parquet(
path="s3://bucket/path/part-00000.snappy.parquet",
filters=[('iso_country_code', '==', 'US')]
)
This results in:
Traceback (most recent call last):
File "filter_parquet_from_aws.py", line 5, in <module>
filters=[('iso_country_code', '==', 'US')]
File "/home/jeremyakers/.local/lib/python3.6/site-packages/awswrangler/s3.py", line 1726, in read_parquet
s3_additional_kwargs=s3_additional_kwargs,
File "/home/jeremyakers/.local/lib/python3.6/site-packages/awswrangler/s3.py", line 1603, in _read_parquet_init
split_row_groups=False,
File "/home/jeremyakers/.local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1200, in __init__
self._filter(filters)
File "/home/jeremyakers/.local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1314, in _filter
accepts_filter = self.partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'
There don't seem to be many examples or clear docs on how to use this filters option, so I'm not sure what I might be doing wrong.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested