Skip to content

Error when attempting to use filters #267

@jeremyakers

Description

@jeremyakers

I'm attempting to filter data from a Parquet file to reduce memory usage:

import awswrangler as wr

df = wr.s3.read_parquet(
         path="s3://bucket/path/part-00000.snappy.parquet",
         filters=[('iso_country_code', '==', 'US')]
)

This results in:

Traceback (most recent call last):
  File "filter_parquet_from_aws.py", line 5, in <module>
    filters=[('iso_country_code', '==', 'US')]
  File "/home/jeremyakers/.local/lib/python3.6/site-packages/awswrangler/s3.py", line 1726, in read_parquet
    s3_additional_kwargs=s3_additional_kwargs,
  File "/home/jeremyakers/.local/lib/python3.6/site-packages/awswrangler/s3.py", line 1603, in _read_parquet_init
    split_row_groups=False,
  File "/home/jeremyakers/.local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1200, in __init__
    self._filter(filters)
  File "/home/jeremyakers/.local/lib/python3.6/site-packages/pyarrow/parquet.py", line 1314, in _filter
    accepts_filter = self.partitions.filter_accepts_partition
AttributeError: 'NoneType' object has no attribute 'filter_accepts_partition'

There don't seem to be many examples or clear docs on how to use this filters option, so I'm not sure what I might be doing wrong.

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions