Skip to content

using filters with pyarrow_additional_kwargs in wr.s3.read_parquet #1032

@mutasem-mattar

Description

@mutasem-mattar

I am trying to read a big dataset. However, I am trying to filter some rows, I tried to add filters param in pyarrow_additional_kwargs but it did not work. I got same data (unfiltered).

x = wr.s3.read_parquet("s3://xxx/yyyy/", 
                       chunked=True, 
                       boto3_session=session, 
                       dataset=False, 
                       use_threads=True,
                       pyarrow_additional_kwargs={"filters":[('purchases', '=', 1)]},
                      ) 

Also I tried with dataset=True but it didnt work as well.

x = wr.s3.read_parquet("s3://xxx/yyyy/", 
                       chunked=True, 
                       boto3_session=session, 
                       dataset=True, 
                       use_threads=True,
                       pyarrow_additional_kwargs={"filters":[('purchases', '=', 1)]},
                      ) 

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions