fix(python, rust): prevent table scan returning large arrow dtypes #2274

ion-elgreco · 2024-03-09T18:19:12Z

Description

Such a small change, but fixes many issues where parquets were written with arrow where the source data was in large dtype format.

By default the parquet::ParquetReader decodes the arrow metadata which in return may give you large dtypes. This would cause issues during DataFusion parquet scan with a filter since the filter wouldn't coerce to the large dtypes. Simply disabling the arrow metadata decoding gives us the parquet schema converted to an arrow schema without large types 👯‍♂️

Related issue(s)

closes Can't write data from parquet file to delta table (Rust) #1470

Blajda

LGTM!
Based on the documentation for this function I don't foresee any issue.

ion-elgreco requested review from wjones127, fvaleye, roeap and rtyler as code owners March 9, 2024 18:19

ion-elgreco force-pushed the fix/large_dtype_coercion branch from d5bbdd2 to ac54d9c Compare March 9, 2024 18:19

github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Mar 9, 2024

ion-elgreco enabled auto-merge (squash) March 9, 2024 18:20

ion-elgreco force-pushed the fix/large_dtype_coercion branch from ac54d9c to 07e617c Compare March 13, 2024 18:18

prevent reading to return large arrow schema

adf43f0

ion-elgreco force-pushed the fix/large_dtype_coercion branch from 07e617c to adf43f0 Compare March 14, 2024 08:39

Blajda approved these changes Mar 15, 2024

View reviewed changes

ion-elgreco merged commit 456c6ce into delta-io:main Mar 15, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python, rust): prevent table scan returning large arrow dtypes #2274

fix(python, rust): prevent table scan returning large arrow dtypes #2274

ion-elgreco commented Mar 9, 2024

Blajda left a comment

fix(python, rust): prevent table scan returning large arrow dtypes #2274

fix(python, rust): prevent table scan returning large arrow dtypes #2274

Conversation

ion-elgreco commented Mar 9, 2024

Description

Related issue(s)

Blajda left a comment

Choose a reason for hiding this comment