Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python, rust): prevent table scan returning large arrow dtypes #2274

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

ion-elgreco
Copy link
Collaborator

Description

Such a small change, but fixes many issues where parquets were written with arrow where the source data was in large dtype format.

By default the parquet::ParquetReader decodes the arrow metadata which in return may give you large dtypes. This would cause issues during DataFusion parquet scan with a filter since the filter wouldn't coerce to the large dtypes. Simply disabling the arrow metadata decoding gives us the parquet schema converted to an arrow schema without large types 👯‍♂️

Related issue(s)

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Mar 9, 2024
@ion-elgreco ion-elgreco enabled auto-merge (squash) March 9, 2024 18:20
Copy link
Collaborator

@Blajda Blajda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Based on the documentation for this function I don't foresee any issue.

@ion-elgreco ion-elgreco merged commit 456c6ce into delta-io:main Mar 15, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't write data from parquet file to delta table (Rust)
2 participants