Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Parquet writer support for UNSIGNED types, various timestamp types, and correctly set converted types in more cases #2597

Merged
merged 9 commits into from
Nov 22, 2021

Conversation

Mytherin
Copy link
Collaborator

This fixes #2545 by adding support for unsigned types to the Parquet reader, and fixes #2454 by correctly setting the converted type to date.

In addition, we move to storing dates/timestamps as proper timestamp/date types, instead of as legacy Impala timestamps, and add support for all timestamp times (TIMESTAMP_MS, TIMESTAMP_S, TIMESTAMP_NS in addition to TIMESTAMP).

This PR also adds support for HUGEINT (by converting to double) and sets the correct converted type for INT8 and INT16 columns.

We also add several round-trip tests with the pyarrow/pandas parquet reader/writer as a sanity check.

@Mytherin
Copy link
Collaborator Author

Ready to merge after feature freeze.

@Mytherin Mytherin merged commit f0f378a into duckdb:master Nov 22, 2021
@Mytherin Mytherin deleted the parquetwritertypes branch December 3, 2021 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants