Extend Parquet writer support for UNSIGNED types, various timestamp types, and correctly set converted types in more cases #2597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes #2545 by adding support for unsigned types to the Parquet reader, and fixes #2454 by correctly setting the converted type to date.
In addition, we move to storing dates/timestamps as proper timestamp/date types, instead of as legacy Impala timestamps, and add support for all timestamp times (
TIMESTAMP_MS
,TIMESTAMP_S
,TIMESTAMP_NS
in addition toTIMESTAMP
).This PR also adds support for
HUGEINT
(by converting to double) and sets the correct converted type forINT8
andINT16
columns.We also add several round-trip tests with the pyarrow/pandas parquet reader/writer as a sanity check.