-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-1928: Interpret Parquet INT96 type as FIXED[12] AVRO Schema #831
Conversation
Gentle bump up |
Parquet community was against adding INT96 support to not to encourage our clients to use it. While I understand the requirement of supporting the already written types. (Meanwhile as parquet-avro did not support INT96 ever this change is required for developments of new functionalities depending on the deprecated INT96.) |
…defaulted to false to discourage use of INT96.
thanks for the review. I have incorporated the changes requested. |
…The flag is defaulted to false to discourage use of INT96.
@gszadovszky Thanks for the approval. |
@anantdamle, thank you for the contribution! |
Thanks @gszadovszky, quick request to kindly squash and merge as there are 2 - useless commits to rectify my IDE's autochange to LICENSE comments. |
@anantdamle, our usual process is to squash all the changes related to one jira before merging. Thanks a lot for your contribution! |
Make sure you have checked all steps below.
Reading Parquet files in Apache Beam using ParquetIO uses
AvroParquetReader
causing it to throwIllegalArgumentException("INT96 not implemented and is deprecated")
Customers have large datasets which can't be reprocessed again to convert into a supported type. An easier approach would be to convert into a byte array of 12 bytes, that can then be interpreted by the developer in any way they want to interpret it.
This patch interprets the INT96 parquet type as a byte array of 12-bytes. the developer/user can then handle it appropriate to interpret into a timestamp or simple some bytes.
testParquetInt96AsFixed12AvroType
andtestParquetInt96DefaultFail
https://issues.apache.org/jira/browse/PARQUET-1928