fix(parquet/pqarrow): fix Decimal256 sign extension #710
Closed
dimakuz wants to merge 1 commit intoapache:mainfrom
Closed
fix(parquet/pqarrow): fix Decimal256 sign extension #710dimakuz wants to merge 1 commit intoapache:mainfrom
dimakuz wants to merge 1 commit intoapache:mainfrom
Conversation
…imal256 bigEndianToDecimal256 in column_readers.go has a bug in the partial-word sign extension path: it shifts by wordLen (byte count) instead of wordLen*8 (bit count). This is a copy-paste error from bigEndianToDecimal128 which correctly uses `* 8`. The bug corrupts negative Decimal256 values when the Parquet byte width is not a multiple of 8. For example, Decimal256 with precision 40 uses 17 bytes (DecimalSize(40) = 17). Reading such values back splits into iterations of 8 + 8 + 1 bytes, and the 1-byte partial word triggers the broken shift, mangling the sign extension.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…imal256
bigEndianToDecimal256 in column_readers.go has a bug in the partial-word sign extension path: it shifts by wordLen (byte count) instead of wordLen*8 (bit count). This is a copy-paste error from bigEndianToDecimal128 which correctly uses
* 8.The bug corrupts negative Decimal256 values when the Parquet byte width is not a multiple of 8. For example, Decimal256 with precision 40 uses 17 bytes (DecimalSize(40) = 17). Reading such values back splits into iterations of 8 + 8 + 1 bytes, and the 1-byte partial word triggers the broken shift, mangling the sign extension.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?