fix(parquet/pqarrow): fix Decimal256 sign extension #711
Merged
zeroshade merged 1 commit intoapache:mainfrom Mar 13, 2026
Merged
fix(parquet/pqarrow): fix Decimal256 sign extension #711zeroshade merged 1 commit intoapache:mainfrom
zeroshade merged 1 commit intoapache:mainfrom
Conversation
…imal256 bigEndianToDecimal256 in column_readers.go has a bug in the partial-word sign extension path: it shifts by wordLen (byte count) instead of wordLen*8 (bit count). This is a copy-paste error from bigEndianToDecimal128 which correctly uses `* 8`. The bug corrupts negative Decimal256 values when the Parquet byte width is not a multiple of 8. For example, Decimal256 with precision 40 uses 17 bytes (DecimalSize(40) = 17). Reading such values back splits into iterations of 8 + 8 + 1 bytes, and the 1-byte partial word triggers the broken shift, mangling the sign extension.
zeroshade
approved these changes
Mar 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bigEndianToDecimal256 in column_readers.go has a bug in the partial-word sign extension path: it shifts by wordLen (byte count) instead of wordLen*8 (bit count).
Rationale for this change
Found while reading a parquet file w/ a certain Decimal256 using go-arrow
What changes are included in this PR?
fix for bigEndianToDecimal256
Are these changes tested?
Yes, round-trip test included
Are there any user-facing changes?
None