changefeedccl: fix decimal writes into parquet#145248
Draft
KeithCh wants to merge 1 commit intocockroachdb:masterfrom
Draft
changefeedccl: fix decimal writes into parquet#145248KeithCh wants to merge 1 commit intocockroachdb:masterfrom
KeithCh wants to merge 1 commit intocockroachdb:masterfrom
Conversation
Member
a4c6470 to
298d3b7
Compare
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the encoding of decimal values in parquet files by switching from string bytes to a proper twos-complement byte representation. Key changes include:
- Expanding the decimal test case in writer_test.go to cover a broader range of decimals.
- Introducing new helper functions (twosComplement and formatDecimal) in write_functions.go for proper decimal encoding.
- Updating the decimal decoding logic in decoders.go to convert from the twos-complement format.
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/util/parquet/writer_test.go | Updated the test case for decimals by adding more decimal columns and test values. |
| pkg/util/parquet/write_functions.go | Added functions to compute the twos-complement and format decimals correctly. |
| pkg/util/parquet/testutils.go | Extended datum validation to include comparisons of decimal coefficients and sign. |
| pkg/util/parquet/decoders.go | Revised decimal decoding to convert twos-complement bytes to decimal values. |
Files not reviewed (1)
- pkg/util/parquet/BUILD.bazel: Language not supported
Comments suppressed due to low confidence (1)
pkg/util/parquet/writer_test.go:240
- [nitpick] The updated test values for decimals now use finite numbers instead of '-inf' and 'nan'. Please verify that these finite values provide sufficient coverage of edge cases previously intended.
if datums[1], err = tree.ParseDDecimal("1.222"); err != nil {
298d3b7 to
6ab739a
Compare
c223a07 to
e7a2847
Compare
Contributor
Author
|
The current status is that it works for 95% of cases, with the following caveats:
|
Previously we wrote decimals into parquet as string bytes. This caused issues when being read since the readers expect the bytes to be a valid decimal representation. We now write all decimals columns in parquet as a (decimal, string) tuple. The decimal value is null if it is not a finite decimal (e.g. NaN, Inf). The string value is null if the decimal value is finite. Epic: CRDB-41784 Fixes: cockroachdb#130909 Release note (bug fix): Fix decimal writes into parquet files from changefeeds and exports.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously we wrote decimals into parquet as
string bytes. This caused issues when being read
since the readers expect the bytes to be a valid
decimal representation.
Epic: CRDB-41784
Fixes: #130909
Release note (bug fix): Fix decimal writes into parquet files from
changefeeds and exports.