Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Deserialising delta binary packed encoded data produces incorrect results #486

Open
swindiggie opened this issue Mar 8, 2024 · 0 comments

Comments

@swindiggie
Copy link
Contributor

swindiggie commented Mar 8, 2024

Library Version

4.23.4

OS

MacOS 13.4

OS Architecture

ARM 64

How to reproduce?

I generated a parquet file using the following one-column CSV:

"column1"
1000
1
2
3
4
5
6
7
8
9
10

The single column in the parquet file is Int32 using DELTA_BINARY_PACKED encoding.

When I deserialise the parquet file using parquet-dotnet, I get:

"column1"
1000
1
-998
-1997
-2996
-3995
-4994
-5993
-6992
-7991
-8990

The incorrect results can be viewed using Parquet Floor 4.23.4.

I have also opened a PR with a failing unit test against the parquet file as a reference:

I verified a few other websites display correct results for the parquet file:

The CSV file and a screenshot of Parquet Floor are attached.

test.csv

Screenshot 2024-03-07 at 4 26 59 pm

Failing test

No response

@swindiggie swindiggie changed the title [BUG]: Deserialising DELTA_BINARY_PACKED produces incorrect results [BUG]: Deserialising delta binary packed encoded data produces incorrect results Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant