-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go][Parquet] RLE problem with ''(blank)" #34603
Comments
@allinux can you provide the code you are using to perform the writes? |
@zeroshade
|
zeroshade
added a commit
to zeroshade/arrow
that referenced
this issue
Mar 23, 2023
zeroshade
added a commit
that referenced
this issue
Mar 31, 2023
…#34709) ### Rationale for this change Writing a dictionary encoded column consisting of empty strings ended up writing values that consisted of a string with a NUL character in them rather than actually writing an empty string. This fixes that issue and also cleans the code up a little bit in doing so. ### Are these changes tested? A unit test is added to test for the behavior. ### Are there any user-facing changes? Users who wrote dictionary ByteArray or FixedLenByteArray columns that contained empty strings will see this fixed when it comes to handling those empty strings rather than having written strings containing a single NUL character (`\x00`). * Closes: #34603 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
minyoung
pushed a commit
to minyoung/arrow
that referenced
this issue
May 1, 2023
…trings (apache#34709) Writing a dictionary encoded column consisting of empty strings ended up writing values that consisted of a string with a NUL character in them rather than actually writing an empty string. This fixes that issue and also cleans the code up a little bit in doing so. A unit test is added to test for the behavior. Users who wrote dictionary ByteArray or FixedLenByteArray columns that contained empty strings will see this fixed when it comes to handling those empty strings rather than having written strings containing a single NUL character (`\x00`). * Closes: apache#34603 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
ArgusLi
pushed a commit
to Bit-Quill/arrow
that referenced
this issue
May 15, 2023
…trings (apache#34709) ### Rationale for this change Writing a dictionary encoded column consisting of empty strings ended up writing values that consisted of a string with a NUL character in them rather than actually writing an empty string. This fixes that issue and also cleans the code up a little bit in doing so. ### Are these changes tested? A unit test is added to test for the behavior. ### Are there any user-facing changes? Users who wrote dictionary ByteArray or FixedLenByteArray columns that contained empty strings will see this fixed when it comes to handling those empty strings rather than having written strings containing a single NUL character (`\x00`). * Closes: apache#34603 Authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
There is a bug where
''
is loaded as\x00
whenWithDictionaryDefault
ofparquet.WriterProperties
is set totrue
when all data in the column is''
.If processed as
false
, it is normally loaded as''
.It seems to occur during RLE processing, but need to check.
sample_parquet.zip contains src.parquet and dest.parquet files,
src.parquet
is a file with''
anddest.parquet
is a file reloaded withWithDictionaryDefault
set totrue
.sample_parquet.zip
Component(s)
Go
The text was updated successfully, but these errors were encountered: