-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-18: Fix all-null value pages with dict encoding. #18
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
TestDictionary#testZeroValues demonstrates the problem, where a page of all null values is decoded using the DicitonaryValuesReader. Because there are no non-null values, the page values section is 0 byte, but the DictionaryValuesReader assumes there is at least one encoded value and attempts to read a bit width. The test passes a byte array to initFromPage with the offset equal to the array's length. The fix is to detect that there are no input bytes to read. To avoid adding validity checks to the read path, this sets the internal decoder to one that will throw an exception if any reads are attempted.
LGTM |
Thanks, @julienledem! |
rdblue
added a commit
to rdblue/parquet-mr
that referenced
this pull request
Aug 11, 2014
TestDictionary#testZeroValues demonstrates the problem, where a page of all null values is decoded using the DicitonaryValuesReader. Because there are no non-null values, the page values section is 0 byte, but the DictionaryValuesReader assumes there is at least one encoded value and attempts to read a bit width. The test passes a byte array to initFromPage with the offset equal to the array's length. The fix is to detect that there are no input bytes to read. To avoid adding validity checks to the read path, this sets the internal decoder to one that will throw an exception if any reads are attempted. Author: Ryan Blue <rblue@cloudera.com> Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits: 0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding. Conflicts: parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesReader.java
rdblue
added a commit
to rdblue/parquet-mr
that referenced
this pull request
Aug 11, 2014
TestDictionary#testZeroValues demonstrates the problem, where a page of all null values is decoded using the DicitonaryValuesReader. Because there are no non-null values, the page values section is 0 byte, but the DictionaryValuesReader assumes there is at least one encoded value and attempts to read a bit width. The test passes a byte array to initFromPage with the offset equal to the array's length. The fix is to detect that there are no input bytes to read. To avoid adding validity checks to the read path, this sets the internal decoder to one that will throw an exception if any reads are attempted. Author: Ryan Blue <rblue@cloudera.com> Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits: 0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding. Conflicts: parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesReader.java
rdblue
added a commit
to rdblue/parquet-mr
that referenced
this pull request
Feb 6, 2015
TestDictionary#testZeroValues demonstrates the problem, where a page of all null values is decoded using the DicitonaryValuesReader. Because there are no non-null values, the page values section is 0 byte, but the DictionaryValuesReader assumes there is at least one encoded value and attempts to read a bit width. The test passes a byte array to initFromPage with the offset equal to the array's length. The fix is to detect that there are no input bytes to read. To avoid adding validity checks to the read path, this sets the internal decoder to one that will throw an exception if any reads are attempted. Author: Ryan Blue <rblue@cloudera.com> Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits: 0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding.
gszadovszky
pushed a commit
to gszadovszky/parquet-mr
that referenced
this pull request
Aug 22, 2018
This updates parquet-format to use org.apache names. Still need to: * Validate that parquet-mr works as expected when relying on these changes Author: Ryan Blue <blue@apache.org> Closes apache#18 from rdblue/PARQUET-23-rename-to-org-apache and squashes the following commits: ddcd50e [Ryan Blue] PARQUET-23: Update changelog for org.apache parquet-format 2.2.0. 5c339d4 [Ryan Blue] PARQUET-23: Update POM to use Apache maven release config. ac982ca [Ryan Blue] PARQUET-23: Refactor parquet-format to org.apache names.
parthchandra
added a commit
to parthchandra/incubator-parquet-mr
that referenced
this pull request
May 13, 2022
apache#18) * Async I/O - read one buffer at a time instead of all at once to allow multiple streams fair use of a shared thread pool.
sunchao
pushed a commit
to sunchao/parquet-mr
that referenced
this pull request
Aug 1, 2022
apache#18) * Async I/O - read one buffer at a time instead of all at once to allow multiple streams fair use of a shared thread pool.
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.
The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.