Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-18: Fix all-null value pages with dict encoding. #18

Closed

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jul 12, 2014

TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.

The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.

TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.

The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.
@julienledem
Copy link
Member

LGTM
+1

@asfgit asfgit closed this in fb01048 Jul 18, 2014
@rdblue
Copy link
Contributor Author

rdblue commented Jul 18, 2014

Thanks, @julienledem!

rdblue added a commit to rdblue/parquet-mr that referenced this pull request Aug 11, 2014
TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.

The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.

Author: Ryan Blue <rblue@cloudera.com>

Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits:

0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding.

Conflicts:
	parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesReader.java
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Aug 11, 2014
TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.

The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.

Author: Ryan Blue <rblue@cloudera.com>

Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits:

0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding.

Conflicts:
	parquet-column/src/main/java/parquet/column/values/dictionary/DictionaryValuesReader.java
rdblue added a commit to rdblue/parquet-mr that referenced this pull request Feb 6, 2015
TestDictionary#testZeroValues demonstrates the problem, where a page of
all null values is decoded using the DicitonaryValuesReader. Because
there are no non-null values, the page values section is 0 byte, but the
DictionaryValuesReader assumes there is at least one encoded value and
attempts to read a bit width. The test passes a byte array to
initFromPage with the offset equal to the array's length.

The fix is to detect that there are no input bytes to read. To avoid
adding validity checks to the read path, this sets the internal decoder
to one that will throw an exception if any reads are attempted.

Author: Ryan Blue <rblue@cloudera.com>

Closes apache#18 from rdblue/PARQUET-18-fix-nulls-with-dictionary and squashes the following commits:

0711766 [Ryan Blue] PARQUET-18: Fix all-null value pages with dict encoding.
gszadovszky pushed a commit to gszadovszky/parquet-mr that referenced this pull request Aug 22, 2018
This updates parquet-format to use org.apache names. Still need to:
* Validate that parquet-mr works as expected when relying on these changes

Author: Ryan Blue <blue@apache.org>

Closes apache#18 from rdblue/PARQUET-23-rename-to-org-apache and squashes the following commits:

ddcd50e [Ryan Blue] PARQUET-23: Update changelog for org.apache parquet-format 2.2.0.
5c339d4 [Ryan Blue] PARQUET-23: Update POM to use Apache maven release config.
ac982ca [Ryan Blue] PARQUET-23: Refactor parquet-format to org.apache names.
parthchandra added a commit to parthchandra/incubator-parquet-mr that referenced this pull request May 13, 2022
apache#18)

* Async I/O - read one buffer at a time instead of all at once to allow multiple streams fair use of a shared thread pool.
sunchao pushed a commit to sunchao/parquet-mr that referenced this pull request Aug 1, 2022
apache#18)

* Async I/O - read one buffer at a time instead of all at once to allow multiple streams fair use of a shared thread pool.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants