Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] DeltaLengthByteArrayDecoder fails to reject empty buffers #34667

Closed
wjones127 opened this issue Mar 21, 2023 · 0 comments · Fixed by #34668
Closed

[C++][Parquet] DeltaLengthByteArrayDecoder fails to reject empty buffers #34667

wjones127 opened this issue Mar 21, 2023 · 0 comments · Fixed by #34668
Assignees
Labels
Component: C++ Component: Parquet Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Milestone

Comments

@wjones127
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

If a corrupted Parquet file indicates it has num_values > 0 in a page while the page is empty, DeltaLengthByteArrayDecoder will cause buffer overflow.

Component(s)

C++, Parquet

@wjones127 wjones127 self-assigned this Mar 21, 2023
@wjones127 wjones127 added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Mar 21, 2023
wjones127 added a commit that referenced this issue Mar 23, 2023
…d inputs (#34668)

### Rationale for this change

We should make sure that we are defensive when it comes to invalid inputs. Otherwise malicious actors might send invalid Parquet files as a way to crash a users service.

### What changes are included in this PR?

Makes sure we reject empty pages by deferring to the DeltaBitPackDecoder to error when the lengths header is missing. Also adds a DCHECK that validates `num_valid_values_` was initialized (it wasn't when `len = 0`). Also adds a few more general tests cases for the encoding, covering empty arrays and empty strings.

### Are these changes tested?

Yes, several unit tests have been added.

### Are there any user-facing changes?

**This PR contains a "Critical Fix".**
* Closes: #34667

Lead-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Will Jones <willjones127@gmail.com>
@wjones127 wjones127 added this to the 12.0.0 milestone Mar 23, 2023
rtpsw pushed a commit to rtpsw/arrow that referenced this issue Mar 27, 2023
…invalid inputs (apache#34668)

### Rationale for this change

We should make sure that we are defensive when it comes to invalid inputs. Otherwise malicious actors might send invalid Parquet files as a way to crash a users service.

### What changes are included in this PR?

Makes sure we reject empty pages by deferring to the DeltaBitPackDecoder to error when the lengths header is missing. Also adds a DCHECK that validates `num_valid_values_` was initialized (it wasn't when `len = 0`). Also adds a few more general tests cases for the encoding, covering empty arrays and empty strings.

### Are these changes tested?

Yes, several unit tests have been added.

### Are there any user-facing changes?

**This PR contains a "Critical Fix".**
* Closes: apache#34667

Lead-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Will Jones <willjones127@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…invalid inputs (apache#34668)

### Rationale for this change

We should make sure that we are defensive when it comes to invalid inputs. Otherwise malicious actors might send invalid Parquet files as a way to crash a users service.

### What changes are included in this PR?

Makes sure we reject empty pages by deferring to the DeltaBitPackDecoder to error when the lengths header is missing. Also adds a DCHECK that validates `num_valid_values_` was initialized (it wasn't when `len = 0`). Also adds a few more general tests cases for the encoding, covering empty arrays and empty strings.

### Are these changes tested?

Yes, several unit tests have been added.

### Are there any user-facing changes?

**This PR contains a "Critical Fix".**
* Closes: apache#34667

Lead-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Component: Parquet Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant