Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37829: [Java] Avoid resizing data buffer twice when appending variable length vectors #37844

Merged
merged 1 commit into from
Sep 25, 2023

Conversation

hrishisd
Copy link
Contributor

@hrishisd hrishisd commented Sep 24, 2023

Rationale for this change

This change prevents avoidable OversizedAllocationExceptions when appending a variable-length vector with many small elements to a variable-length vector with a few large elements.

When appending variable-length vectors, VectorAppender iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer.

What changes are included in this PR?

The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.

The data buffer is resized based on the total required data size of the combined elements.

Are these changes tested?

Yes. I added a unit test that results in an OversizedAllocationException when run against the previous version of the code.

Are there any user-facing changes?

No.

@github-actions
Copy link

⚠️ GitHub issue #37829 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Sep 25, 2023
@lidavidm lidavidm merged commit 0f94eb6 into apache:main Sep 25, 2023
15 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Sep 25, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 0f94eb6.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

etseidl pushed a commit to etseidl/arrow that referenced this pull request Sep 28, 2023
…g variable length vectors (apache#37844)

### Rationale for this change
This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. 

When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. 

### What changes are included in this PR?
The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.

The data buffer is resized based on the total required data size of the combined elements.

### Are these changes tested?
Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. 

### Are there any user-facing changes?
No.

* Closes: apache#37829

Authored-by: hrishisd <hdharam@berkeley.edu>
Signed-off-by: David Li <li.davidm96@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…g variable length vectors (apache#37844)

### Rationale for this change
This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. 

When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. 

### What changes are included in this PR?
The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.

The data buffer is resized based on the total required data size of the combined elements.

### Are these changes tested?
Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. 

### Are there any user-facing changes?
No.

* Closes: apache#37829

Authored-by: hrishisd <hdharam@berkeley.edu>
Signed-off-by: David Li <li.davidm96@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…g variable length vectors (apache#37844)

### Rationale for this change
This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. 

When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. 

### What changes are included in this PR?
The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.

The data buffer is resized based on the total required data size of the combined elements.

### Are these changes tested?
Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. 

### Are there any user-facing changes?
No.

* Closes: apache#37829

Authored-by: hrishisd <hdharam@berkeley.edu>
Signed-off-by: David Li <li.davidm96@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…g variable length vectors (apache#37844)

### Rationale for this change
This change prevents avoidable `OversizedAllocationException`s when appending a variable-length vector with many small elements to a variable-length vector with a few large elements. 

When appending variable-length vectors, `VectorAppender` iteratively doubles the offset and validity buffers until they can accommodate the combined elements. In the previous implementation, each iteration would also double the data buffer's capacity. This behavior is appropriate for vectors of fixed-size types but can result in an oversized data buffers when appending many small elements to a variable length vector with a large data buffer. 

### What changes are included in this PR?
The new behavior only resizes the offset and validity buffers when resizing the target vector's buffers to ensure they can hold the total number of combined elements.

The data buffer is resized based on the total required data size of the combined elements.

### Are these changes tested?
Yes. I added a unit test that results in an `OversizedAllocationException` when run against the previous version of the code. 

### Are there any user-facing changes?
No.

* Closes: apache#37829

Authored-by: hrishisd <hdharam@berkeley.edu>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Excessive resizing in VectorAppender leads to avoidable OversizedAllocationException
2 participants