[JAVA] realloc should consider the existing buffer capacity for computing target memory requirement

We recently encountered a problem when we were trying to add JSON files with complex schema as datasets.

Initially we started with a Float8Vector with default memory allocation of (4096 \* 8) 32KB.
Went through several iterations of setSafe() to trigger a realloc() from 32KB to 64KB.
Another round of setSafe() calls to trigger a realloc() from 64KB to 128KB

After that we encountered a BigInt and promoted our vector to UnionVector.

This required us to create a UnionVector with BigIntVector and Float8Vector. The latter required us to transfer the Float8Vector we were earlier working with to the Float8Vector inside the Union.

As part of transferTo(), the target Float8Vector got all the ArrowBuf state (capacity, buffer contents) etc transferred from the source vector.

Later, a realloc was triggered on the Float8Vector inside the UnionVector.

The computation inside realloc() to determine the amount of memory to be reallocated goes wrong since it makes the decision based on allocateSizeInBytes – although this vector was created as part of transfer() from 128KB source vector, allocateSizeInBytes is still at the initial/default value of 32KB

We end up allocating a 64KB buffer and attempt to copy 128KB over 64KB and seg fault when invoking setBytes().

There is a wrong assumption in realloc() that allocateSizeInBytes is always equal to data.capacity(). The particular scenario described above exposes where this assumption could go wrong.

**Reporter**: [Siddharth Teotia](https://issues.apache.org/jira/browse/ARROW-1533) / @siddharthteotia
**Assignee**: [Siddharth Teotia](https://issues.apache.org/jira/browse/ARROW-1533) / @siddharthteotia
#### PRs and other links:
- [GitHub Pull Request #1097](https://github.com/apache/arrow/pull/1097)
- [GitHub Pull Request #1112](https://github.com/apache/arrow/pull/1112)

<sub>**Note**: *This issue was originally created as [ARROW-1533](https://issues.apache.org/jira/browse/ARROW-1533). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JAVA] realloc should consider the existing buffer capacity for computing target memory requirement #17549

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[JAVA] realloc should consider the existing buffer capacity for computing target memory requirement #17549

Description

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions