Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Incorrect getBufferSize for a dense union vector #38242

Closed
wotbrew opened this issue Oct 12, 2023 · 0 comments · Fixed by #38305
Closed

[Java] Incorrect getBufferSize for a dense union vector #38242

wotbrew opened this issue Oct 12, 2023 · 0 comments · Fixed by #38305

Comments

@wotbrew
Copy link
Contributor

wotbrew commented Oct 12, 2023

Describe the bug, including details regarding any error messages, version, and platform.

DenseUnionVector.getBufferSizeFor takes a count parameter. My expectation is the count represents the number of elements in the union you wish to account for.

However, that count is passed directly to internalStruct.getBufferSizeFor, which I suspect is a bug.

This is because struct fields normally have the same valueCount, but this is not likely true when used in a union. If your children have different lengths:

  • for fixed vectors, you will calculate the size of the leg contents incorrectly
  • dynamic vectors like VarBinary may require reading buffer contents to find the data size, potentially causing an out-of-bounds dereference

Observed on 13.0.0 and 12.0.1.

Repro:

package org.apache.arrow.vector.complex;

import static org.junit.jupiter.api.Assertions.assertDoesNotThrow;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.util.Arrays;
import java.util.List;

import org.apache.arrow.memory.BoundsChecking;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.BaseValueVector;
import org.apache.arrow.vector.complex.DenseUnionVector;
import org.apache.arrow.vector.holders.NullableIntHolder;
import org.apache.arrow.vector.holders.NullableVarBinaryHolder;
import org.apache.arrow.vector.types.UnionMode;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import org.junit.jupiter.api.Test;

public class TestDuvBufferCount {
  @Test
  public void testBufferSize() {
    try (BufferAllocator allocator = new RootAllocator();
         DenseUnionVector duv = new DenseUnionVector("duv", allocator,
                 FieldType.nullable(new ArrowType.Union(UnionMode.Dense, null)), null)) {

      List<Field> fields = Arrays.asList(
              new Field("a", FieldType.notNullable(new ArrowType.Int(32, true)), null),
              new Field("b", FieldType.notNullable(new ArrowType.Binary()), null)
      );

      duv.initializeChildrenFromFields(fields);

      NullableIntHolder nih = new NullableIntHolder();
      NullableVarBinaryHolder nvbh = new NullableVarBinaryHolder();

      int ac = BaseValueVector.INITIAL_VALUE_ALLOCATION + 1;
      for (int i = 0; i < ac; i++) {
        duv.setTypeId(i, (byte) 0);
        duv.setSafe(i, nih);
      }

      int bc = 1;
      for (int i = 0; i < bc; i++) {
        duv.setTypeId(i + ac, (byte) 1);
        duv.setSafe(i + ac, nvbh);
      }

      duv.setValueCount(ac + bc);

      // will not necessarily see an error unless bounds checking is on.
      assertTrue(BoundsChecking.BOUNDS_CHECKING_ENABLED);
      assertDoesNotThrow(duv::getBufferSize);
    }
  }
}

Component(s)

Java

@wotbrew wotbrew changed the title Incorrect getBufferSize for a dense union vector with (non-fixed) legs of different lengths. Incorrect getBufferSize for a dense union vector Oct 12, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 17, 2023
@wotbrew wotbrew changed the title Incorrect getBufferSize for a dense union vector [Java] Incorrect getBufferSize for a dense union vector Oct 17, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 20, 2023
wotbrew added a commit to wotbrew/arrow that referenced this issue Oct 20, 2023
@pitrou pitrou added this to the 15.0.0 milestone Oct 23, 2023
pitrou pushed a commit that referenced this issue Oct 23, 2023
…ionVector#getBufferSizeFor (#38305)

### What changes are included in this PR?

Fix incorrect implementation of `DenseUnionVector.getBufferSizeFor`.
Sum the type id counts before calling `getBufferSizeFor` on the union's child vectors.

### Are these changes tested?

Yes. A test verifies the OOB read (requires bounds check), and an example count is correct, fails as expected before fix.

**This PR contains a "Critical Fix".** 

For users of DenseUnionVector the size can calculated incorrectly, as well as cause of out-of-bounds buffer reads which may return garbage (or potentially segfaulting) if bounds checking is off.

* Closes: #38242

Authored-by: Dan Stone <wotbrew@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…enseUnionVector#getBufferSizeFor (apache#38305)

### What changes are included in this PR?

Fix incorrect implementation of `DenseUnionVector.getBufferSizeFor`.
Sum the type id counts before calling `getBufferSizeFor` on the union's child vectors.

### Are these changes tested?

Yes. A test verifies the OOB read (requires bounds check), and an example count is correct, fails as expected before fix.

**This PR contains a "Critical Fix".** 

For users of DenseUnionVector the size can calculated incorrectly, as well as cause of out-of-bounds buffer reads which may return garbage (or potentially segfaulting) if bounds checking is off.

* Closes: apache#38242

Authored-by: Dan Stone <wotbrew@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 25, 2023
…enseUnionVector#getBufferSizeFor (apache#38305)

### What changes are included in this PR?

Fix incorrect implementation of `DenseUnionVector.getBufferSizeFor`.
Sum the type id counts before calling `getBufferSizeFor` on the union's child vectors.

### Are these changes tested?

Yes. A test verifies the OOB read (requires bounds check), and an example count is correct, fails as expected before fix.

**This PR contains a "Critical Fix".** 

For users of DenseUnionVector the size can calculated incorrectly, as well as cause of out-of-bounds buffer reads which may return garbage (or potentially segfaulting) if bounds checking is off.

* Closes: apache#38242

Authored-by: Dan Stone <wotbrew@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…enseUnionVector#getBufferSizeFor (apache#38305)

### What changes are included in this PR?

Fix incorrect implementation of `DenseUnionVector.getBufferSizeFor`.
Sum the type id counts before calling `getBufferSizeFor` on the union's child vectors.

### Are these changes tested?

Yes. A test verifies the OOB read (requires bounds check), and an example count is correct, fails as expected before fix.

**This PR contains a "Critical Fix".** 

For users of DenseUnionVector the size can calculated incorrectly, as well as cause of out-of-bounds buffer reads which may return garbage (or potentially segfaulting) if bounds checking is off.

* Closes: apache#38242

Authored-by: Dan Stone <wotbrew@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…enseUnionVector#getBufferSizeFor (apache#38305)

### What changes are included in this PR?

Fix incorrect implementation of `DenseUnionVector.getBufferSizeFor`.
Sum the type id counts before calling `getBufferSizeFor` on the union's child vectors.

### Are these changes tested?

Yes. A test verifies the OOB read (requires bounds check), and an example count is correct, fails as expected before fix.

**This PR contains a "Critical Fix".** 

For users of DenseUnionVector the size can calculated incorrectly, as well as cause of out-of-bounds buffer reads which may return garbage (or potentially segfaulting) if bounds checking is off.

* Closes: apache#38242

Authored-by: Dan Stone <wotbrew@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants