Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Why does VectorSchemaRoot::slice return reference to same root when slice index is [0, rowCount] ? #35275

Closed
ParthChonkar opened this issue Apr 21, 2023 · 1 comment · Fixed by #35476
Assignees
Labels
Component: Java Type: usage Issue is a user question
Milestone

Comments

@ParthChonkar
Copy link
Contributor

ParthChonkar commented Apr 21, 2023

Describe the usage question you have. Please include as many useful details as possible.

Hi Arrow community,

In the VectorSchemaRoot::slice implementation it checks if the slice covers the entirety of the root and returns the reference to the same root if so.

    if (index == 0 && length == rowCount) {
      return this;
    }

Looks like this is done as an optimization to prevent creating another root + transferring to it.

Doesn't this optimization make slice inconsistent? Sometimes you get back a new autocloseable resource that has to be separately closed and sometimes you get back the same root.

So if you want to calculate something based on a set of different slices of your root then you have to manually check whether the current slice you are requesting isn't the whole root to avoid closing the original root.

Since splitAndTransfer do no copying of the actual data, is this optimization worth this tradeoff?

Component(s)

Java

@ParthChonkar ParthChonkar added the Type: usage Issue is a user question label Apr 21, 2023
ParthChonkar added a commit to ParthChonkar/arrow that referenced this issue May 8, 2023
@ParthChonkar
Copy link
Contributor Author

Also in this documentation https://arrow.apache.org/docs/java/vector_schema_root.html#vectorschemaroot
it says that A **new** VectorSchemaRoot can be sliced from an existing root without copying data:

lidavidm pushed a commit that referenced this issue May 8, 2023
…5476)

### Rationale for this change
The optimization in the `slice` function for VectorSchemaRoot that returns the reference to the original root when the slice covers the entire root should be removed because this is an inconsistent interface (sometimes you get a totally new autocloseable and sometimes you get the original one). 

Also - in the [this documentation page](https://arrow.apache.org/docs/java/vector_schema_root.html#vectorschemaroot) it explicitly says that "A **NEW** VectorSchemaRoot can be sliced from an existing root without copying data:" so any time you call `slice`, the slice root should be a new one (even if it's the entire root). 

The test for slice has also been changed to have more coverage of the slice function (not just slicing from the start index).  

### What changes are included in this PR?
Remove self reference return in `VectorSchemaRoot#split` and expand test coverage for the function.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes 

**This PR includes breaking changes to public APIs.** 

* Closes: #35275

Authored-by: Parth <parthchonkar@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 13.0.0 milestone May 8, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…ot (apache#35476)

### Rationale for this change
The optimization in the `slice` function for VectorSchemaRoot that returns the reference to the original root when the slice covers the entire root should be removed because this is an inconsistent interface (sometimes you get a totally new autocloseable and sometimes you get the original one). 

Also - in the [this documentation page](https://arrow.apache.org/docs/java/vector_schema_root.html#vectorschemaroot) it explicitly says that "A **NEW** VectorSchemaRoot can be sliced from an existing root without copying data:" so any time you call `slice`, the slice root should be a new one (even if it's the entire root). 

The test for slice has also been changed to have more coverage of the slice function (not just slicing from the start index).  

### What changes are included in this PR?
Remove self reference return in `VectorSchemaRoot#split` and expand test coverage for the function.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes 

**This PR includes breaking changes to public APIs.** 

* Closes: apache#35275

Authored-by: Parth <parthchonkar@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…ot (apache#35476)

### Rationale for this change
The optimization in the `slice` function for VectorSchemaRoot that returns the reference to the original root when the slice covers the entire root should be removed because this is an inconsistent interface (sometimes you get a totally new autocloseable and sometimes you get the original one). 

Also - in the [this documentation page](https://arrow.apache.org/docs/java/vector_schema_root.html#vectorschemaroot) it explicitly says that "A **NEW** VectorSchemaRoot can be sliced from an existing root without copying data:" so any time you call `slice`, the slice root should be a new one (even if it's the entire root). 

The test for slice has also been changed to have more coverage of the slice function (not just slicing from the start index).  

### What changes are included in this PR?
Remove self reference return in `VectorSchemaRoot#split` and expand test coverage for the function.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes 

**This PR includes breaking changes to public APIs.** 

* Closes: apache#35275

Authored-by: Parth <parthchonkar@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…ot (apache#35476)

### Rationale for this change
The optimization in the `slice` function for VectorSchemaRoot that returns the reference to the original root when the slice covers the entire root should be removed because this is an inconsistent interface (sometimes you get a totally new autocloseable and sometimes you get the original one). 

Also - in the [this documentation page](https://arrow.apache.org/docs/java/vector_schema_root.html#vectorschemaroot) it explicitly says that "A **NEW** VectorSchemaRoot can be sliced from an existing root without copying data:" so any time you call `slice`, the slice root should be a new one (even if it's the entire root). 

The test for slice has also been changed to have more coverage of the slice function (not just slicing from the start index).  

### What changes are included in this PR?
Remove self reference return in `VectorSchemaRoot#split` and expand test coverage for the function.

### Are these changes tested?
Yes

### Are there any user-facing changes?
Yes 

**This PR includes breaking changes to public APIs.** 

* Closes: apache#35275

Authored-by: Parth <parthchonkar@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Java Type: usage Issue is a user question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants