Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9680: [Java] Support non-nullable vectors #8147

Closed
wants to merge 2 commits into from

Conversation

liyafan82
Copy link
Contributor

This issue was first discussed in the ML (https://lists.apache.org/thread.html/r480387ec9ec822f3ed30e9131109e43874a1c4d18af74ede1a7e41c5%40%3Cdev.arrow.apache.org%3E), from which we have received some feedback.

We briefly resate it here as below:

  1. Non-nullable vectors are widely used in practice. For example, in a database engine, a column can be declared as not null, so it cannot contain null values.
    2.Non-nullable vectors has significant performance advantages compared with their nullable conterparts, such as:
  1. the memory space of the validity buffer can be saved.
  2. manipulation of the validity buffer can be bypassed
  3. some if-else branches can be replaced by sequential instructions (by the JIT compiler), leading to high throughput for the CPU pipeline.

We open this Jira to facilitate further discussions, and we may provide a sample PR, which will help us make a clearer decision.

@liyafan82
Copy link
Contributor Author

In the first commit of this PR, we only support IntVector.

@github-actions
Copy link

github-actions bot commented Sep 9, 2020

@@ -164,7 +165,9 @@ private void setValue(int index, int value) {
* @param value value of element
*/
public void set(int index, int value) {
BitVectorHelper.setBit(validityBuffer, index);
if (!NON_NULLABLE_VECTORS_ENABLED || nullable) {
Copy link
Member

@kiszk kiszk Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we declare NON_NULLABLE_VECTORS_ENABLED && nullable as a final instance variable? If so, we can avoid a similar conditional expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kiszk Thanks for your feedback. I have changed the code accordingly.
It really makes the code simpler, and I've verified that there were no performance regression.

However, it could also lead to some consufing behavior. For example, a vector that is known to have no null values may have nullable field equal to true.

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Nov 25, 2020
@emkornfield
Copy link
Contributor

I think consensus on the mailing list is that this isn't something we want to pursue?

@liyafan82
Copy link
Contributor Author

I think consensus on the mailing list is that this isn't something we want to pursue?

@emkornfield Thanks for your kind reminder. I am closing it.

@liyafan82 liyafan82 closed this Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase A PR that needs to be rebased by the author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants