Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-7616: [Java] Support comparing value ranges for dense union vector #6355

Closed
wants to merge 1 commit into from

Conversation

liyafan82
Copy link
Contributor

After we support dense union vectors, we should support range value comparisons for them.

@github-actions
Copy link

github-actions bot commented Feb 5, 2020

@emkornfield
Copy link
Contributor

@TheNeuralBit do you have time to review?

@TheNeuralBit TheNeuralBit self-requested a review February 22, 2020 17:53
int curIdx = childFields.size();
typeIds[curIdx] = i;
childFields.add(typeFields[i]);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the motivation for this change? I'm not really following why it's necessary to add support for comparing value ranges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit Thanks a lot for your feedback.

Motivation for comparing value ranges:
For each type of vector, we need a way to compare individual elements/contiguous range of elements. This is implemented in class RangeEqualsVisitor, with each overload method processing one type of vectors.

Recently, we have introduced a new vector type: DenseUnionVector, so we need to add a new overload to support it.

Motivation for this change:

The old implementation of this method mimics that of class UnionVector, as they are both union vectors, and share many similarities. When implementating this PR, we found this implementation caused some problems. So we fix it now.

The fundamental reason is that the type ids are defined differently for union and dense union vectors. For union vectors, the type id is the ordinal of the minor type of the child vector; whereas for dense union vectors, the type id is an incremental tiny int allocated by the vector when registering a child vector.

When comparing vector elements, we need to compare their fields first, and the fields include child vector ids (type ids), so we need to change this method.

If there are any other problems, please feel free to contact us. Thanks a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental reason is that the type ids are defined differently for union and dense union vectors.

But why is there this difference? I don't actually see anything in the specification about a difference between type ids for sparse vs. dense unions. To me it looks like this PR is creating that difference.

It just seems like an unrelated change that's been pulled into this PR. An answer to this question would help me understand: Could you add support for comparing value ranges without this change to type ids? If not, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheNeuralBit Good question.

You are right, as the difference is not mentioned in the specification.

For Java, the dense union is aligning with the specification, while the sparse union is not. The dense union was implemented only recently, while the sparse union was implemented long ago. So it is not easy to fix the layout of sparse union, for the sake of backward compatibility.

So it is the implementation of dense union that started introducing the difference, and this change makes the difference complete. I am afraid it is difficult to remove the difference, at least in the near future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks for the explanation

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution and for your patience :)

@liyafan82
Copy link
Contributor Author

LGTM. Thanks for the contribution and for your patience :)

Thanks a lot for your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants