-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7616: [Java] Support comparing value ranges for dense union vector #6355
Conversation
@TheNeuralBit do you have time to review? |
int curIdx = childFields.size(); | ||
typeIds[curIdx] = i; | ||
childFields.add(typeFields[i]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain the motivation for this change? I'm not really following why it's necessary to add support for comparing value ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit Thanks a lot for your feedback.
Motivation for comparing value ranges:
For each type of vector, we need a way to compare individual elements/contiguous range of elements. This is implemented in class RangeEqualsVisitor, with each overload method processing one type of vectors.
Recently, we have introduced a new vector type: DenseUnionVector, so we need to add a new overload to support it.
Motivation for this change:
The old implementation of this method mimics that of class UnionVector, as they are both union vectors, and share many similarities. When implementating this PR, we found this implementation caused some problems. So we fix it now.
The fundamental reason is that the type ids are defined differently for union and dense union vectors. For union vectors, the type id is the ordinal of the minor type of the child vector; whereas for dense union vectors, the type id is an incremental tiny int allocated by the vector when registering a child vector.
When comparing vector elements, we need to compare their fields first, and the fields include child vector ids (type ids), so we need to change this method.
If there are any other problems, please feel free to contact us. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fundamental reason is that the type ids are defined differently for union and dense union vectors.
But why is there this difference? I don't actually see anything in the specification about a difference between type ids for sparse vs. dense unions. To me it looks like this PR is creating that difference.
It just seems like an unrelated change that's been pulled into this PR. An answer to this question would help me understand: Could you add support for comparing value ranges without this change to type ids? If not, why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit Good question.
You are right, as the difference is not mentioned in the specification.
For Java, the dense union is aligning with the specification, while the sparse union is not. The dense union was implemented only recently, while the sparse union was implemented long ago. So it is not easy to fix the layout of sparse union, for the sake of backward compatibility.
So it is the implementation of dense union that started introducing the difference, and this change makes the difference complete. I am afraid it is difficult to remove the difference, at least in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks for the explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution and for your patience :)
Thanks a lot for your effort. |
After we support dense union vectors, we should support range value comparisons for them.