New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10378: [Rust] Update take() kernel with support for LargeList. #8556
Conversation
Hi @drusso , thanks a lot. I haven't look into the changes in detail, but I always understood offsets and indices as different concepts: AFAI understand, According to the spec,
So, if anything, what we could change is to make |
I'll add some background context on the implementation of Let's start with a list array:
For this example, let's take indices
Let's look specifically at the input values array and the output values array:
We can think of the values arrays as their sequences of indices into the input values array:
Constructing that output values array can be accomplished by:
This is what the implementation of The I'll also note that the generic implementation is in |
You are obviously right, @drusso. I am sorry for the confusion. |
I've been away for a few days, I'll review this in the evening GMT+2, as I worked on the initial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me -- I had some code style questions, but all in all it looks like a nice generalization to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
let start = offsets[ix]; | ||
let end = offsets[ix + 1]; | ||
current_offset += (end - start) as i32; | ||
current_offset = current_offset + (end - start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably did this in response to a clippy lint, but it's fine as we have more lints to fix at some point
This change adds support for `LargeList` in `take()`. There is an additional update to the underlying implementation of `take()` such that the indices may be any `PrimitiveArray` of `ArrowNumericType`, rather than only `UInt32Array`. This change is motivated by the recursive call to `take()` in `take_list()` ([here](https://github.com/apache/arrow/blob/b109195b77d85e513aab80650bd4b193e26a5471/rust/arrow/src/compute/kernels/take.rs#L324)), since in order to support `LargeListArray`, which use `i64` offsets, the recursive call must support indices arrays that are `Int64Array`. Closes apache#8556 from drusso/ARROW-10378 Authored-by: Daniel Russo <danrusso@gmail.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
This change adds support for
LargeList
intake()
.There is an additional update to the underlying implementation of
take()
such that the indices may be anyPrimitiveArray
ofArrowNumericType
, rather than onlyUInt32Array
. This change is motivated by the recursive call totake()
intake_list()
(here), since in order to supportLargeListArray
, which usei64
offsets, the recursive call must support indices arrays that areInt64Array
.