-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-16824: [C++] Migrate VectorKernels to use ExecSpan, split out ChunkedArray execution #13398
Conversation
} else { | ||
value.SetScalar(batch[i].scalar().get()); | ||
} | ||
RETURN_NOT_OK(encoders_[i]->Encode(value, batch.length, key_buf_ptrs.data())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary because Grouper and things hashing related are going to switch soon to use ExecSpan, so this will not be necessary once that's done
CI is passing — this Windows thread pool failure is weird though (cc @westonpace). I’m going to start working on the next stage of work but I’ll just stack the work on this PR in a branch while this PR is in review |
if (data.buffers[0] == nullptr && type_id != Type::NA && | ||
type_id != Type::SPARSE_UNION && type_id != Type::DENSE_UNION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there's arrow::internal::HasValidityBitmap
in arrow/type.h
@@ -208,7 +215,6 @@ int64_t ArraySpan::GetNullCount() const { | |||
int GetNumBuffers(const DataType& type) { | |||
switch (type.id()) { | |||
case Type::NA: | |||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes — if you look at NullType::layout()
there's a single buffer that's always null, but this seems to conflict with the columnar format documentation. @pitrou do you know more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose it's like how here we have unions listed with three buffers even though the spec lists only one/two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was to avoid breaking compatibility. We noticed too late that this was inconsistent with the format spec.
We bridge for the inconsistency in the IPC and C data layers.
// TODO(wesm): these assertions that the arguments cannot be ChunkedArray | ||
// should happen someplace more generic, not here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this function we eventually do want to support chunked arrays in all positions, it just wasn't implemented
…hunkedArray execution (apache#13398) This is mostly mechanical refactoring. Since many VectorKernels support being passed in a ChunkedArray, I separated the `ExecSpan` code path (which does not support chunked arrays) from a separate `VectorKernel::exec_chunked` function which continues to use the `const ExecBatch&` input format. Aggregate (scalar and hash) kernels will have to get refactored in a follow up PR. I was going to try to pack that into this branch but it got to be a little bit too much. I would like to work on ARROW-16757 next anyway which will help a lot with cleaning up a lot of accumulated cruft I also removed the "scalar" input mode of "cumulative_sum" because the result was ill-defined per a separate discussion. Authored-by: Wes McKinney <wesm@apache.org> Signed-off-by: Wes McKinney <wesm@apache.org>
This is mostly mechanical refactoring. Since many VectorKernels support being passed in a ChunkedArray, I separated the
ExecSpan
code path (which does not support chunked arrays) from a separateVectorKernel::exec_chunked
function which continues to use theconst ExecBatch&
input format.Aggregate (scalar and hash) kernels will have to get refactored in a follow up PR. I was going to try to pack that into this branch but it got to be a little bit too much. I would like to work on ARROW-16757 next anyway which will help a lot with cleaning up a lot of accumulated cruft
I also removed the "scalar" input mode of "cumulative_sum" because the result was ill-defined per a separate discussion.