Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-16824: [C++] Migrate VectorKernels to use ExecSpan, split out ChunkedArray execution #13398

Merged
merged 15 commits into from
Jun 21, 2022

Conversation

wesm
Copy link
Member

@wesm wesm commented Jun 18, 2022

This is mostly mechanical refactoring. Since many VectorKernels support being passed in a ChunkedArray, I separated the ExecSpan code path (which does not support chunked arrays) from a separate VectorKernel::exec_chunked function which continues to use the const ExecBatch& input format.

Aggregate (scalar and hash) kernels will have to get refactored in a follow up PR. I was going to try to pack that into this branch but it got to be a little bit too much. I would like to work on ARROW-16757 next anyway which will help a lot with cleaning up a lot of accumulated cruft

I also removed the "scalar" input mode of "cumulative_sum" because the result was ill-defined per a separate discussion.

@github-actions
Copy link

} else {
value.SetScalar(batch[i].scalar().get());
}
RETURN_NOT_OK(encoders_[i]->Encode(value, batch.length, key_buf_ptrs.data()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary because Grouper and things hashing related are going to switch soon to use ExecSpan, so this will not be necessary once that's done

@wesm
Copy link
Member Author

wesm commented Jun 18, 2022

CI is passing — this Windows thread pool failure is weird though (cc @westonpace). I’m going to start working on the next stage of work but I’ll just stack the work on this PR in a branch while this PR is in review

Comment on lines +155 to +156
if (data.buffers[0] == nullptr && type_id != Type::NA &&
type_id != Type::SPARSE_UNION && type_id != Type::DENSE_UNION) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -208,7 +215,6 @@ int64_t ArraySpan::GetNullCount() const {
int GetNumBuffers(const DataType& type) {
switch (type.id()) {
case Type::NA:
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — if you look at NullType::layout() there's a single buffer that's always null, but this seems to conflict with the columnar format documentation. @pitrou do you know more about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose it's like how here we have unions listed with three buffers even though the spec lists only one/two.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was to avoid breaking compatibility. We noticed too late that this was inconsistent with the format spec.
We bridge for the inconsistency in the IPC and C data layers.

Comment on lines +391 to +392
// TODO(wesm): these assertions that the arguments cannot be ChunkedArray
// should happen someplace more generic, not here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this function we eventually do want to support chunked arrays in all positions, it just wasn't implemented

@wesm wesm merged commit 982ea6c into apache:master Jun 21, 2022
@wesm wesm deleted the ARROW-16824-migrate-other-kernels branch June 21, 2022 22:58
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Jul 5, 2022
…hunkedArray execution (apache#13398)

This is mostly mechanical refactoring. Since many VectorKernels support being passed in a ChunkedArray, I separated the `ExecSpan` code path (which does not support chunked arrays) from a separate `VectorKernel::exec_chunked` function which continues to use the `const ExecBatch&` input format. 

Aggregate (scalar and hash) kernels will have to get refactored in a follow up PR. I was going to try to pack that into this branch but it got to be a little bit too much. I would like to work on ARROW-16757 next anyway which will help a lot with cleaning up a lot of accumulated cruft

I also removed the "scalar" input mode of "cumulative_sum" because the result was ill-defined per a separate discussion.

Authored-by: Wes McKinney <wesm@apache.org>
Signed-off-by: Wes McKinney <wesm@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants