Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17520: [C++] Implement SubStrait SetRel (UnionAll) #14186

Merged
merged 9 commits into from Dec 14, 2022

Conversation

vibhatha
Copy link
Collaborator

This PR includes the initial version of union operator support for Substrait->Acero.

@vibhatha
Copy link
Collaborator Author

cc @westonpace @jvanstraten

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has no components in JIRA, make sure you assign one.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor suggestions but this seems pretty solid to me

// Note: at the moment Acero only supports UNION_ALL operation
switch (op) {
case substrait::SetRel::SET_OP_UNSPECIFIED:
return Status::NotImplemented("NotImplemented union type");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can use EnumToString to have a better error message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use this, I did a minor refactor. I will point it out below.

cpp/src/arrow/engine/substrait/serde_test.cc Show resolved Hide resolved
Comment on lines -85 to -138
std::string EnumToString(int value, const google::protobuf::EnumDescriptor* descriptor) {
const google::protobuf::EnumValueDescriptor* value_desc =
descriptor->FindValueByNumber(value);
if (value_desc == nullptr) {
return "unknown";
}
return value_desc->name();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this method from expression_internal.cc since this can be a generic util. And put it to the util.cc

@@ -179,7 +179,8 @@ void CheckRoundTripResult(const std::shared_ptr<Schema> output_schema,
compute::ExecContext& exec_context,
std::shared_ptr<Buffer>& buf,
const std::vector<int>& include_columns = {},
const ConversionOptions& conversion_options = {}) {
const ConversionOptions& conversion_options = {},
const compute::SortOptions* sort_options = NULLPTR) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here is the same as the glob PR.

Comment on lines 150 to 158
std::string EnumToString(int value, const google::protobuf::EnumDescriptor* descriptor) {
const google::protobuf::EnumValueDescriptor* value_desc =
descriptor->FindValueByNumber(value);
if (value_desc == nullptr) {
return "unknown";
}
return value_desc->name();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the EnumToString function to util.h

@@ -25,6 +25,8 @@
#include "arrow/engine/substrait/options.h"
#include "arrow/util/iterator.h"

#include "substrait/algebra.pb.h" // IWYU pragma: export
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not very sure. We could include proto headers which is required for this, but what is the best practice here? Since there would be more utils like this which could reference the interfaces in the algebra.pb.h, we could use it. But as per this PR what is the best?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't do this. I made the same mistake playing around with extension rels. However, this is part of the public API (these methods are used by python).

I agree it would be useful to have this method in a common spot. Can you make a util_internal.h for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonpace I added and internal file, could you please check?

@vibhatha
Copy link
Collaborator Author

@westonpace thanks for the review, and I updated it with a few changes out of scope. Please check it.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The include in util.h will have to change. But I think we can solve it with util_internal.h

cpp/src/arrow/engine/substrait/relation_internal.cc Outdated Show resolved Hide resolved
@@ -25,6 +25,8 @@
#include "arrow/engine/substrait/options.h"
#include "arrow/util/iterator.h"

#include "substrait/algebra.pb.h" // IWYU pragma: export
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we can't do this. I made the same mistake playing around with extension rels. However, this is part of the public API (these methods are used by python).

I agree it would be useful to have this method in a common spot. Can you make a util_internal.h for this?

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few nits and one slight change needed to how we handle the new internal header.

cpp/src/arrow/engine/substrait/util_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/util_internal.h Outdated Show resolved Hide resolved
cpp/src/arrow/engine/substrait/util.h Outdated Show resolved Hide resolved
@vibhatha
Copy link
Collaborator Author

@westonpace I updated the PR.

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Thanks for your persistence.

@westonpace westonpace merged commit 4647739 into apache:master Dec 14, 2022
@vibhatha
Copy link
Collaborator Author

Looks good to me now. Thanks for your persistence.

Thank you for keeping up with the modifications. 🙂

@ursabot
Copy link

ursabot commented Dec 15, 2022

Benchmark runs are scheduled for baseline = f668537 and contender = 4647739. 4647739 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.66% ⬆️0.07%] test-mac-arm
[Finished ⬇️0.27% ⬆️0.27%] ursa-i9-9960x
[Finished ⬇️0.93% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 4647739e ec2-t3-xlarge-us-east-2
[Finished] 4647739e test-mac-arm
[Finished] 4647739e ursa-i9-9960x
[Finished] 4647739e ursa-thinkcentre-m75q
[Finished] f6685371 ec2-t3-xlarge-us-east-2
[Finished] f6685371 test-mac-arm
[Finished] f6685371 ursa-i9-9960x
[Finished] f6685371 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants