-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17520: [C++] Implement SubStrait SetRel (UnionAll) #14186
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions but this seems pretty solid to me
// Note: at the moment Acero only supports UNION_ALL operation | ||
switch (op) { | ||
case substrait::SetRel::SET_OP_UNSPECIFIED: | ||
return Status::NotImplemented("NotImplemented union type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use EnumToString to have a better error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use this, I did a minor refactor. I will point it out below.
std::string EnumToString(int value, const google::protobuf::EnumDescriptor* descriptor) { | ||
const google::protobuf::EnumValueDescriptor* value_desc = | ||
descriptor->FindValueByNumber(value); | ||
if (value_desc == nullptr) { | ||
return "unknown"; | ||
} | ||
return value_desc->name(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this method from expression_internal.cc
since this can be a generic util. And put it to the util.cc
@@ -179,7 +179,8 @@ void CheckRoundTripResult(const std::shared_ptr<Schema> output_schema, | |||
compute::ExecContext& exec_context, | |||
std::shared_ptr<Buffer>& buf, | |||
const std::vector<int>& include_columns = {}, | |||
const ConversionOptions& conversion_options = {}) { | |||
const ConversionOptions& conversion_options = {}, | |||
const compute::SortOptions* sort_options = NULLPTR) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic here is the same as the glob PR.
std::string EnumToString(int value, const google::protobuf::EnumDescriptor* descriptor) { | ||
const google::protobuf::EnumValueDescriptor* value_desc = | ||
descriptor->FindValueByNumber(value); | ||
if (value_desc == nullptr) { | ||
return "unknown"; | ||
} | ||
return value_desc->name(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the EnumToString
function to util.h
@@ -25,6 +25,8 @@ | |||
#include "arrow/engine/substrait/options.h" | |||
#include "arrow/util/iterator.h" | |||
|
|||
#include "substrait/algebra.pb.h" // IWYU pragma: export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not very sure. We could include proto headers which is required for this, but what is the best practice here? Since there would be more utils like this which could reference the interfaces in the algebra.pb.h
, we could use it. But as per this PR what is the best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't do this. I made the same mistake playing around with extension rels. However, this is part of the public API (these methods are used by python).
I agree it would be useful to have this method in a common spot. Can you make a util_internal.h
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@westonpace I added and internal file, could you please check?
@westonpace thanks for the review, and I updated it with a few changes out of scope. Please check it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The include in util.h
will have to change. But I think we can solve it with util_internal.h
@@ -25,6 +25,8 @@ | |||
#include "arrow/engine/substrait/options.h" | |||
#include "arrow/util/iterator.h" | |||
|
|||
#include "substrait/algebra.pb.h" // IWYU pragma: export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't do this. I made the same mistake playing around with extension rels. However, this is part of the public API (these methods are used by python).
I agree it would be useful to have this method in a common spot. Can you make a util_internal.h
for this?
f055b7b
to
d85b388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few nits and one slight change needed to how we handle the new internal header.
d85b388
to
2541256
Compare
@westonpace I updated the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now. Thanks for your persistence.
Thank you for keeping up with the modifications. 🙂 |
Benchmark runs are scheduled for baseline = f668537 and contender = 4647739. 4647739 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This PR includes the initial version of union operator support for Substrait->Acero.