-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14615: [C++] Refactor nested field refs and add union support #11641
Conversation
|
cpp/src/arrow/array/array_nested.cc
Outdated
flattened_null_bitmap->mutable_data()); | ||
} | ||
|
||
auto flattened_data = child_data->Copy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
child_data
was already a copy, so is this necessary?
ASSERT_OK_AND_ASSIGN(flattened, sliced->GetFlattenedField(1)); | ||
AssertArraysEqual(*ArrayFromJSON(utf8(), R"(["c"])"), *flattened, /*verbose=*/true); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test with an empty union array?
/// Options for struct_field function | ||
class ARROW_EXPORT StructFieldOptions : public FunctionOptions { | ||
public: | ||
explicit StructFieldOptions(std::vector<int> indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this should also accept a FieldRef
or field resolution should be left to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldRef is relative to a schema so we'd want/need a variant of this function that operates on a RecordBatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a plain string name could be useful for a StructArray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd still need a type to resolve it to an index, right? Unless you mean storing std::vector<std::string>
or std::string
directly? (Which might be reasonable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, storing it as strings on the options, I meant. Because when actually executing the kernel, the struct array itself can perfectly resolve the name I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. Want to file a followup? I think we can support having a FieldRef internally, basically. (Though the interpretation will be a little different - it'll be relative to an array, not a schema.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think of it, it's probably better to resolve the field up front (using the schema) than pay the cost for every kernel invocation with the same schema.
EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, | ||
::testing::HasSubstr("out-of-bounds field reference"), | ||
CallFunction("struct_field", {arr}, &invalid2)); | ||
EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, ::testing::HasSubstr("cannot subscript"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be TypeError
in this case?
CallFunction("struct_field", {arr}, &invalid2)); | ||
EXPECT_RAISES_WITH_MESSAGE_THAT(Invalid, ::testing::HasSubstr("cannot subscript"), | ||
CallFunction("struct_field", {arr}, &invalid3)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with non-nested arr
and trivial
options? Should it be tested here?
("Given a series of indices, extract the child array or scalar referenced " | ||
"by the index. For union values, mask the child based on the type codes " | ||
"of the union array. The indices are always the child index and not the " | ||
"type code (for unions) - so the first child is always index 0."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that the indices are given in StructFieldOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thank you @lidavidm
Benchmark runs are scheduled for baseline = a9f2091 and contender = 140b0b2. 140b0b2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
No description provided.