New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10959: [C++] Add scalar string join kernel #8990
Conversation
8f6e26e
to
609a388
Compare
d4608a9
to
356c300
Compare
609a388
to
d42058a
Compare
d42058a
to
9c0836d
Compare
Failures seem unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maartenbreddels . Here are some comments.
Regarding "join" in the name: Regarding "binary" in the name: It would also be good for the name to reflect that this operates on a list array. How about renaming it |
I'll just point out that "join" is not a python-ism. There is a string join in Java, Rust, C#, JavaScript, etc. and it is consistently called join. I think R is the only language I can find that doesn't call it "join". |
SQL—which probably has more users than all those languages combined 😁 |
Snark aside, that's a good point, and since users who are thinking in the SQL way will always be at least one abstraction layer removed from the C++ library, that makes me think "join" is fine 👍 |
In SQL it's called |
In SQL, |
@maartenbreddels Do you want to update this PR? |
I'm going to take this up. |
f06549e
to
9353e4e
Compare
I adressed review comments, added a benchmark, and made |
Still a TODO to address (also support large_list inputs). Edit: done |
@@ -113,6 +113,11 @@ void CheckScalarBinary(std::string func_name, std::shared_ptr<Array> left_input, | |||
std::shared_ptr<Array> expected, | |||
const FunctionOptions* options = nullptr); | |||
|
|||
void CheckScalarBinary(std::string func_name, std::shared_ptr<Array> left_input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep the public overloads to a minimum, these should probably just take Datum
s. Added https://issues.apache.org/jira/browse/ARROW-12953
std::shared_ptr<Array> string_array; | ||
RETURN_NOT_OK(builder->Finish(&string_array)); | ||
*out = *string_array->data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't lose anything for string kernels which always do their own allocation, but it's worth noting that *out
is always a correctly shaped ArrayData. We could also safely write
std::shared_ptr<Array> string_array; | |
RETURN_NOT_OK(builder->Finish(&string_array)); | |
*out = *string_array->data(); | |
RETURN_NOT_OK(builder->FinishInternal(const_cast<std::shared_ptr<ArrayData>*>(&out->array()))); |
@jorisvandenbossche I've implemented this kernel as a binary (arity) kernel, so the input list array *and* the separator input string array can both be an array (see python test). I did not implement the case where the input list is a scalar, and the separator an array, since I don't think that's very common. And note that the kernel is named `binary_join` because it takes string-like and binary-like inputs. Closes apache#8990 from maartenbreddels/ARROW-10959 Lead-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Maarten A. Breddels <maartenbreddels@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
@jorisvandenbossche I've implemented this kernel as a binary (arity) kernel, so the input list array and the separator input string array can both be an array (see python test).
I did not implement the case where the input list is a scalar, and the separator an array, since I don't think that's very common.
And note that the kernel is named
binary_join
because it takes string-like and binary-like inputs.