New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1560: [C++] Kernel implementations for "match" function #5665
Conversation
@psuman65 if you think this is ready to merge, could you please rebase on master and see if that resolves the Appveyor failure (and if not, then see about fixing that)? |
00798fa
to
ebab59f
Compare
CI is "green" now (Appveyor is still backed up, but I saw it pass before the most recent rebase). @emkornfield do you have any further concerns with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assign yourself to the JIRA
// of a value from left array in right array. | ||
/// | ||
/// If null occurs in left and right, | ||
/// it returns the index, else returns null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include a concrete example in the comment, see compute::Filter for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please explicitly state that if no match is found then null is yielded
/* in_is_valid= */ {}, | ||
/* member_set_values= */ {2, 1, 2, 3}, | ||
/* member_set_is_valid= */ {}, | ||
/* out_values= */ {0, 1, 0, 1, 0, 2}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected the last entry to be 3, since the index of "3" in `member_set_values is 3. The doccomment for Match should mention that the member set is uniqued first or something
/* member_set_is_valid= */ {}, | ||
/* out_values= */ {0, 1, 0, 1, 0, 2}, | ||
/* out_is_valid= */ {}); | ||
// Nulls in left array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing behavior with an all-null array is valuable, but please also include more tests for partially null arrays.
const std::vector<bool>& member_set_is_valid, | ||
const std::vector<int32_t>& out_values, | ||
const std::vector<bool>& out_is_valid) { | ||
std::shared_ptr<Array> input = _MakeArray<Type, T>(type, in_values, in_is_valid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your tests would be significantly more readable if you used ArrayFromJSON
// Using a visitor create a memo_table_ for the right array | ||
// TODO: Implement for small lists | ||
|
||
template <typename T, typename Scalar> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match's implementation appears to share a lot of code with IsIn. Could you consolidate them by putting shared classes like MemoTableRight into match_internal.h?
|
||
if (left_null_count != 0 && right_null_count == 0) { | ||
for (int64_t i = 0; i < left_data.length; ++i) { | ||
indices_builder_.UnsafeAppendNull(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices_builder_.UnsafeAppendNull(); | |
indices_builder_.AppendNulls(left_data.length); |
Status ConstructRight(FunctionContext* ctx, const Datum& right) override { | ||
if (right.kind() == Datum::ARRAY) { | ||
const ArrayData& right_data = *right.array(); | ||
right_null_count = right_data.GetNullCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right_null_count = right_data.GetNullCount(); | |
right_null_count = right.array()->GetNullCount(); |
const ChunkedArray& right_array = *right.chunked_array(); | ||
for (int i = 0; i < right_array.num_chunks(); i++) { | ||
right_null_count += right_array.chunk(i)->null_count(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
right_null_count = right.chunked_array()->null_count(); |
} | ||
|
||
private: | ||
int64_t left_null_count{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t left_null_count{}; | |
int64_t left_null_count = 0; |
@psuman65 pinging on this one |
@wesm @emkornfield I apologize, I couldn't find much time to work on Match. Let me know if I need to close the PR for anyone else to work on it. |
@bkietz could you commit your suggestions and make a followup Jira if necessary? Would you be willing to approve and merge given that? |
Co-Authored-By: Benjamin Kietzman <bengilgit@gmail.com>
@psuman65 @nealrichardson I'll commit my suggestions shortly and rebase. Stand by for force push |
No description provided.