Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1560: [C++] Kernel implementations for "match" function #5665

Closed
wants to merge 7 commits into from

Conversation

psuman65
Copy link
Contributor

No description provided.

@github-actions
Copy link

@nealrichardson
Copy link
Contributor

nealrichardson commented Nov 8, 2019

@psuman65 if you think this is ready to merge, could you please rebase on master and see if that resolves the Appveyor failure (and if not, then see about fixing that)?

@nealrichardson
Copy link
Contributor

CI is "green" now (Appveyor is still backed up, but I saw it pass before the most recent rebase). @emkornfield do you have any further concerns with this PR?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assign yourself to the JIRA

cpp/src/arrow/compute/kernels/match.h Outdated Show resolved Hide resolved
// of a value from left array in right array.
///
/// If null occurs in left and right,
/// it returns the index, else returns null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include a concrete example in the comment, see compute::Filter for an example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please explicitly state that if no match is found then null is yielded

/* in_is_valid= */ {},
/* member_set_values= */ {2, 1, 2, 3},
/* member_set_is_valid= */ {},
/* out_values= */ {0, 1, 0, 1, 0, 2},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected the last entry to be 3, since the index of "3" in `member_set_values is 3. The doccomment for Match should mention that the member set is uniqued first or something

/* member_set_is_valid= */ {},
/* out_values= */ {0, 1, 0, 1, 0, 2},
/* out_is_valid= */ {});
// Nulls in left array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing behavior with an all-null array is valuable, but please also include more tests for partially null arrays.

const std::vector<bool>& member_set_is_valid,
const std::vector<int32_t>& out_values,
const std::vector<bool>& out_is_valid) {
std::shared_ptr<Array> input = _MakeArray<Type, T>(type, in_values, in_is_valid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your tests would be significantly more readable if you used ArrayFromJSON

// Using a visitor create a memo_table_ for the right array
// TODO: Implement for small lists

template <typename T, typename Scalar>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match's implementation appears to share a lot of code with IsIn. Could you consolidate them by putting shared classes like MemoTableRight into match_internal.h?


if (left_null_count != 0 && right_null_count == 0) {
for (int64_t i = 0; i < left_data.length; ++i) {
indices_builder_.UnsafeAppendNull();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices_builder_.UnsafeAppendNull();
indices_builder_.AppendNulls(left_data.length);

Status ConstructRight(FunctionContext* ctx, const Datum& right) override {
if (right.kind() == Datum::ARRAY) {
const ArrayData& right_data = *right.array();
right_null_count = right_data.GetNullCount();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
right_null_count = right_data.GetNullCount();
right_null_count = right.array()->GetNullCount();

const ChunkedArray& right_array = *right.chunked_array();
for (int i = 0; i < right_array.num_chunks(); i++) {
right_null_count += right_array.chunk(i)->null_count();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
right_null_count = right.chunked_array()->null_count();

}

private:
int64_t left_null_count{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64_t left_null_count{};
int64_t left_null_count = 0;

@emkornfield
Copy link
Contributor

@psuman65 did/do you have time to address @bkietz comments?

@wesm
Copy link
Member

wesm commented Feb 21, 2020

@psuman65 pinging on this one

@psuman65
Copy link
Contributor Author

@wesm @emkornfield I apologize, I couldn't find much time to work on Match. Let me know if I need to close the PR for anyone else to work on it.

@nealrichardson
Copy link
Contributor

@bkietz could you commit your suggestions and make a followup Jira if necessary? Would you be willing to approve and merge given that?

@bkietz
Copy link
Member

bkietz commented Mar 12, 2020

@psuman65 @nealrichardson I'll commit my suggestions shortly and rebase. Stand by for force push

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants