New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Kernel implementations for "match" function #17575
Comments
Atri Sharma / @atris: |
Atri Sharma / @atris: |
Micah Kornfield / @emkornfield: |
Uwe Korn / @xhochy: |
Francois Saint-Jacques / @fsaintjacques: |
Wes McKinney / @wesm: > match(c(1, 2, 3), c(2, 3, 4))
[1] NA 1 2 It's configurable though > match(c(1, 2, 3), c(2, 3, 4), nomatch=-1)
[1] -1 1 2 |
Preeti Suman / @psuman65: |
Wes McKinney / @wesm: Could you describe your implementation approach before you go too far down the rabbit hole? We want to make use of the existing hashing machinery that we are using for the |
Micah Kornfield / @emkornfield: |
Wes McKinney / @wesm: |
Preeti Suman / @psuman65: a) Do we need to match null with null or ignore null completely? Example: match(['a', 'b', null], ['a', 'c', null]) Expected output [0, null, 2] b) If we need to compare, what will be the suggested way to traverse nulls if we use the VisitValue and VisitNull (using ArrayDataVisitor) for the array? |
Wes McKinney / @wesm: > match(c(NA, NA, NA, NA), NA)
[1] 1 1 1 1 On the second question, I'm not sure. We aren't accounting for nulls in other hash-related functions like ValueCounts. See ARROW-4787. When you populate the hash table with the right-hand-side values, you can set a flag whether null was present or not (and at what position) and then use this when VisitNull is invoked (if using ArrayDataVisitor turns out to be the most efficient method for this, which I'm also not sure about) |
Ben Kietzman / @bkietz: |
Match computes a position index array from an array values into a set of categories
Reporter: Wes McKinney / @wesm
Assignee: Preeti Suman / @psuman65
PRs and other links:
Note: This issue was originally created as ARROW-1560. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: