[C++] Kernel implementations for "match" function #17575

asfimport · 2017-09-19T21:51:44Z

Match computes a position index array from an array values into a set of categories

match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])

return [1, 0, 1, null, 0, 1, 0]

Reporter: Wes McKinney / @wesm
Assignee: Preeti Suman / @psuman65

PRs and other links:

GitHub Pull Request #5665

_{Note: This issue was originally created as ARROW-1560. Please see the migration documentation for further details.}

asfimport · 2018-09-15T17:19:55Z

Atri Sharma / @atris:
Can someone please assign this to me?

asfimport · 2018-09-15T17:35:24Z

Uwe Korn / @xhochy:
@atris Assigned you

asfimport · 2018-09-16T13:08:30Z

Atri Sharma / @atris:
Thanks!

asfimport · 2019-01-22T07:27:27Z

Micah Kornfield / @emkornfield:
Is the intended output an array of the smallest numeric type capable of holding the index (or some fixed size)?

asfimport · 2019-01-22T09:22:04Z

Uwe Korn / @xhochy:
[~emkornfield@gmail.com] This could also be an array of int64 for simplicity/as a start.

asfimport · 2019-01-22T15:46:38Z

Francois Saint-Jacques / @fsaintjacques:
What should it return when it doesn't match?

asfimport · 2019-01-22T15:48:42Z

Wes McKinney / @wesm:
R returns null by default

> match(c(1, 2, 3), c(2, 3, 4))
[1] NA  1  2

It's configurable though

> match(c(1, 2, 3), c(2, 3, 4), nomatch=-1)
[1] -1  1  2

asfimport · 2019-02-19T22:47:46Z

Preeti Suman / @psuman65:
Can someone assign this to me?

asfimport · 2019-02-19T23:28:49Z

Wes McKinney / @wesm:
Just made you a Contributor and assigned the issue to you.

Could you describe your implementation approach before you go too far down the rabbit hole? We want to make use of the existing hashing machinery that we are using for the Unique and DictionaryEncode functions

asfimport · 2019-02-19T23:54:00Z

Micah Kornfield / @emkornfield:
Should there be two implementations? One for small lists (linear scan) and
one with hashtable ?

asfimport · 2019-03-13T02:50:28Z

Wes McKinney / @wesm:
Yes that sounds right to me

asfimport · 2019-04-04T15:33:37Z

Preeti Suman / @psuman65:
For match (and isin) compute kernel , in left and right array, if there are nulls in the input,

a) Do we need to match null with null or ignore null completely?

Example:

match(['a', 'b', null], ['a', 'c', null])

Expected output [0, null, 2]

b) If we need to compare, what will be the suggested way to traverse nulls if we use the VisitValue and VisitNull (using ArrayDataVisitor) for the array?

asfimport · 2019-04-04T18:44:24Z

Wes McKinney / @wesm:
R does match NA (null-ish values) so that should probably be the default

> match(c(NA, NA, NA, NA), NA)
[1] 1 1 1 1

On the second question, I'm not sure. We aren't accounting for nulls in other hash-related functions like ValueCounts. See ARROW-4787. When you populate the hash table with the right-hand-side values, you can set a flag whether null was present or not (and at what position) and then use this when VisitNull is invoked (if using ArrayDataVisitor turns out to be the most efficient method for this, which I'm also not sure about)

asfimport · 2020-03-12T21:54:08Z

Ben Kietzman / @bkietz:
Issue resolved by pull request 5665
#5665

asfimport closed this as completed Mar 12, 2020

asfimport added this to the 0.17.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++] Kernel implementations for "match" function #17575

[C++] Kernel implementations for "match" function #17575

asfimport commented Sep 19, 2017

asfimport commented Sep 15, 2018

asfimport commented Sep 15, 2018

asfimport commented Sep 16, 2018

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Feb 19, 2019

asfimport commented Feb 19, 2019

asfimport commented Feb 19, 2019

asfimport commented Mar 13, 2019

asfimport commented Apr 4, 2019

asfimport commented Apr 4, 2019

asfimport commented Mar 12, 2020

[C++] Kernel implementations for "match" function #17575

[C++] Kernel implementations for "match" function #17575

Comments

asfimport commented Sep 19, 2017

PRs and other links:

asfimport commented Sep 15, 2018

asfimport commented Sep 15, 2018

asfimport commented Sep 16, 2018

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Jan 22, 2019

asfimport commented Feb 19, 2019

asfimport commented Feb 19, 2019

asfimport commented Feb 19, 2019

asfimport commented Mar 13, 2019

asfimport commented Apr 4, 2019

asfimport commented Apr 4, 2019

asfimport commented Mar 12, 2020