New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++/Python] Kernel for SetItem(IntegerArray, values) ("replace_with_indices") #25505
Comments
Niranda Perera / @nirandaperera: |
Joris Van den Bossche / @jorisvandenbossche: |
Niranda Perera / @nirandaperera: |
Niranda Perera / @nirandaperera: Just to be clear, an example of this would be,
// code placeholder
arr1 = [a, b, c, d]
arr2 = [l, m, n, o, p]
to_replace = [0, 2, 3]
res = replace_by_index(to_replace, arr1, arr2)
# res --> [l, b, n, o]
What is a good name for the kernel? I'm not sure about replace_by_index because Arrays are immutable. Any suggestions? Can we assume that 'to_replace' is a non-null Int64Array? And should we enforce 'to_replace' being sorted (IMO this would be more efficient)? What is the arity of this kernel? Unary or binary?
|
Joris Van den Bossche / @jorisvandenbossche:
That might be the easiest for now. Or otherwise we can say nulls in
Hmm, not sure. Also for
Ternary? |
Antoine Pitrou / @pitrou: |
Antoine Pitrou / @pitrou: arr1 = [a, b, c, d]
arr2 = [l, m, n, o, p]
to_replace = [0, 2, 3]
res = replace_by_index(to_replace, arr1, arr2)
# res --> [l, b, m, n] @xhochy What do you think? |
Niranda Perera / @nirandaperera: def replace_by_index(to_replace: Array, arr1: Array, arr2: Array) -> Array:
out = arr1 # copy array
for i in to_replace:
out[i] = arr2[i]
return out
Its like an if-else mask, but the mask's set bits are encoded in an array ARROW-10640 |
Antoine Pitrou / @pitrou: |
Niranda Perera / @nirandaperera: |
>>> a = np.array(["a", "b", "c", "d"])
>>> b = np.array(["m", "n", "o"])
>>> a[[0,2,3]] = b
>>> a
array(['m', 'b', 'n', 'o'], dtype='<U1') |
Antoine Pitrou / @pitrou: |
Niranda Perera / @nirandaperera: |
Antoine Pitrou / @pitrou: |
Niranda Perera / @nirandaperera: |
The intention was to support pandas' |
Joris Van den Bossche / @jorisvandenbossche: @nirandaperera to use your pseudo-code, the expected behaviour looks like: def replace_by_index(to_replace: Array, arr1: Array, arr2: Array) -> Array:
out = arr1 # copy array
for idx, val in zip(to_replace, arr2):
out[idx] = val
return out so where |
Niranda Perera / @nirandaperera: |
Niranda Perera / @nirandaperera: def replace_by_index(source, indices, values) array(k) --> array of size k, M <= N |source|indices|values|output| |
Joris Van den Bossche / @jorisvandenbossche: |
Niranda Perera / @nirandaperera: |
Niranda Perera / @nirandaperera: |
Joris Van den Bossche / @jorisvandenbossche: The related ARROW-9430 has been done by now, and there we went with a name |
Todd Farmer / @toddfarmer: |
We should have a kernel that allows overriding the values of an array using an integer array as the indexer and a scalar or array of equal length as the values.
Reporter: Uwe Korn / @xhochy
Related issues:
Note: This issue was originally created as ARROW-9431. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: