Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Implement arithmetic kernels on List(number) #33042

Open
asfimport opened this issue Sep 22, 2022 · 3 comments
Open

[C++] Implement arithmetic kernels on List(number) #33042

asfimport opened this issue Sep 22, 2022 · 3 comments

Comments

@asfimport
Copy link

eg. rounding in list(float64()), similar to a map or foreach

Reporter: Adam Lippai / @alippai

Note: This issue was originally created as ARROW-17820. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Weston Pace / @westonpace:
Could we do this by running the kernel on the child array? For example, if we have LIST we have a parent list array with one child array. We could run the unary kernel on the child array. I think it would only work for unary kernels however.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
It would be nice if we would have a way that all unary scalar kernels could be applied on list arrays (indeed by being applied to the single child array of flat values).

I think in SQL one could do this with a subquery with unnesting and aggregating again (eg https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#creating_arrays_from_subqueries, although that example is actually not a unary kernel but a binary).

Such an approach doesn't really fit our kernels / Acero, I think. One option could be to have a generic kernel to "map" another kernel on the list values. Like

list_map_function(list_array, "kernel_name", FunctionOptions)

where you can pass the function name you want to apply, and a FunctionOptions object matching the kernel. Would something like this be possible technically?

Another option could be to directly register list type for unary kernels? (in many cases there might be no ambiguity about that we expect the function to be applied to each value in the list, instead of applied to each list. For example for round(list<float>) or ascii_lower(list<string>))

@asfimport
Copy link
Author

Weston Pace / @westonpace:

Such an approach doesn't really fit our kernels / Acero, I think. One option could be to have a generic kernel to "map" another kernel on the list values. where you can pass the function name you want to apply, and a FunctionOptions object matching the kernel. Would something like this be possible technically?

Yes, I think that should be possible for unary kernels. Though I think mapping a single kernel (as opposed to a single expression) might be a bit limiting, though maybe it isn't so bad. For example, what if a user wants to do something like map(lambda f: f.upper() * 2, ["a", "b", "c"])

Another thing is that it should be valid to use n-ary functions too provided the other arguments are scalars. This discussion has come up in Substrait with respect to lambdas (substrait-io/substrait#349).

Perhaps the "map function" for List<T> could be an expression bound to a schema of "{item: T}" (e.g. so you could do field_ref(0) or field_ref("item")).

Though if the map function is an expression then a kernel would have to execute an entire expression which may or may not be doable (I've reached the limit of my imagination for a Friday :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant